14:30:34 #startmeeting Pulp Triage 2016-09-16 14:30:34 #info asmacdo has joined triage 14:30:35 Meeting started Fri Sep 16 14:30:34 2016 UTC and is due to finish in 60 minutes. The chair is asmacdo. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:30:35 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:30:35 The meeting name has been set to 'pulp_triage_2016_09_16' 14:30:35 asmacdo has joined triage 14:30:38 !here 14:30:38 #info jcline has joined triage 14:30:39 jcline has joined triage 14:30:49 !here 14:30:49 #info pcreech has joined triage 14:30:49 pcreech has joined triage 14:30:56 !here 14:30:56 #info dkliban has joined triage 14:30:57 dkliban has joined triage 14:31:13 !here 14:31:13 #info mhrivnak has joined triage 14:31:14 mhrivnak has joined triage 14:31:16 !here 14:31:16 #info ttereshc has joined triage 14:31:17 ttereshc has joined triage 14:31:39 !next 14:31:40 9 issues left to triage: 2248, 2252, 2253, 2254, 2255, 2256, 2262, 2263, 2264 14:31:41 #topic metadata file copy results in error 'Content import of FILENAME failed - must be an existing file' - http://pulp.plan.io/issues/2248 14:31:41 RPM Support Issue #2248 [NEW] (unassigned) - Priority: Normal | Severity: High | Severity: High 14:31:42 metadata file copy results in error 'Content import of FILENAME failed - must be an existing file' - http://pulp.plan.io/issues/2248 14:31:52 did anyone talk with jsherrill about this? 14:32:19 !here 14:32:19 #info preethi has joined triage 14:32:19 preethi has joined triage 14:32:23 reading 14:32:41 !here 14:32:41 #info ipanova has joined triage 14:32:42 ipanova has joined triage 14:33:11 !here 14:33:11 #info bmbouter has joined triage 14:33:12 bmbouter has joined triage 14:33:47 asmacdo: i had opened it to track the issue, i'm not confident that ipanova's reproducer steps are what the user followed, as they said they had successfully done the copy after step 7, but later on saw the issue 14:34:05 which lead me to believe there maybe is some outstanding bug 14:34:12 but its all just conjecture at this point 14:34:30 !here 14:34:30 #info fdobrovo has joined triage 14:34:30 fdobrovo has joined triage 14:35:07 hmm. well we have the ipanova reproducer, even if it wasn't what the user saw. i am thinking we accept it and expand it if we can nail down the user's case? 14:35:08 jsherrill: depends which units have been copied 14:35:43 !propose accept 14:35:43 #idea Proposed for #2248: Leave the issue as-is, accepting its current state. 14:35:44 Proposed for #2248: Leave the issue as-is, accepting its current state. 14:35:57 (it is normal/high) 14:36:00 jsherrill: if these were units that were added after upgrade, then copy would succeed, or units that before the upgrade have not been copied, but copied after upgrade 14:36:14 ipanova: its a complete copy every time 14:36:25 if we're copying repo a to repo b 14:36:28 we clear out everything in repo b 14:36:32 and then copy everything from a to b 14:37:08 jsherrill: it would be good if you would fine more details from the user 14:37:15 s/fine/find 14:37:32 ipanova: which details exactly? 14:37:51 he's told me what he's done and a timeline of that 14:38:01 It seems like we know that the user can get in a state where the file has been deleted from the FS, right? We did a bug fix to prevent that from happening, but are we just missing a way to deal with the mess that may have been created pre-bugfix? 14:38:25 mhrivnak: what i'm saying is, based on what the user reported, it happened again post-bugfix 14:38:46 It seems worthy of a careful audit 14:38:56 jsherrill, I see. I suspect the file was already missing, and they just triggered a symptom of that post-upgrade. 14:39:02 jsherrill: so far i do not see any other way how this could happend - just in those exact steps i provided 14:39:19 right, and the user may be mis-remembering something, thats always possible 14:39:23 i'm just reporting what they said 14:39:28 jsherrill: exactly 14:39:55 they were pretty confident in their claims though 14:39:59 of the timing of things 14:40:19 i am convinced that we could make this high/high to at least investigate 14:41:03 mhrivnak: the only reason i suspect this is not the case, is that they claimed the same operation worked fine after the upgrade to 2.8.7 14:41:14 jsherrill, gotcha 14:41:28 We can't just close bugs because we're pretty sure it's probably fixed, can we? We should at least look into it, right? 14:41:51 jsherrill: maybe it was a newly created repo with feed, then copy would work fine after upgrade 14:42:05 jcline, definitely. I don't think anyone is suggesting we close it. 14:42:49 current proposal is normal/high. does anyone think it needs to be high/high or can we move on? 14:42:55 !propose triage high medium 14:42:55 #idea Proposed for #2248: Priority: High, Severity: Medium 14:42:55 Proposed for #2248: Priority: High, Severity: Medium 14:43:16 medium severity since there's a workaround that probably works in most cases. 14:43:21 but I could be talked into high severity. 14:43:38 but am thinking about severity inflation. :) 14:43:43 I'm fine with that. ipanova? 14:43:44 mhrivnak: please read the last paragraph of my comment on the issue 14:44:21 ipanova, I see. At a minimum, making recovery steps more accessible to users could be valuable. 14:44:28 agree 14:44:39 either just on this issue, or in a release note, troubleshooting docs, etc. 14:44:49 +1 14:45:10 !accept 14:45:10 #agreed Priority: High, Severity: Medium 14:45:10 Current proposal accepted: Priority: High, Severity: Medium 14:45:12 #topic Repo sync with "epoch" related error message. - http://pulp.plan.io/issues/2252 14:45:12 8 issues left to triage: 2252, 2253, 2254, 2255, 2256, 2262, 2263, 2264 14:45:12 RPM Support Issue #2252 [NEW] (unassigned) - Priority: Normal | Severity: Medium | Severity: Medium 14:45:13 Repo sync with "epoch" related error message. - http://pulp.plan.io/issues/2252 14:46:09 This one is pretty crazy. As bmbouter pointed out, it seems like they must have bad data in the DB. 14:46:10 I think this is another data quality issue (as I wrote on it) 14:46:22 seems like that 14:46:37 This user was running manual mongo update commands in the mongo shell trying to solve other issues. 14:46:44 So it's at least possible that something went wrong there. 14:46:44 shall we defer that till next triage and have some feedback? 14:47:03 the commands he ran were given by ipanova and I 14:47:15 and he has a backup 14:47:21 yeah, those commands had nothing to do with epoch deletion field 14:47:34 we can defer or accept it I'm ok with either 14:47:48 bmbouter, indeed. I saw them and they looked totally sane. But it's still possible he had a copy-paste error, tried some of his own commands, or who knows. 14:47:49 some info from him would be good and a script can probably be written 14:48:06 ipanova++ 14:48:06 ipanova's karma is now 23 14:48:08 !propose needinfo 14:48:08 #idea Proposed for #2252: This issue cannot be triaged without more info. 14:48:08 #info smyers has joined triage 14:48:09 smyers has joined triage 14:48:10 Proposed for #2252: This issue cannot be triaged without more info. 14:48:35 +1 14:48:41 works for me. 14:48:44 +1 14:48:47 +1 14:48:48 !accept 14:48:48 #agreed This issue cannot be triaged without more info. 14:48:49 Current proposal accepted: This issue cannot be triaged without more info. 14:48:50 #topic pulp status call hangs if qpid is unresponsive - http://pulp.plan.io/issues/2253 14:48:50 7 issues left to triage: 2253, 2254, 2255, 2256, 2262, 2263, 2264 14:48:51 Pulp Issue #2253 [NEW] (unassigned) - Priority: Normal | Severity: Medium | Severity: Medium 14:48:52 pulp status call hangs if qpid is unresponsive - http://pulp.plan.io/issues/2253 14:49:17 this seems like a high/high to me 14:49:22 agreed 14:49:36 propose triage high high 14:49:39 !propose triage high high 14:49:39 #idea Proposed for #2253: Priority: High, Severity: High 14:49:40 Proposed for #2253: Priority: High, Severity: High 14:49:48 Why high severity? 14:50:04 cause katello relies on that API 14:50:35 Yeah this seems important 14:50:54 +1 14:50:59 This bug only happens if qpid is already in a bad state. I agree this is quite important, but it comes down to a failure to report other bad state. 14:51:03 qpid becomes unresponsive on its own, aiui, and there's no workaround for fixing it unless the pulp admin is able to also control qpid, which isn't guaranteed. 14:51:07 Presumably you could also kill the whole system by locking up worker threads, right? 14:51:08 Pulp should handle it well. 14:52:04 mhrivnak, are you satisfied? 14:53:03 I'd go with high/medium personally, because fixing this won't change the fact that qpid is broken in that circumstance and everything's borked, but I'm ok with any severity. 14:53:20 This hasn't got anything to do with qpid 14:53:26 It's about how pulp handles a broken qpid 14:53:49 It shouldn't crap its pants entirely when the broker's...er...broken 14:54:00 from a user perspective of katello, we hit that status api quite a bit before performing an action, so from a user perspective if you want them to try to perform and action and immediate get an error about 'qpid is down' vs the entire system hanging 14:54:00 It's only the status API we're talking about, right? 14:54:00 lol smyers 14:54:15 jsherrill++ 14:54:15 jsherrill's karma is now 1 14:54:26 only 1? jsherrill needs more love 14:54:43 !accept 14:54:43 #agreed Priority: High, Severity: High 14:54:43 Current proposal accepted: Priority: High, Severity: High 14:54:43 maybe i can transfer some from other bots 14:54:44 #topic Distributor options are not documented - http://pulp.plan.io/issues/2254 14:54:44 6 issues left to triage: 2254, 2255, 2256, 2262, 2263, 2264 14:54:45 OSTree Support Issue #2254 [NEW] (unassigned) - Priority: Normal | Severity: Medium | Severity: Medium 14:54:46 Distributor options are not documented - http://pulp.plan.io/issues/2254 14:54:47 jsherrill++ 14:54:47 jsherrill's karma is now 2 14:54:49 You make a request to the status API and it hangs instead of reporting failure? 14:54:54 jsherrill++ 14:54:54 jsherrill's karma is now 3 14:54:55 mhrivnak, if you can lock a thread making a call, imagine what happens when you make lots of calls. You could bring the entire REST API down. 14:54:57 let's share love 14:55:01 oh sorry mhrivnak would you like to go back? 14:55:02 mhrivnak: thats what the issue was about 14:55:04 jsherrill++ 14:55:04 jsherrill's karma is now 4 14:55:22 gotcha, ok. we can move on. 14:55:46 jcline, that's a great point. thanks. 14:56:35 lets accept as is 14:56:41 !propose accept 14:56:41 #idea Proposed for #2254: Leave the issue as-is, accepting its current state. 14:56:41 Proposed for #2254: Leave the issue as-is, accepting its current state. 14:57:06 +1 14:57:08 +1 14:57:10 +1 14:57:16 +1 14:57:24 !accept 14:57:24 #agreed Leave the issue as-is, accepting its current state. 14:57:25 Current proposal accepted: Leave the issue as-is, accepting its current state. 14:57:25 #topic Distributor behaviour is not documented - http://pulp.plan.io/issues/2255 14:57:26 5 issues left to triage: 2255, 2256, 2262, 2263, 2264 14:57:27 OSTree Support Issue #2255 [NEW] (unassigned) - Priority: Normal | Severity: Medium | Severity: Medium 14:57:28 Distributor behaviour is not documented - http://pulp.plan.io/issues/2255 14:58:13 !propose accept 14:58:13 #idea Proposed for #2255: Leave the issue as-is, accepting its current state. 14:58:14 Proposed for #2255: Leave the issue as-is, accepting its current state. 14:58:18 +1 14:58:18 +1 14:58:20 +1 14:58:24 +1 14:58:50 !accept 14:58:50 #agreed Leave the issue as-is, accepting its current state. 14:58:50 Current proposal accepted: Leave the issue as-is, accepting its current state. 14:58:51 4 issues left to triage: 2256, 2262, 2263, 2264 14:58:52 #topic Document cancel_publish_repo for distributors - http://pulp.plan.io/issues/2256 14:58:52 Pulp Issue #2256 [NEW] (unassigned) - Priority: Normal | Severity: Medium | Severity: Medium 14:58:53 Document cancel_publish_repo for distributors - http://pulp.plan.io/issues/2256 14:59:07 !propose accept 14:59:07 #idea Proposed for #2256: Leave the issue as-is, accepting its current state. 14:59:08 Proposed for #2256: Leave the issue as-is, accepting its current state. 14:59:11 +1 14:59:14 +1 14:59:30 btw, we use to make a day for docs bug 14:59:39 shall we do that again one day? 14:59:49 we should 14:59:53 ipanova++ 14:59:53 ipanova's karma is now 24 15:00:00 ipanova++ 15:00:00 ipanova's karma is now 25 15:00:01 ipanova, certainly for 3.0 we should if not sooner. 15:00:03 we have a lot of content problems 15:00:29 +1 15:00:39 mhrivnak, i think we should have a docs day for the 2.y line before we ditch it 15:01:04 ack 15:01:04 !accept 15:01:04 #agreed Leave the issue as-is, accepting its current state. 15:01:05 Current proposal accepted: Leave the issue as-is, accepting its current state. 15:01:06 #topic Rewrite all shebangs to explicitly call python3 in 3.0-dev - http://pulp.plan.io/issues/2262 15:01:06 3 issues left to triage: 2262, 2263, 2264 15:01:07 RPM Support Issue #2262 [NEW] (unassigned) - Priority: Normal | Severity: Medium | Severity: Medium 15:01:08 Rewrite all shebangs to explicitly call python3 in 3.0-dev - http://pulp.plan.io/issues/2262 15:01:27 !propose other switch to task 15:01:27 #idea Proposed for #2262: switch to task 15:01:28 Proposed for #2262: switch to task 15:01:35 +1 15:01:47 +1 15:01:52 +1 15:02:05 Woops 15:02:08 prio normal medium? 15:02:09 I thought it was a task :D 15:02:17 ++11 15:02:23 er... 15:02:24 +1 15:02:37 ppccreeeccchh++++ 15:02:38 I don'toh god and it's in rpm support 15:02:41 was I drunk? 15:02:46 oh man 15:03:10 !accept 15:03:10 #agreed switch to task 15:03:11 Current proposal accepted: switch to task 15:03:12 #topic Fast-forward yum publish skips units if previous publish was cancelled - http://pulp.plan.io/issues/2263 15:03:12 2 issues left to triage: 2263, 2264 15:03:13 RPM Support Issue #2263 [NEW] (unassigned) - Priority: Normal | Severity: High | Severity: High 15:03:14 Fast-forward yum publish skips units if previous publish was cancelled - http://pulp.plan.io/issues/2263 15:03:49 !propose triage high high 15:03:49 #idea Proposed for #2263: Priority: High, Severity: High 15:03:49 is this fixable without transactions? 15:03:49 Proposed for #2263: Priority: High, Severity: High 15:03:58 yes, it should be. 15:04:12 +1 for high/high 15:04:31 It appears the last_published attribute is getting updated in a case where it should not. It should be easy to just not update it. 15:05:09 !accept 15:05:09 #agreed Priority: High, Severity: High 15:05:09 Current proposal accepted: Priority: High, Severity: High 15:05:10 1 issues left to triage: 2264 15:05:11 #topic better error reporting during import/association - http://pulp.plan.io/issues/2264 15:05:11 Pulp Issue #2264 [NEW] (unassigned) - Priority: Normal | Severity: Medium | Severity: Medium 15:05:12 yes, it should be at easy change 15:05:12 better error reporting during import/association - http://pulp.plan.io/issues/2264 15:05:16 s/at/an 15:05:41 oooo a PR. That's always nice to see. 15:05:48 yeah it is 15:05:48 we can accept as is. he submitted a PR for this issue, but it still needs to have unit tests fixed 15:05:53 !propose other switch to story 15:05:53 #idea Proposed for #2264: switch to story 15:05:54 Proposed for #2264: switch to story 15:06:00 asmacdo: though i would not accept that for high/high, because there is force full flag 15:06:31 k ipanova we can go back to 2263 next 15:06:33 it does change a significant thing though in that 1 line 15:06:50 we catch Exception and raise PulpCodedException 15:07:32 so maybe now is an ok time to do that 15:07:50 we do this all over the codebase 15:08:09 I think this can stay as a bug if we're ok thinking of it as "pulp hides a perfectly good error message from the user" 15:08:21 that sounds good to me 15:08:30 this one does seem to swallow useful errors more often than other PulpCodedExceptions 15:08:37 I agree 15:08:41 !propsoe accept 15:08:41 Error: "propsoe" is not a valid command. 15:08:42 ok let's do this 15:08:48 !proppse accept 15:08:48 Error: "proppse" is not a valid command. 15:08:53 yes 15:08:54 so yes 15:08:55 also, it raises a PulpExecutionException, which does not appear ot be a PulpCodedException. 15:08:56 !propose accept 15:08:56 #idea Proposed for #2264: Leave the issue as-is, accepting its current state. 15:08:57 Proposed for #2264: Leave the issue as-is, accepting its current state. 15:09:02 oh gosh 15:09:12 well then let's definitly do this 15:09:16 +1 15:09:25 !accept 15:09:25 #agreed Leave the issue as-is, accepting its current state. 15:09:26 Current proposal accepted: Leave the issue as-is, accepting its current state. 15:09:27 No issues to triage. 15:09:29 and assign to jluza_ 15:09:36 !issue 2263 15:09:36 #topic Fast-forward yum publish skips units if previous publish was cancelled - http://pulp.plan.io/issues/2263 15:09:37 RPM Support Issue #2263 [NEW] (unassigned) - Priority: High | Severity: High | Severity: High 15:09:37 :) 15:09:37 Fast-forward yum publish skips units if previous publish was cancelled - http://pulp.plan.io/issues/2263 15:09:56 so talking about how no-op works, from logical point is correct, because since last publish there were no units added/removed, no config change. so that's why publish says- nothing changed, and that's correct from logical point of view. if you canceled it, you just need to keep in mind to make a force publish 15:10:05 ichimonji10: I've been trying to validate the changes of the python3 branch and I am having connections issues, I will create a sample job to test that on jenkins I think that will be more productive 15:10:16 this is the forth time it fails :( 15:11:17 ipanova, if it was cancelled you can't say that last_publish occurred at that time 15:11:19 Just thinking about the last_publish attribute, it seems reasonable for it to only represent successful publishes. 15:11:22 Yeesh. :( 15:11:24 because it was cancelled) 15:11:34 i am with mhrivnak on this one 15:11:35 If a publish was not successful, then the published data did not change. 15:11:51 what if a publish was partially successful? 15:11:59 asmacdo: there is no such thing 15:12:08 oh hey, thats good 15:12:10 agreed, no such thing. 15:12:31 are sure we do not set the finish_time if publish failed? 15:12:32 indeed. :) 15:12:41 if - yes, then, i agree with mhrivnak 15:13:00 ipanova, finish_time as in the TaskStatus field? 15:13:31 i think we take the finish_time from task and put it into last_published 15:13:36 but i am not 100% sure 15:13:58 let's investigate that at least but with lower severity 15:14:21 i can be on board with lower severity than high. the workaround is to force a full publish 15:14:21 I think setting finish_time is appropriate, because it can be evaluated within the context of the "state" field. 15:15:33 !propose triage high medium 15:15:33 #idea Proposed for #2263: Priority: High, Severity: Medium 15:15:34 Proposed for #2263: Priority: High, Severity: Medium 15:15:41 mhrivnak: i mean we're taking finished time from task and inject it to the last_publish no matter task was ok or failed. but i might be wrong 15:15:42 +1 15:15:43 one quick point... 15:16:01 If a user hits this bug, they likely won't realize it. 15:16:13 there will be missing packages from their published repo. 15:16:26 But unless they're paying close attention, they'll have no idea. 15:16:47 good point 15:17:04 ipanova, you may be right on that. And if that's what pulp is doing, we should change it to only do it if the state was successful. 15:17:04 mhrivnak: sure, but that's why we write release notes 15:17:24 mhrivnak: yes 15:17:25 agreed 15:17:56 !propose triage high high 15:17:56 #idea Proposed for #2263: Priority: High, Severity: High 15:17:57 Proposed for #2263: Priority: High, Severity: High 15:18:00 I just don't think release notes are much help when we're talking about missing packages, which are likely to be updates. 15:18:13 +11 15:18:16 +1 15:18:48 ok 15:18:50 thinking you just published a critical errata and then updated your machines, when actually something was missing without any error message, is potentially dangerous. 15:19:27 maybe will be worth to leave a note about checking if we update last_publish only with successful one 15:19:41 definitely. 15:19:46 !accept 15:19:46 #agreed Priority: High, Severity: High 15:19:47 Current proposal accepted: Priority: High, Severity: High 15:19:48 No issues to triage. 15:19:52 ipanova, could you leave that note? 15:19:56 !end 15:19:56 #endmeeting