14:31:30 <daviddavis> #startmeeting Pulp Triage 2020-04-21 14:31:30 <daviddavis> !start 14:31:30 <pulpbot> Meeting started Tue Apr 21 14:31:30 2020 UTC. The chair is daviddavis. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:31:30 <pulpbot> Useful Commands: #action #agreed #help #info #idea #link #topic. 14:31:30 <pulpbot> The meeting name has been set to 'pulp_triage_2020-04-21' 14:31:30 <daviddavis> #info daviddavis has joined triage 14:31:30 <pulpbot> daviddavis: daviddavis has joined triage 14:31:38 <ttereshc> ah holiday 14:31:41 <ttereshc> #info ttereshc has joined triage 14:31:41 <ttereshc> !here 14:31:41 <pulpbot> ttereshc: ttereshc has joined triage 14:31:45 <dalley> #info dalley has joined triage 14:31:45 <dalley> !here 14:31:45 <pulpbot> dalley: dalley has joined triage 14:31:53 <x9c4> #info x9c4 has joined triage 14:31:53 <x9c4> !here 14:31:53 <pulpbot> x9c4: x9c4 has joined triage 14:32:14 <dkliban> #info dkliban has joined triage 14:32:14 <dkliban> !here 14:32:14 <pulpbot> dkliban: dkliban has joined triage 14:32:17 <daviddavis> !next 14:32:18 <pulpbot> daviddavis: 5 issues left to triage: 6534, 6533, 6521, 6520, 6463 14:32:18 <daviddavis> #topic https://pulp.plan.io/issues/6534 14:32:19 <pulpbot> RM 6534 - ttereshc - NEW - Having same content in one batch can cause issues in _post_save of ContentSaver 14:32:20 <pulpbot> https://pulp.plan.io/issues/6534 14:32:46 <bmbouter> #info bmbouter has joined triage 14:32:46 <bmbouter> !here 14:32:46 <pulpbot> bmbouter: bmbouter has joined triage 14:33:10 <bmbouter> I put this on the agenda for the pulpcore meeting 14:33:20 <dkliban> just accept? 14:33:25 <bmbouter> I think so 14:33:25 <ttereshc> skip? 14:33:27 <ttereshc> :) 14:33:28 <bmbouter> skip 14:33:30 <dkliban> ok 14:33:36 <daviddavis> !skip 14:33:37 <daviddavis> #topic https://pulp.plan.io/issues/6533 14:33:37 <pulpbot> daviddavis: 4 issues left to triage: 6533, 6521, 6520, 6463 14:33:38 <pulpbot> RM 6533 - ipanova@redhat.com - NEW - Task get stuck in 'running' state 14:33:39 <pulpbot> https://pulp.plan.io/issues/6533 14:33:57 <ggainey> #info ggainey has joined triage 14:33:57 <ggainey> !here 14:33:57 <pulpbot> ggainey: ggainey has joined triage 14:34:00 <dkliban> this is the issue i was experiencing on friday 14:34:05 <dkliban> we need to accept and add to sprint 14:34:12 <ttereshc> I think it might be the same issue with rq 14:34:24 <daviddavis> #idea Proposed for #6533: accept and add to sprint 14:34:24 <daviddavis> !propose other accept and add to sprint 14:34:24 <pulpbot> daviddavis: Proposed for #6533: accept and add to sprint 14:34:24 <ipanova> #info ipanova has joined triage 14:34:24 <ipanova> !here 14:34:25 <pulpbot> ipanova: ipanova has joined triage 14:34:33 <x9c4> +1 14:34:38 <ipanova> +1 14:34:47 <daviddavis> #agreed accept and add to sprint 14:34:47 <daviddavis> !accept 14:34:47 <pulpbot> daviddavis: Current proposal accepted: accept and add to sprint 14:34:48 <daviddavis> #topic https://pulp.plan.io/issues/6521 14:34:48 <pulpbot> daviddavis: 3 issues left to triage: 6521, 6520, 6463 14:34:49 <pulpbot> RM 6521 - lmjachky - NEW - An internal server error is raised when creating a new content using repository_version instead of repository 14:34:50 <ttereshc> when I experienced the issue, I had 68 queues and some work was dispatched to the stalled/old workers 14:34:51 <pulpbot> https://pulp.plan.io/issues/6521 14:35:05 <ipanova> #idea Proposed for #6521: accept and add to sprint 14:35:05 <ipanova> !propose other accept and add to sprint 14:35:05 <pulpbot> ipanova: Proposed for #6521: accept and add to sprint 14:35:10 <dkliban> +1 14:35:20 <x9c4> +1 14:35:33 <ttereshc> +1 14:35:35 <dkliban> ttereshc: but did the tasks in your case have a 'waiting' state or 'running' state? 14:35:54 <ttereshc> first running and after subsequent restarts waitning 14:36:02 <ttereshc> until full clean up 14:36:03 <dalley> I've seen that before 14:36:14 <daviddavis> interesting 14:36:16 <dalley> whenever that happens, for me, redis-cli FLUSHALL usually fixes it 14:36:31 <dalley> the subsequent restarts waiting bit I mean 14:36:31 <bmbouter> we have this bug from bin-li and the reproducer from david also 14:36:53 <bmbouter> if you restart redis postgresql holds onto the task and it doesn't cancel correctyl but it's lost from RQ 14:37:11 <bmbouter> maybe we chat more at open floor about it and skip for now 14:37:14 <daviddavis> ok 14:37:16 <dkliban> ok 14:37:18 <daviddavis> #agreed accept and add to sprint 14:37:18 <daviddavis> !accept 14:37:18 <pulpbot> daviddavis: Current proposal accepted: accept and add to sprint 14:37:19 <pulpbot> daviddavis: 2 issues left to triage: 6520, 6463 14:37:19 <daviddavis> #topic https://pulp.plan.io/issues/6520 14:37:20 <pulpbot> RM 6520 - ipanova@redhat.com - POST - Regression: publishing an empty ISO repo no longer publishes PULP_MANIFEST 14:37:21 <pulpbot> https://pulp.plan.io/issues/6520 14:37:22 <bmbouter> I'm ok to add to sprint also but without a reproducer who can take 14:37:32 <bmbouter> at POST, accept? 14:37:46 <ipanova> yes, actually should be modified 14:37:57 <ipanova> will move 14:37:57 <daviddavis> +1 14:38:10 <daviddavis> can you accept it too? 14:38:14 <dkliban> this is confusing ... is this pulp 2? 14:38:15 <daviddavis> #idea Proposed for #6520: Leave the issue as-is, accepting its current state. 14:38:15 <daviddavis> !propose accept 14:38:15 <pulpbot> daviddavis: Proposed for #6520: Leave the issue as-is, accepting its current state. 14:38:18 <ipanova> it is pulp2 14:38:19 <daviddavis> dkliban: yes 14:38:21 <dkliban> ok 14:38:22 <daviddavis> it's tagged pulp 2 14:38:29 <dkliban> i see that now 14:38:31 <dkliban> thanks 14:38:33 <daviddavis> cool 14:38:39 <ipanova> done 14:38:45 <daviddavis> #agreed Leave the issue as-is, accepting its current state. 14:38:45 <daviddavis> !accept 14:38:45 <pulpbot> daviddavis: Current proposal accepted: Leave the issue as-is, accepting its current state. 14:38:46 <daviddavis> #topic https://pulp.plan.io/issues/6463 14:38:46 <pulpbot> daviddavis: 1 issues left to triage: 6463 14:38:47 <pulpbot> RM 6463 - binlinf0 - NEW - pulp 3.2.1 duplicate key error when sync 14:38:48 <pulpbot> https://pulp.plan.io/issues/6463 14:39:00 <dkliban> let's skip again 14:39:05 <daviddavis> !skip 14:39:06 <pulpbot> daviddavis: No issues to triage. 14:39:08 <bmbouter> agreed, we emailed asking for more info again today 14:39:23 <ipanova> dkliban: he replied on the list that re-creating repos fixes the issue 14:39:43 <dkliban> i know ... i think he must have created the repos originally when there was a bug related to this 14:40:00 <ipanova> can be 14:40:07 <dkliban> cause he's been using pulpcore since 3.0.0 14:40:21 <dkliban> since before then, but i suspect he did a rebuild at 3.0.0 14:40:50 <dkliban> i am going to try to find an issue related ... if there is one 14:41:02 <mikedep333> #info mikedep333 has joined triage 14:41:02 <mikedep333> !here 14:41:02 <pulpbot> mikedep333: mikedep333 has joined triage 14:41:52 <daviddavis> do we want to discuss https://pulp.plan.io/issues/6533 more? 14:42:44 <dkliban> yes 14:43:00 <daviddavis> !issue 6533 14:43:01 <daviddavis> #topic https://pulp.plan.io/issues/6533 14:43:01 <pulpbot> RM 6533 - ipanova@redhat.com - NEW - Task get stuck in 'running' state 14:43:02 <pulpbot> https://pulp.plan.io/issues/6533 14:43:16 <dkliban> i have been able to reproduce consistently by running the migration plan and then trying to sync a migrated repository 14:43:28 <dkliban> this is on EL7 and python 3.6.8 14:43:49 <bmbouter> if there is a reproducer then someone could take it 14:43:56 <ipanova> bmbouter: there is a reproducer 14:43:57 <dkliban> i'll post it on the issue 14:44:10 <ipanova> i have provided all the steps there 14:44:40 <dkliban> ipanova: oh yeah ... you have the exact same reproduction stteps i had in mind 14:45:15 <ttereshc> I wonder why migration plugin task triggers that issue :/ 14:46:01 <ipanova> yeah 14:46:02 <daviddavis> yea strange 14:46:07 <daviddavis> so accept and add to sprint? 14:46:11 <dkliban> yes please 14:46:13 <x9c4> +1 14:46:15 <ipanova> +1 14:46:23 <dalley> +1 14:46:24 <ttereshc> +1 if someone has capacity to investigate that 14:46:26 <bmbouter> so here's more I want to share about it 14:46:27 <daviddavis> #idea Proposed for #6533: accept and add to sprint 14:46:27 <daviddavis> !propose other accept and add to sprint 14:46:27 <pulpbot> daviddavis: Proposed for #6533: accept and add to sprint 14:46:30 <daviddavis> #agreed accept and add to sprint 14:46:30 <daviddavis> !accept 14:46:30 <pulpbot> daviddavis: Current proposal accepted: accept and add to sprint 14:46:31 <pulpbot> daviddavis: No issues to triage. 14:46:34 <bmbouter> and I'll write on the issue the same but here for more visibility 14:46:52 <bmbouter> we tried py-spy and it didn't yield results showing what the task is doing which is ... strange 14:47:10 <daviddavis> weird 14:47:25 <bmbouter> so I reocmmend a gdb coredump of the child process (the process that is forked from the RQ parent for each task) 14:47:37 <bmbouter> that will for sure show each thread and where in the python code the interpreter is "stuck" 14:47:53 <bmbouter> and we should take a few core dumps over a few seconds and compare them to see if it's "halted" or "looping in one area" 14:48:13 <bmbouter> we need insight into what it's doing to learn more 14:48:27 <bmbouter> I'll write this on the issue also along w/ some links and commands on how to do this 14:48:28 <daviddavis> cool, sounds like a plan 14:48:50 <x9c4> There is also a builtin async debug option in python. 14:49:20 <x9c4> To show if any coroutine blocks the processor too long or if futures are not awaited. 14:51:39 <daviddavis> interesting, did not know that 14:51:59 <daviddavis> we should write up a guide to debugging stuck tasks 14:52:08 <daviddavis> capture all this info 14:52:18 <x9c4> Put a not in the issue. 14:52:27 <x9c4> note 14:52:38 <daviddavis> +1 14:52:49 <daviddavis> alright, last call for triage 14:53:27 <bmbouter> daviddavis: agreed, if someone filed it I could write such docs 14:53:33 <daviddavis> ok, I'll do it 14:53:39 <daviddavis> x9c4: is this what you were talking about https://docs.python.org/3.8/library/asyncio-dev.html 14:53:48 <x9c4> Yes! 14:53:52 <daviddavis> awesome 14:53:59 <x9c4> i linked it in the issue. 14:54:04 <daviddavis> great 14:54:13 <daviddavis> #endmeeting