Re: [lttng-dev] urcu workqueue thread uses 99% of cpu while workqueue is empty

2022-06-14 Thread Mathieu Desnoyers via lttng-dev
- On Jun 13, 2022, at 11:55 PM, Minlan Wang wangmin...@szsandstone.com 
wrote:

> Hi, Mathieu,

Hi Minlan,

Thanks for the detailed bug report. Can I ask more precisely which commit ID
of the userspace-rcu stable-2.12 branch you are using ? Typically a 
"userspace-rcu-latest-0.12.tar.bz2"
gets generated from a git tree at a given point in time, but it does not give
me enough details to know which commit it refers to.

Thanks,

Mathieu

>   We are running a CentOS 8.2 os on Intel(R) Xeon(R) CPU E5-2630 v4,
> and using the workqueue interfaces in src/workqueue.h in
> userspace-rcu-latest-0.12.tar.bz2.
>   Recently, we found the workqueue thread rushes cpu into 99% usage.
> After some debuging, we found that the futex in struct urcu_workqueue got
> into very big negative value, e.g, -12484; while the qlen, cbs_tail, and
> cbs_head suggest that the workqueue is empty.
> We add a watchpoint of workqueue->futex in workqueue_thread(), and got this
> log when workqueue->futex first get into -2:
> ...
> Old value = -1
> New value = 0
> 0x737c1d6d in futex_wake_up (futex=0x5f74aa40) at workqueue.c:160
> 160 in workqueue.c
> #0  0x737c1d6d in futex_wake_up (futex=0x5f74aa40) at
> workqueue.c:160
> #1  0x737c2737 in wake_worker_thread (workqueue=0x5f74aa00) at
> workqueue.c:324
> #2  0x737c29fb in urcu_workqueue_queue_work (workqueue=0x5f74aa00,
> work=0x66e05e00, func=0x77523c90 ) at
> workqueue.c:3
> 67
> #3  0x7752c520 in aio_complete_cb (ctx=,
> iocb=, res=, res2=) at
> bio/aio_bio_adapter.c:152
> #4  0x7752c696 in poll_io_complete (arg=0x62e4f4a0) at
> bio/aio_bio_adapter.c:289
> #5  0x772e6ea5 in start_thread () from /usr/lib64/libpthread.so.0
> #6  0x7415d96d in clone () from /usr/lib64/libc.so.6
> [Switching to Thread 0x7fffde3f3700 (LWP 821768)]
> Hardware watchpoint 4: -location workqueue->futex
> 
> Old value = 0
> New value = -1
> 0x737c2473 in __uatomic_dec (len=4, addr=0x5f74aa40) at
> ../include/urcu/uatomic.h:490
> 490 ../include/urcu/uatomic.h: No such file or directory.
> #0  0x737c2473 in __uatomic_dec (len=4, addr=0x5f74aa40) at
> ../include/urcu/uatomic.h:490
> #1  workqueue_thread (arg=0x5f74aa00) at workqueue.c:250
> #2  0x772e6ea5 in start_thread () from /usr/lib64/libpthread.so.0
> #3  0x7415d96d in clone () from /usr/lib64/libc.so.6
> Hardware watchpoint 4: -location workqueue->futex
> 
> Old value = -1
> New value = -2
> 0x737c2473 in __uatomic_dec (len=4, addr=0x5f74aa40) at
> ../include/urcu/uatomic.h:490
> 490 in ../include/urcu/uatomic.h
> #0  0x737c2473 in __uatomic_dec (len=4, addr=0x5f74aa40) at
> ../include/urcu/uatomic.h:490
> #1  workqueue_thread (arg=0x5f74aa00) at workqueue.c:250
> #2  0x772e6ea5 in start_thread () from /usr/lib64/libpthread.so.0
> #3  0x7415d96d in clone () from /usr/lib64/libc.so.6
> Hardware watchpoint 4: -location workqueue->futex
> 
> Old value = -2
> New value = -3
> 0x737c2473 in __uatomic_dec (len=4, addr=0x5f74aa40) at
> ../include/urcu/uatomic.h:490
> 490 in ../include/urcu/uatomic.h
> #0  0x737c2473 in __uatomic_dec (len=4, addr=0x5f74aa40) at
> ../include/urcu/uatomic.h:490
> #1  workqueue_thread (arg=0x5f74aa00) at workqueue.c:250
> #2  0x772e6ea5 in start_thread () from /usr/lib64/libpthread.so.0
> #3  0x7415d96d in clone () from /usr/lib64/libc.so.6
> Hardware watchpoint 4: -location workqueue->futex
> ...
> 
> After this, things went into wild, workqueue->futex got into bigger negative
> value, and workqueue thread eat up the cpu it is using.
> This ends only when workqueue->futex down flew into 0.
> 
> Do you have any idea why this is happening, and how to fix it?
> 
> B.R
> Minlan Wang

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] urcu workqueue thread uses 99% of cpu while workqueue is empty

2022-06-14 Thread Mathieu Desnoyers via lttng-dev
- On Jun 14, 2022, at 9:39 AM, Mathieu Desnoyers 
mathieu.desnoy...@efficios.com wrote:

> - On Jun 13, 2022, at 11:55 PM, Minlan Wang wangmin...@szsandstone.com
> wrote:
> 
>> Hi, Mathieu,
> 
> Hi Minlan,
> 
> Thanks for the detailed bug report. Can I ask more precisely which commit ID
> of the userspace-rcu stable-2.12 branch you are using ? Typically a

I meant "stable-0.12" branch here.

Thanks,

Mathieu

> "userspace-rcu-latest-0.12.tar.bz2"
> gets generated from a git tree at a given point in time, but it does not give
> me enough details to know which commit it refers to.
> 
> Thanks,
> 
> Mathieu
> 
>>  We are running a CentOS 8.2 os on Intel(R) Xeon(R) CPU E5-2630 v4,
>> and using the workqueue interfaces in src/workqueue.h in
>> userspace-rcu-latest-0.12.tar.bz2.
>>  Recently, we found the workqueue thread rushes cpu into 99% usage.
>> After some debuging, we found that the futex in struct urcu_workqueue got
>> into very big negative value, e.g, -12484; while the qlen, cbs_tail, and
>> cbs_head suggest that the workqueue is empty.
>> We add a watchpoint of workqueue->futex in workqueue_thread(), and got this
>> log when workqueue->futex first get into -2:
>> ...
>> Old value = -1
>> New value = 0
>> 0x737c1d6d in futex_wake_up (futex=0x5f74aa40) at workqueue.c:160
>> 160 in workqueue.c
>> #0  0x737c1d6d in futex_wake_up (futex=0x5f74aa40) at
>> workqueue.c:160
>> #1  0x737c2737 in wake_worker_thread (workqueue=0x5f74aa00) at
>> workqueue.c:324
>> #2  0x737c29fb in urcu_workqueue_queue_work 
>> (workqueue=0x5f74aa00,
>> work=0x66e05e00, func=0x77523c90 ) at
>> workqueue.c:3
>> 67
>> #3  0x7752c520 in aio_complete_cb (ctx=,
>> iocb=, res=, res2=) at
>> bio/aio_bio_adapter.c:152
>> #4  0x7752c696 in poll_io_complete (arg=0x62e4f4a0) at
>> bio/aio_bio_adapter.c:289
>> #5  0x772e6ea5 in start_thread () from /usr/lib64/libpthread.so.0
>> #6  0x7415d96d in clone () from /usr/lib64/libc.so.6
>> [Switching to Thread 0x7fffde3f3700 (LWP 821768)]
>> Hardware watchpoint 4: -location workqueue->futex
>> 
>> Old value = 0
>> New value = -1
>> 0x737c2473 in __uatomic_dec (len=4, addr=0x5f74aa40) at
>> ../include/urcu/uatomic.h:490
>> 490 ../include/urcu/uatomic.h: No such file or directory.
>> #0  0x737c2473 in __uatomic_dec (len=4, addr=0x5f74aa40) at
>> ../include/urcu/uatomic.h:490
>> #1  workqueue_thread (arg=0x5f74aa00) at workqueue.c:250
>> #2  0x772e6ea5 in start_thread () from /usr/lib64/libpthread.so.0
>> #3  0x7415d96d in clone () from /usr/lib64/libc.so.6
>> Hardware watchpoint 4: -location workqueue->futex
>> 
>> Old value = -1
>> New value = -2
>> 0x737c2473 in __uatomic_dec (len=4, addr=0x5f74aa40) at
>> ../include/urcu/uatomic.h:490
>> 490 in ../include/urcu/uatomic.h
>> #0  0x737c2473 in __uatomic_dec (len=4, addr=0x5f74aa40) at
>> ../include/urcu/uatomic.h:490
>> #1  workqueue_thread (arg=0x5f74aa00) at workqueue.c:250
>> #2  0x772e6ea5 in start_thread () from /usr/lib64/libpthread.so.0
>> #3  0x7415d96d in clone () from /usr/lib64/libc.so.6
>> Hardware watchpoint 4: -location workqueue->futex
>> 
>> Old value = -2
>> New value = -3
>> 0x737c2473 in __uatomic_dec (len=4, addr=0x5f74aa40) at
>> ../include/urcu/uatomic.h:490
>> 490 in ../include/urcu/uatomic.h
>> #0  0x737c2473 in __uatomic_dec (len=4, addr=0x5f74aa40) at
>> ../include/urcu/uatomic.h:490
>> #1  workqueue_thread (arg=0x5f74aa00) at workqueue.c:250
>> #2  0x772e6ea5 in start_thread () from /usr/lib64/libpthread.so.0
>> #3  0x7415d96d in clone () from /usr/lib64/libc.so.6
>> Hardware watchpoint 4: -location workqueue->futex
>> ...
>> 
>> After this, things went into wild, workqueue->futex got into bigger negative
>> value, and workqueue thread eat up the cpu it is using.
>> This ends only when workqueue->futex down flew into 0.
>> 
>> Do you have any idea why this is happening, and how to fix it?
>> 
>> B.R
>> Minlan Wang
> 
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] urcu workqueue thread uses 99% of cpu while workqueue is empty

2022-06-14 Thread Minlan Wang via lttng-dev
Hi, Mathieu,
We are running a CentOS 8.2 os on Intel(R) Xeon(R) CPU E5-2630 v4,
and using the workqueue interfaces in src/workqueue.h in
userspace-rcu-latest-0.12.tar.bz2. 
Recently, we found the workqueue thread rushes cpu into 99% usage.
After some debuging, we found that the futex in struct urcu_workqueue got
into very big negative value, e.g, -12484; while the qlen, cbs_tail, and
cbs_head suggest that the workqueue is empty.
We add a watchpoint of workqueue->futex in workqueue_thread(), and got this
log when workqueue->futex first get into -2:
...
Old value = -1
New value = 0
0x737c1d6d in futex_wake_up (futex=0x5f74aa40) at workqueue.c:160
160 in workqueue.c
#0  0x737c1d6d in futex_wake_up (futex=0x5f74aa40) at
workqueue.c:160
#1  0x737c2737 in wake_worker_thread (workqueue=0x5f74aa00) at
workqueue.c:324
#2  0x737c29fb in urcu_workqueue_queue_work (workqueue=0x5f74aa00,
work=0x66e05e00, func=0x77523c90 ) at
workqueue.c:3
67
#3  0x7752c520 in aio_complete_cb (ctx=,
iocb=, res=, res2=) at
bio/aio_bio_adapter.c:152
#4  0x7752c696 in poll_io_complete (arg=0x62e4f4a0) at
bio/aio_bio_adapter.c:289
#5  0x772e6ea5 in start_thread () from /usr/lib64/libpthread.so.0
#6  0x7415d96d in clone () from /usr/lib64/libc.so.6
[Switching to Thread 0x7fffde3f3700 (LWP 821768)]
Hardware watchpoint 4: -location workqueue->futex

Old value = 0
New value = -1
0x737c2473 in __uatomic_dec (len=4, addr=0x5f74aa40) at
../include/urcu/uatomic.h:490
490 ../include/urcu/uatomic.h: No such file or directory.
#0  0x737c2473 in __uatomic_dec (len=4, addr=0x5f74aa40) at
../include/urcu/uatomic.h:490
#1  workqueue_thread (arg=0x5f74aa00) at workqueue.c:250
#2  0x772e6ea5 in start_thread () from /usr/lib64/libpthread.so.0
#3  0x7415d96d in clone () from /usr/lib64/libc.so.6
Hardware watchpoint 4: -location workqueue->futex

Old value = -1
New value = -2
0x737c2473 in __uatomic_dec (len=4, addr=0x5f74aa40) at
../include/urcu/uatomic.h:490
490 in ../include/urcu/uatomic.h
#0  0x737c2473 in __uatomic_dec (len=4, addr=0x5f74aa40) at
../include/urcu/uatomic.h:490
#1  workqueue_thread (arg=0x5f74aa00) at workqueue.c:250
#2  0x772e6ea5 in start_thread () from /usr/lib64/libpthread.so.0
#3  0x7415d96d in clone () from /usr/lib64/libc.so.6
Hardware watchpoint 4: -location workqueue->futex

Old value = -2
New value = -3
0x737c2473 in __uatomic_dec (len=4, addr=0x5f74aa40) at
../include/urcu/uatomic.h:490
490 in ../include/urcu/uatomic.h
#0  0x737c2473 in __uatomic_dec (len=4, addr=0x5f74aa40) at
../include/urcu/uatomic.h:490
#1  workqueue_thread (arg=0x5f74aa00) at workqueue.c:250
#2  0x772e6ea5 in start_thread () from /usr/lib64/libpthread.so.0
#3  0x7415d96d in clone () from /usr/lib64/libc.so.6
Hardware watchpoint 4: -location workqueue->futex
...

After this, things went into wild, workqueue->futex got into bigger negative
value, and workqueue thread eat up the cpu it is using.
This ends only when workqueue->futex down flew into 0.

Do you have any idea why this is happening, and how to fix it?

B.R
Minlan Wang



___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] urcu workqueue thread uses 99% of cpu while workqueue is empty

2022-06-14 Thread Mathieu Desnoyers via lttng-dev
- On Jun 13, 2022, at 11:55 PM, Minlan Wang wangmin...@szsandstone.com 
wrote:

> Hi, Mathieu,
>   We are running a CentOS 8.2 os on Intel(R) Xeon(R) CPU E5-2630 v4,

Also, can you provide more information about which exact Linux kernel version
you are using ?

Thanks,

Mathieu

> and using the workqueue interfaces in src/workqueue.h in
> userspace-rcu-latest-0.12.tar.bz2.
>   Recently, we found the workqueue thread rushes cpu into 99% usage.
> After some debuging, we found that the futex in struct urcu_workqueue got
> into very big negative value, e.g, -12484; while the qlen, cbs_tail, and
> cbs_head suggest that the workqueue is empty.
> We add a watchpoint of workqueue->futex in workqueue_thread(), and got this
> log when workqueue->futex first get into -2:
> ...
> Old value = -1
> New value = 0
> 0x737c1d6d in futex_wake_up (futex=0x5f74aa40) at workqueue.c:160
> 160 in workqueue.c
> #0  0x737c1d6d in futex_wake_up (futex=0x5f74aa40) at
> workqueue.c:160
> #1  0x737c2737 in wake_worker_thread (workqueue=0x5f74aa00) at
> workqueue.c:324
> #2  0x737c29fb in urcu_workqueue_queue_work (workqueue=0x5f74aa00,
> work=0x66e05e00, func=0x77523c90 ) at
> workqueue.c:3
> 67
> #3  0x7752c520 in aio_complete_cb (ctx=,
> iocb=, res=, res2=) at
> bio/aio_bio_adapter.c:152
> #4  0x7752c696 in poll_io_complete (arg=0x62e4f4a0) at
> bio/aio_bio_adapter.c:289
> #5  0x772e6ea5 in start_thread () from /usr/lib64/libpthread.so.0
> #6  0x7415d96d in clone () from /usr/lib64/libc.so.6
> [Switching to Thread 0x7fffde3f3700 (LWP 821768)]
> Hardware watchpoint 4: -location workqueue->futex
> 
> Old value = 0
> New value = -1
> 0x737c2473 in __uatomic_dec (len=4, addr=0x5f74aa40) at
> ../include/urcu/uatomic.h:490
> 490 ../include/urcu/uatomic.h: No such file or directory.
> #0  0x737c2473 in __uatomic_dec (len=4, addr=0x5f74aa40) at
> ../include/urcu/uatomic.h:490
> #1  workqueue_thread (arg=0x5f74aa00) at workqueue.c:250
> #2  0x772e6ea5 in start_thread () from /usr/lib64/libpthread.so.0
> #3  0x7415d96d in clone () from /usr/lib64/libc.so.6
> Hardware watchpoint 4: -location workqueue->futex
> 
> Old value = -1
> New value = -2
> 0x737c2473 in __uatomic_dec (len=4, addr=0x5f74aa40) at
> ../include/urcu/uatomic.h:490
> 490 in ../include/urcu/uatomic.h
> #0  0x737c2473 in __uatomic_dec (len=4, addr=0x5f74aa40) at
> ../include/urcu/uatomic.h:490
> #1  workqueue_thread (arg=0x5f74aa00) at workqueue.c:250
> #2  0x772e6ea5 in start_thread () from /usr/lib64/libpthread.so.0
> #3  0x7415d96d in clone () from /usr/lib64/libc.so.6
> Hardware watchpoint 4: -location workqueue->futex
> 
> Old value = -2
> New value = -3
> 0x737c2473 in __uatomic_dec (len=4, addr=0x5f74aa40) at
> ../include/urcu/uatomic.h:490
> 490 in ../include/urcu/uatomic.h
> #0  0x737c2473 in __uatomic_dec (len=4, addr=0x5f74aa40) at
> ../include/urcu/uatomic.h:490
> #1  workqueue_thread (arg=0x5f74aa00) at workqueue.c:250
> #2  0x772e6ea5 in start_thread () from /usr/lib64/libpthread.so.0
> #3  0x7415d96d in clone () from /usr/lib64/libc.so.6
> Hardware watchpoint 4: -location workqueue->futex
> ...
> 
> After this, things went into wild, workqueue->futex got into bigger negative
> value, and workqueue thread eat up the cpu it is using.
> This ends only when workqueue->futex down flew into 0.
> 
> Do you have any idea why this is happening, and how to fix it?
> 
> B.R
> Minlan Wang

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [lttng-tools] Removal of root_regression tests

2022-06-14 Thread Marcel Hamer via lttng-dev
Hello Jonathan,

On Mon, Jun 13, 2022 at 11:21:49AM -0400, Jonathan Rajotte-Julien wrote:
> [Please note: This e-mail is from an EXTERNAL e-mail address]
> 
> Hi Marcel,
> 
> - Original Message -
> > From: "Marcel Hamer via lttng-dev" 
> > To: "lttng-dev" 
> > Sent: Monday, 13 June, 2022 07:49:39
> > Subject: [lttng-dev] [lttng-tools] Removal of root_regression tests
> 
> > Hello,
> >
> > Since version v2.12.9 of lttng-tools the root_regression file has been 
> > emptied
> > to make the tests part of the 'make check' sequence instead.
> >
> > We were always actively using that test file as part of our regression 
> > testing.
> > In our case we are working in a cross-compilation environment, where the 
> > run.sh
> > script was used on target for testing and as such not at compile time. It 
> > is not
> > easy to run a make check sequence on a target.
> 
> I would suggest that you take a look at how OpenEmbedded does it with ptest 
> AFAIK it match your requirements:
> 
> https://github.com/openembedded/openembedded-core/blob/c7e2901eacf3dcbd0c5bb91d2cc1d467b4a9aaf7/meta/recipes-kernel/lttng/lttng-tools_2.13.7.bb#L75
> 

That is a very good suggestion. I guess we were a bit too focused on our 
existing
solution of using run.sh. We will look into this.

> >
> > It is now also a bit unclear which tests actually require root access and 
> > which
> > tests do not. I understood this was the reason the file was called
> > 'root_regression'?
> 
> Yes when the tests suites primarily used `prove` via run.sh.
> 
> We have been slowly moving away from it for a good time and now mostly use 
> the Automake test harness as much as possible.
> 
> The worse that will happen if you run a test that required root as a non-root 
> user is that `skip` tap output will be emitted.
> 
> >
> > Some questions that get raised because of this:
> >
> > - Is there now an alternative way to run regressions on target in case of a
> >  cross-compilation environment?
> 
> AFAIU, this is out of scope of the lttng project. Still, I would recommend 
> that you see how yocto/oe do it with ptest.
> 
> > - Would there be a possibility to fill the 'root_regression' file again and
> >  possibly revert this change?
> 
> Feel free to do it out-of-tree. I doubt that we are the only project that 
> WindRiver handles that uses
> the automake test harness and that do not provide a easy way to run on-target 
> for cross-compilation testing.

Yes, you are right and that is a fair point. We will look into the ptest
solution.

> 
> A quick grep with "isroot" should get you 95% there.
> 
> > - How are tests now identified that require root access?
> 
> All tests that require root access test for it at runtime
> 
> Something along:
> 
> regression/tools/streaming/test_high_throughput_limits:
> 
>  if [ "$(id -u)" == "0" ]; then
> isroot=1
>  else
>  isroot=0
>  fi
> 
>  skip $isroot "Root access is needed to set bandwidth limits. Skipping all 
> tests." $NUM_TESTS ||
>  {
>  ...
> Tests are done here.
>  }
> 
> Cheers

Thanks for the tip on how to identify test cases that require root privileges.

Kind regards,

Marcel
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] urcu workqueue thread uses 99% of cpu while workqueue is empty

2022-06-14 Thread Mathieu Desnoyers via lttng-dev
- On Jun 13, 2022, at 11:55 PM, Minlan Wang wangmin...@szsandstone.com 
wrote:

> Hi, Mathieu,
>   We are running a CentOS 8.2 os on Intel(R) Xeon(R) CPU E5-2630 v4,
> and using the workqueue interfaces in src/workqueue.h in
> userspace-rcu-latest-0.12.tar.bz2.

Also, I notice that you appear to be using an internal liburcu API (not public)
from outside of the liburcu project, which is not really expected.

If your process forks without exec, make sure you wire up the equivalent of
rculfhash pthread_atfork functions which call urcu_workqueue_pause_worker(),
urcu_workqueue_resume_worker() and urcu_workqueue_create_worker().

Also, can you validate of you have many workqueue worker threads trying to
dequeue from the same workqueue in parallel ? This is unsupported and would
cause the kind of issues you are observing here.

Thanks,

Mathieu

>   Recently, we found the workqueue thread rushes cpu into 99% usage.
> After some debuging, we found that the futex in struct urcu_workqueue got
> into very big negative value, e.g, -12484; while the qlen, cbs_tail, and
> cbs_head suggest that the workqueue is empty.
> We add a watchpoint of workqueue->futex in workqueue_thread(), and got this
> log when workqueue->futex first get into -2:
> ...
> Old value = -1
> New value = 0
> 0x737c1d6d in futex_wake_up (futex=0x5f74aa40) at workqueue.c:160
> 160 in workqueue.c
> #0  0x737c1d6d in futex_wake_up (futex=0x5f74aa40) at
> workqueue.c:160
> #1  0x737c2737 in wake_worker_thread (workqueue=0x5f74aa00) at
> workqueue.c:324
> #2  0x737c29fb in urcu_workqueue_queue_work (workqueue=0x5f74aa00,
> work=0x66e05e00, func=0x77523c90 ) at
> workqueue.c:3
> 67
> #3  0x7752c520 in aio_complete_cb (ctx=,
> iocb=, res=, res2=) at
> bio/aio_bio_adapter.c:152
> #4  0x7752c696 in poll_io_complete (arg=0x62e4f4a0) at
> bio/aio_bio_adapter.c:289
> #5  0x772e6ea5 in start_thread () from /usr/lib64/libpthread.so.0
> #6  0x7415d96d in clone () from /usr/lib64/libc.so.6
> [Switching to Thread 0x7fffde3f3700 (LWP 821768)]
> Hardware watchpoint 4: -location workqueue->futex
> 
> Old value = 0
> New value = -1
> 0x737c2473 in __uatomic_dec (len=4, addr=0x5f74aa40) at
> ../include/urcu/uatomic.h:490
> 490 ../include/urcu/uatomic.h: No such file or directory.
> #0  0x737c2473 in __uatomic_dec (len=4, addr=0x5f74aa40) at
> ../include/urcu/uatomic.h:490
> #1  workqueue_thread (arg=0x5f74aa00) at workqueue.c:250
> #2  0x772e6ea5 in start_thread () from /usr/lib64/libpthread.so.0
> #3  0x7415d96d in clone () from /usr/lib64/libc.so.6
> Hardware watchpoint 4: -location workqueue->futex
> 
> Old value = -1
> New value = -2
> 0x737c2473 in __uatomic_dec (len=4, addr=0x5f74aa40) at
> ../include/urcu/uatomic.h:490
> 490 in ../include/urcu/uatomic.h
> #0  0x737c2473 in __uatomic_dec (len=4, addr=0x5f74aa40) at
> ../include/urcu/uatomic.h:490
> #1  workqueue_thread (arg=0x5f74aa00) at workqueue.c:250
> #2  0x772e6ea5 in start_thread () from /usr/lib64/libpthread.so.0
> #3  0x7415d96d in clone () from /usr/lib64/libc.so.6
> Hardware watchpoint 4: -location workqueue->futex
> 
> Old value = -2
> New value = -3
> 0x737c2473 in __uatomic_dec (len=4, addr=0x5f74aa40) at
> ../include/urcu/uatomic.h:490
> 490 in ../include/urcu/uatomic.h
> #0  0x737c2473 in __uatomic_dec (len=4, addr=0x5f74aa40) at
> ../include/urcu/uatomic.h:490
> #1  workqueue_thread (arg=0x5f74aa00) at workqueue.c:250
> #2  0x772e6ea5 in start_thread () from /usr/lib64/libpthread.so.0
> #3  0x7415d96d in clone () from /usr/lib64/libc.so.6
> Hardware watchpoint 4: -location workqueue->futex
> ...
> 
> After this, things went into wild, workqueue->futex got into bigger negative
> value, and workqueue thread eat up the cpu it is using.
> This ends only when workqueue->futex down flew into 0.
> 
> Do you have any idea why this is happening, and how to fix it?
> 
> B.R
> Minlan Wang

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] urcu workqueue thread uses 99% of cpu while workqueue is empty

2022-06-14 Thread Minlan Wang via lttng-dev
Hi, Mathieu,
The commit on branch stable-0.12 correponds to the tarball we downloaded is 
this:

commit d5277e807192178ddb79f56ecbbd5ac3c4994f60 (HEAD -> v0.12.1.b, tag: 
v0.12.1)
Author: Mathieu Desnoyers 
Date:   Wed Apr 22 08:51:41 2020 -0400

Version 0.12.1

Signed-off-by: Mathieu Desnoyers 

The OS we are using is CentOS Linux release 7.9.2009 (Core), not CentOS 8.2
as mentioned before. And the kernel version is: 3.10.0-1160.el7.x86_64.

On Tue, Jun 14, 2022 at 11:53:16AM -0400, Mathieu Desnoyers wrote:
> Also, I notice that you appear to be using an internal liburcu API (not 
> public)
> from outside of the liburcu project, which is not really expected.
We are trying to move some linux kernel module function into userspace, and
found that the urcu internal workqueue.h has all the things we need for a
replace for kernel workqueue, so we decided to give it a try.

> 
> If your process forks without exec, make sure you wire up the equivalent of
> rculfhash pthread_atfork functions which call urcu_workqueue_pause_worker(),
> urcu_workqueue_resume_worker() and urcu_workqueue_create_worker().
There's no fork/exec in the process who is calling alloc_workqueue, and the
threads who are enqueue work into the workqueue is created by calling
pthread_create.

> 
> Also, can you validate of you have many workqueue worker threads trying to
> dequeue from the same workqueue in parallel ? This is unsupported and would
> cause the kind of issues you are observing here.
The workqueue thread is created by calling urcu_workqueue_create in the code
below, and it is the only thread which will dequeue work from the workqueue.
Though, there are multiple threads who will enqueue work by calling
urcu_workqueue_queue_work(wq, work, work->func).
---
static void workqueue_init_fn(struct urcu_workqueue *workqueue, void *priv)
{
pthread_t tid;
const char *name;
char thread_name[16] = {0};

if (!priv)
return;

name = (const char *)priv;
tid = pthread_self();

memcpy(thread_name, name, 15);
if (pthread_setname_np(tid, thread_name)) {
pr_err("failed to set thread name for workqueue %s\n", name);
}

urcu_memb_register_thread();
}

static void workqueue_finalize_fn(struct urcu_workqueue *workqueue, void
  *priv)
{
urcu_memb_unregister_thread();
if (priv)
free(priv);
}

struct workqueue_struct *alloc_workqueue(const char *fmt,
 unsigned int flags,
 int max_active, ...)
{
const char *name;

name = strdup(fmt);
if (!name) {
pr_err("failed to dup name for workqueue %s\n", fmt);
return NULL;
}

return urcu_workqueue_create(0, -1, (void *)name,
  NULL, /* grace */
  workqueue_init_fn,/* init */
  workqueue_finalize_fn,/* finalize */
  NULL, /* before wait */
  NULL, /* after wake up */
  NULL, /* before pasue */
  NULL);/* after resume */
}
---

B.R
Minlan


___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev