On 8/6/19 2:51 AM, Martin Jansa wrote:
This is the same reproducer I am using in:
https://bugzilla.yoctoproject.org/show_bug.cgi?id=12434
but with this SRCREV I haven't reproduced it yet in first 500 iterations, so it's definitely improving for me (used to reproduce it at least once in first 500 iterations)

Now I'm testing the reproducer with "qmake -install qinstall".

Any update Martin?



Using a variation of Juro's script and adding a little stress-ng load,
it _seems_ that I can make the problem happen more quickly than without
system stress but it's a shared system so _seems_ is underlined.

Using stress-ng was supposed to be a quick check to see if I could
get the reproducer down to minutes rather than around an hour.

Results are promising so I'll continue to use this approach as
I add debugging to pseudo and add an inline, immediate check in
the context of:

http://cgit.openembedded.org/openembedded-core/tree/meta/recipes-core/glibc/glibc-locale.inc?h=master#n72
to see if the UID/GID are equal to my UID/GID.

Test runs summaries are below.

../Randy



cat src/distro/yocto/b/uid-diff/glibc-locale-stress
#!/bin/bash

fname='glibc-locale_master_august13'
max=100
for (( i=1; i <= $max; i++ ))
do
    echo "$i/$max  ${fname}_$i.log"
    bitbake glibc-locale -c cleanall 2>&1 > /dev/null
    # add some stress
    stress-ng -t 1000 --switch 8 --switch-freq 50000 &
    bitbake glibc-locale 2>&1 > ${fname}_$i.log
    # Destress
    killall -9 stress-ng
    if grep -q "host-user-contaminated" ${fname}_$i.log; then
        echo "error !"
      exit 2
    #else
      #rm ${fname}_$i.log
    fi
done


On a (shared) system where lscpu shows 128 cores
and no stress:

Trial   Iteration Error
1       44
2       19


stress-ng -t 1000 --switch 8 --switch-freq 50000

50000 was just the frequency that generated a high enough
but not too high load. On this systems, each process used ~30% of a cpu.

Trial   Iteration Error
1       3
2       18


stress-ng -t 1000 --switch 16 --switch-freq 50000

Trial   Iteration Error
1       3
2       1
3       11

stress-ng -t 1000 --switch 32 --switch-freq 50000

Trial   Iteration Error
1       2
2       9
3       8


stress-ng -t 1000 --switch 64 --switch-freq 50000

Trial   Iteration Error
1       4
2       13
3       >6


stress-ng -t 1000 --mq 64
 128 processes using 98% cpu each

Trial   Iteration Error
1       14
2       NaN

Trial 2 was precluded by other users of the shared system complaining!
The idea was to cause more rapid context switches. Later, I might try
this again with say 16 workers. If anyone has a better idea, please
reply.

EOM


Regards,

On Tue, Aug 6, 2019 at 12:43 AM Bystricky, Juro <juro.bystri...@intel.com <mailto:juro.bystri...@intel.com>> wrote:

    I can reproduce the problem fairly easily  (and, sadly even with the
    latest commits as 060058bb29f70b244e685b3c704eb0641b736f73 ).
    In my case, it seems easy to reproduce if I have 40+ threads running.
    The reproducer script (below) fails typically within the first 10
    iterations.


    #!/bin/bash

    fname='glibc-locale_master_august8'
    max=1000
    for (( i=1; i <= $max; i++ ))
    do
         echo "$i/$max  ${fname}_$i.log"
         bitbake glibc-locale -c cleanall 2>&1 > /dev/null
         bitbake glibc-locale 2>&1 > ${fname}_$i.log
          if grep -q "host-user-contaminated" ${fname}_$i.log; then
             echo "error !"
           exit 2
         #else
           #rm ${fname}_$i.log
         fi

    done

    ________________________________________
    From: openembedded-core-boun...@lists.openembedded.org
    <mailto:openembedded-core-boun...@lists.openembedded.org>
    [openembedded-core-boun...@lists.openembedded.org
    <mailto:openembedded-core-boun...@lists.openembedded.org>] on behalf
    of Seebs [se...@seebs.net <mailto:se...@seebs.net>]
    Sent: Saturday, August 03, 2019 7:23 AM
    To: Khem Raj
    Cc: openembedded-core@lists.openembedded.org
    <mailto:openembedded-core@lists.openembedded.org>
    Subject: Re: [OE-core] [PATCH v2] pseudo: Upgrade to latest to fix
    openat() with a directory symlink [NAK]

    On Sat, 3 Aug 2019 05:33:46 -0700
    Khem Raj <raj.k...@gmail.com <mailto:raj.k...@gmail.com>> wrote:

     > Will this fix the file ownership issue that we see with Glibc-locale
     > packages from time to time?

    I have no idea. Since I haven't got a reliable reproducer for it, I
    can't test it in a sane way.

    -s
    --
    _______________________________________________
    Openembedded-core mailing list
    Openembedded-core@lists.openembedded.org
    <mailto:Openembedded-core@lists.openembedded.org>
    http://lists.openembedded.org/mailman/listinfo/openembedded-core
-- _______________________________________________
    Openembedded-core mailing list
    Openembedded-core@lists.openembedded.org
    <mailto:Openembedded-core@lists.openembedded.org>
    http://lists.openembedded.org/mailman/listinfo/openembedded-core




--
# Randy MacLeod
# Wind River Linux
--
_______________________________________________
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core

Reply via email to