Re: Unusual threading behavior on single processes

2020-03-28 Thread Otto Moerbeek
On Fri, Mar 27, 2020 at 09:03:40PM +, Stefmorino wrote:

> I have question about a performance quirk on OpenBSD, but I'm not really sure
> how to address it, or what the root cause even is; that being how 
> multithreaded
> applications (libpthread?) behave (notably, games).
> 
> I have tested many applications, the behavior is the same in all of them, but
> I'll talk about OpenMW (an open-source game engine for morrowind) since I have
> the most useful information about how this program is threaded. By default,
> OpenMW uses 4 threads (cited here:
> https://openmw.readthedocs.io/en/stable/reference/modding/settings/cells.html),
> one for main/generic processing, one for graphics, one for audio, and one for
> preloading terrain. You can see this if you look at the thread usage under top
> while running the game; however, this is exactly where my question comes into
> play. Instead of each thread processing the game independently with their own
> limits, each thread is "capped" to the total limit of one thread (I.E. instead
> of openmw's process using 100% of 4 threads, or 400% cpu in top, instead the
> process uses 25% across 4 threads, or 100% cpu in top). I tested this using
> GENERIC instead of GENERIC.MP as well, and get identical performance on the 
> one
> thread; it's almost like pthreads is acting as a placeholder of sorts and not
> actually improving performance where it should.
> 
> Is it a lock (spin is at 0)? A placeholder? A limitation of how Ryzen SMP is
> implemented?

Hard to tell, no idea what that game engine does.  But this not a
general problem, e.g. the malloc_duel regress test
(/usr/src/regress/lib/libpthread/malloc_duel). I see > 100% as well
with other multi-threaded programs. 

32013 otto  600 6020K 1552K onproc/3  - 1:07 228.81% malloc_due

Wild guess: it could be that you program actually does not do real
threading, but userland threading. Check with top -H if it really
creates threads.  You should see multiple threads having the same PID.
or all thraeds are using a resource that cannot be shared.

-Otto
> 
> I'd be happy to do any additional testing, I have a fresh -current source tree
> ready
> 
> dmesg
> OpenBSD 6.6-current (GENERIC.MP) #75: Tue Mar 24 12:56:37 MDT 2020
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 16603250688 (15834MB)
> avail mem = 16087437312 (15342MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 3.1 @ 0x986ec000 (62 entries)
> bios0: vendor LENOVO version "R0UET76W (1.56 )" date 11/05/2019
> bios0: LENOVO 20KVCTO1WW
> acpi0 at bios0: ACPI 5.0
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP SSDT SSDT CRAT CDIT UEFI MSDM BATB HPET APIC MCFG 
> SBST WSMT IVRS FPDT SSDT SSDT SSDT UEFI SSDT
> acpi0: wakeup devices GPP0(S3) GPP1(S3) GPP2(S3) GPP3(S3) GPP4(S3) GPP5(S3) 
> GPP6(S3) GP17(S3) XHC0(S3) XHC1(S3) GP18(S3) LID_(S3) SLPB(S3)
> acpitimer0 at acpi0: 3579545 Hz, 32 bits
> acpihpet0 at acpi0: 14318180 Hz
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: AMD Ryzen 5 2500U with Radeon Vega Mobile Gfx, 1996.61 MHz, 17-11-00
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,IBPB,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu0: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 
> 64b/line 8-way L2 cache, 4MB 64b/line 16-way L3 cache
> cpu0: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
> cpu0: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 24MHz
> cpu0: mwait min=64, max=64, C-substates=1.1, IBE
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: AMD Ryzen 5 2500U with Radeon Vega Mobile Gfx, 1996.23 MHz, 17-11-00
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,IBPB,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu1: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 
> 64b/line 8-way L2 cache, 4MB 64b/line 16-way L3 cache
> cpu1: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
> cpu1: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
> cpu1: smt 1, core 

Zoom meeting via chromium web app

2020-03-28 Thread Alessandro De Laurenzis

Greetings,

I'm trying to use the Zoom meeting platform in OpenBSD through the 
Chromium web app (-current, very recent snapshot, Chromium 
80.0.3987.149, amd64).


When I click on the app icon, a new browser window opens and the sign-in 
web page appears, but soon after the browser is killed:


Mar 28 09:52:43 theseus /bsd: chrome(36809): pledge sysctl 2: 6 2
Mar 28 09:52:43 theseus /bsd: chrome[36809]: pledge "", syscall 202

Starting chrome with --disable-unveil doesn't help (same error).

Anybody did succeed in using this (or a similar) platform?

Any hints would be very appreciated.

All the best

--
Alessandro De Laurenzis
[mailto:jus...@atlantide.mooo.com]
Web: http://www.atlantide.mooo.com
LinkedIn: http://it.linkedin.com/in/delaurenzis



Re: Zoom meeting via chromium web app

2020-03-28 Thread Antoine Jacoutot
On Sat, Mar 28, 2020 at 10:00:28AM +0100, Alessandro De Laurenzis wrote:
> Greetings,
> 
> I'm trying to use the Zoom meeting platform in OpenBSD through the Chromium
> web app (-current, very recent snapshot, Chromium 80.0.3987.149, amd64).
> 
> When I click on the app icon, a new browser window opens and the sign-in web
> page appears, but soon after the browser is killed:
> 
> Mar 28 09:52:43 theseus /bsd: chrome(36809): pledge sysctl 2: 6 2
> Mar 28 09:52:43 theseus /bsd: chrome[36809]: pledge "", syscall 202
> 
> Starting chrome with --disable-unveil doesn't help (same error).
> 
> Anybody did succeed in using this (or a similar) platform?

You can use --no-sandbox.
But Zoom will not work anyway, at least for me it doesn't recognize my audio
nor my camera.
I use Windows for video conf.

-- 
Antoine



Re: Zoom meeting via chromium web app

2020-03-28 Thread Tristan Pilat
On March 28, 2020 11:40:25 AM GMT+01:00, Antoine Jacoutot 
 wrote:
>On Sat, Mar 28, 2020 at 10:00:28AM +0100, Alessandro De Laurenzis
>wrote:
>> Greetings,
>> 
>> I'm trying to use the Zoom meeting platform in OpenBSD through the
>Chromium
>> web app (-current, very recent snapshot, Chromium 80.0.3987.149,
>amd64).
>> 
>> When I click on the app icon, a new browser window opens and the
>sign-in web
>> page appears, but soon after the browser is killed:
>> 
>> Mar 28 09:52:43 theseus /bsd: chrome(36809): pledge sysctl 2: 6 2
>> Mar 28 09:52:43 theseus /bsd: chrome[36809]: pledge "", syscall 202
>> 
>> Starting chrome with --disable-unveil doesn't help (same error).
>> 
>> Anybody did succeed in using this (or a similar) platform?
>
>You can use --no-sandbox.
>But Zoom will not work anyway, at least for me it doesn't recognize my
>audio
>nor my camera.
>I use Windows for video conf.

Hello,

I haven't tried Zoom but I successfully used Jisti with Chromium on current. I 
just had to chown /dev/videoX. It was working nicely until my system hung after 
10 or 15 min though, likely because of the lack of hardware acceleration. You 
should give it a try.

Cheers,

-- 
Tristan



RE: Unusual threading behavior on single processes

2020-03-28 Thread zeurkous
Haai,

Just to make a more-or-less general point (or two)...

"Otto Moerbeek"  wrote:
> On Fri, Mar 27, 2020 at 09:03:40PM +, Stefmorino wrote:
>
>> I have tested many applications, the behavior is the same in all of them, but
>> I'll talk about OpenMW (an open-source game engine for morrowind) since I 
>> have
>> the most useful information about how this program is threaded. By default,
>> OpenMW uses 4 threads (cited here:
>> https://openmw.readthedocs.io/en/stable/reference/modding/settings/cells.html),
>> one for main/generic processing, one for graphics, one for audio, and one for
>> preloading terrain.
>>[snip]
>>
>> Is it a lock (spin is at 0)? A placeholder? A limitation of how Ryzen SMP is
>> implemented?
>[snip]
>
> Wild guess: it could be that you program actually does not do real
> threading, but userland threading.

"Fibering", in other words.

> Check with top -H if it really
> creates threads. You should see multiple threads having the same PID.
> or all thraeds are using a resource that cannot be shared.

Likely the latter. It's always funny, isn't it... A coder thinks "hey,
I want a multi-threading 'cause its 1337, I'll just neatly run these
subsystems within seperate threads and I'm done!".

The fact that such is a frequently a naive proposition should be clear
to the more clueful reader. Games tend to be heavy on global state, and
are more likely to benefit from a multi-process model w/ carefully
thought-out boundaries, than from a shared-everything thread model.
While that need not be the case here, mestrongly suspects it is. Take
heed, and measure. Always measure.

Take care,

 --zeurkous.

> -Otto

-- 
Friggin' Machines!



/bin/sh: ctags: Argument list too long (make tags)

2020-03-28 Thread Greg Steuck
Apparently the number files in kern is on the hairy edge of ARG_MAX on
openbsd 6.6-current amd64. If I run the same command in /usr/src, it works
making the problem easy to ignore until more files are added.

Should ctags grow an option to take a list of inputs from a file or is -a
smart enough to be used with xagrs to resolve this problem?

cd /home/greg/s/src/sys/kern; make tags
...
TDIR=`mktemp -d /tmp/_tagXX` || exit 1;  eval
"S=/home/greg/s/src/sys/arch/amd64/../.." &&  config -s
/home/greg/s/src/sys/arch/amd64/../.. -b ${TDIR}
/home/greg/s/src/sys/arch/amd64/conf/GENERIC.MP &&  eval "_arch=\"`make -V
_arch -f ${TDIR}/Makefile`\"" &&  eval "_mach=\"`make -V _mach -f
${TDIR}/Makefile`\"" &&  eval
"_machdir=\/home/greg/s/src/sys/arch/amd64/../../arch/${_mach}" &&  eval
"_archdir=\/home/greg/s/src/sys/arch/amd64/../../arch/${_arch}" &&  eval
"HFILES=\"`find /home/greg/s/src/sys/arch/amd64/../.. \( -path
/home/greg/s/src/sys/arch/amd64/../../'arch' -o -path
/home/greg/s/src/sys/arch/amd64/../../stand -o -path
/home/greg/s/src/sys/arch/amd64/../../lib/libsa -o -path
/home/greg/s/src/sys/arch/amd64/../..'/lib/libkern/arch' \) -prune -o -name
'*.h'; find ${_machdir} ${_archdir}
/home/greg/s/src/sys/arch/amd64/../../lib/libkern/arch/${_mach} \( -name
boot -o -name stand \) -prune -o -name '*.h'`\"" &&  eval "SFILES=\"`make
-V SFILES -f ${TDIR}/Makefile`\"" &&  eval "CFILES=\"`make -V CFILES -f
${TDIR}/Makefile`\"" &&  eval "AFILES=\"`make -V AFILES -f
${TDIR}/Makefile`\"" &&  ctags -wd -f /home/greg/s/src/sys/arch/amd64/tags
${CFILES} ${HFILES} &&  egrep "^[_A-Z]*ENTRY[_A-Z]*\(.*\)" ${SFILES}
${AFILES} |  sed "s;\\([^:]*\\):\\([^(]*\\)(\\([^, )]*\\)\\(.*\\);\\3 \\1
/^\\2(\\3\\4$/;"  >> /home/greg/s/src/sys/arch/amd64/tags &&  sort -o
/home/greg/s/src/sys/arch/amd64/tags /home/greg/s/src/sys/arch/amd64/tags
&&  rm -rf ${TDIR}
/bin/sh: ctags: Argument list too long
*** Error 1 in arch/amd64 (Makefile:42 'tags')

-- 
nest.cx is Gmail hosted, use PGP:
https://pgp.key-server.io/0x0B1542BD8DF5A1B0
Fingerprint: 5E2B 2D0E 1E03 2046 BEC3  4D50 0B15 42BD 8DF5 A1B0


Re: Unusual threading behavior on single processes

2020-03-28 Thread Stefmorino
Thank you, your information was very helpful. I compiled and ran
malloc_duel and it's working as intended. I wasn't aware of the -H flag
for top, and I can see programs are threading as you say, though the
bottleneck to my poor performance is still a mystery.

I took some screen captures so you can see what I'm seeing:
Xonotic:
https://0x0.st/iBCD.png
OpenMW:
https://0x0.st/iBC5.png
Terraria: (fnaify if curious, thanks thfr :>)
https://0x0.st/iMrr.png

In the case of OpenMW, the bottleneck actually seems pretty obvious with
what top -H reports. I don't really know what to say about the other
examples.

I would break out a profiling tool at this stage, but the results of
testing with top -H have left me with no idea where the bottleneck is
(except openmw where it might actually be CPU); digging through systat
hasn't really given me any revelations either. :/

If anyone has a hunch where I should check, or if you need me to test a
different software, I'd be more than happy to.

Regards,
Stefmorino


On Sat, Mar 28, 2020, at 09:00:21AM +, Otto Moerbeek wrote:

> On Fri, Mar 27, 2020 at 09:03:40PM +, Stefmorino wrote:

>> I have question about a performance quirk on OpenBSD, but I'm not really sure
>> how to address it, or what the root cause even is; that being how 
>> multithreaded
>> applications (libpthread?) behave (notably, games).
>>
>> I have tested many applications, the behavior is the same in all of them, but
>> I'll talk about OpenMW (an open-source game engine for morrowind) since I 
>> have
>> the most useful information about how this program is threaded. By default,
>> OpenMW uses 4 threads (cited here:
>> https://openmw.readthedocs.io/en/stable/reference/modding/settings/cells.html),
>> one for main/generic processing, one for graphics, one for audio, and one for
>> preloading terrain. You can see this if you look at the thread usage under 
>> top
>> while running the game; however, this is exactly where my question comes into
>> play. Instead of each thread processing the game independently with their own
>> limits, each thread is "capped" to the total limit of one thread (I.E. 
>> instead
>> of openmw's process using 100% of 4 threads, or 400% cpu in top, instead the
>> process uses 25% across 4 threads, or 100% cpu in top). I tested this using
>> GENERIC instead of GENERIC.MP as well, and get identical performance on the 
>> one
>> thread; it's almost like pthreads is acting as a placeholder of sorts and not
>> actually improving performance where it should.
>>
>> Is it a lock (spin is at 0)? A placeholder? A limitation of how Ryzen SMP is
>> implemented?
>
> Hard to tell, no idea what that game engine does.  But this not a
> general problem, e.g. the malloc_duel regress test
> (/usr/src/regress/lib/libpthread/malloc_duel). I see > 100% as well
> with other multi-threaded programs.
>
> 32013 otto  600 6020K 1552K onproc/3  - 1:07 228.81% 
> malloc_due
>
> Wild guess: it could be that you program actually does not do real
> threading, but userland threading. Check with top -H if it really
> creates threads.  You should see multiple threads having the same PID.
> or all thraeds are using a resource that cannot be shared.
>
>  -Otto
>>
>> I'd be happy to do any additional testing, I have a fresh -current source 
>> tree
>> ready
>>
>> dmesg
>> OpenBSD 6.6-current (GENERIC.MP) #75: Tue Mar 24 12:56:37 MDT 2020
>> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>> real mem = 16603250688 (15834MB)
>> avail mem = 16087437312 (15342MB)
>> mpath0 at root
>> scsibus0 at mpath0: 256 targets
>> mainbus0 at root
>> bios0 at mainbus0: SMBIOS rev. 3.1 @ 0x986ec000 (62 entries)
>> bios0: vendor LENOVO version "R0UET76W (1.56 )" date 11/05/2019
>> bios0: LENOVO 20KVCTO1WW
>> acpi0 at bios0: ACPI 5.0
>> acpi0: sleep states S0 S3 S4 S5
>> acpi0: tables DSDT FACP SSDT SSDT CRAT CDIT UEFI MSDM BATB HPET APIC MCFG 
>> SBST WSMT IVRS FPDT SSDT SSDT SSDT UEFI SSDT
>> acpi0: wakeup devices GPP0(S3) GPP1(S3) GPP2(S3) GPP3(S3) GPP4(S3) GPP5(S3) 
>> GPP6(S3) GP17(S3) XHC0(S3) XHC1(S3) GP18(S3) LID_(S3) SLPB(S3)
>> acpitimer0 at acpi0: 3579545 Hz, 32 bits
>> acpihpet0 at acpi0: 14318180 Hz
>> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
>> cpu0 at mainbus0: apid 0 (boot processor)
>> cpu0: AMD Ryzen 5 2500U with Radeon Vega Mobile Gfx, 1996.61 MHz, 17-11-00
>> cpu0: 
>> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,IBPB,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
>> cpu0: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 
>> 64b/line 8-way L2 cache, 4MB 64b/line 16-w