Re: [Mjpeg-users] [Mjpeg-developer] yuvdenoise performance patch
Hallo > Oh yeah, this isn't on Linux. OSX probably has some kind of API for > checking if sse2 is available. Using CPUID isn't enough, because sse > requires OS support that might not be there. I.e., the cpu supports > sse2 but you're not actually able to use it. Probably not much of issue > on OSX. > > Easy fix would just be the change the sse detection asm to save and > restore ebx. > > __asm__ volatile("pushl %%ebx ; cpuid ; popl %%ebx" : "=d"(d) : "a"(1) : > "ecx"); > > or better > > uint32_t tmp; > __asm__ volatile("movl %%ebx, %1; cpuid; movl %1, %%ebx" : "=d"(d), > "=&g"(tmp) : "a"(1) : "ecx"); > > The latter is safer in general, as you can't use push or pop around any > asm code that has a parameter with a constraint that allows memory > references. The memory reference might be relative to esp, in which > case the push/pop would move it. Or it might not be relative to esp, in > which case the push/pop doesn't move it. So there's no way to adjust > for it. I tested your better version. And it compiles here on my linux and Intel osx box. I did also a quick test with the new version on the linux box. And it works well. So I would appreciate a feedback if it works on a mac. auf hoffentlich bald, Berni the Chaos of Woodquarter Email: shadowl...@utanet.at www: http://www.lysator.liu.se/~gz/bernhard -- Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today. http://p.sf.net/sfu/beautyoftheweb ___ Mjpeg-users mailing list Mjpeg-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mjpeg-users
Re: [Mjpeg-users] [Mjpeg-developer] yuvdenoise performance patch
* Bernhard Praschinger on Thursday, October 14, 2010 at 19:06:33 +0200 >> Oh yeah, this isn't on Linux. OSX probably has some kind of API for >> checking if sse2 is available. Using CPUID isn't enough, because sse >> requires OS support that might not be there. I.e., the cpu supports >> sse2 but you're not actually able to use it. Probably not much of issue >> on OSX. >> >> Easy fix would just be the change the sse detection asm to save and >> restore ebx. >> >> __asm__ volatile("pushl %%ebx ; cpuid ; popl %%ebx" : "=d"(d) : "a"(1) : >> "ecx"); >> >> or better >> >> uint32_t tmp; >> __asm__ volatile("movl %%ebx, %1; cpuid; movl %1, %%ebx" : "=d"(d), >> "=&g"(tmp) : "a"(1) : "ecx"); >> >> The latter is safer in general, as you can't use push or pop around any >> asm code that has a parameter with a constraint that allows memory >> references. The memory reference might be relative to esp, in which >> case the push/pop would move it. Or it might not be relative to esp, in >> which case the push/pop doesn't move it. So there's no way to adjust >> for it. > I tested your better version. And it compiles here on my linux and Intel > osx box. I did also a quick test with the new version on the linux box. > And it works well. > > So I would appreciate a feedback if it works on a mac. Thanks for looking into this, but I get: gcc -DHAVE_CONFIG_H -I. -I.. -I.. -I../utils -O3 -funroll-all-loops -ffast-math -march=nocona -mtune=nocona -g -O2 -I/sw/include -no-cpp-precomp -D_THREAD_SAFE -Wall -Wunused -MT main.o -MD -MP -MF .deps/main.Tpo -c -o main.o main.c main.c: In function ‘main’: main.c:1339: error: PIC register ‘ebx’ clobbered in ‘asm’ make: *** [main.o] Error 1 $ sw_vers ProductName:Mac OS X ProductVersion: 10.5.8 BuildVersion: 9L30 c -- theatre - books - texts - movies Black Trash Productions at home: http://www.blacktrash.org Black Trash Productions on Facebook: http://www.facebook.com/blacktrashproductions -- Download new Adobe(R) Flash(R) Builder(TM) 4 The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly Flex(R) Builder(TM)) enable the development of rich applications that run across multiple browsers and platforms. Download your free trials today! http://p.sf.net/sfu/adobe-dev2dev ___ Mjpeg-users mailing list Mjpeg-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mjpeg-users
Re: [Mjpeg-users] [Mjpeg-developer] yuvdenoise performance patch
* Christian Ebert on Thursday, October 14, 2010 at 23:43:20 +0200 > * Bernhard Praschinger on Thursday, October 14, 2010 at 19:06:33 +0200 >>> Oh yeah, this isn't on Linux. OSX probably has some kind of API for >>> checking if sse2 is available. Using CPUID isn't enough, because sse >>> requires OS support that might not be there. I.e., the cpu supports >>> sse2 but you're not actually able to use it. Probably not much of issue >>> on OSX. >>> >>> Easy fix would just be the change the sse detection asm to save and >>> restore ebx. >>> >>> __asm__ volatile("pushl %%ebx ; cpuid ; popl %%ebx" : "=d"(d) : "a"(1) : >>> "ecx"); >>> >>> or better >>> >>> uint32_t tmp; >>> __asm__ volatile("movl %%ebx, %1; cpuid; movl %1, %%ebx" : "=d"(d), >>> "=&g"(tmp) : "a"(1) : "ecx"); >>> >>> The latter is safer in general, as you can't use push or pop around any >>> asm code that has a parameter with a constraint that allows memory >>> references. The memory reference might be relative to esp, in which >>> case the push/pop would move it. Or it might not be relative to esp, in >>> which case the push/pop doesn't move it. So there's no way to adjust >>> for it. >> I tested your better version. And it compiles here on my linux and Intel >> osx box. I did also a quick test with the new version on the linux box. >> And it works well. >> >> So I would appreciate a feedback if it works on a mac. > > Thanks for looking into this, but I get: > > gcc -DHAVE_CONFIG_H -I. -I.. -I.. -I../utils -O3 -funroll-all-loops > -ffast-math -march=nocona -mtune=nocona -g -O2 -I/sw/include -no-cpp-precomp > -D_THREAD_SAFE -Wall -Wunused -MT main.o -MD -MP -MF .deps/main.Tpo -c -o > main.o main.c > main.c: In function ‘main’: > main.c:1339: error: PIC register ‘ebx’ clobbered in ‘asm’ > make: *** [main.o] Error 1 > > $ sw_vers > ProductName: Mac OS X > ProductVersion: 10.5.8 > BuildVersion: 9L30 Just came across this: http://lists.mplayerhq.hu/pipermail/mplayer-users/2010-October/081276.html c -- \black\trash movie _COWBOY CANOE COMA_ Ein deutscher Western/A German Western --->> http://www.blacktrash.org/underdogma/ccc.php -- Download new Adobe(R) Flash(R) Builder(TM) 4 The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly Flex(R) Builder(TM)) enable the development of rich applications that run across multiple browsers and platforms. Download your free trials today! http://p.sf.net/sfu/adobe-dev2dev ___ Mjpeg-users mailing list Mjpeg-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mjpeg-users
Re: [Mjpeg-users] [Mjpeg-developer] yuvdenoise performance patch
On Thu, Oct 14, 2010 at 2:43 PM, Christian Ebert wrote: > * Bernhard Praschinger on Thursday, October 14, 2010 at 19:06:33 +0200 > >> Oh yeah, this isn't on Linux. OSX probably has some kind of API for > >> checking if sse2 is available. Using CPUID isn't enough, because sse > >> requires OS support that might not be there. I.e., the cpu supports > >> sse2 but you're not actually able to use it. Probably not much of issue > >> on OSX. > >> > >> Easy fix would just be the change the sse detection asm to save and > >> restore ebx. > >> > >> __asm__ volatile("pushl %%ebx ; cpuid ; popl %%ebx" : "=d"(d) : "a"(1) : > >> "ecx"); > >> > >> or better > >> > >> uint32_t tmp; > >> __asm__ volatile("movl %%ebx, %1; cpuid; movl %1, %%ebx" : "=d"(d), > >> "=&g"(tmp) : "a"(1) : "ecx"); > >> > >> The latter is safer in general, as you can't use push or pop around any > >> asm code that has a parameter with a constraint that allows memory > >> references. The memory reference might be relative to esp, in which > >> case the push/pop would move it. Or it might not be relative to esp, in > >> which case the push/pop doesn't move it. So there's no way to adjust > >> for it. > > I tested your better version. And it compiles here on my linux and Intel > > osx box. I did also a quick test with the new version on the linux box. > > And it works well. > > > > So I would appreciate a feedback if it works on a mac. > > Thanks for looking into this, but I get: > > gcc -DHAVE_CONFIG_H -I. -I.. -I.. -I../utils -O3 -funroll-all-loops > -ffast-math -march=nocona -mtune=nocona -g -O2 -I/sw/include -no-cpp-precomp > -D_THREAD_SAFE -Wall -Wunused -MT main.o -MD -MP -MF .deps/main.Tpo -c -o > main.o main.c > main.c: In function ‘main’: > main.c:1339: error: PIC register ‘ebx’ clobbered in ‘asm’ > make: *** [main.o] Error 1 > Looks like you didn't actually change the needed lines. -- Download new Adobe(R) Flash(R) Builder(TM) 4 The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly Flex(R) Builder(TM)) enable the development of rich applications that run across multiple browsers and platforms. Download your free trials today! http://p.sf.net/sfu/adobe-dev2dev___ Mjpeg-users mailing list Mjpeg-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mjpeg-users