Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2016-04-07 Thread Paolo Bonzini
On 07/04/2016 14:54, Michael S. Tsirkin wrote: > > char check_zero(char *p, int len) > { > char res = 0; > int i; > > for (i = 0; i < len; i++) { > res = res | p[i]; > } > > return res; > } > > > If you compile this function with --tree-vectorize and --unroll-loop

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2016-04-07 Thread Dr. David Alan Gilbert
* Michael S. Tsirkin (m...@redhat.com) wrote: > On Thu, Apr 07, 2016 at 12:09:52PM +0100, Dr. David Alan Gilbert wrote: > > * Eric Blake (ebl...@redhat.com) wrote: > > > On 11/12/2015 12:56 PM, Dr. David Alan Gilbert wrote: > > > > > > >> One thing I still can't understand, why the unit test in ho

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2016-04-07 Thread Michael S. Tsirkin
On Thu, Apr 07, 2016 at 12:09:52PM +0100, Dr. David Alan Gilbert wrote: > * Eric Blake (ebl...@redhat.com) wrote: > > On 11/12/2015 12:56 PM, Dr. David Alan Gilbert wrote: > > > > >> One thing I still can't understand, why the unit test in host > > >> environment shows > > >> 'memcmp()' have bett

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2016-04-07 Thread Dr. David Alan Gilbert
* Eric Blake (ebl...@redhat.com) wrote: > On 11/12/2015 12:56 PM, Dr. David Alan Gilbert wrote: > > >> One thing I still can't understand, why the unit test in host environment > >> shows > >> 'memcmp()' have better performance? > > Have you tried running under a profiler, to see if there are ho

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-12 Thread Eric Blake
On 11/12/2015 12:56 PM, Dr. David Alan Gilbert wrote: >> One thing I still can't understand, why the unit test in host environment >> shows >> 'memcmp()' have better performance? Have you tried running under a profiler, to see if there are hotspots or at least get an idea of where the time is be

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-12 Thread Dr. David Alan Gilbert
* Li, Liang Z (liang.z...@intel.com) wrote: > > >> > > > >> > I use your new code: > > >> > - > > >> >unsigned long *p = ... > > >> >if (p[0] || p[1] || p[2] || p[3] > > >> >|| memcmp(p+4, p, size - 4 * sizeof(unsigned long

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-12 Thread Li, Liang Z
> >> > > >> > I use your new code: > >> > - > >> > unsigned long *p = ... > >> > if (p[0] || p[1] || p[2] || p[3] > >> > || memcmp(p+4, p, size - 4 * sizeof(unsigned long)) != 0) > >> > return BUFFER_NOT_ZERO; > >> > else > >> >

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-12 Thread Juan Quintela
"Li, Liang Z" wrote: >> On 12/11/2015 10:40, Li, Liang Z wrote: >> > I migrate a 8GB RAM Idle guest, I think most of it's pages are zero pages. >> > >> > I use your new code: >> > - >> >unsigned long *p = ... >> >if (p[0] || p[1] || p[2] ||

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-12 Thread Li, Liang Z
> On 12/11/2015 10:40, Li, Liang Z wrote: > > I migrate a 8GB RAM Idle guest, I think most of it's pages are zero pages. > > > > I use your new code: > > - > > unsigned long *p = ... > > if (p[0] || p[1] || p[2] || p[3] > > || memcmp(

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-12 Thread Paolo Bonzini
On 12/11/2015 10:40, Li, Liang Z wrote: > I migrate a 8GB RAM Idle guest, I think most of it's pages are zero pages. > > I use your new code: > - > unsigned long *p = ... > if (p[0] || p[1] || p[2] || p[3] > || memcmp(p+4, p,

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-12 Thread Li, Liang Z
> >>> I am very surprised about the live migration performance result > >>> when I use your ' memeqzero4_paolo' instead of these SSE2 Intrinsics > >>> to check the zero pages. > >> > >> What code were you using? Remember I suggested using only unsigned > >> long checks, like > >> > >>unsigned

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-12 Thread Paolo Bonzini
On 12/11/2015 09:53, Li, Liang Z wrote: >> On 12/11/2015 03:49, Li, Liang Z wrote: >>> I am very surprised about the live migration performance result when >>> I use your ' memeqzero4_paolo' instead of these SSE2 Intrinsics to >>> check the zero pages. >> >> What code were you using? Remember I

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-12 Thread Li, Liang Z
> On 12/11/2015 03:49, Li, Liang Z wrote: > > I am very surprised about the live migration performance result when > > I use your ' memeqzero4_paolo' instead of these SSE2 Intrinsics to > > check the zero pages. > > What code were you using? Remember I suggested using only unsigned long > checks

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-12 Thread Paolo Bonzini
On 12/11/2015 03:49, Li, Liang Z wrote: > I am very surprised about the live migration performance result when > I use your ' memeqzero4_paolo' instead of these SSE2 Intrinsics to > check the zero pages. What code were you using? Remember I suggested using only unsigned long checks, like

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-11 Thread Li, Liang Z
> > On 10/11/2015 10:26, Li, Liang Z wrote: > > I don't know Paolo's opinion about how to deal with the SSE2 > > Intrinsics, he is the author. From my personal view, now that we have > > found a better way, why to use such low level SSE2/AVX2 Intrinsics. > > I totally agree. :) > > Paolo Hi Pao

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-10 Thread Li, Liang Z
> On 10/11/2015 10:56, Li, Liang Z wrote: > > > I agree that your patch can be dropped, but go ahead and submit your > > > improvements! > > > > You mean I do this work? > > If you are busy, I can do this. > > It's not that I'm busy, it's that it's your idea. It doesn't matter if I > (and Peter

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-10 Thread Paolo Bonzini
On 10/11/2015 10:56, Li, Liang Z wrote: > > I agree that your patch can be dropped, but go ahead and submit your > > improvements! > > You mean I do this work? > If you are busy, I can do this. It's not that I'm busy, it's that it's your idea. It doesn't matter if I (and Peter Lieven too, act

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-10 Thread Li, Liang Z
> On 10/11/2015 10:41, Li, Liang Z wrote: > >> On 10/11/2015 10:26, Li, Liang Z wrote: > >>> I don't know Paolo's opinion about how to deal with the SSE2 > >>> Intrinsics, he is the author. From my personal view, now that we > >>> have found a better way, why to use such low level SSE2/AVX2 > >>> I

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-10 Thread Paolo Bonzini
On 10/11/2015 10:41, Li, Liang Z wrote: >> On 10/11/2015 10:26, Li, Liang Z wrote: >>> I don't know Paolo's opinion about how to deal with the SSE2 >>> Intrinsics, he is the author. From my personal view, now that we >>> have found a better way, why to use such low level SSE2/AVX2 >>> Intrinsics

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-10 Thread Li, Liang Z
> On 10/11/2015 10:26, Li, Liang Z wrote: > > I don't know Paolo's opinion about how to deal with the SSE2 > > Intrinsics, he is the author. From my personal view, now that we have > > found a better way, why to use such low level SSE2/AVX2 Intrinsics. > > I totally agree. :) > > Paolo Hi Paolo,

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-10 Thread Paolo Bonzini
On 10/11/2015 10:26, Li, Liang Z wrote: > I don't know Paolo's opinion about how to deal with the SSE2 > Intrinsics, he is the author. From my personal view, now that we have > found a better way, why to use such low level SSE2/AVX2 Intrinsics. I totally agree. :) Paolo

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-10 Thread Li, Liang Z
> > Eric, thanks for you information. I didn't notice that discussion before. > > > > > > I rewrite the buffer_find_nonzero_offset() with the 'bool memeqzero4_paolo > length' > > then write a test program to check a large amount of zero pages, and > > use the 'time' to recode the time takes by diff

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-10 Thread Paolo Bonzini
On 10/11/2015 10:13, Juan Quintela wrote: >> > I rewrite the buffer_find_nonzero_offset() with the 'bool memeqzero4_paolo >> > length' >> > then write a test program to check a large amount of zero pages, and >> > use the 'time' to >> > recode the time takes by different optimization. Test resul

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-10 Thread Juan Quintela
"Li, Liang Z" wrote: >> Rather than trying to cater to multiple assembly instruction implementations >> ourselves, have you tried taking the ideas in this earlier thread? >> https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg05298.html >> >> Ideally, libc's memcmp() will already be using th

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-09 Thread Li, Liang Z
> Rather than trying to cater to multiple assembly instruction implementations > ourselves, have you tried taking the ideas in this earlier thread? > https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg05298.html > > Ideally, libc's memcmp() will already be using the most efficient assembly >

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-09 Thread Eric Blake
On 11/09/2015 07:51 PM, Liang Li wrote: > buffer_find_nonzero_offset() is a hot function during live migration. > Now it use SSE2 intructions for optimization. For platform supports > AVX2 instructions, use the AVX2 instructions for optimization can help > to improve the performance about 30% compa