Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2016-04-07 Thread Paolo Bonzini
On 07/04/2016 14:54, Michael S. Tsirkin wrote: > > char check_zero(char *p, int len) > { > char res = 0; > int i; > > for (i = 0; i < len; i++) { > res = res | p[i]; > } > > return res; > } > > > If you compile this function with --tree-vectorize and --unroll-loop

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2016-04-07 Thread Dr. David Alan Gilbert
* Michael S. Tsirkin (m...@redhat.com) wrote: > On Thu, Apr 07, 2016 at 12:09:52PM +0100, Dr. David Alan Gilbert wrote: > > * Eric Blake (ebl...@redhat.com) wrote: > > > On 11/12/2015 12:56 PM, Dr. David Alan Gilbert wrote: > > > > > > >> One thing I still can't understand, why the unit test in ho

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2016-04-07 Thread Michael S. Tsirkin
On Thu, Apr 07, 2016 at 12:09:52PM +0100, Dr. David Alan Gilbert wrote: > * Eric Blake (ebl...@redhat.com) wrote: > > On 11/12/2015 12:56 PM, Dr. David Alan Gilbert wrote: > > > > >> One thing I still can't understand, why the unit test in host > > >> environment shows > > >> 'memcmp()' have bett

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2016-04-07 Thread Dr. David Alan Gilbert
* Eric Blake (ebl...@redhat.com) wrote: > On 11/12/2015 12:56 PM, Dr. David Alan Gilbert wrote: > > >> One thing I still can't understand, why the unit test in host environment > >> shows > >> 'memcmp()' have better performance? > > Have you tried running under a profiler, to see if there are ho

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-12 Thread Eric Blake
On 11/12/2015 12:56 PM, Dr. David Alan Gilbert wrote: >> One thing I still can't understand, why the unit test in host environment >> shows >> 'memcmp()' have better performance? Have you tried running under a profiler, to see if there are hotspots or at least get an idea of where the time is be

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-12 Thread Dr. David Alan Gilbert
* Li, Liang Z (liang.z...@intel.com) wrote: > > >> > > > >> > I use your new code: > > >> > - > > >> >unsigned long *p = ... > > >> >if (p[0] || p[1] || p[2] || p[3] > > >> >|| memcmp(p+4, p, size - 4 * sizeof(unsigned long

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-12 Thread Li, Liang Z
> >> > > >> > I use your new code: > >> > - > >> > unsigned long *p = ... > >> > if (p[0] || p[1] || p[2] || p[3] > >> > || memcmp(p+4, p, size - 4 * sizeof(unsigned long)) != 0) > >> > return BUFFER_NOT_ZERO; > >> > else > >> >

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-12 Thread Juan Quintela
"Li, Liang Z" wrote: >> On 12/11/2015 10:40, Li, Liang Z wrote: >> > I migrate a 8GB RAM Idle guest, I think most of it's pages are zero pages. >> > >> > I use your new code: >> > - >> >unsigned long *p = ... >> >if (p[0] || p[1] || p[2] ||

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-12 Thread Li, Liang Z
> On 12/11/2015 10:40, Li, Liang Z wrote: > > I migrate a 8GB RAM Idle guest, I think most of it's pages are zero pages. > > > > I use your new code: > > - > > unsigned long *p = ... > > if (p[0] || p[1] || p[2] || p[3] > > || memcmp(

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-12 Thread Paolo Bonzini
On 12/11/2015 10:40, Li, Liang Z wrote: > I migrate a 8GB RAM Idle guest, I think most of it's pages are zero pages. > > I use your new code: > - > unsigned long *p = ... > if (p[0] || p[1] || p[2] || p[3] > || memcmp(p+4, p,

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-12 Thread Li, Liang Z
> >>> I am very surprised about the live migration performance result > >>> when I use your ' memeqzero4_paolo' instead of these SSE2 Intrinsics > >>> to check the zero pages. > >> > >> What code were you using? Remember I suggested using only unsigned > >> long checks, like > >> > >>unsigned

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-12 Thread Paolo Bonzini
On 12/11/2015 09:53, Li, Liang Z wrote: >> On 12/11/2015 03:49, Li, Liang Z wrote: >>> I am very surprised about the live migration performance result when >>> I use your ' memeqzero4_paolo' instead of these SSE2 Intrinsics to >>> check the zero pages. >> >> What code were you using? Remember I

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-12 Thread Li, Liang Z
> On 12/11/2015 03:49, Li, Liang Z wrote: > > I am very surprised about the live migration performance result when > > I use your ' memeqzero4_paolo' instead of these SSE2 Intrinsics to > > check the zero pages. > > What code were you using? Remember I suggested using only unsigned long > checks

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-12 Thread Paolo Bonzini
On 12/11/2015 03:49, Li, Liang Z wrote: > I am very surprised about the live migration performance result when > I use your ' memeqzero4_paolo' instead of these SSE2 Intrinsics to > check the zero pages. What code were you using? Remember I suggested using only unsigned long checks, like

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-11 Thread Li, Liang Z
> > On 10/11/2015 10:26, Li, Liang Z wrote: > > I don't know Paolo's opinion about how to deal with the SSE2 > > Intrinsics, he is the author. From my personal view, now that we have > > found a better way, why to use such low level SSE2/AVX2 Intrinsics. > > I totally agree. :) > > Paolo Hi Pao

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-10 Thread Li, Liang Z
> On 10/11/2015 10:56, Li, Liang Z wrote: > > > I agree that your patch can be dropped, but go ahead and submit your > > > improvements! > > > > You mean I do this work? > > If you are busy, I can do this. > > It's not that I'm busy, it's that it's your idea. It doesn't matter if I > (and Peter

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-10 Thread Paolo Bonzini
On 10/11/2015 10:56, Li, Liang Z wrote: > > I agree that your patch can be dropped, but go ahead and submit your > > improvements! > > You mean I do this work? > If you are busy, I can do this. It's not that I'm busy, it's that it's your idea. It doesn't matter if I (and Peter Lieven too, act

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-10 Thread Li, Liang Z
> On 10/11/2015 10:41, Li, Liang Z wrote: > >> On 10/11/2015 10:26, Li, Liang Z wrote: > >>> I don't know Paolo's opinion about how to deal with the SSE2 > >>> Intrinsics, he is the author. From my personal view, now that we > >>> have found a better way, why to use such low level SSE2/AVX2 > >>> I

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-10 Thread Paolo Bonzini
On 10/11/2015 10:41, Li, Liang Z wrote: >> On 10/11/2015 10:26, Li, Liang Z wrote: >>> I don't know Paolo's opinion about how to deal with the SSE2 >>> Intrinsics, he is the author. From my personal view, now that we >>> have found a better way, why to use such low level SSE2/AVX2 >>> Intrinsics

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-10 Thread Li, Liang Z
> On 10/11/2015 10:26, Li, Liang Z wrote: > > I don't know Paolo's opinion about how to deal with the SSE2 > > Intrinsics, he is the author. From my personal view, now that we have > > found a better way, why to use such low level SSE2/AVX2 Intrinsics. > > I totally agree. :) > > Paolo Hi Paolo,

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-10 Thread Paolo Bonzini
On 10/11/2015 10:26, Li, Liang Z wrote: > I don't know Paolo's opinion about how to deal with the SSE2 > Intrinsics, he is the author. From my personal view, now that we have > found a better way, why to use such low level SSE2/AVX2 Intrinsics. I totally agree. :) Paolo

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-10 Thread Li, Liang Z
> > Eric, thanks for you information. I didn't notice that discussion before. > > > > > > I rewrite the buffer_find_nonzero_offset() with the 'bool memeqzero4_paolo > length' > > then write a test program to check a large amount of zero pages, and > > use the 'time' to recode the time takes by diff

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-10 Thread Paolo Bonzini
On 10/11/2015 10:13, Juan Quintela wrote: >> > I rewrite the buffer_find_nonzero_offset() with the 'bool memeqzero4_paolo >> > length' >> > then write a test program to check a large amount of zero pages, and >> > use the 'time' to >> > recode the time takes by different optimization. Test resul

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-10 Thread Juan Quintela
"Li, Liang Z" wrote: >> Rather than trying to cater to multiple assembly instruction implementations >> ourselves, have you tried taking the ideas in this earlier thread? >> https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg05298.html >> >> Ideally, libc's memcmp() will already be using th

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-09 Thread Li, Liang Z
> Rather than trying to cater to multiple assembly instruction implementations > ourselves, have you tried taking the ideas in this earlier thread? > https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg05298.html > > Ideally, libc's memcmp() will already be using the most efficient assembly >

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-09 Thread Eric Blake
On 11/09/2015 07:51 PM, Liang Li wrote: > buffer_find_nonzero_offset() is a hot function during live migration. > Now it use SSE2 intructions for optimization. For platform supports > AVX2 instructions, use the AVX2 instructions for optimization can help > to improve the performance about 30% compa

[Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-09 Thread Liang Li
buffer_find_nonzero_offset() is a hot function during live migration. Now it use SSE2 intructions for optimization. For platform supports AVX2 instructions, use the AVX2 instructions for optimization can help to improve the performance about 30% comparing to SSE2. Zero page check can be faster with