On 07/04/2016 14:54, Michael S. Tsirkin wrote:
>
> char check_zero(char *p, int len)
> {
> char res = 0;
> int i;
>
> for (i = 0; i < len; i++) {
> res = res | p[i];
> }
>
> return res;
> }
>
>
> If you compile this function with --tree-vectorize and --unroll-loop
* Michael S. Tsirkin (m...@redhat.com) wrote:
> On Thu, Apr 07, 2016 at 12:09:52PM +0100, Dr. David Alan Gilbert wrote:
> > * Eric Blake (ebl...@redhat.com) wrote:
> > > On 11/12/2015 12:56 PM, Dr. David Alan Gilbert wrote:
> > >
> > > >> One thing I still can't understand, why the unit test in ho
On Thu, Apr 07, 2016 at 12:09:52PM +0100, Dr. David Alan Gilbert wrote:
> * Eric Blake (ebl...@redhat.com) wrote:
> > On 11/12/2015 12:56 PM, Dr. David Alan Gilbert wrote:
> >
> > >> One thing I still can't understand, why the unit test in host
> > >> environment shows
> > >> 'memcmp()' have bett
* Eric Blake (ebl...@redhat.com) wrote:
> On 11/12/2015 12:56 PM, Dr. David Alan Gilbert wrote:
>
> >> One thing I still can't understand, why the unit test in host environment
> >> shows
> >> 'memcmp()' have better performance?
>
> Have you tried running under a profiler, to see if there are ho
On 11/12/2015 12:56 PM, Dr. David Alan Gilbert wrote:
>> One thing I still can't understand, why the unit test in host environment
>> shows
>> 'memcmp()' have better performance?
Have you tried running under a profiler, to see if there are hotspots or
at least get an idea of where the time is be
* Li, Liang Z (liang.z...@intel.com) wrote:
> > >> >
> > >> > I use your new code:
> > >> > -
> > >> >unsigned long *p = ...
> > >> >if (p[0] || p[1] || p[2] || p[3]
> > >> >|| memcmp(p+4, p, size - 4 * sizeof(unsigned long
> >> >
> >> > I use your new code:
> >> > -
> >> > unsigned long *p = ...
> >> > if (p[0] || p[1] || p[2] || p[3]
> >> > || memcmp(p+4, p, size - 4 * sizeof(unsigned long)) != 0)
> >> > return BUFFER_NOT_ZERO;
> >> > else
> >> >
"Li, Liang Z" wrote:
>> On 12/11/2015 10:40, Li, Liang Z wrote:
>> > I migrate a 8GB RAM Idle guest, I think most of it's pages are zero pages.
>> >
>> > I use your new code:
>> > -
>> >unsigned long *p = ...
>> >if (p[0] || p[1] || p[2] ||
> On 12/11/2015 10:40, Li, Liang Z wrote:
> > I migrate a 8GB RAM Idle guest, I think most of it's pages are zero pages.
> >
> > I use your new code:
> > -
> > unsigned long *p = ...
> > if (p[0] || p[1] || p[2] || p[3]
> > || memcmp(
On 12/11/2015 10:40, Li, Liang Z wrote:
> I migrate a 8GB RAM Idle guest, I think most of it's pages are zero pages.
>
> I use your new code:
> -
> unsigned long *p = ...
> if (p[0] || p[1] || p[2] || p[3]
> || memcmp(p+4, p,
> >>> I am very surprised about the live migration performance result
> >>> when I use your ' memeqzero4_paolo' instead of these SSE2 Intrinsics
> >>> to check the zero pages.
> >>
> >> What code were you using? Remember I suggested using only unsigned
> >> long checks, like
> >>
> >>unsigned
On 12/11/2015 09:53, Li, Liang Z wrote:
>> On 12/11/2015 03:49, Li, Liang Z wrote:
>>> I am very surprised about the live migration performance result when
>>> I use your ' memeqzero4_paolo' instead of these SSE2 Intrinsics to
>>> check the zero pages.
>>
>> What code were you using? Remember I
> On 12/11/2015 03:49, Li, Liang Z wrote:
> > I am very surprised about the live migration performance result when
> > I use your ' memeqzero4_paolo' instead of these SSE2 Intrinsics to
> > check the zero pages.
>
> What code were you using? Remember I suggested using only unsigned long
> checks
On 12/11/2015 03:49, Li, Liang Z wrote:
> I am very surprised about the live migration performance result when
> I use your ' memeqzero4_paolo' instead of these SSE2 Intrinsics to
> check the zero pages.
What code were you using? Remember I suggested using only unsigned long
checks, like
>
> On 10/11/2015 10:26, Li, Liang Z wrote:
> > I don't know Paolo's opinion about how to deal with the SSE2
> > Intrinsics, he is the author. From my personal view, now that we have
> > found a better way, why to use such low level SSE2/AVX2 Intrinsics.
>
> I totally agree. :)
>
> Paolo
Hi Pao
> On 10/11/2015 10:56, Li, Liang Z wrote:
> > > I agree that your patch can be dropped, but go ahead and submit your
> > > improvements!
> >
> > You mean I do this work?
> > If you are busy, I can do this.
>
> It's not that I'm busy, it's that it's your idea. It doesn't matter if I
> (and Peter
On 10/11/2015 10:56, Li, Liang Z wrote:
> > I agree that your patch can be dropped, but go ahead and submit your
> > improvements!
>
> You mean I do this work?
> If you are busy, I can do this.
It's not that I'm busy, it's that it's your idea. It doesn't matter if
I (and Peter Lieven too, act
> On 10/11/2015 10:41, Li, Liang Z wrote:
> >> On 10/11/2015 10:26, Li, Liang Z wrote:
> >>> I don't know Paolo's opinion about how to deal with the SSE2
> >>> Intrinsics, he is the author. From my personal view, now that we
> >>> have found a better way, why to use such low level SSE2/AVX2
> >>> I
On 10/11/2015 10:41, Li, Liang Z wrote:
>> On 10/11/2015 10:26, Li, Liang Z wrote:
>>> I don't know Paolo's opinion about how to deal with the SSE2
>>> Intrinsics, he is the author. From my personal view, now that we
>>> have found a better way, why to use such low level SSE2/AVX2
>>> Intrinsics
> On 10/11/2015 10:26, Li, Liang Z wrote:
> > I don't know Paolo's opinion about how to deal with the SSE2
> > Intrinsics, he is the author. From my personal view, now that we have
> > found a better way, why to use such low level SSE2/AVX2 Intrinsics.
>
> I totally agree. :)
>
> Paolo
Hi Paolo,
On 10/11/2015 10:26, Li, Liang Z wrote:
> I don't know Paolo's opinion about how to deal with the SSE2
> Intrinsics, he is the author. From my personal view, now that we have
> found a better way, why to use such low level SSE2/AVX2 Intrinsics.
I totally agree. :)
Paolo
> > Eric, thanks for you information. I didn't notice that discussion before.
> >
> >
> > I rewrite the buffer_find_nonzero_offset() with the 'bool memeqzero4_paolo
> length'
> > then write a test program to check a large amount of zero pages, and
> > use the 'time' to recode the time takes by diff
On 10/11/2015 10:13, Juan Quintela wrote:
>> > I rewrite the buffer_find_nonzero_offset() with the 'bool memeqzero4_paolo
>> > length'
>> > then write a test program to check a large amount of zero pages, and
>> > use the 'time' to
>> > recode the time takes by different optimization. Test resul
"Li, Liang Z" wrote:
>> Rather than trying to cater to multiple assembly instruction implementations
>> ourselves, have you tried taking the ideas in this earlier thread?
>> https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg05298.html
>>
>> Ideally, libc's memcmp() will already be using th
> Rather than trying to cater to multiple assembly instruction implementations
> ourselves, have you tried taking the ideas in this earlier thread?
> https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg05298.html
>
> Ideally, libc's memcmp() will already be using the most efficient assembly
>
On 11/09/2015 07:51 PM, Liang Li wrote:
> buffer_find_nonzero_offset() is a hot function during live migration.
> Now it use SSE2 intructions for optimization. For platform supports
> AVX2 instructions, use the AVX2 instructions for optimization can help
> to improve the performance about 30% compa
buffer_find_nonzero_offset() is a hot function during live migration.
Now it use SSE2 intructions for optimization. For platform supports
AVX2 instructions, use the AVX2 instructions for optimization can help
to improve the performance about 30% comparing to SSE2.
Zero page check can be faster with
27 matches
Mail list logo