On 07/04/2016 14:54, Michael S. Tsirkin wrote:
>
> char check_zero(char *p, int len)
> {
> char res = 0;
> int i;
>
> for (i = 0; i < len; i++) {
> res = res | p[i];
> }
>
> return res;
> }
>
>
> If you compile this function with --tree-vectorize and --unroll-loop
* Michael S. Tsirkin (m...@redhat.com) wrote:
> On Thu, Apr 07, 2016 at 12:09:52PM +0100, Dr. David Alan Gilbert wrote:
> > * Eric Blake (ebl...@redhat.com) wrote:
> > > On 11/12/2015 12:56 PM, Dr. David Alan Gilbert wrote:
> > >
> > > >> One thing I still can't understand, why the unit test in ho
On Thu, Apr 07, 2016 at 12:09:52PM +0100, Dr. David Alan Gilbert wrote:
> * Eric Blake (ebl...@redhat.com) wrote:
> > On 11/12/2015 12:56 PM, Dr. David Alan Gilbert wrote:
> >
> > >> One thing I still can't understand, why the unit test in host
> > >> environment shows
> > >> 'memcmp()' have bett
* Eric Blake (ebl...@redhat.com) wrote:
> On 11/12/2015 12:56 PM, Dr. David Alan Gilbert wrote:
>
> >> One thing I still can't understand, why the unit test in host environment
> >> shows
> >> 'memcmp()' have better performance?
>
> Have you tried running under a profiler, to see if there are ho
On 11/12/2015 12:56 PM, Dr. David Alan Gilbert wrote:
>> One thing I still can't understand, why the unit test in host environment
>> shows
>> 'memcmp()' have better performance?
Have you tried running under a profiler, to see if there are hotspots or
at least get an idea of where the time is be
* Li, Liang Z (liang.z...@intel.com) wrote:
> > >> >
> > >> > I use your new code:
> > >> > -
> > >> >unsigned long *p = ...
> > >> >if (p[0] || p[1] || p[2] || p[3]
> > >> >|| memcmp(p+4, p, size - 4 * sizeof(unsigned long
> >> >
> >> > I use your new code:
> >> > -
> >> > unsigned long *p = ...
> >> > if (p[0] || p[1] || p[2] || p[3]
> >> > || memcmp(p+4, p, size - 4 * sizeof(unsigned long)) != 0)
> >> > return BUFFER_NOT_ZERO;
> >> > else
> >> >
"Li, Liang Z" wrote:
>> On 12/11/2015 10:40, Li, Liang Z wrote:
>> > I migrate a 8GB RAM Idle guest, I think most of it's pages are zero pages.
>> >
>> > I use your new code:
>> > -
>> >unsigned long *p = ...
>> >if (p[0] || p[1] || p[2] ||
> On 12/11/2015 10:40, Li, Liang Z wrote:
> > I migrate a 8GB RAM Idle guest, I think most of it's pages are zero pages.
> >
> > I use your new code:
> > -
> > unsigned long *p = ...
> > if (p[0] || p[1] || p[2] || p[3]
> > || memcmp(
On 12/11/2015 10:40, Li, Liang Z wrote:
> I migrate a 8GB RAM Idle guest, I think most of it's pages are zero pages.
>
> I use your new code:
> -
> unsigned long *p = ...
> if (p[0] || p[1] || p[2] || p[3]
> || memcmp(p+4, p,
> >>> I am very surprised about the live migration performance result
> >>> when I use your ' memeqzero4_paolo' instead of these SSE2 Intrinsics
> >>> to check the zero pages.
> >>
> >> What code were you using? Remember I suggested using only unsigned
> >> long checks, like
> >>
> >>unsigned
On 12/11/2015 09:53, Li, Liang Z wrote:
>> On 12/11/2015 03:49, Li, Liang Z wrote:
>>> I am very surprised about the live migration performance result when
>>> I use your ' memeqzero4_paolo' instead of these SSE2 Intrinsics to
>>> check the zero pages.
>>
>> What code were you using? Remember I
> On 12/11/2015 03:49, Li, Liang Z wrote:
> > I am very surprised about the live migration performance result when
> > I use your ' memeqzero4_paolo' instead of these SSE2 Intrinsics to
> > check the zero pages.
>
> What code were you using? Remember I suggested using only unsigned long
> checks
On 12/11/2015 03:49, Li, Liang Z wrote:
> I am very surprised about the live migration performance result when
> I use your ' memeqzero4_paolo' instead of these SSE2 Intrinsics to
> check the zero pages.
What code were you using? Remember I suggested using only unsigned long
checks, like
>
> On 10/11/2015 10:26, Li, Liang Z wrote:
> > I don't know Paolo's opinion about how to deal with the SSE2
> > Intrinsics, he is the author. From my personal view, now that we have
> > found a better way, why to use such low level SSE2/AVX2 Intrinsics.
>
> I totally agree. :)
>
> Paolo
Hi Pao
> On 10/11/2015 10:56, Li, Liang Z wrote:
> > > I agree that your patch can be dropped, but go ahead and submit your
> > > improvements!
> >
> > You mean I do this work?
> > If you are busy, I can do this.
>
> It's not that I'm busy, it's that it's your idea. It doesn't matter if I
> (and Peter
On 10/11/2015 10:56, Li, Liang Z wrote:
> > I agree that your patch can be dropped, but go ahead and submit your
> > improvements!
>
> You mean I do this work?
> If you are busy, I can do this.
It's not that I'm busy, it's that it's your idea. It doesn't matter if
I (and Peter Lieven too, act
> On 10/11/2015 10:41, Li, Liang Z wrote:
> >> On 10/11/2015 10:26, Li, Liang Z wrote:
> >>> I don't know Paolo's opinion about how to deal with the SSE2
> >>> Intrinsics, he is the author. From my personal view, now that we
> >>> have found a better way, why to use such low level SSE2/AVX2
> >>> I
On 10/11/2015 10:41, Li, Liang Z wrote:
>> On 10/11/2015 10:26, Li, Liang Z wrote:
>>> I don't know Paolo's opinion about how to deal with the SSE2
>>> Intrinsics, he is the author. From my personal view, now that we
>>> have found a better way, why to use such low level SSE2/AVX2
>>> Intrinsics
> On 10/11/2015 10:26, Li, Liang Z wrote:
> > I don't know Paolo's opinion about how to deal with the SSE2
> > Intrinsics, he is the author. From my personal view, now that we have
> > found a better way, why to use such low level SSE2/AVX2 Intrinsics.
>
> I totally agree. :)
>
> Paolo
Hi Paolo,
On 10/11/2015 10:26, Li, Liang Z wrote:
> I don't know Paolo's opinion about how to deal with the SSE2
> Intrinsics, he is the author. From my personal view, now that we have
> found a better way, why to use such low level SSE2/AVX2 Intrinsics.
I totally agree. :)
Paolo
> > Eric, thanks for you information. I didn't notice that discussion before.
> >
> >
> > I rewrite the buffer_find_nonzero_offset() with the 'bool memeqzero4_paolo
> length'
> > then write a test program to check a large amount of zero pages, and
> > use the 'time' to recode the time takes by diff
On 10/11/2015 10:13, Juan Quintela wrote:
>> > I rewrite the buffer_find_nonzero_offset() with the 'bool memeqzero4_paolo
>> > length'
>> > then write a test program to check a large amount of zero pages, and
>> > use the 'time' to
>> > recode the time takes by different optimization. Test resul
"Li, Liang Z" wrote:
>> Rather than trying to cater to multiple assembly instruction implementations
>> ourselves, have you tried taking the ideas in this earlier thread?
>> https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg05298.html
>>
>> Ideally, libc's memcmp() will already be using th
> Rather than trying to cater to multiple assembly instruction implementations
> ourselves, have you tried taking the ideas in this earlier thread?
> https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg05298.html
>
> Ideally, libc's memcmp() will already be using the most efficient assembly
>
On 11/09/2015 07:51 PM, Liang Li wrote:
> buffer_find_nonzero_offset() is a hot function during live migration.
> Now it use SSE2 intructions for optimization. For platform supports
> AVX2 instructions, use the AVX2 instructions for optimization can help
> to improve the performance about 30% compa
26 matches
Mail list logo