> > Eric, thanks for you information. I didn't notice that discussion before. > > > > > > I rewrite the buffer_find_nonzero_offset() with the 'bool memeqzero4_paolo > length' > > then write a test program to check a large amount of zero pages, and > > use the 'time' to recode the time takes by different optimization. > > Test result is like this: > > > > SSE2: > > ------------------------------------------------------ > > | test 1 | test 2 > > ---------------------------------------------------- > > Time(S):| 13.696 | 13.533 > > ------------------------------------------------ > > > > > > AVX2: > > ------------------------------------------- > > | test 1 | test 2 > > ------------------------------------------- > > Time (S):| 10.583 | 10.306 > > ------------------------------------------- > > > > memeqzero4_paolo: > > --------------------------------------- > > | test 1 | test 2 > > --------------------------------------- > > Time (S):| 9.718 | 9.817 > > ---------------------------------------- > > > > > > Paolo's implementation has the best performance. It seems that we can > > remove the SSE2 related Intrinsics. > > How should I understand that comment? That you are about to send an email > to remove the sse2 support and that I can forget about this patch? > > Thanks, Juan. >
I don't know Paolo's opinion about how to deal with the SSE2 Intrinsics, he is the author. From my personal view, now that we have found a better way, why to use such low level SSE2/AVX2 Intrinsics. I don't know if someone else is working on this. if not, and the related maintainer agrees to remove them, I am happy to send out a new patch. Let's forget my patch at the moment. Liang