On Wed, Jan 21, 2015 at 03:44:23AM +0000, Wang, Zhihong wrote: > Neil, Bruce, > > Some data first. > > Sandy Bridge without AVX2: > 1. original w/ 10 constant memcpy: 2'25" > 2. patch w/ 12 constant memcpy: 2'41" > 3. patch w/ 63 constant memcpy: 9'41" > > Haswell with AVX2: > 1. original w/ 10 constant memcpy: 1'57" > 2. patch w/ 12 constant memcpy: 1'56" > 3. patch w/ 63 constant memcpy: 3'16" > > Also, to address Bruce's question, we have to reduce test case to cut down > compile time. Because we use: > 1. intrinsics instead of assembly for better flexibility and can utilize more > compiler optimization > 2. complex function body for better performance > 3. inlining > This increases compile time. > But I think it'd be okay to do that as long as we can select a fair set of > test points. > > It'd be great if you could give some suggestion, say, 12 points. > > Zhihong (John) > Hi Zhihong,
Just for comparison I've done a clean dpdk compile on my SNB system this morning. Using parallel make (which is pretty normal I suspect), I get the following numbers: real 0m52.549s user 0m36.034s sys 0m10.014s So total compile time is 52 seconds. Running a make uninstall and then make install again with "-j 1", provides the following numbers: real 0m32.751s user 0m16.041s sys 0m7.946s Obviously, caching effects are being completely ignored by the this unscientific study (rerunning the first test again gives a 13-second time), but the upshot is that the compile time for DPDK right now is well under a minute in the normal case. Adding in a new file that, in the best case, takes two minutes to compile is going to increase our compile time many times over. Regards, /Bruce