On 04/28/2013 07:14 PM, Cojocaru Alexandru wrote: > On Sun, 28 Apr 2013 15:04:31 +0100 > Pádraig Brady <p...@draigbrady.com> wrote: > >> On 04/28/2013 12:44 PM, Cojocaru Alexandru wrote: >>> Another problem may be the merging and the call to `xrealloc' that >>> we do at the end of `set_fields'. >> >> That's only called at startup so I wouldn't worry too much. >> What was your specific concern here? > The file you used with the benchmarks was quite small, so I was > worring that both the loop used for the merging and the call to > `xrealloc' was affecting too much the benchmarks.
Ah right. I've enough to see relative differences quite easily, but will increase further benchmark sizes if needed. >>>> but with the advantage of disassociating mem usage from range width. >>>> I'll split the patch into two for the mem change and the cpu change, >>>> and might follow up with a subsequent patch to reinstate the bit array >>>> for the common case of small -[bcf] and no --output-delim. >>> My primary goal was to simplify the code. Even if the attached patch >>> dosen't work, I think that detecting small -[bcf] ranges would just make >>> the code more cumbersome. >> >> Yes it's a trade off. For often used tools such as coreutils though, >> it's sometimes worth a little bit extra complexity for performance reasons. >> Here we might be able to guide the compiler around the branches like: >> >> print_kth() >> { >> if likely(bitarray_used) >> ... >> else >> ... >> } > Ok. > >> Anyway I'll wait for your patch before carefully considering >> to reinstate the bit array. > Please, give me some more time. I think that it would be possible to > use the sentinel to speed up things a bit. Sure. > [...] >> Sure. Eagerly waiting the patch :) > Attached. That changes the else to an || I thought gcc would optimize that to the same code. While the assembly generated is a little different, the performance of both is essentially the same. thanks, Pádraig.