Op Sat, 11 Jun 2005, schreef Christian Iversen:
> > With all the discussion about speed of lowercase/uppercase recently, I thought > I'd chip in. I don't have the time to actually implement and test this, but I > hope somebody else will. > > This semi-pseudocode should be a bit faster than what is currently done: > > function lowercase(const value: string): pchar; > var > X,L,C: Integer; > B: Char; > begin > C := length-of(value); > set-length(result, C); > for X := 0 to C/4-1 do > begin > L = PIntegers(value)[X]; > // Char 1/4 > B := char(L and $FF); > if (B in ['A'..'Z']) then > result[x shl 2] := B or $20 else > result[x shl 2] := B; > // Char 2/4 > B := char((L shr 8) and $FF); > if (B in ['A'..'Z']) then > result[(x shl 2)+1] := B or $20 else > result[(x shl 2)+1] := B; > ..etc.. > end; > ..fix remainging 0-3 chars.. > end; > > Basically, a simple loop unroll, with the twist that the fetch is collapsed to > every 4th iteration. This, combined with the dynamic scheduling of modern > CPUs, should increase the speed considerably. Since each iteration is very > collapsable in terms of data-dependencies, this could work out to be quite > fast. > > I could be wrong, of course :-) > > Does somebody want to try this? I expect it would be hard to get any speed advantage using this approach, since the overhead of expressions like char((L shr 8) and $FF) is quite severe. The real penalty is the expression in ['A'..'Z']. This expression needs two branches which both cannot be predicted by the processer. They depend on the input, the processors branch caches don't help here. You get a pipeline stall in 50% of the cases. This is the reason why a branch-less solution like I posted is so much faster. Of course, working with dwords can be a huge advantage, but only if you can process data in parallel or if you are memory bound. At least, this is what I expect and theory and practise don't allways agree. Dani�l _______________________________________________ fpc-devel maillist - [email protected] http://lists.freepascal.org/mailman/listinfo/fpc-devel
