On Sun, Dec 22, 2013 at 2:52 PM, Branko Čibej <br...@wandisco.com> wrote:
> On 22.12.2013 14:16, Stefan Fuhrmann wrote: > > > > On Mon, Dec 9, 2013 at 11:01 AM, Branko Čibej <br...@wandisco.com > > <mailto:br...@wandisco.com>> wrote: > > > > To clarify, the most often used pattern where the initial membuf > > size os > > 0 is when normalizing UTF-8 strings, where we let the utf8proc code > > determine how large the allocation has to be, based on its analysis > of > > the string; the only alternative is to allocate a far larger > > buffer than > > you can ever need, and incidentally making assumptions about how the > > normalization is implemented. The extra allocation you introduced > here > > does not speed anything up; rather the opposite. > > > > > > It is not an extra allocation. For 0 bytes we simply get a valid pointer > > but the next allocation will return the same pointer. So, there is no > > waste. > [Last post to this topic as this is *really* a minor change.] How on earth do you know that? Do you have a crystal ball that tells you > that there will be no intervening allocations from the same pool? Even if the active block in the pool has been completely allocated (zero free memory), allocating 0 extra bytes is for free in the current implementation. > > Or > another one that tells you what will happen to APR's pool implementation > in some future version? > I obviously can't tell - except that a major point of the APR pool design is to be space efficient at the cost of being unable to de-alloc selectively. If it ever were to add some per-allocation overhead, its size will still be small relative to the actual data buffer size. > (On the other note about apr_palloc taking less time than a mispredicted > conditional jump ... you're assuming that the apr_palloc code is in the > L1 instruction cache, Which it will be in most cases. If it is not, the initial allocation will prime L1I for the following re-alloc. In general, SVN has quite high L1I hit rates, i.e. high temporal code locality. > and you're assuming that everyone uses Intel Core > processors apr_palloc latency is dominated by L1D latency. The latter is usually subject to the same design forces than pipeline depth. Even for embedded PPC, 2xL1D latency <= branch misprediction latency. > — and that everyone uses the same compiler you do. None of > the above is likely to be true, in general.) > Well, with a good compiler, constant propagation will make the old special-cased membuf_create() than the new one calling apr_palloc (even if the latter gets a constant prop code variant as well). The resize code is the place where we can skip a NULL check. -- Stefan^2.