Hi Paul, > > Here is the generalization of 'strxfrm' to strings with embedded NUL bytes. > > Sorry, I didn't really notice this email until just now. As it happens, > coreutils has had an memxfrm implementation since 2006, which > it never exported to gnulib.
And I'm sorry that I overlooked yours in coreutils when I contributed memxfrm to gnulib in 2009. > The coreutils memxfrm is closer to how > strxfrm behaves, in that it does not allocate memory: it relies on the > caller to do memory allocation. The signatures differ as follows: > > // coreutils returns number of bytes that were translated, > // (or would be translated if there were enough room). > // It also sets errno on error. > size_t memxfrm (char *restrict dst, size_t dstsize, > char *restrict src, size_t srcsize); > > // gnulib returns pointer to destination, which is possibly-different if > // the destination wasn't large enough. It updates *DSTSIZEPTR to > // the newly allocated size, if it allocated storage. It returns > // NULL (setting errno) on error. > char *memxfrm (char *src, size_t srcsize, char *dst, size_t *dstsizeptr); Indeed the algorithm is virtually identical, and the only difference is the calling convention. > So I propose that the gnulib memxfrm be renamed to something else, to > reflect the fact that it allocates memory. I suggest the name > "amemxfrm", as a leading "a" is the usual convention for variants that > allocate memory (e.g., "asprintf"). > > I guess the coreutils memxfrm could also be migrated into gnulib, > afterwards. This approach would make sense if the two functions had different functionality. But they effectively do the same, only with different calling conventions. Therefore I believe gnulib should only have one of these functions, either the best among the two, or a combination that combines the best properties of the two. > For coreutils, the coreutils interface is more memory-efficient, > because malloc is invoked at most once when comparing two lines. If > the small buffer on the stack isn't large enough to hold the > translated output for both strings, the two calls to memxfrm will tell > sort.c exactly how big the buffer should be, and it can invoke malloc > just once and then invoke memxfrm again (twice) to successfully do the > translation. > > The gnulib interface is more convenient for applications that don't > care about this sort of memory optimization, and I expect that for > some (large) cases it is faster because it sometimes avoids translating > the same chunk twice. So it's useful as well. Since you want to let the two functions compete by performance, find attached a program that exercises a small string 3 times with both, then a large string 3 times with both. 1000 calls in each round. Compiled like this: $ gcc -O2 -Wall coreutils-memxfrm.c gnulib-memxfrm.c compare.c -I. -Drestrict= I observe timings like this: Time for gnulib_memxfrm: 0,036002 Time for coreutils_memxfrm: 0,036002 Time for gnulib_memxfrm: 0,036002 Time for coreutils_memxfrm: 0,036003 Time for gnulib_memxfrm: 0,032002 Time for coreutils_memxfrm: 0,036002 Time for gnulib_memxfrm: 2,65217 Time for coreutils_memxfrm: 3,45622 Time for gnulib_memxfrm: 1,97612 Time for coreutils_memxfrm: 3,42021 Time for gnulib_memxfrm: 1,98012 Time for coreutils_memxfrm: 3,42021 This means, when the stack buffer is sufficient - no mallocs needed on either side - the timings are the same: 36 μsec per call on each side. But when the stack buffer is not sufficient, then the use of coreutils memxfrm is 30% to 70% slower than the use of gnulib memxfrm, with a difference of 700 μsec at least. You argue that the benefit of coreutils' memxfrm is that it requires one less malloc. True, but a malloc of 40 KB is much much cheaper than a call to memxfrm on 40 KB (think of all the locale dependent processing that it must do). To get figures about this, I added an extra strdup + free to the first loop in compare(). The timings are indistinguishable: $ ./a.out Time for gnulib_memxfrm: 0,032002 Time for coreutils_memxfrm: 0,036002 Time for gnulib_memxfrm: 0,036002 Time for coreutils_memxfrm: 0,032002 Time for gnulib_memxfrm: 0,036002 Time for coreutils_memxfrm: 0,036003 Time for gnulib_memxfrm: 2,18814 Time for coreutils_memxfrm: 3,41621 Time for gnulib_memxfrm: 1,98012 Time for coreutils_memxfrm: 3,42021 Time for gnulib_memxfrm: 1,98012 Time for coreutils_memxfrm: 3,42021 In summary, I think that gnulib memxfrm is more performant than coreutils memxfrm. It is also easier to use: 3 lines of code for gnulib memxfrm vs. 7 lines of code for coreutils memxfrm. I'd therefore suggest to keep the gnulib one, and that coreutils starts to use the gnulib one (via a modified xmemxfrm wrapper). Bruno
compare.tar.gz
Description: application/tgz