On Mon, Jan 13, 2014 at 11:19:49AM -0800, Silvan Jegen wrote: > I have rewritten "tr" to use mmap and the wchar.h functions. It seems > to be quite slow but as far as I can tell it works reasonably well (at > least when using a UTF-8 locale). Comments/review and testing welcome > (I am relatively new to C so beware)!
Thanks for your patch! > If you think adding this version of "tr" to sbase makes sense I can > prepare a man page that points out all the shortcomings (e. g. no > character classes) of this implementation. Yes, we will need the manpages for merging this in. > +void > +handle_escapes(char *s) We try to avoid underscores when naming things. > +void > +parse_mapping(char *set1, char *set2, wchar_t *mappings) > +{ > + char *s; > + wchar_t runeleft; > + wchar_t runeright; > + int leftbytes; > + int rightbytes; > + size_t n = 0; > + size_t lset2; > + > + if(set2) { > + lset2 = strnlen(set2, 255 * sizeof(wchar_t)); > + } else { > + set2 = (char*) &set1[0]; Here you use `(char*)' but further down you use `(const char *)'. > + lset2 = 0; > + } > + > + s = set1; > + while(*s) { > + if(*s == '\\') { > + handle_escapes(++s); > + } > + > + leftbytes = mbtowc(&runeleft, (const char *) s, 4); > + if(*(set2 + n)) set2[n]. Maybe it is worth checking specifically against != '\0'. It is a good idea to use `if (p) { ... }' or `if (!p) { ... }' only when p is a pointer. > +int > +main(int argc, char *argv[]) > +{ > + wchar_t *mappings; > + char *buf = NULL; > + size_t size = 0; > + void (*mapfunc) (const wchar_t*, char*); > + > + setlocale(LC_ALL, ""); > + > + mappings = (wchar_t *) mmap(NULL, 0x110000 * sizeof(wchar_t), > PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); Break this line into two lines. > + > + ARGBEGIN { > + default: > + usage(); > + } ARGEND; > + > + if(!argc) argc != 0. > + if (ferror(stdin)) { > + eprintf("<stdin>: read error:"); > + return EXIT_FAILURE; No need for return EXIT_FAILURE.