On Tue, Nov 26, 2013 at 12:01:01PM -0800, Silvan Jegen wrote: > Hi > > This is a braindead and incomplete implementation of tr that only > works for one-byte encodings. Do you think it makes sense to use this > implementation as some kind of stopgap-measure until we have a more > robust version of tr?
This particular version of the patch does not introduce a manpage which would be necessary to document the limited behaviour of the current program. I am starting to wonder, do you guys think it would make sense to have a staging branch that we can use for incomplete tools? Currently some of the tools implement a subset of the total behaviour but I'd like to believe that they implement that subset correctly. As long as we document that they can go in master with possible eprintf("not implemented"); calls for the options that we care about. Programs that are obviously buggy can go in the staging branch. > If you you would rather not take this version, what approach would > you take for the character set mapping when using UTF-8? A hashmap-, > or B-tree-based solution or something else entirely? I am not knowledgeable enough about UTF-8 so I can't answer this. A B-tree is I think an overkill for sbase. We do not have a nice implementation of a hash table in sbase as we did not need it but if we go down that path it makes sense to put this in util/ so other programs can benefit. Currently we don't have an implementation of a singly linked list that we can reuse, but that is trivial enough and we've re-implemented it wherever needed (with the minimum set of operations needed for each tool). I can send an implementation of a hash table that I've used for my own programs, MIT/X licensed and it is simple enough. Regarding UTF-8, some other programs in sbase also lack proper handling of UTF-8. Do you think we could embed libutf8 from suckless.org and use it? > +usage(void) > +{ > + eprintf("usage: tr set1 [set2]\n"); > +} Use %s and argv0. > +void > +handle_escapes(char *s) > +{ > + switch(*s) { > + case 'n': > + *s = '\x0A'; > + break; > + case 't': > + *s = '\x09'; > + break; > + case '\\': > + *s = '\x5c'; > + break; > + } > +} I have not yet applied this patch but I suspect you have mixed whitespace + tabs here. Use tabs only. > + if (ferror(stdin)) { > + eprintf("<stdin>: read error:"); > + return EXIT_FAILURE; > + } Indentation issues. I'll have a look at the rest of the code once I have some time today. Thanks, sin