On 21 September 2010 01:07, Bruno Haible <br...@clisp.org> wrote: > Reuben, > >> Heh. My point precisely: 3 functions and 50 lines versus 1 flag and 5 >> lines (RE_PLAIN) to solve the same problem > > I agree that if we had the opportunity to invent regex APIs from scratch > now, all 4 syntaxes (literals, wildcards, basic regular expression, extended > regular expression) would be worth supporting equally. > > But the fact is that POSIX standardizes the regex API, and therefore there > is a border between "in glibc" and "outside glibc". Functionality in glibc > is available at no cost; functionality outside glibc requires additional > link options and increased startup time or a 50KB bigger executable.
Equally, libc APIs such as the crappy standard string handling functions waste my time on a daily basis, whereas APIs from other libraries save it. C does make linking harder than newer languages, but link options and increased startup time and/or a bigger executable have never put me off (the penalties are tiny on modern machines), although I do spend time considering licensing and portability of libraries I use. Indeed, libc's general weakness in so many areas means I consider third-party libraries much more often than in other languages. (glibc is at least better than many libc's in this respect, by covering a lot more ground, though it is a pity that it carries so much non-standard crud just for backwards binary compatibility.) In this particular case, the point is somewhat moot: GNU regex is still not synced with glibc, many applications continue to use internal copies unconditionally (though, thanks to hard work by GNU developers, most GNU programs now use gnulib), and tons of other applications use other regex libraries altogether. So, I am not really making things much worse by proposing extensions to the POSIX API, and indeed I am leaving the door open to make things better: the chances of any other C regex API ever being standardised are practically zero, so applications using non-POSIX APIs are always going to suffer the penalty of an external library; whereas API and ABI-compatible extensions at least have a chance of one day being added to the standard. Not to mention the big picture: the vast majority of C apps these days use either POSIX or PCRE regex APIs. On most GNU systems, there will today still be plenty of apps in which POSIX regexes are compiled in statically via GNU regex (old glibc's and/or old apps). My suggestions aim at a situation in which, in a few years, the situation is much the same but the application code is cleaner (and there are not lots of statically-linked "quote_regexp" functions). And then, a few years later, the changes get into glibc and the statically linked copies disappear. Not to mention that plenty of mature programs won't want any of my extensions, and therefore will not need statically-linked POSIX regex. The fundamental point is this: the two scenarios are pretty much equal with respect to time and space overhead, but evolving standard APIs over time wins hands down when it comes to improving things for application programmers, and reducing application code. Developer time is a much more important resource than machine cycles or bytes, and that is not to disrespect users, because time that developers save not coding they can spend on useful optimisation. (I am also rather alarmed at the way that gnulib seems to be growing without bound; it should be making bits of itself redundant just as fast as it can, so that it's the glue between the near past and the near future, not just another big ball of wax that keeps accreting.) -- http://rrt.sc3d.org