Hello all,
recently, I've had a discussion with rdivacky@ about the status of these
tools. It's about bc, dc, grep, sort and iconv. He has persuaded me to
write a summary here in case someone else is interested in contributing
to these tools. So here I come with a little summary.
BSD bc/dc will come just after 8.0-RELEASE. They are quite mature and
delphij@ offered to help me getting this into the three by reviewing and
approving my changes (I only have doc/ports bit).
BSD grep is also quite mature, I've fixed the last critical bug
recently. My only concern is the performance. GNU is fast but has ~8
KSLOC. BSD grep is slightly slower but has only ~1.5 KSLOC. It's a huge
difference in complexity and GNU grep is very hard to read but they use
a lot of custom optimizations to get this performance. I think we should
go another way and have a well-optimized and mature regex library. The
current one is very old and doesn't have wchar support, it's slow like
hell and doesn't support custom GNU bullshit, which is unfortunately
necessary to maintain compatiblity. (e.g. "(a|)" is considered invalid
in strict POSIX regex but GNU accepts it!) Because of this, BSD grep is
linked to the GNU regex library at the moment but because of the custom
magic in grep it's still slower a bit. If we can live with this slight
performance hit, we can commit it, I think because it's quite
feature-complete. You know, I'm a beginner but I think that the code of
BSD grep is so tiny and simple that there are almost absolutely no ways
to optimize it more by simplifying the code, so I think further
optimization should be done in the regex library. As for the regex
library, NetBSD's SoC project is worth a look. I'm interested in this
but I have too much things in the queue to start another one...
As for sort, it isn't so mature yet. I've just made a TODO list of the
known missing features or bugs:
- sometimes it segfaults when reading huge files
- the -k option isn't implemented yet
- the -n option doesn't work correctly
- preproc() optimization (I don't what it refers to actually but I had
it on my previous TODO list, will have to check)
- polishing man page
- adding some more test cases to the regression test
- checking performance (in this case, it really matters because sorting
is an algorithmic piece of cake and I'm not an algorithmic guru... And
this version of sort was written by me from scratch. The OpenBSD-one
isn't wchar-clean and can't be fixed by design. This sort is much more
tiny but it seems the algorithm isn't optimal.)
As for iconv, I'll keep working on it in my BSc thesis. The forward (foo
-> utf32) conversions are almost completely GNU-compatible, the reverse
ones not so much. GNU has an optional transliteration, while BSD iconv
uses it at default so I compared the output to GNU's transliterated
output and it has some more advanced mappings to do this. Apart from
this, almost all encodings are supported, that we have in locale(1)
charmaps but the Big5 module segfaults. I hope I'll be able to solve
these issues and check performance as part of my BSc thesis.
Regards,
--
Gabor Kovesdan
FreeBSD Volunteer
EMAIL: ga...@freebsd.org .:|:. ga...@kovesdan.org
WEB: http://people.FreeBSD.org/~gabor .:|:. http://kovesdan.org
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"