On Tue, 3 Feb 2015 15:18:59 -0800 Evan Gates <evan.ga...@gmail.com> wrote:
Hey Evan, > I finally have a new sed implementation. It's littered with FIXMEs and > there are some points that need to be discussed, but for the most part > it works like it should. It definitely has some work until it sucks > less. that sounds very cool! sed(1) is a big point on the TODO-list and I'm glad you sat down and worked on it! Let me get to your points: > 1) When should we choose POSIX behavior and when should we choose GNU > behavior? There are a number of differences, many times the GNU > behavior seems to make more sense. (no final newline, > semicolon/whitespace terminated labels and filenames, etc.) Sometimes, going the third way is best. Make it as flexible as possible. If a delimiter is limited to one char, implement it in a way that arbitrary strings are allowed for instance (this could be improved here). GNU behaviour can be erratic in many cases and reflecting on it, the POSIX-people mostly have good reasons to make software behave as it does. In some cases, POSIX is not suckless though. Can you be more specific and rule out the cases where GNU differs from POSIX? I am absolutely sure this has to be decided on a case-by-case-basis. > 2) Should we strictly enforce valid UTF-8? In the script? In the > input? Currently it's enforced in a few places of the script because > that made it easier for me, but it's not enforced in the input file. Use chartorunearray instead of handrolling it. Having a Rune-array can make things simpler. Also, readrune already deals with invalid UTF-8 in such a way that partial reads are returned with RuneError. > 3) Pending a resolution on (2), should we allow nul bytes in the > input? Currently I'm using libc's string functions so nul bytes cause > bad things to happen. If we decide to support nul bytes it'll be a > rather large change. Given I'm not an expert in this area. Can anybody give a reason to support nul bytes? If there are benefits, it should be supported. GNU bullshit should never be a reference in _lack_ of features, as seen in many other cases. > 4) Which extensions over POSIX should be implemented? (\t for tab in > regex and s replacement text? etc.) Look at unescape(). POSIX defines escaped characters rather inconsistently across the base, which is not good. Use util-functions where possible and try to keep the code as concise as possible. I have not tested it yet, but if you say it works as of now, nothing speaks against pulling it into sbase. Keep up the great work! Cheers FRIGN -- FRIGN <d...@frign.de>