On May 9, 6:40 pm, "Nathan Harmston" <[EMAIL PROTECTED]> wrote: > Hi, > > I ve been playing with the CSV module for parsing a few files. A row > in a file looks like this: > > some_id\t|\tsome_data\t|t\some_more_data\t|\tlast_data\t\n > > so the lineterminator is \t\n and the delimiter is \t|\t, however when > I subclass Dialect and try to set delimiter is "\t|\t" it says > delimiter can only be a character. > > I know its an easy fix to just do .strip("\t") on the output I get, > but I was wondering > a) if theres a better way of doing this when the file is actually > being parsed by the csv module
No; usually one would want at least to do .strip() on each field anyway to remove *all* leading and trailing whitespace. Replacing multiple whitespace characters with one space is often a good idea. One may want to get fancier and ensure that NO-BREAK SPACE aka (\xA0 in many encodings) is treated as whitespace. So your gloriously redundant tabs vanish, for free. > b) Why are delimiters only allowed to be one character in length. Speed. The reader is a hand-crafted finite-state machine designed to operate on a byte at a time. Allowing for variable-length delimiters would increase the complexity and lower the speed -- for what gain? How often does one see 2-byte or 3-byte delimiters? -- http://mail.python.org/mailman/listinfo/python-list