csv module
hello, I'm using cvs standard module under Python 2.3 / 2.4 to write a file delimited with tabs. I use the "excel-tab" dialect to do that. To read my CSV file, I choose to 'sniff' with a sample data in order to get the dialect. The problem I meet is that I get a wrong dialect: the sniffer return an empty string delimiter. It is probably a bug in _guess_delimiter() method. The message I obtain is: TypeError: bad argument type for built-in operation Do you know a way to sniff tab-delimited data ? Is it a known bug ? Bye. -- http://mail.python.org/mailman/listinfo/python-list
csv.Sniffer: wrong detection of the end of line delimiter
hello, I'm using cvs standard module under Python 2.3 / 2.4 to read a CSV file. The file is opened in binary mode, so I keep the end of line terminator. It appears that the csv.Sniffer force the line terminator to be '\r\n'. It's fine under Windows but wrong under Linux or Macintosh. More about this line terminator: Potential bug in the _guess_delimiter() method. The first line of code does a wrong splitting: data = filter(None, data.split('\n')) It doesn't take care of the real line terminator! Here is a patch (not a perfect one): # --- begin of patch --- class PatchedSniffer(csv.Sniffer): def __init__(self): csv.Sniffer.__init__(self) def sniff(self, p_data, p_delimiters = None): t_dialect = csv.Sniffer.sniff(self, p_data, p_delimiters) t_dialect.lineterminator = self._guessLineTerminator(p_data) return t_dialect def _guessLineTerminator(self, p_data): for t_lineTerminator in ['\r\n', '\n', '\r']: if t_lineTerminator in p_data: return t_lineTerminator else: return '\r\n' # Windows default (Excel) def _formatDataForGuess(self, p_data): t_lineTerminator = self._guessLineTerminator(p_data) return '\n'.join(p_data.split(t_lineTerminator)) def _guess_delimiter(self, p_data, p_delimiters): t_data = self._formatDataForGuess(p_data) (t_delimiter, t_skipInitialSpace) = \ csv.Sniffer._guess_delimiter(self, t_data, p_delimiters) if t_delimiter == '' and '\t' in p_data: t_delimiter = '\t' return (t_delimiter, t_skipInitialSpace) # --- end of patch --- Bye. --- Laurent. -- http://mail.python.org/mailman/listinfo/python-list
Re: csv module
Sorry, Here is my example: Python 2.3.1 (#1, Sep 29 2003, 15:42:58) [GCC 2.96 2731 (Red Hat Linux 7.1 2.96-98)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import csv >>> t_sniffer = csv.Sniffer() >>> t_data = "aaa\tbbb\r\n\r\nAAA\tBBB\r\n" >>> t_dialect = t_sniffer.sniff(t_data) >>> t_dialect.delimiter '' In fact, I found the pb (thanks to you): I add a newline '\r\n' to separate the header from the records... -- http://mail.python.org/mailman/listinfo/python-list
Re: csv module
In fact, there is another bug: In my CVS file, all the records ends with a trailing tab '\t' except the header because the last field is always empty. For example, I get : >>> import csv >>> t_sniffer = csv.Sniffer() >>> t_data = "aaa\tbbb\r\nAAA\t\r\nBBB\t\r\n" >>> t_dialect = t_sniffer.sniff(t_data) >>> t_dialect.delimiter '' It is done in the _guess_delimiter() method during the building of frequency tables. A striping is done for each line (why??) If I change: freq = line.strip().count(char) by: freq = line.count(char) It works fine. Do you have a workaround for that? --- Laurent. -- http://mail.python.org/mailman/listinfo/python-list