csv module

2005-12-28 Thread Laurent Laporte
hello,

I'm using cvs standard module under Python 2.3 / 2.4 to write a file
delimited with tabs. I use the "excel-tab" dialect to do that.

To read my CSV file, I choose to 'sniff' with a sample data in order to
get the dialect.
The problem I meet is that I get a wrong dialect: the sniffer return an
empty string delimiter. It is probably a bug in _guess_delimiter()
method.

The message I obtain is:
TypeError: bad argument type for built-in operation

Do you know a way to sniff tab-delimited data ?
Is it a known bug ?

Bye.

-- 
http://mail.python.org/mailman/listinfo/python-list


csv.Sniffer: wrong detection of the end of line delimiter

2005-12-28 Thread Laurent Laporte
hello,

I'm using cvs standard module under Python 2.3 / 2.4 to read a CSV
file. The file is opened in binary mode, so I keep the end of line
terminator.

It appears that the csv.Sniffer force the line terminator to be
'\r\n'. It's fine under Windows but wrong under Linux or
Macintosh.

More about this line terminator: Potential bug in the
_guess_delimiter() method.
The first line of code does a wrong splitting:
data = filter(None, data.split('\n'))
It doesn't take care of the real line terminator!

Here is a patch (not a perfect one):
# --- begin of patch ---
class PatchedSniffer(csv.Sniffer):

  def __init__(self):
csv.Sniffer.__init__(self)


  def sniff(self, p_data, p_delimiters = None):
t_dialect = csv.Sniffer.sniff(self, p_data, p_delimiters)
t_dialect.lineterminator = self._guessLineTerminator(p_data)
return t_dialect


  def _guessLineTerminator(self, p_data):
for t_lineTerminator in ['\r\n', '\n', '\r']:
  if t_lineTerminator in p_data:
return t_lineTerminator
else:
  return '\r\n' # Windows default (Excel)


  def _formatDataForGuess(self, p_data):
t_lineTerminator = self._guessLineTerminator(p_data)
return '\n'.join(p_data.split(t_lineTerminator))


  def _guess_delimiter(self, p_data, p_delimiters):
t_data = self._formatDataForGuess(p_data)

(t_delimiter, t_skipInitialSpace) = \
  csv.Sniffer._guess_delimiter(self, t_data, p_delimiters)

if t_delimiter == '' and '\t' in p_data:
  t_delimiter = '\t'

return (t_delimiter, t_skipInitialSpace)
# --- end of patch ---

Bye.
--- Laurent.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: csv module

2005-12-28 Thread Laurent Laporte
Sorry,

Here is my example:

Python 2.3.1 (#1, Sep 29 2003, 15:42:58)
[GCC 2.96 2731 (Red Hat Linux 7.1 2.96-98)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import csv
>>> t_sniffer = csv.Sniffer()
>>> t_data = "aaa\tbbb\r\n\r\nAAA\tBBB\r\n"
>>> t_dialect = t_sniffer.sniff(t_data)
>>> t_dialect.delimiter
''

In fact, I found the pb (thanks to you): I add a newline '\r\n' to
separate the header from the records...

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: csv module

2005-12-28 Thread Laurent Laporte
In fact, there is another bug:

In my CVS file, all the records ends with a trailing tab '\t'
except the header because the last field is always empty.

For example, I get :
>>> import csv
>>> t_sniffer = csv.Sniffer()
>>> t_data = "aaa\tbbb\r\nAAA\t\r\nBBB\t\r\n"
>>> t_dialect = t_sniffer.sniff(t_data)
>>> t_dialect.delimiter
''

It is done in the _guess_delimiter() method during the building of
frequency tables. A striping is done for each line (why??)
If I change:
  freq = line.strip().count(char)
by:
  freq = line.count(char)
It works fine.

Do you have a workaround for that?

--- Laurent.

-- 
http://mail.python.org/mailman/listinfo/python-list