[issue29405] improve csv.Sniffer().sniff() behavior

2017-01-31 Thread Milt Epstein

New submission from Milt Epstein:

I'm trying to use csv.Sniffer().sniff(sample_data) to determine the delimiter 
on a number of input files.  Through some trial and error, many "Could not 
determine delimiter" errors, and analyzing how this routine works/behaves, I 
settled on sample_data being some number of lines of the input file, 
particularly 30.  This value seems to allow the routine to work more 
frequently, although not always, particularly on short input files.

I realize the way this routine works is somewhat idiosyncratic, and it won't be 
so easy to improve it generally, but there's one simple change that occurred to 
me that would help in some cases.  Currently the function _guess_delimiter() in 
csv.py contains the following lines:

# build a list of possible delimiters
modeList = modes.items()
total = float(chunkLength * iteration)

So total is increased by chunkLength on each iteration.  The problem occurs 
when total becomes greater than the length of sample_data, that is, the 
iteration would go beyond the end of sample_data.  That reading is handled 
fine, it's truncated at the end of sample_data, but total is needlessly set too 
high.  My suggested change is to add the following two lines after the above:

if total > len(data):
total = float(len(data))

--
components: Library (Lib)
messages: 286570
nosy: mepstein
priority: normal
severity: normal
status: open
title: improve csv.Sniffer().sniff() behavior
type: behavior
versions: Python 3.5

___
Python tracker 
<http://bugs.python.org/issue29405>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29405] improve csv.Sniffer().sniff() behavior

2017-01-31 Thread Milt Epstein

Milt Epstein added the comment:

FWIW, it might be more concise and more consistent with the existing code to 
change the one line to:

total = min(float(chunkLength * iteration), float(len(data)))

--

___
Python tracker 
<http://bugs.python.org/issue29405>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29405] improve csv.Sniffer().sniff() behavior

2017-02-03 Thread Milt Epstein

Milt Epstein added the comment:

That's right, with 11 lines in the sample data, total will become 20 on the 
second iteration.  And that throws off some of the computations done in that 
function.

Your patch looks good, in that it will achieve what I'm requesting.  But :-), 
your pointing out that other redundant min() made me take a closer look at the 
code, and led me to produce the attached patch as an alternate suggestion.  I 
think it makes the code a bit more sensible and cleaner.  Please review, and go 
with what you think best.

Thanks.

--
Added file: http://bugs.python.org/file46512/csv.py.patch

___
Python tracker 
<http://bugs.python.org/issue29405>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com