On 2015-10-06 12:24, Jaydip Chakrabarty wrote:
On Tue, 06 Oct 2015 01:34:17 +1100, Chris Angelico wrote:
On Tue, Oct 6, 2015 at 1:06 AM, Tim Chase
<python.l...@tim.thechases.com> wrote:
That way, if you determine by line 3 that your million-row CSV file has
no blank columns, you can get away with not processing all million
rows.
Sure, although that effectively means the entire job is moot. I kinda
assume that the OP knows that there are some blank columns (maybe lots
of them). The extra check is unnecessary unless it's actually plausible
that there'll be no blanks whatsoever.
Incidentally, you have an ordered_headers list which is the blank
columns in order; I think the OP was looking for a list of the
_non_blank columns. But that's a trivial difference, easy to tweak.
ChrisA
Thanks to you all. I got it this far. But while writing back to another
csv file, I got this error - "ValueError: dict contains fields not in
fieldnames: None". Here is my code.
rdr = csv.DictReader(fin, delimiter=',')
header_set = set(rdr.fieldnames)
Initially, header_set contains all of the field names.
for r in rdr:
header_set = set(h for h in header_set if not r[h])
Keeping the field name if the field is empty.
if not header_set:
break
At this point, header_set will contain the field names where none of
its values are empty.
Wasn't the original question about excluding columns where all of the
values are empty? You're excluding columns where _any_ of the values
are empty.
for r in rdr:
data = list(r[i] for i in header_set)
data will contain each processed row in turn. Because of the
indentation, only the final data (row) will be would be written out.
dw = csv.DictWriter(fout, header_set)
dw.writeheader()
dw.writerows(data)
Also, there is difference between len(header_set) and len(data[0].keys).
Why is so?
Thanks again for all your help.
Thanks.
--
https://mail.python.org/mailman/listinfo/python-list