On Tue, 06 Oct 2015 14:33:51 +0200, Peter Otten wrote: > Jaydip Chakrabarty wrote: > >> On Tue, 06 Oct 2015 01:34:17 +1100, Chris Angelico wrote: >> >>> On Tue, Oct 6, 2015 at 1:06 AM, Tim Chase >>> <python.l...@tim.thechases.com> wrote: >>>> That way, if you determine by line 3 that your million-row CSV file >>>> has no blank columns, you can get away with not processing all >>>> million rows. >>> >>> Sure, although that effectively means the entire job is moot. I kinda >>> assume that the OP knows that there are some blank columns (maybe lots >>> of them). The extra check is unnecessary unless it's actually >>> plausible that there'll be no blanks whatsoever. >>> >>> Incidentally, you have an ordered_headers list which is the blank >>> columns in order; I think the OP was looking for a list of the >>> _non_blank columns. But that's a trivial difference, easy to tweak. >>> >>> ChrisA >> >> Thanks to you all. I got it this far. But while writing back to another >> csv file, I got this error - "ValueError: dict contains fields not in >> fieldnames: None". Here is my code. >> >> rdr = csv.DictReader(fin, delimiter=',') >> header_set = set(rdr.fieldnames) >> for r in rdr: >> header_set = set(h for h in header_set if not r[h]) >> if not header_set: >> break >> >> for r in rdr: >> data = list(r[i] for i in header_set) >> >> dw = csv.DictWriter(fout, header_set) >> dw.writeheader() >> dw.writerows(data) > > Sorry, this is not the code you ran. I could guess what the missing > parts might be, but it is easier for both sides if you provide a small > script that actually can be executed and a small dataset that shows the > behaviour you describe. Then post the session and especially the > traceback. Example: > > $ cat my_data.csv 0 > $ cat my_code.py print 1/int(open("my_data.csv").read()) > $ python my_code.py Traceback (most recent call last): > File "my_code.py", line 1, in <module> > print 1/int(open("my_data.csv").read()) > ZeroDivisionError: integer division or modulo by zero > > Don't retype, use cut and paste. Thank you.
I downloaded gmail contacts in google csv format. There are so many columns. So I was trying to create another csv with the required columns. Now when I tried to open the gmail csv file with csv DictReader, it said the file contained NULL characters. So first I did - data = open(fn, 'rb').read() fout = open(ofn, 'wb') fout.write(data.replace('\x00', '')) fout.close() shutil.move(ofn, fn) Then I found, there were some special characters in the file. So, once again I opened the file and did - data = open(fn, 'rb').read() fout = open(ofn, 'wb') fout.write(data.replace('\xff\xfe', '')) fout.close() shutil.move(ofn, fn) Now it seemed right. So I started to remove empty columns. fin = open(fn, 'rb') fout = open(ofn, 'wb') rdr = csv.DictReader(fin, delimiter=',') flds = rdr.fieldnames header_set = set(rdr.fieldnames) for r in rdr: header_set = set(h for h in header_set if not r[h]) if not header_set: break for r in rdr: data = list(r[i] for i in header_set) dw = csv.DictWriter(fout, data[0].keys()) dw.writeheader() dw.writerows(data) fin.close() fout.close() But, I am getting error at dw.writerows(data). I put the whole code here. Please help. Thanks. -- https://mail.python.org/mailman/listinfo/python-list