Jaydip Chakrabarty wrote: > On Tue, 06 Oct 2015 14:33:51 +0200, Peter Otten wrote: > >> Jaydip Chakrabarty wrote: >> >>> On Tue, 06 Oct 2015 01:34:17 +1100, Chris Angelico wrote: >>> >>>> On Tue, Oct 6, 2015 at 1:06 AM, Tim Chase >>>> <python.l...@tim.thechases.com> wrote: >>>>> That way, if you determine by line 3 that your million-row CSV file >>>>> has no blank columns, you can get away with not processing all >>>>> million rows. >>>> >>>> Sure, although that effectively means the entire job is moot. I kinda >>>> assume that the OP knows that there are some blank columns (maybe lots >>>> of them). The extra check is unnecessary unless it's actually >>>> plausible that there'll be no blanks whatsoever. >>>> >>>> Incidentally, you have an ordered_headers list which is the blank >>>> columns in order; I think the OP was looking for a list of the >>>> _non_blank columns. But that's a trivial difference, easy to tweak. >>>> >>>> ChrisA >>> >>> Thanks to you all. I got it this far. But while writing back to another >>> csv file, I got this error - "ValueError: dict contains fields not in >>> fieldnames: None". Here is my code. >>> >>> rdr = csv.DictReader(fin, delimiter=',') >>> header_set = set(rdr.fieldnames) >>> for r in rdr: >>> header_set = set(h for h in header_set if not r[h]) >>> if not header_set: >>> break >>> >>> for r in rdr: >>> data = list(r[i] for i in header_set) >>> >>> dw = csv.DictWriter(fout, header_set) >>> dw.writeheader() >>> dw.writerows(data) >> >> Sorry, this is not the code you ran. I could guess what the missing >> parts might be, but it is easier for both sides if you provide a small >> script that actually can be executed and a small dataset that shows the >> behaviour you describe. Then post the session and especially the >> traceback. Example: >> >> $ cat my_data.csv 0 >> $ cat my_code.py print 1/int(open("my_data.csv").read()) >> $ python my_code.py Traceback (most recent call last): >> File "my_code.py", line 1, in <module> >> print 1/int(open("my_data.csv").read()) >> ZeroDivisionError: integer division or modulo by zero >> >> Don't retype, use cut and paste. Thank you. > > I downloaded gmail contacts in google csv format. There are so many > columns. So I was trying to create another csv with the required columns. > Now when I tried to open the gmail csv file with csv DictReader, it said > the file contained NULL characters. > So first I did - > > data = open(fn, 'rb').read() > fout = open(ofn, 'wb') > fout.write(data.replace('\x00', '')) > fout.close() > shutil.move(ofn, fn) > > Then I found, there were some special characters in the file. So, once > again I opened the file and did - > > data = open(fn, 'rb').read() > fout = open(ofn, 'wb') > fout.write(data.replace('\xff\xfe', '')) > fout.close() > shutil.move(ofn, fn)
Uh this looks like the file is in UTF-16. Use import codecs fn = ... ofn = ... with codecs.open(fn, encoding="utf-16") as f: with codecs.open(ofn, "w", encoding="utf-8") as g: g.writelines(f) ... to convert it to UTF-8 which is compatible with the csv module of Python 2. > Now it seemed right. Only if all characters are encodable as iso-8859-1. > So I started to remove empty columns. > > fin = open(fn, 'rb') > fout = open(ofn, 'wb') > > rdr = csv.DictReader(fin, delimiter=',') > flds = rdr.fieldnames > header_set = set(rdr.fieldnames) > for r in rdr: > header_set = set(h for h in header_set if not r[h]) > if not header_set: > break > for r in rdr: > data = list(r[i] for i in header_set) > > dw = csv.DictWriter(fout, data[0].keys()) > dw.writeheader() > dw.writerows(data) > > fin.close() > fout.close() > > But, I am getting error at dw.writerows(data). I put the whole code here. > Please help. I really meant it when I asked you to post the code you actually ran, and the traceback it produces. When I fill in the blanks by guessing $ cat in.csv one,two,three foo,, bar,,baz $ cat remove_empty_colums.py import csv fn = "in.csv" ofn = "out.csv" fin = open(fn, 'rb') fout = open(ofn, 'wb') rdr = csv.DictReader(fin, delimiter=',') flds = rdr.fieldnames header_set = set(rdr.fieldnames) for r in rdr: header_set = set(h for h in header_set if not r[h]) if not header_set: break for r in rdr: data = list(r[i] for i in header_set) dw = csv.DictWriter(fout, data[0].keys()) dw.writeheader() dw.writerows(data) fin.close() fout.close() and then run the resulting script I get $ python remove_empty_colums.py Traceback (most recent call last): File "remove_empty_colums.py", line 18, in <module> dw = csv.DictWriter(fout, data[0].keys()) NameError: name 'data' is not defined So this is my traceback, and while the NameError is trivial to fix (reopen the file or do a seek) but not sufficient to make the script do what you want it doesn't seem to be the problem you ran into. So you have a different script. I'd really like to see it, and the traceback it produces. -- https://mail.python.org/mailman/listinfo/python-list