On 24Sep2019 15:55, Mihir Kothari <mihir.koth...@gmail.com> wrote:
I am using python 3.4. I have a CSV file as below:
ABC,PQR,(TEST1,TEST2)
FQW,RTE,MDE
Really? No quotes around the (TEST1,TEST2) column value? I would have
said this is invalid data, but that does not help you.
Basically comma-separated rows, where some rows have a data in column which
is array like i.e. in brackets.
So I need to read the file and treat such columns as one i.e. do not
separate based on comma if it is inside the bracket.
In short I need to read a CSV file where separator inside the brackets
needs to be ignored.
Output:
Column: 1 2 3
Row1: ABC PQR (TEST1,TEST2)
Row2: FQW RTE MDE
Can you please help with the snippet?
I would be reaching for a regular expression. If you partition your
values into 2 types: those starting and ending in a bracket, and those
not, you could write a regular expression for the former:
\([^)]*\)
which matches a string like (.....) (with, importantly, no embedded
brackets, only those at the beginning and end.
And you can write a regular expression like:
[^,]*
for a value containing no commas i.e. all the other values.
Test the bracketed one first, because the second one always matches
something.
Then you would not use the CSV module (which expects better formed data
than you have) and instead write a simple parser for a line of text
which tries to match one of these two expressions repeatedly to consume
the line. Something like this (UNTESTED):
bracketed_re = re.compile(r'\([^)]*\)')
no_commas_re = re.compile(r'[^,]*')
def split_line(line):
line = line.rstrip() # drop trailing whitespace/newline
fields = []
offset = 0
while offset < len(line):
m = bracketed_re.match(line, offset)
if m:
field = m.group()
else:
m = no_commas_re.match(line, offset) # this always matches
field = m.group()
fields.append(field)
offset += len(field)
if line.startswith(',', offset):
# another column
offset += 1
elif offset < len(line):
raise ValueError(
"incomplete parse at offset %d, line=%r" % (offset, line))
return fields
Then read the lines of the file and split them into fields:
row = []
with open(datafilename) as f:
for line in f:
fields = split_line(line)
rows.append(fields)
So basicly you're writing a little parser. If you have nested brackets
things get harder.
Cheers,
Cameron Simpson <c...@cskk.id.au>
--
https://mail.python.org/mailman/listinfo/python-list