On 01/20/2013 05:04 PM, Garry wrote:
I'm trying to manipulate family tree data using Python.
I'm using linux and Python 2.7.3 and have data files saved as Linux formatted
cvs files
The data appears in this format:
Marriage,Husband,Wife,Date,Place,Source,Note0x0a
Note: the Source field or the Note field can contain quoted data (same as the
Place field)
Actual data:
[F0244],[I0690],[I0354],1916-06-08,"Neely's Landing, Cape Gir. Co, MO",,0x0a
[F0245],[I0692],[I0355],1919-09-04,"Cape Girardeau Co, MO",,0x0a
code snippet follows:
import os
import re
#I'm using the following regex in an attempt to decode the data:
RegExp2 =
"^(\[[A-Z]\d{1,}\])\,(\[[A-Z]\d{1,}\])\,(\[[A-Z]\d{1,}\])\,(\d{,4}\-\d{,2}\-\d{,2})\,(.*|\".*\")\,(.*|\".*\")\,(.*|\".*\")"
#
line = "[F0244],[I0690],[I0354],1916-06-08,\"Neely's Landing, Cape Gir. Co,
MO\",,"
#
(Marriage,Husband,Wife,Date,Place,Source,Note) = re.split(RegExp2,line)
#
#However, this does not decode the 7 fields.
# The following error is displayed:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: too many values to unpack
#
# When I use xx the fields apparently get unpacked.
xx = re.split(RegExp2,line)
#
print xx[0]
print xx[1]
[F0244]
print xx[5]
"Neely's Landing, Cape Gir. Co, MO"
print xx[6]
print xx[7]
print xx[8]
Why is there an extra NULL field before and after my record contents?
I'm stuck, comments and solutions greatly appreciated.
Garry
Gosh, you really don't want to use regex to split csv lines like that....
Use csv module:
>>> s
'[F0244],[I0690],[I0354],1916-06-08,"Neely\'s Landing, Cape Gir. Co,
MO",,0x0a'
>>> import csv
>>> r = csv.reader([s])
>>> for l in r: print(l)
...
['[F0244]', '[I0690]', '[I0354]', '1916-06-08', "Neely's Landing, Cape
Gir. Co, MO", '', '0x0a']
the arg to csv.reader can be the file object (or a list of lines).
- mitya
--
Lark's Tongue Guide to Python: http://lightbird.net/larks/
--
http://mail.python.org/mailman/listinfo/python-list