On 2015-05-14 01:06, Ethan Furman wrote:
On 05/13/2015 04:24 PM, 20/20 Lab wrote:
I'm a beginner to python. Reading here and there. Written a couple of
short and simple programs to make life easier around the office.
That being said, I'm not even sure what I need to ask for. I've never
worked with external data before.
I have a LARGE csv file that I need to process. 110+ columns, 72k
rows. I managed to write enough to reduce it to a few hundred rows, and
the five columns I'm interested in.
Now is were I have my problem:
myList = [ [123, "XXX", "Item", "Qty", "Noise"],
[72976, "YYY", "Item", "Qty", "Noise"],
[123, "XXX" "ItemTypo", "Qty", "Noise"] ]
Basically, I need to check for rows with duplicate accounts row[0] and
staff (row[1]), and if so, remove that row, and add it's Qty to the
original row. I really dont have a clue how to go about this. The
number of rows change based on which run it is, so I couldnt even get
away with using hundreds of compare loops.
If someone could point me to some documentation on the functions I would
need, or a tutorial it would be a great help.
You could try using a dictionary, combining when needed:
# untested
data = {}
for row in all_rows:
key = row[0], row[1]
if key in data:
item, qty, noise = data[key]
qty += row[3]
else:
item, qty, noise = row[2:]
data[key] = item, qty, noise
for (account, staff), (item, qty, noise) in data.items():
do_stuff_with(account, staff, item, qty, noise)
At the end, data should have what you want. It won't, however, be in
the same order, so hopefully that's not an issue for you.
Starting from that, if the order matters, you can do it this way:
data = {}
order = {}
for index, row in enumerate(all_rows):
key = row[0], row[1]
if key in data:
item, qty, noise = data[key]
qty += row[3]
else:
item, qty, noise = row[2:]
data[key] = item, qty, noise
order.setdefault(key, index)
merged_rows = [(account, staff, item, qty, noise) for (account, staff),
(item, qty, noise) in data.items()]
def original_order(row):
key = row[0], row[1]
return order[key]
merged_rows.sort(key=original_order)
--
https://mail.python.org/mailman/listinfo/python-list