On 05/13/2015 06:23 PM, Steven D'Aprano wrote:
On Thu, 14 May 2015 09:24 am, 20/20 Lab wrote:
I'm a beginner to python. Reading here and there. Written a couple of
short and simple programs to make life easier around the office.
That being said, I'm not even sure what I need to ask for. I've never
worked with external data before.
I have a LARGE csv file that I need to process. 110+ columns, 72k
rows. I managed to write enough to reduce it to a few hundred rows, and
the five columns I'm interested in.
That's not large. Large is millions of rows, or tens of millions if you have
enough memory. What's large to you and me is usually small to the computer.
You should use the csv module for handling the CSV file, if you aren't
already doing so. Do you need a url to the docs?
I actually stumbled across the csv module after coding enough to make a
list of lists. So that is more the reason I approached the list;
Nothing like spending hours (or days) coding something that already
exists and just dont know about.
Now is were I have my problem:
myList = [ [123, "XXX", "Item", "Qty", "Noise"],
[72976, "YYY", "Item", "Qty", "Noise"],
[123, "XXX" "ItemTypo", "Qty", "Noise"] ]
Basically, I need to check for rows with duplicate accounts row[0] and
staff (row[1]), and if so, remove that row, and add it's Qty to the
original row. I really dont have a clue how to go about this.
Is the order of the rows important? If not, the problem is simpler.
processed = {} # hold the processed data in a dict
for row in myList:
account, staff = row[0:2]
key = (account, staff) # Put them in a tuple.
if key in processed:
# We've already seen this combination.
processed[key][3] += row[3] # Add the quantities.
else:
# Never seen this combination before.
processed[key] = row
newlist = list(processed.values())
Does that help?
It does, immensely. I'll make this work. Thank you again for the link
from yesterday and apologies for hitting the wrong reply button. I'll
have to study more on the usage and implementations of dictionaries and
tuples.
--
https://mail.python.org/mailman/listinfo/python-list