On 05/13/2015 08:45 PM, 20/20 Lab wrote:>
You accidentally replied to me, rather than the mailing list. Please
use reply-list, or if your mailer can't handle that, do a Reply-All, and
remove the parts you don't want.
>
> On 05/13/2015 05:07 PM, Dave Angel wrote:
>> On 05/13/2015 07:24 PM, 20/20 Lab wrote:
>>> I'm a beginner to python. Reading here and there. Written a couple of
>>> short and simple programs to make life easier around the office.
>>>
>> Welcome to Python, and to this mailing list.
>>
>>> That being said, I'm not even sure what I need to ask for. I've never
>>> worked with external data before.
>>>
>>> I have a LARGE csv file that I need to process. 110+ columns, 72k
>>> rows.
>>
>> That's not very large at all.
>>
> In the grand scheme, I guess not. However I'm currently doing this
> whole process using office. So it can be a bit daunting.
I'm not familiar with the "office" operating system.
>>> I managed to write enough to reduce it to a few hundred rows, and
>>> the five columns I'm interested in.
>>
>>>
>>> Now is were I have my problem:
>>>
>>> myList = [ [123, "XXX", "Item", "Qty", "Noise"],
>>> [72976, "YYY", "Item", "Qty", "Noise"],
>>> [123, "XXX" "ItemTypo", "Qty", "Noise"] ]
>>>
>>
>> It'd probably be useful to identify names for your columns, even if
>> it's just in a comment. Guessing from the paragraph below, I figure
>> the first two columns are "account" & "staff"
>
> The columns that I pull are Account, Staff, Item Sold, Quantity sold,
> and notes about the sale (notes arent particularly needed, but the
> higher ups would like them in the report)
>>
>>> Basically, I need to check for rows with duplicate accounts row[0] and
>>> staff (row[1]), and if so, remove that row, and add it's Qty to the
>>> original row.
>>
>> And which column is that supposed to be? Shouldn't there be a number
>> there, rather than a string?
>>
>>> I really dont have a clue how to go about this. The
>>> number of rows change based on which run it is, so I couldnt even get
>>> away with using hundreds of compare loops.
>>>
>>> If someone could point me to some documentation on the functions I
would
>>> need, or a tutorial it would be a great help.
>>>
>>
>> Is the order significant? Do you have to preserve the order that the
>> accounts appear? I'll assume not.
>>
>> Have you studied dictionaries? Seems to me the way to handle the
>> problem is to read in a row, create a dictionary with key of (account,
>> staff), and data of the rest of the line.
>>
>> Each time you read a row, you check if the key is already in the
>> dictionary. If not, add it. If it's already there, merge the data as
>> you say.
>>
>> Then when you're done, turn the dict back into a list of lists.
>>
> The order is irrelevant. No, I've not really studied dictionaries, but
> a few people have mentioned it. I'll have to read up on them and, more
> importantly, their applications. Seems that they are more versatile
> then I thought.
>
> Thank you.
You have to realize that a tuple can be used as a key, in your case a
tuple of Account and Staff.
You'll have to decide how you're going to merge the ItemSold,
QuantitySold, and notes.
--
DaveA
--
DaveA
--
https://mail.python.org/mailman/listinfo/python-list