Re: Looking for direction

Dave Angel Wed, 13 May 2015 18:15:12 -0700

On 05/13/2015 08:45 PM, 20/20 Lab wrote:>

You accidentally replied to me, rather than the mailing list. Pleaseuse reply-list, or if your mailer can't handle that, do a Reply-All, andremove the parts you don't want.


>
> On 05/13/2015 05:07 PM, Dave Angel wrote:
>> On 05/13/2015 07:24 PM, 20/20 Lab wrote:
>>> I'm a beginner to python.  Reading here and there.  Written a couple of
>>> short and simple programs to make life easier around the office.
>>>
>> Welcome to Python, and to this mailing list.
>>
>>> That being said, I'm not even sure what I need to ask for. I've never
>>> worked with external data before.
>>>
>>> I have a LARGE csv file that I need to process.  110+ columns, 72k
>>> rows.
>>
>> That's not very large at all.
>>
> In the grand scheme, I guess not.  However I'm currently doing this
> whole process using office.  So it can be a bit daunting.

I'm not familiar with the "office" operating system.

>>>  I managed to write enough to reduce it to a few hundred rows, and
>>> the five columns I'm interested in.
>>
>>>
>>> Now is were I have my problem:
>>>
>>> myList = [ [123, "XXX", "Item", "Qty", "Noise"],
>>>             [72976, "YYY", "Item", "Qty", "Noise"],
>>>             [123, "XXX" "ItemTypo", "Qty", "Noise"]    ]
>>>
>>
>> It'd probably be useful to identify names for your columns, even if
>> it's just in a comment.  Guessing from the paragraph below, I figure
>> the first two columns are "account" & "staff"
>
> The columns that I pull are Account, Staff, Item Sold, Quantity sold,
> and notes about the sale (notes arent particularly needed, but the
> higher ups would like them in the report)
>>
>>> Basically, I need to check for rows with duplicate accounts row[0] and
>>> staff (row[1]), and if so, remove that row, and add it's Qty to the
>>> original row.
>>
>> And which column is that supposed to be?  Shouldn't there be a number
>> there, rather than a string?
>>
>>> I really dont have a clue how to go about this.  The
>>> number of rows change based on which run it is, so I couldnt even get
>>> away with using hundreds of compare loops.
>>>

>>> If someone could point me to some documentation on the functions Iwould

>>> need, or a tutorial it would be a great help.
>>>
>>
>> Is the order significant?  Do you have to preserve the order that the
>> accounts appear?  I'll assume not.
>>
>> Have you studied dictionaries?  Seems to me the way to handle the
>> problem is to read in a row, create a dictionary with key of (account,
>> staff), and data of the rest of the line.
>>
>> Each time you read a row, you check if the key is already in the
>> dictionary.  If not, add it.  If it's already there, merge the data as
>> you say.
>>
>> Then when you're done, turn the dict back into a list of lists.
>>
> The order is irrelevant.  No, I've not really studied dictionaries, but
> a few people have mentioned it.  I'll have to read up on them and, more
> importantly, their applications.  Seems that they are more versatile
> then I thought.
>
> Thank you.

You have to realize that a tuple can be used as a key, in your case atuple of Account and Staff.

You'll have to decide how you're going to merge the ItemSold,QuantitySold, and notes.


--
DaveA


--
DaveA
--
https://mail.python.org/mailman/listinfo/python-list

Re: Looking for direction

Reply via email to