Hello,



I am into my 1st BSc Statistics and into a project for an MNC. I am
trying my first hand at linux and was recently introduced to gawk.



I am having trouble processing a statistical dump that is provided to us in the
form of a csv file. The format of the file is given below



C_ID, ID_NO, stat1, vol2, amount3,... 



There are around 40 fields and the csv file has close to a million records



The C_ID, is the customer id and is only way to identify the customer.

The ID_NO field is the premium plan the customer is in

stat1, vol2, amounts are all numbers



I can write a query that uses a few if statements and gets the details if the
ID_NO is sequential. 

1.       However,
there are over 1000 different ID_NO (legacy) and we need to add the stat1, vol2,
amount3 for each of the ID_NOs separately (I need to group the ID_NO, and sum
of the fields)

2.       If
I have the C_ID of the customers in a separate csv, is it possible to compare
the C_ID with that of the C_ID in the dump and determine the sum of stat1,
vol2, amount3…  (sum of the fields only
for a set of customers and not for the ID_NOs in whole)

We are extensively using
MS-Access for this and it has been a pain. A friend suggested that I try my
hand at using tools in linux.

I am not sure if this is the
right mailing list for this. 

Will really appreciate any help
in this regard.

Thanks in advance

Siva

_______________________________________________
To unsubscribe, email [email protected] with 
"unsubscribe <password> <address>"
in the subject or body of the message.  
http://www.ae.iitm.ac.in/mailman/listinfo/ilugc

Reply via email to