Hello,
I am into my 1st BSc Statistics and into a project for an MNC. I am
trying my first hand at linux and was recently introduced to gawk.
I am having trouble processing a statistical dump that is provided to us in the
form of a csv file. The format of the file is given below
C_ID, ID_NO, stat1, vol2, amount3,...
There are around 40 fields and the csv file has close to a million records
The C_ID, is the customer id and is only way to identify the customer.
The ID_NO field is the premium plan the customer is in
stat1, vol2, amounts are all numbers
I can write a query that uses a few if statements and gets the details if the
ID_NO is sequential.
1. However,
there are over 1000 different ID_NO (legacy) and we need to add the stat1, vol2,
amount3 for each of the ID_NOs separately (I need to group the ID_NO, and sum
of the fields)
2. If
I have the C_ID of the customers in a separate csv, is it possible to compare
the C_ID with that of the C_ID in the dump and determine the sum of stat1,
vol2, amount3… (sum of the fields only
for a set of customers and not for the ID_NOs in whole)
We are extensively using
MS-Access for this and it has been a pain. A friend suggested that I try my
hand at using tools in linux.
I am not sure if this is the
right mailing list for this.
Will really appreciate any help
in this regard.
Thanks in advance
Siva
_______________________________________________
To unsubscribe, email [email protected] with
"unsubscribe <password> <address>"
in the subject or body of the message.
http://www.ae.iitm.ac.in/mailman/listinfo/ilugc