how to deal with continued records

wushuzh Thu, 11 Jun 2015 00:09:54 -0700

Hello,

I have a large CSV file in which the continued records(with same RecordID)
may have the context meaning. I should see these continued records as ONE
complete record. Also the recordID will be reset to 1 at some time when the
csv dumper system think it's necessary.


I'd like to get some suggestion about how to do analyze with this kind of
file in Spark ? for example,

I need to get the number of the complete record which should consists >=2
continued records. Obviously, "2, s2, 9, r1, 7, r2, 8, r3, 3" is one of my
target. 

A example sample of csv

RecordID,stdID,stdVal,refID,refVal
1,s1,10,r1,7
2,s2,9,r1,7
2,s2,9,r2,8
2,s2,9,r3,3
3,s1,12,r2,10
...
42,s3,8,r7,5
1,s2,11,r3,5

Best regards,
Jiaqiang



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-deal-with-continued-records-tp23269.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

how to deal with continued records

Reply via email to