Hello, I have a large CSV file in which the continued records(with same RecordID) may have the context meaning. I should see these continued records as ONE complete record. Also the recordID will be reset to 1 at some time when the csv dumper system think it's necessary.
I'd like to get some suggestion about how to do analyze with this kind of file in Spark ? for example, I need to get the number of the complete record which should consists >=2 continued records. Obviously, "2, s2, 9, r1, 7, r2, 8, r3, 3" is one of my target. A example sample of csv RecordID,stdID,stdVal,refID,refVal 1,s1,10,r1,7 2,s2,9,r1,7 2,s2,9,r2,8 2,s2,9,r3,3 3,s1,12,r2,10 ... 42,s3,8,r7,5 1,s2,11,r3,5 Best regards, Jiaqiang -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-to-deal-with-continued-records-tp23269.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
