Re: FlatMapValues

2015-01-05 Thread Sanjay Subramanian
cool let me adapt that. thanks a tonregardssanjay From: Sean Owen To: Sanjay Subramanian Cc: "user@spark.apache.org" Sent: Monday, January 5, 2015 3:19 AM Subject: Re: FlatMapValues For the record, the solution I was suggesting was about like this: inputRDD.flatM

Re: FlatMapValues

2015-01-05 Thread Sean Owen
For the record, the solution I was suggesting was about like this: inputRDD.flatMap { input => val tokens = input.split(',') val id = tokens(0) val keyValuePairs = tokens.tail.grouped(2) val keys = keyValuePairs.map(_(0)) keys.map(key => (id, key)) } This is much more efficient. On Wed

Re: FlatMapValues

2015-01-02 Thread Sanjay Subramanian
else { ("") } }).flatMap(str => str.split('\t')).filter(line => line.toString.length() > 0).saveAsTextFile("/data/vaers/msfx/reac/" + outFile) From: Sanjay Subramanian To: Hitesh Khamesra Cc:

Re: FlatMapValues

2015-01-01 Thread Sanjay Subramanian
thanks let me try that out From: Hitesh Khamesra To: Sanjay Subramanian Cc: Kapil Malik ; Sean Owen ; "user@spark.apache.org" Sent: Thursday, January 1, 2015 9:46 AM Subject: Re: FlatMapValues How about this..apply flatmap on per line. And in that function, parse each

Re: FlatMapValues

2015-01-01 Thread Hitesh Khamesra
nd you need > to import org.apache.spark.rdd.SparkContext._ to use them > (http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions > ) > > @Sean, yes indeed flatMap / flatMapValues both can be used. > > Regards, > > Kapil > > &

Re: FlatMapValues

2014-12-31 Thread Sanjay Subramanian
,Injection site oedema025005,Injection site reaction thanks sanjay From: Kapil Malik To: Sean Owen ; Sanjay Subramanian Cc: "user@spark.apache.org" Sent: Wednesday, December 31, 2014 9:35 AM Subject: RE: FlatMapValues Hi Sanjay, Oh yes .. on flatMapValues, it&#x

RE: FlatMapValues

2014-12-31 Thread Kapil Malik
h can be used. Regards, Kapil -Original Message- From: Sean Owen [mailto:so...@cloudera.com] Sent: 31 December 2014 21:16 To: Sanjay Subramanian Cc: user@spark.apache.org Subject: Re: FlatMapValues From the clarification below, the problem is that you are calling flatMapValues, whi

Re: FlatMapValues

2014-12-31 Thread Sean Owen
>From the clarification below, the problem is that you are calling flatMapValues, which is only available on an RDD of key-value tuples. Your map function returns a tuple in one case but a String in the other, so your RDD is a bunch of Any, which is not at all what you want. You need to return a tu

Re: FlatMapValues

2014-12-31 Thread Sanjay Subramanian
ks regards sanjay From: Fernando O. To: Kapil Malik Cc: Sanjay Subramanian ; "user@spark.apache.org" Sent: Wednesday, December 31, 2014 6:06 AM Subject: Re: FlatMapValues Hi Sanjay, Doing an if inside a Map sounds like a bad idea, it seems like you actually want to filter and

Re: FlatMapValues

2014-12-31 Thread Fernando O.
Hi Sanjay, Doing an if inside a Map sounds like a bad idea, it seems like you actually want to filter and then apply map On Wed, Dec 31, 2014 at 9:54 AM, Kapil Malik wrote: > Hi Sanjay, > > > > I tried running your code on spark shell piece by piece – > > > > // Setup > > val line1 = “025126,C

RE: FlatMapValues

2014-12-31 Thread Kapil Malik
Hi Sanjay, I tried running your code on spark shell piece by piece – // Setup val line1 = “025126,Chills,8.10,Injection site oedema,8.10,Injection site reaction,8.10,Malaise,8.10,Myalgia,8.10” val line2 = “025127,Chills,8.10,Injection site oedema,8.10,Injection site reaction,8.10,Malaise,8.10,M

Re: FlatMapValues

2014-12-31 Thread Raghavendra Pandey
Why don't you push "\n" instead of "\t" in your first transformation [ (fields(0),(fields(1)+"\t"+fields(3)+"\t"+fields(5)+"\t"+fields(7)+"\t" +fields(9)))] and then do saveAsTextFile? -Raghavendra On Wed Dec 31 2014 at 1:42:55 PM Sanjay Subramanian wrote: > hey guys > > My dataset is like this