Re: Finding most occurrences in a JSON Nested Array

2015-01-21 Thread Pankaj Narang
send me the current code here. I will fix and send back to you -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Finding-most-occurrences-in-a-JSON-Nested-Array-tp20971p21295.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --

Re: Finding most occurrences in a JSON Nested Array

2015-01-19 Thread Pankaj Narang
I just checked the post. do you need help still ? I think getAs(Seq[String]) should help. If you are still stuck let me know. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Finding-most-occurrences-in-a-JSON-Nested-Array-tp20971p21252.html Sent from t

Re: Finding most occurrences in a JSON Nested Array

2015-01-16 Thread adstan
Hi Pankaj, I have another related problem... given the following data: I want to find the top 3 locations where the employees' certifications are obtained (ignoring the fact that geospatial comparison are more than just equality). I tried the earlier approach, the challenge is the location field

Re: Finding most occurrences in a JSON Nested Array

2015-01-06 Thread Pankaj Narang
Thats great. I was not having access on the developer machine so sent you the psuedo code only. Happy to see its working. If you need any more help related to spark let me know anytime. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Finding-most-occurrence

Re: Finding most occurrences in a JSON Nested Array

2015-01-06 Thread adstan
Many thanks Pankaj, I've got it working. For completeness, here's the whole segment (including the printout at diff stages): -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Finding-most-occurrences-in-a-JSON-Nested-Array-tp20971p20996.html Sent from the Ap

Re: Finding most occurrences in a JSON Nested Array

2015-01-05 Thread Pankaj Narang
yes row(1).collect would be wrong as it is not tranformation on RDD try getString(1) to fetch the RDD I already said this is the psuedo code. If it does not help let me know I will run the code and send you get/getAs should work for you for example var hashTagsList = popularHashTags.flatM

Re: Finding most occurrences in a JSON Nested Array

2015-01-05 Thread adstan
I did try this earlier before, but I’ve got an error that I couldn’t comprehend: scala> var hobbies = results.flatMap(row => row(1)) :16: error: type mismatch; found : Any required: TraversableOnce[?] var hobbies = results.flatMap(row => row(1)) I must be missing something, perhaps a

Re: Finding most occurrences in a JSON Nested Array

2015-01-05 Thread Pankaj Narang
If you need more help let me know -Pankaj Linkedin https://www.linkedin.com/profile/view?id=171566646 Skype pankaj.narang -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Finding-most-occurrences-in-a-JSON-Nested-Array-tp20971p20976.html Sent from the Apach

Re: Finding most occurrences in a JSON Nested Array

2015-01-05 Thread Pankaj Narang
try as below results.map(row => row(1)).collect try var hobbies = results.flatMap(row => row(1)) It will create all the hobbies in a simpe array nowob hbmap =hobbies.map(hobby =>(hobby,1)).reduceByKey((hobcnt1,hobcnt2) =>hobcnt1+hobcnt2) It will aggregate hobbies as below {swimming,2}, {hi