Re: Can this be done in map-reduce technique (in parallel)

2014-06-05 Thread lmk
Hi Cheng, Thanks a lot. That solved my problem. Thanks again for the quick response and solution. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-this-be-handled-in-map-reduce-using-RDDs-tp6905p7047.html Sent from the Apache Spark User List mailing list

Re: Can this be done in map-reduce technique (in parallel)

2014-06-05 Thread Cheng Lian
Hmm… my bad. The reason of the first exception is that the Iterator class is not serializable since my snippet tries to return something like RDD[(String, Iterator[(Double, Double)]]. As for the second one, the for expression returns an iterator rather than a collection, you need to traverse the it

Re: Can this be done in map-reduce technique (in parallel)

2014-06-05 Thread Christopher Nguyen
Lakshmi, this is orthogonal to your question, but in case it's useful. It sounds like you're trying to determine the home location of a user, or something similar. If that's the problem statement, the data pattern may suggest a far more computationally efficient approach. For example, first map a

Re: Can this be done in map-reduce technique (in parallel)

2014-06-05 Thread lmk
Hi Cheng, Sorry Again. In this method, i see that the values for a <- positions.iterator b <- positions.iterator always remain the same. I tried to do a b <- positions.iterator.next, it throws an error: value filter is not a member of (Double, Double) Is there something I

Re: Can this be done in map-reduce technique (in parallel)

2014-06-05 Thread lmk
Hi Cheng, Thank you for your response. While I tried your solution, .mapValues { positions => for { a <- positions.iterator b <- positions.iterator if lessThan(a, b) && distance(a, b) < 100 } yield { (a, b) } } I got the result

Re: Can this be done in map-reduce technique (in parallel)

2014-06-04 Thread Cheng Lian
Hi Imk, I think iterator and for-comprehension may help here. I wrote a snippet that implements your first 2 requirements: def distance(a: (Double, Double), b: (Double, Double)): Double = ??? // Defines some total ordering among locations. def lessThan(a: (Double, Double), b: (Double

Re: Can this be done in map-reduce technique (in parallel)

2014-06-04 Thread lmk
Hi Oleg/Andrew, Thanks much for the prompt response. We expect thousands of lat/lon pairs for each IP address. And that is my concern with the Cartesian product approach. Currently for a small sample of this data (5000 rows) I am grouping by IP address and then computing the distance between lat/

Re: Can this be done in map-reduce technique (in parallel)

2014-06-04 Thread Andrew Ash
When you group by IP address in step 1 to this: (ip1,(lat1,lon1),(lat2,lon2)) (ip2,(lat3,lon3),(lat4,lat5)) How many lat/lon locations do you expect for each IP address? avg and max are interesting. Andrew On Wed, Jun 4, 2014 at 5:29 AM, Oleg Proudnikov wrote: > It is possi

Re: Can this be done in map-reduce technique (in parallel)

2014-06-04 Thread Oleg Proudnikov
It is possible if you use a cartesian product to produce all possible pairs for each IP address and 2 stages of map-reduce: - first by pairs of points to find the total of each pair and - second by IP address to find the pair for each IP address with the maximum count. Oleg On 4 June 2014 11