Re: Can this be done in map-reduce technique (in parallel)

2014-06-05 Thread lmk
Hi Cheng, Thanks a lot. That solved my problem. Thanks again for the quick response and solution. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-this-be-handled-in-map-reduce-using-RDDs-tp6905p7047.html Sent from the Apache Spark User List mailing list

Re: Can this be done in map-reduce technique (in parallel)

2014-06-05 Thread Cheng Lian
Hmm… my bad. The reason of the first exception is that the Iterator class is not serializable since my snippet tries to return something like RDD[(String, Iterator[(Double, Double)]]. As for the second one, the for expression returns an iterator rather than a collection, you need to traverse the it

Re: Can this be done in map-reduce technique (in parallel)

2014-06-05 Thread Christopher Nguyen
gt; > In this case, how can I iterate and compare each coordinate pair with all > the other pairs? > Can this be done in a distributed manner, as this data set is going to have > a few million records? > Can we do this in map/reduce commands? > > Thanks. > > > > --

Re: Can this be done in map-reduce technique (in parallel)

2014-06-05 Thread lmk
Hi Cheng, Sorry Again. In this method, i see that the values for a <- positions.iterator b <- positions.iterator always remain the same. I tried to do a b <- positions.iterator.next, it throws an error: value filter is not a member of (Double, Double) Is there something I

Re: Can this be done in map-reduce technique (in parallel)

2014-06-05 Thread lmk
Hi Cheng, Thank you for your response. While I tried your solution, .mapValues { positions => for { a <- positions.iterator b <- positions.iterator if lessThan(a, b) && distance(a, b) < 100 } yield { (a, b) } } I got the result

Re: Can this be done in map-reduce technique (in parallel)

2014-06-04 Thread Cheng Lian
Hi Imk, I think iterator and for-comprehension may help here. I wrote a snippet that implements your first 2 requirements: def distance(a: (Double, Double), b: (Double, Double)): Double = ??? // Defines some total ordering among locations. def lessThan(a: (Double, Double), b: (Double

Re: Can this be done in map-reduce technique (in parallel)

2014-06-04 Thread lmk
Hi Oleg/Andrew, Thanks much for the prompt response. We expect thousands of lat/lon pairs for each IP address. And that is my concern with the Cartesian product approach. Currently for a small sample of this data (5000 rows) I am grouping by IP address and then computing the distance between lat/

Re: Can this be done in map-reduce technique (in parallel)

2014-06-04 Thread Andrew Ash
.Find the coordinate pair with the maximum occurrences >> >> In this case, how can I iterate and compare each coordinate pair with all >> the other pairs? >> Can this be done in a distributed manner, as this data set is going to >> have >> a few million records? &

Re: Can this be done in map-reduce technique (in parallel)

2014-06-04 Thread Oleg Proudnikov
with all > the other pairs? > Can this be done in a distributed manner, as this data set is going to have > a few million records? > Can we do this in map/reduce commands? > > Thanks. > > > > -- > View this message in context: > http://apache-spark-user-list.1001

Can this be done in map-reduce technique (in parallel)

2014-06-04 Thread lmk
this in map/reduce commands? Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-this-be-done-in-map-reduce-technique-in-parallel-tp6905.html Sent from the Apache Spark User List mailing list archive at Nabble.com.