Hi Cheng,
Thanks a lot. That solved my problem.
Thanks again for the quick response and solution.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Can-this-be-handled-in-map-reduce-using-RDDs-tp6905p7047.html
Sent from the Apache Spark User List mailing list
Hmm… my bad. The reason of the first exception is that the Iterator class
is not serializable since my snippet tries to return something like
RDD[(String,
Iterator[(Double, Double)]]. As for the second one, the for expression
returns an iterator rather than a collection, you need to traverse the
it
gt;
> In this case, how can I iterate and compare each coordinate pair with all
> the other pairs?
> Can this be done in a distributed manner, as this data set is going to have
> a few million records?
> Can we do this in map/reduce commands?
>
> Thanks.
>
>
>
> --
Hi Cheng,
Sorry Again.
In this method, i see that the values for
a <- positions.iterator
b <- positions.iterator
always remain the same. I tried to do a b <- positions.iterator.next, it
throws an error: value filter is not a member of (Double, Double)
Is there something I
Hi Cheng,
Thank you for your response. While I tried your solution,
.mapValues { positions =>
for {
a <- positions.iterator
b <- positions.iterator
if lessThan(a, b) && distance(a, b) < 100
} yield {
(a, b)
}
}
I got the result
Hi Imk,
I think iterator and for-comprehension may help here. I wrote a snippet
that implements your first 2 requirements:
def distance(a: (Double, Double), b: (Double, Double)): Double = ???
// Defines some total ordering among locations.
def lessThan(a: (Double, Double), b: (Double
Hi Oleg/Andrew,
Thanks much for the prompt response.
We expect thousands of lat/lon pairs for each IP address. And that is my
concern with the Cartesian product approach.
Currently for a small sample of this data (5000 rows) I am grouping by IP
address and then computing the distance between lat/
.Find the coordinate pair with the maximum occurrences
>>
>> In this case, how can I iterate and compare each coordinate pair with all
>> the other pairs?
>> Can this be done in a distributed manner, as this data set is going to
>> have
>> a few million records?
&
with all
> the other pairs?
> Can this be done in a distributed manner, as this data set is going to have
> a few million records?
> Can we do this in map/reduce commands?
>
> Thanks.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001
this in map/reduce commands?
Thanks.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Can-this-be-done-in-map-reduce-technique-in-parallel-tp6905.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
10 matches
Mail list logo