Hi Cheng,
Thanks a lot. That solved my problem.
Thanks again for the quick response and solution.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Can-this-be-handled-in-map-reduce-using-RDDs-tp6905p7047.html
Sent from the Apache Spark User List mailing list
Hmm… my bad. The reason of the first exception is that the Iterator class
is not serializable since my snippet tries to return something like
RDD[(String,
Iterator[(Double, Double)]]. As for the second one, the for expression
returns an iterator rather than a collection, you need to traverse the
it
Lakshmi, this is orthogonal to your question, but in case it's useful.
It sounds like you're trying to determine the home location of a user, or
something similar.
If that's the problem statement, the data pattern may suggest a far more
computationally efficient approach. For example, first map a
Hi Cheng,
Sorry Again.
In this method, i see that the values for
a <- positions.iterator
b <- positions.iterator
always remain the same. I tried to do a b <- positions.iterator.next, it
throws an error: value filter is not a member of (Double, Double)
Is there something I
Hi Cheng,
Thank you for your response. While I tried your solution,
.mapValues { positions =>
for {
a <- positions.iterator
b <- positions.iterator
if lessThan(a, b) && distance(a, b) < 100
} yield {
(a, b)
}
}
I got the result
Hi Imk,
I think iterator and for-comprehension may help here. I wrote a snippet
that implements your first 2 requirements:
def distance(a: (Double, Double), b: (Double, Double)): Double = ???
// Defines some total ordering among locations.
def lessThan(a: (Double, Double), b: (Double
Hi Oleg/Andrew,
Thanks much for the prompt response.
We expect thousands of lat/lon pairs for each IP address. And that is my
concern with the Cartesian product approach.
Currently for a small sample of this data (5000 rows) I am grouping by IP
address and then computing the distance between lat/
When you group by IP address in step 1 to this:
(ip1,(lat1,lon1),(lat2,lon2))
(ip2,(lat3,lon3),(lat4,lat5))
How many lat/lon locations do you expect for each IP address? avg and max
are interesting.
Andrew
On Wed, Jun 4, 2014 at 5:29 AM, Oleg Proudnikov
wrote:
> It is possi
It is possible if you use a cartesian product to produce all possible
pairs for each IP address and 2 stages of map-reduce:
- first by pairs of points to find the total of each pair and
- second by IP address to find the pair for each IP address with the
maximum count.
Oleg
On 4 June 2014 11