You could use cogroup to combine RDDs in one RDD for cross reference processing.
e.g.
a.cogroup(b). filter{case (_, (l,r)) => l.nonEmpty && r.nonEmpty }. map{case
(k,(l,r)) => (k, l)}
Best Regards,
Raymond Liu
-----Original Message-----
From: marylucy [mailto:[email protected]]
Sent: Friday, August 29, 2014 9:26 PM
To: Matthew Farrellee
Cc: [email protected]
Subject: Re: how to filter value in spark
i see it works well,thank you!!!
But in follow situation how to do
var a = sc.textFile("/sparktest/1/").map((_,"a"))
var b = sc.textFile("/sparktest/2/").map((_,"b"))
How to get (3,"a") and (4,"a")????
在 Aug 28, 2014,19:54,"Matthew Farrellee" <[email protected]> 写道:
> On 08/28/2014 07:20 AM, marylucy wrote:
>> fileA=1 2 3 4 one number a line,save in /sparktest/1/
>> fileB=3 4 5 6 one number a line,save in /sparktest/2/ I want to get
>> 3 and 4
>>
>> var a = sc.textFile("/sparktest/1/").map((_,1))
>> var b = sc.textFile("/sparktest/2/").map((_,1))
>>
>> a.filter(param=>{b.lookup(param._1).length>0}).map(_._1).foreach(prin
>> tln)
>>
>> Error throw
>> Scala.MatchError:Null
>> PairRDDFunctions.lookup...
>
> the issue is nesting of the b rdd inside a transformation of the a rdd
>
> consider using intersection, it's more idiomatic
>
> a.intersection(b).foreach(println)
>
> but not that intersection will remove duplicates
>
> best,
>
>
> matt
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected] For
> additional commands, e-mail: [email protected]
>
B�KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB??[��X�剀�X�KK[XZ[
?\�\�][��X�剀�X�P?\���\X?K����B����Y][��[圹[X[??K[XZ[
?\�\�Z[?\���\X?K����B�B