Agree with Koert that UnionRDD should have a narrow dependencies .
Although union of two RDDs increases the number of tasks to be executed (
rdd1.partitions + rdd2.partitions) .
If your two RDDs have same number of partitions , you can also use
zipPartitions, which causes lesser number of tasks, he
i am surprised union introduces a stage. UnionRDD should have only narrow
dependencies.
On Tue, Feb 2, 2016 at 11:25 PM, Koert Kuipers wrote:
> well the "hadoop" way is to save to a/b and a/c and read from a/* :)
>
> On Tue, Feb 2, 2016 at 11:05 PM, Jerry Lam wrote:
>
>> Hi Spark users and deve
well the "hadoop" way is to save to a/b and a/c and read from a/* :)
On Tue, Feb 2, 2016 at 11:05 PM, Jerry Lam wrote:
> Hi Spark users and developers,
>
> anyone knows how to union two RDDs without the overhead of it?
>
> say rdd1.union(rdd2).saveTextFile(..)
> This requires a stage to union th
Hi Spark users and developers,
anyone knows how to union two RDDs without the overhead of it?
say rdd1.union(rdd2).saveTextFile(..)
This requires a stage to union the 2 rdds before saveAsTextFile (2 stages).
Is there a way to skip the union step but have the contents of the two rdds
save to the s