CompositeInputFormat in Spark

Alex Nastetsky Fri, 30 Oct 2015 20:54:07 -0700

Does Spark have an implementation similar to CompositeInputFormat in
MapReduce?


CompositeInputFormat joins multiple datasets prior to the mapper, that are
partitioned the same way with the same number of partitions, using the
"part" number in the file name in each dataset to figure out which file to
join with its counterparts in the other datasets.

Here is a similar question from earlier this year:

http://mail-archives.us.apache.org/mod_mbox/spark-user/201505.mbox/%3CCADrn=epwl6ghs9hfyo3csuxhshtycsrlbujcmpxrtz4zype...@mail.gmail.com%3E

>From what I can tell, there's no way to tell Spark about how a dataset had
been previously partitioned, other than repartitioning it in order to
achieve a map-side join with a similarly partitioned dataset.

CompositeInputFormat in Spark

Reply via email to