Does Spark have an implementation similar to CompositeInputFormat in
MapReduce?

CompositeInputFormat joins multiple datasets prior to the mapper, that are
partitioned the same way with the same number of partitions, using the
"part" number in the file name in each dataset to figure out which file to
join with its counterparts in the other datasets.

Here is a similar question from earlier this year:

http://mail-archives.us.apache.org/mod_mbox/spark-user/201505.mbox/%3CCADrn=epwl6ghs9hfyo3csuxhshtycsrlbujcmpxrtz4zype...@mail.gmail.com%3E

>From what I can tell, there's no way to tell Spark about how a dataset had
been previously partitioned, other than repartitioning it in order to
achieve a map-side join with a similarly partitioned dataset.

Reply via email to