Does Spark have an implementation similar to CompositeInputFormat in MapReduce?
CompositeInputFormat joins multiple datasets prior to the mapper, that are partitioned the same way with the same number of partitions, using the "part" number in the file name in each dataset to figure out which file to join with its counterparts in the other datasets. Here is a similar question from earlier this year: http://mail-archives.us.apache.org/mod_mbox/spark-user/201505.mbox/%3CCADrn=epwl6ghs9hfyo3csuxhshtycsrlbujcmpxrtz4zype...@mail.gmail.com%3E >From what I can tell, there's no way to tell Spark about how a dataset had been previously partitioned, other than repartitioning it in order to achieve a map-side join with a similarly partitioned dataset.