Here's a class which lets you proved a function on a row by row basis to
declare location
https://github.com/hortonworks-spark/cloud-integration/blob/master/spark-cloud-integration/src/main/scala/org/apache/spark/cloudera/ParallelizedWithLocalityRDD.scala
needs to be in o.a.spark as something you
AFAICT, `FileScanRDD` invokes`FilePartition::preferredLocations()`
method, which is ordered by the data size, to get the partition
preferred locations. If there are other vectors to sort, I'm wondering
if here[1] can be a place to add. Or inheriting class `FilePartition`
with overridden `preferredL
HI Spark developers,
I have created new format extending fileformat. I see getPrefferedLocations is
available if newCustomRDD is created. Since fileformat is based off FileScanRDD
which uses readfile method to read partitioned file, Is there a way to add
desired preferredLocations ?
Appreciate