Here's a class which lets you proved a function on a row by row basis to
declare location
https://github.com/hortonworks-spark/cloud-integration/blob/master/spark-cloud-integration/src/main/scala/org/apache/spark/cloudera/ParallelizedWithLocalityRDD.scala
needs to be in o.a.spark as something you
AFAICT, `FileScanRDD` invokes`FilePartition::preferredLocations()`
method, which is ordered by the data size, to get the partition
preferred locations. If there are other vectors to sort, I'm wondering
if here[1] can be a place to add. Or inheriting class `FilePartition`
with overridden `preferredL