subject:"Re\: preferredlocations for hadoopfsrelations based baseRelations"

Re: preferredlocations for hadoopfsrelations based baseRelations

2020-06-29 Thread Steve Loughran

Here's a class which lets you proved a function on a row by row basis to declare location https://github.com/hortonworks-spark/cloud-integration/blob/master/spark-cloud-integration/src/main/scala/org/apache/spark/cloudera/ParallelizedWithLocalityRDD.scala needs to be in o.a.spark as something you

Re: preferredlocations for hadoopfsrelations based baseRelations

2020-06-04 Thread ZHANG Wei

AFAICT, `FileScanRDD` invokes`FilePartition::preferredLocations()` method, which is ordered by the data size, to get the partition preferred locations. If there are other vectors to sort, I'm wondering if here[1] can be a place to add. Or inheriting class `FilePartition` with overridden `preferredL