My parquet files are first partitioned by environment and then by date like:
env=testing/ date=2018-03-04/ part1.parquet part2.parquet part3.parquet date=2018-03-05/ part1.parquet part2.parquet part3.parquet date=2018-03-06/ part1.parquet part2.parquet part3.parquet In our read stream, I do the following: val tunerParquetDF = spark .readStream .schema(...) .format("parquet") .option("basePath", basePath) .option("path", basePath+"/env*") .option("maxFilesPerTrigger", 5) .load() The expected behavior is that readstream will read files in order of the dates but the observed behavior is that files are shuffled in random order. How do I force the date order of read in Parquet files ? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org