Hi All, I have a table with 3TB data, stored as parquet snappy compression - 100 columns.. However I am filtering the DataFrame on date column (date between 20190501-20190530) & selecting only 20 columns & counting.. This operation takes about 45 mins!!
Shouldn't parquet do the predicate pushdown and filtering without scanning the entire dataset? -- Regards, Rishi Shah