Hi All,

I have a table with 3TB data, stored as parquet snappy compression - 100
columns.. However I am filtering the DataFrame on date column (date between
20190501-20190530) & selecting only 20 columns & counting.. This operation
takes about 45 mins!!

Shouldn't parquet do the predicate pushdown and filtering without scanning
the entire dataset?

-- 
Regards,

Rishi Shah

Reply via email to