Why the filter push down does not reduce the read data record count

Sun, Keith Fri, 23 Feb 2018 03:09:07 -0800

Hi,


Why Hive still read so much "records" even with a filter pushdown enabled and 
the returned dataset would be a very small amount ( 4k out of  30billion 
records).


The "RECORDS_IN" counter of Hive which still showed the 30billion count and 
also the output in the map reduce log like this :

org.apache.hadoop.hive.ql.exec.MapOperator: MAP[4]: records read - 100000


BTW, I am using parquet as stoarg format and the filter pushdown did work as i 
see this in log :


AM INFO: parquet.filter2.compat.FilterCompat: Filtering using predicate: 
eq(myid, 223)


Thanks,

Keith

Why the filter push down does not reduce the read data record count

Reply via email to