[ https://issues.apache.org/jira/browse/HIVE-24566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254633#comment-17254633 ]
Jesus Camacho Rodriguez commented on HIVE-24566: ------------------------------------------------ [~belugabehr], yes, I think this approach could potentially improve performance for such queries. I guess you referred to 'single multi-threaded processor' to avoid launching any jobs to compute these queries. For tables with a large number of files, computing from metadata even if jobs are launched, would still be a useful optimization. > Add Parquet Stats Optimization > ------------------------------- > > Key: HIVE-24566 > URL: https://issues.apache.org/jira/browse/HIVE-24566 > Project: Hive > Issue Type: Improvement > Reporter: David Mollitor > Priority: Major > > Parquet files store min/max/count data in foot metadata. > When a query is submitted to a Parquet table, and stats are not available, > Hive should launch a single multi-threaded processor that simply reads the > meta data of each Parquet file instead of walking through every single record > in the table. -- This message was sent by Atlassian Jira (v8.3.4#803005)