[ https://issues.apache.org/jira/browse/HUDI-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexey Kudinkin updated HUDI-3834: ---------------------------------- Description: Previously, while evaluating Data Skipping runtime in EMR setting, it was measured that reading of Column Stats Index of about ~800k records takes about {*}60s{*}, which is *very slow* (~10ms / record). Given that the total size of the log-files involved is ~60Mb, seems like there are some performance bottlenecks that we should investigate before 0.11 release > Evaluate MT Column Stats Performance > ------------------------------------- > > Key: HUDI-3834 > URL: https://issues.apache.org/jira/browse/HUDI-3834 > Project: Apache Hudi > Issue Type: Bug > Reporter: Alexey Kudinkin > Assignee: Alexey Kudinkin > Priority: Blocker > Fix For: 0.11.0 > > > Previously, while evaluating Data Skipping runtime in EMR setting, it was > measured that reading of Column Stats Index of about ~800k records takes > about {*}60s{*}, which is *very slow* (~10ms / record). > > Given that the total size of the log-files involved is ~60Mb, seems like > there are some performance bottlenecks that we should investigate before 0.11 > release -- This message was sent by Atlassian Jira (v8.20.1#820001)