[
https://issues.apache.org/jira/browse/IMPALA-12162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17950142#comment-17950142
]
Michael Smith edited comment on IMPALA-12162 at 5/7/25 11:35 PM:
-----------------------------------------------------------------
Oh, LOAD DATA also relies on UpdateCatalog to load metadata. Which would also
need to compute checksums for each file. That makes me lean towards
parallelizing the requests.
We also probably don't need to hold the table lock during
prepareInsertEventData.
was (Author: JIRAUSER288956):
Oh, LOAD DATA also relies on UpdateCatalog to load metadata. Which would also
need to compute checksums for each file. That makes me lean towards
parallelizing the requests
> makeInsertEventData() can be slow in fetching file checksums
> ------------------------------------------------------------
>
> Key: IMPALA-12162
> URL: https://issues.apache.org/jira/browse/IMPALA-12162
> Project: IMPALA
> Issue Type: Improvement
> Components: Catalog
> Reporter: Csaba Ringhofer
> Assignee: Quanlong Huang
> Priority: Critical
>
> Saw some INSERTs where most of the time was spent in
> https://github.com/apache/impala/blob/dc63ae514a445e3f197cab405b01a30c58015695/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L7011
> This was surprising, as I assumed that most of the time in
> updateCatalog()/createInsertEvents() is spent in HMS RPCs, but in the jstacks
> I saw it was mainly in calls to HDFS to compute checksum of files.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]