foxtail463 opened a new issue, #63694:
URL: https://github.com/apache/doris/issues/63694

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no 
similar issues.
   
   
   ### Description
   
   For HMS external tables, Doris may estimate table row count by listing Hive 
files when HMS table parameters do not contain row count and 
`enable_get_row_count_from_file_list` is enabled.
   
   Currently, this row-count estimation path may read Hive partition and file 
metadata without filling Doris' Hive external metadata cache. In a normal query 
planning flow, the scan planning phase still needs the same partition and file 
metadata later, so Doris can read the same HMS/file metadata twice in one query 
planning process.
   
   This is inefficient for Hive tables with many partitions or files, 
especially when HMS access is expensive.
   
   Expected behavior:
   
   - Query planning should be able to reuse Hive metadata fetched during 
row-count estimation.
   - Non-query metadata display paths, such as `SHOW TABLE STATUS`, `SHOW 
STATS`, or `information_schema.tables`, should still avoid filling heavy Hive 
metadata caches just for displaying cached row count.
   
   ### Solution
   
   Introduce a separate row-count loading mode for query planning and metadata 
display paths.
   
   - `ExternalTable.getRowCount()` should load row count in a query-planning 
mode that may fill external metadata cache.
   - `ExternalTable.getCachedRowCount()` and display-oriented paths should keep 
the lightweight behavior and avoid filling heavy metadata cache.
   - `HMSExternalTable` should choose cached or non-cached Hive metadata APIs 
when estimating row count from file list.
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to