freemandealer opened a new issue, #17902: URL: https://github.com/apache/doris/issues/17902
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no similar issues. ### Description ORC/parquet files can have very low compression rates, such as 3%. This means a 3GB file can expand to 100GB during loading. Users may wonder why loading such ‘small’ files takes hours. We need to explain the data size to them. ### Solution In a loading job, there are three types of data size: 1) the original size which we need to read as input. It may be compressed and encoded compactly. 2) the size during process. Each record is decompressed and decoded so that Doris can deal with it. 3) the size we write to disk. Again, compressed and encoded. We should explain the three sizes individually to the user during & after the load, by: 1) enhancing `show load`/`show stream load` statements 2) improving profile facilities 3) other places that can show the user intuitively ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org