wgtmac commented on PR #3556:
URL: https://github.com/apache/parquet-java/pull/3556#issuecomment-4609512333

   Thanks for looking into this! 
   
   While the problem is real, introducing a boolean flag 
`dictionaryEarlyCheckEnabled` feels like a band-aid fix that pushes the burden 
to users. Most users won't know when to manually toggle this to prevent storage 
inflation.
   
   Instead of a new config, could we make the heuristic more adaptive? For 
example, we could delay the compression check until we've accumulated a certain 
amount of raw data (e.g., `1MB`), or evaluate it over the first `N` pages 
rather than just the first one. 
   
   This would solve the issue out of the box without hurting usability. 
Thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to