alamb commented on issue #11042:
URL: https://github.com/apache/datafusion/issues/11042#issuecomment-2232995071

   > I also found https://github.com/apache/arrow-rs/issues/5828 which might be 
related and/or relevant.
   
   I would expect that the memory usage hightlighted in 
https://github.com/apache/arrow-rs/issues/5828  would be directly reduced by 
setting the `data_page_row_limit`. 
   
   > After disabling it I see the memory increasing only marginally for every 
invocation (in the 100-200MB range) while with DICTIONARY_ENABLED true each 
invocation increases the memory usage in multiple GBs (2-3GB) and it seems it 
never gets freed again.
   
   I wonder if this could be related to  DataFusion overriding the 
`data_page_row_limit` setting in 
https://github.com/apache/datafusion/issues/11367 (that @wiedld  is working on)
   
   I think you can set this option like
   
   ```sql
   COPY (SELECT col1, timestamp, col10, col12 FROM my_table ORDER BY col1 ASC, 
timestamp ASC)
   TO './output' STORED AS PARQUET PARTITIONED BY (col1) 
   OPTIONS (
     compression 'uncompressed', 
     'format.parquet.data_pagesize_limit' 20000
   );
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to