alamb commented on issue #11367:
URL: https://github.com/apache/datafusion/issues/11367#issuecomment-2234282902

   @wiedld 's pr https://github.com/apache/datafusion/pull/11524 has a very 
nice table of the current state of affairs:
   
   
   
   Here are the places where the current defaults differ:
   
   | setting_name                 | applied        | default, datafusion 
(ParquetOptions) | default, parquet (ArrowWriterOptions) |
   | ---------------------------- | -------------- | 
------------------------------------ | ------------------------------------- |
   | data_page_row_count_limit    | file           |                           
usize::MAX |                                20_000 |
   | column_index_truncate_length | file           |                            
     None |                              Some(64) |
   | compression                  | column default |                            
     zstd |                          uncompressed |
   | dictionary_enabled           | column default |                       None 
or true † |                                  true |
   | statistics_enabled           | column default |                       None 
or page † |                                  page |
   | max_statistics_size          | column default |                       None 
or 4096 † |                                  4096 |
   
   † For these settings, datafusion has no default (None). However, once 
datafusion's ParquetOptions are used by the extern parquet (a.k.a. converted to 
parquet's ArrowWriterOptions) then it uses the extern parquet's defaults. Refer 
to the newly added tests.
   
   . 
   
   Additionally, there are differences in the bloom filter configurations based 
upon partial definition (a.k.a. leaving some as default, and some as defined):
   
   | bloom_filter_on | fpp  | ndv  |        if build from datafusion's 
ParquetOptions | if build from (arrow-rs) parquet's WriterPropertiesBuilder |
   | --------------- | ---- | ---- | 
-------------------------------------------- | 
------------------------------------ |
   | false           | none | none |                                         
None |                                 None |
   | **TRUE**            | none | none |         
Some(BloomFilterProperties::default) |                                 None |
   | true            | **SOME** | none | Some(BloomFilterProperties: 
fpp,default_ndv) |                                 None |
   | true            | none | **SOME** | Some(BloomFilterProperties: 
ndv,default_fpp) |                                 None |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to