corwinjoy commented on PR #16738: URL: https://github.com/apache/datafusion/pull/16738#issuecomment-3059890834
This pull request introduces several updates across multiple components of the codebase, focusing on dependency management, feature enhancements, and code cleanup. The most significant changes include switching Arrow-related dependencies to a custom fork, deprecating the use of `max_statistics_size` in Parquet options, and extending support for additional decimal data types. ### Dependency Updates: * Updated Arrow-related dependencies (`arrow`, `arrow-buffer`, `arrow-flight`, etc.) in `Cargo.toml` to use a custom fork hosted on GitHub (`https://github.com/rok/arrow-rs.git`) with the `multi-threaded_encrypted_writing` branch. This change enables multi-threaded encrypted writing features. [[1]](diffhunk://#diff-2e9d962a08321605940b5a657135052fbcef87b5e360662bb527c96d9a615542L92-R106) [[2]](diffhunk://#diff-2e9d962a08321605940b5a657135052fbcef87b5e360662bb527c96d9a615542L158-R159) ### Feature Enhancements: * Added support for `Decimal32` and `Decimal64` data types in various parts of the codebase, including scalar value creation (`datafusion/common/src/scalar/mod.rs`), native type conversion (`datafusion/common/src/types/native.rs`), and hashing utilities (`datafusion/expr/src/utils.rs`). [[1]](diffhunk://#diff-49e275af8f09685c7bbc491db8ab3b9479960878f42ac558ec0e3e39570590bdL2137-R2139) [[2]](diffhunk://#diff-a066a5e9aa2819edbe027fab69f894a4ed6d7b29bb31d4f728d9a4c05961a12eL410-R413) [[3]](diffhunk://#diff-6ecfe2ad8756d38c607dc31a1972574157061e1121d2dbf73115aab4958489dcR819-R820) * Modified the Parquet writer to allow single-file parallelism explicitly in tests and serialization tasks, while introducing a new `ArrowRowGroupWriterFactory` for parallel row group writing. [[1]](diffhunk://#diff-e0c626f9a057537911f8da6a300790aa37a424b344f69de824f02c2a2a166ebcR280) [[2]](diffhunk://#diff-a8919cf6209fb777550056cdd7decca3e6ed94370a2821a9395763fdd6271967L1461-R1459) [[3]](diffhunk://#diff-a8919cf6209fb777550056cdd7decca3e6ed94370a2821a9395763fdd6271967R1711-L1747) ### Code Cleanup: * Deprecated the use of `max_statistics_size` in Parquet writer options and removed related code, including references to the deprecated constant and associated logic. [[1]](diffhunk://#diff-31437479de022c958ed226271029ecc34d8b4048b0afb32553118a15cb151cc4L38-R40) [[2]](diffhunk://#diff-31437479de022c958ed226271029ecc34d8b4048b0afb32553118a15cb151cc4L170-R178) [[3]](diffhunk://#diff-31437479de022c958ed226271029ecc34d8b4048b0afb32553118a15cb151cc4L271-R278) * Removed unused example files (`flight_sql_server`, `flight_server`, `flight_client`) from `datafusion-examples/Cargo.toml`. ### Protobuf Updates: * Extended the protobuf conversion logic to handle `Decimal32` and `Decimal64` types in `datafusion/proto-common/src/to_proto/mod.rs`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org