github
Thread
Date
Earlier messages
Messages by Thread
[PR] feat: reuse repeated exchanges in the distributed planner (ReuseExchange analog) [datafusion-ballista]
via GitHub
[PR] build(deps): bump astral-sh/setup-uv from 8.2.0 to 8.3.0 [datafusion-python]
via GitHub
[PR] build(deps): bump github/codeql-action/init from 4.36.2 to 4.36.3 [datafusion-python]
via GitHub
[PR] build(deps): bump github/codeql-action/analyze from 4.36.2 to 4.36.3 [datafusion-python]
via GitHub
Re: [I] Investigate use of `DynamicFilters` in ballista [datafusion-ballista]
via GitHub
[I] Reuse repeated exchanges/subplans in the distributed planner (ReuseExchange/ReuseSubquery analog) [datafusion-ballista]
via GitHub
[I] Spill large shuffle partition fetches to disk to bound decoded in-flight memory [datafusion-ballista]
via GitHub
[PR] feat: bound shuffle fetch with a reduce-side in-flight governor [datafusion-ballista]
via GitHub
[I] Adaptive query planning fails with remote object stores: "No suitable object store found" [datafusion-ballista]
via GitHub
[PR] feat: object-store data cache (memory tier) + cache-affinity scheduling [datafusion-comet]
via GitHub
[PR] feat: partial project fallback (JVM expression detour) [datafusion-comet]
via GitHub
[PR] fix: preserve aggregate scope when unparsing [datafusion]
via GitHub
Re: [I] physical_planner::tests::test_optimization_invariant_checker fails when run with release-nonlto profile [datafusion]
via GitHub
Re: [PR] [datafusion-spark] Support 2-argument ceil(value, scale) [datafusion]
via GitHub
[PR] Issue 1795 surface failed tasks [datafusion-ballista]
via GitHub
Re: [PR] fix(tui + rest) surface failed tasks [datafusion-ballista]
via GitHub
[I] PartitionedTopKRank over-accounts memory when a single batch has boundary ties across many partitions [datafusion]
via GitHub
Re: [PR] Push down topk through join [datafusion]
via GitHub
[PR] fix: bump iceberg-rust version to a rev with `ArrowReader` fix [datafusion-comet]
via GitHub
[PR] feat(scheduler): call try_acquire_job on active cache miss in update_task_statuses [datafusion-ballista]
via GitHub
[PR] fix: cache CLI AWS credentials until expiry [datafusion]
via GitHub
[PR] fix(cli): sort jobs by stage completion ratio [datafusion-ballista]
via GitHub
Re: [PR] fix(cli): sort jobs by stage completion ratio [datafusion-ballista]
via GitHub
Re: [PR] fix(cli): sort jobs by stage completion ratio [datafusion-ballista]
via GitHub
[PR] refactor(hash-aggr): Simplify aggregate hash table with tempated functions [datafusion]
via GitHub
Re: [PR] feat: Support Spark expression: percentile_cont [datafusion-comet]
via GitHub
Re: [PR] fix: Add schema validation for native_datafusion Parquet scan [datafusion-comet]
via GitHub
Re: [PR] bench: improve Iceberg TPC workflow and plan capture [datafusion-comet]
via GitHub
Re: [PR] feat: Add native support for max_by and min_by [datafusion-comet]
via GitHub
Re: [PR] feat: Use single spill file for multiple partitions in native shuffle [datafusion-comet]
via GitHub
Re: [PR] fix(substrait): normalize table names from Substrait NamedTable for Calcite interop [datafusion]
via GitHub
Re: [PR] [datafusion-spark] Add Spark-compatible isnan function [datafusion]
via GitHub
Re: [PR] Query-aware statistics requests via ScanArgs / ScanResult (RFC for #21624) [datafusion]
via GitHub
[PR] fix: reuse a small pool of multiplexed connections per peer for shuffle fetches [datafusion-ballista]
via GitHub
Re: [PR] fix: reuse a small pool of multiplexed connections per peer for shuffle fetches [datafusion-ballista]
via GitHub
Re: [PR] fix: reuse a small pool of multiplexed connections per peer for shuffle fetches [datafusion-ballista]
via GitHub
Re: [PR] fix: reuse a small pool of multiplexed connections per peer for shuffle fetches [datafusion-ballista]
via GitHub
Re: [PR] fix: reuse a small pool of multiplexed connections per peer for shuffle fetches [datafusion-ballista]
via GitHub
[PR] fix: gate debug-only assertions in physical planner test test_optimization_invariant_checker [datafusion]
via GitHub
Re: [PR] fix: gate debug-only assertions in physical planner test test_optimization_invariant_checker [datafusion]
via GitHub
Re: [PR] fix: gate debug-only assertions in physical planner test test_optimization_invariant_checker [datafusion]
via GitHub
[PR] fix: hold pooled shuffle connection for the response stream lifetime [datafusion-ballista]
via GitHub
Re: [PR] fix: hold pooled shuffle connection for the response stream lifetime [datafusion-ballista]
via GitHub
Re: [PR] fix: hold pooled shuffle connection for the response stream lifetime [datafusion-ballista]
via GitHub
Re: [I] Coverage jobs are failing on master: `Failed to compile tests! Error: sql_integration: linking with `cc` failed: exit status: 1"` [datafusion]
via GitHub
Re: [I] Coverage jobs are failing on master: `Failed to compile tests! Error: sql_integration: linking with `cc` failed: exit status: 1"` [datafusion]
via GitHub
Re: [I] Coverage jobs are failing on master: `Failed to compile tests! Error: sql_integration: linking with `cc` failed: exit status: 1"` [datafusion]
via GitHub
Re: [I] Coverage jobs are failing on master: `Failed to compile tests! Error: sql_integration: linking with `cc` failed: exit status: 1"` [datafusion]
via GitHub
[I] Performance snapshot: Ballista vs Spark/Comet for TPC-H @ SF100 [datafusion-ballista]
via GitHub
Re: [I] Performance snapshot: Ballista vs Spark/Comet for TPC-H @ SF100 [datafusion-ballista]
via GitHub
Re: [I] Performance snapshot: Ballista vs Spark/Comet for TPC-H @ SF100 [datafusion-ballista]
via GitHub
Re: [I] Performance snapshot: Ballista vs Spark/Comet for TPC-H @ SF100 [datafusion-ballista]
via GitHub
[I] Shuffle fetch exhausts ephemeral ports at high target_partitions (client connection caching off by default) [datafusion-ballista]
via GitHub
Re: [I] Shuffle fetch exhausts ephemeral ports at high target_partitions (client connection caching off by default) [datafusion-ballista]
via GitHub
Re: [I] Shuffle fetch exhausts ephemeral ports at high target_partitions (client connection caching off by default) [datafusion-ballista]
via GitHub
Re: [I] Shuffle fetch exhausts ephemeral ports at high target_partitions (client connection caching off by default) [datafusion-ballista]
via GitHub
Re: [I] Shuffle fetch exhausts ephemeral ports at high target_partitions (client connection caching off by default) [datafusion-ballista]
via GitHub
Re: [I] Materialize Dictionaries in Group Keys [datafusion]
via GitHub
Re: [I] panic on SET datafusion.runtime.* when value ends with non-ASCII byte (split_at char boundary) [datafusion]
via GitHub
[I] chore: Explore combining `SessionConfig` and `RuntimeConfig` into same framework [datafusion]
via GitHub
Re: [I] chore: Explore combining `SessionConfig` and `RuntimeConfig` into same framework [datafusion]
via GitHub
Re: [I] chore: Explore combining `SessionConfig` and `RuntimeConfig` into same framework [datafusion]
via GitHub
[I] Partial project fallback: keep a projection/filter native by evaluating an unsupported subexpression in the JVM [datafusion-comet]
via GitHub
[PR] chore: Use native impl of `soundex` function [datafusion-comet]
via GitHub
[I] CLI utility does not check expiration of AWS credentials [datafusion]
via GitHub
Re: [I] CLI utility does not check expiration of AWS credentials [datafusion]
via GitHub
[PR] feat: support reading TPC-H benchmark data from S3-compatible object storage [datafusion-ballista]
via GitHub
Re: [PR] feat: support running TPC-H benchmarks against S3-compatible object storage [datafusion-ballista]
via GitHub
[PR] refactor(scheduler): keep submit_job non-breaking, add submit_physical_plan [datafusion-ballista]
via GitHub
Re: [PR] refactor(scheduler): keep submit_job non-breaking, add submit_physical_plan [datafusion-ballista]
via GitHub
[I] Restore backward-compatible submit_job signature after #1924 [datafusion-ballista]
via GitHub
Re: [I] Restore backward-compatible submit_job signature after #1924 [datafusion-ballista]
via GitHub
[I] Add a versioned Upgrade Guide to the documentation [datafusion-ballista]
via GitHub
Re: [I] Add a versioned Upgrade Guide to the documentation [datafusion-ballista]
via GitHub
[PR] docs: add versioned upgrade guide (54.0.0) [datafusion-ballista]
via GitHub
Re: [PR] docs: add versioned upgrade guide (54.0.0) [datafusion-ballista]
via GitHub
[I] Add suport for `DataFrame.checkpoint()` [datafusion-ballista]
via GitHub
[PR] feat: support decimals in trunc UDF [datafusion]
via GitHub
Re: [I] Add support for `DataFrame.cache()` to Ballista [datafusion-ballista]
via GitHub
Re: [I] Add support for `DataFrame.cache()` to Ballista [datafusion-ballista]
via GitHub
[PR] Add regression tests for hash-join dynamic filter expression policy [datafusion]
via GitHub
Re: [PR] Add regression tests for hash-join dynamic filter expression policy [datafusion]
via GitHub
[PR] Enable mixed approx_count_distinct aggregation [datafusion-comet]
via GitHub
[I] Simplify aggregate table for ordered cases [datafusion]
via GitHub
Re: [PR] build: Enable Spark SQL tests for Spark 4.2 [will not merge until 4.2 is released] [datafusion-comet]
via GitHub
Re: [PR] feat: Optimise convert_to_state for SUM and BIT_OR_XOR [datafusion]
via GitHub
Re: [PR] [WIP] Explore extensible range partitioning for dynamic filters [datafusion]
via GitHub
[PR] fix: handle null sub-arrays in flatten [datafusion-comet]
via GitHub
Re: [I] [Proposal] Scan I/O acceleration: node-local fragment cache, asynchronous prefetch, and cache-affinity scheduling [datafusion-comet]
via GitHub
Re: [I] [Proposal] Scan I/O acceleration: node-local fragment cache, asynchronous prefetch, and cache-affinity scheduling [datafusion-comet]
via GitHub
Re: [I] [Proposal] Scan I/O acceleration: node-local fragment cache, asynchronous prefetch, and cache-affinity scheduling [datafusion-comet]
via GitHub
[I] SQL Unparser generates incorrect column references [datafusion]
via GitHub
Re: [I] SQL Unparser generates incorrect column references [datafusion]
via GitHub
Re: [I] SQL Unparser generates incorrect column references [datafusion]
via GitHub
Re: [PR] chore: deprecate record_batch macro in favor of upstream one [datafusion]
via GitHub
Re: [PR] chore: deprecate record_batch macro in favor of upstream one [datafusion]
via GitHub
Re: [PR] chore: deprecate record_batch macro in favor of upstream one [datafusion]
via GitHub
Re: [I] Rebalance deep associative binary expression chains (Add, Multiply, bitwise) to avoid protobuf recursion limit [datafusion-comet]
via GitHub
Re: [PR] feat: rebalance associative bitwise/Add/Multiply chains to avoid protobuf recursion limit [datafusion-comet]
via GitHub
[PR] fix: avoid panic parsing non-ASCII runtime config values [datafusion]
via GitHub
Re: [PR] fix: avoid panic parsing non-ASCII runtime config values [datafusion]
via GitHub
Re: [PR] fix: avoid panic parsing non-ASCII runtime config values [datafusion]
via GitHub
Re: [PR] fix: avoid panic parsing non-ASCII runtime config values [datafusion]
via GitHub
Re: [PR] fix: avoid panic parsing non-ASCII runtime config values [datafusion]
via GitHub
Re: [PR] fix: avoid panic parsing non-ASCII runtime config values [datafusion]
via GitHub
Re: [PR] fix: avoid panic parsing non-ASCII runtime config values [datafusion]
via GitHub
Re: [PR] fix: avoid panic parsing non-ASCII runtime config values [datafusion]
via GitHub
[PR] docs: document ClickBench setup details [datafusion]
via GitHub
Re: [PR] docs: document ClickBench setup details [datafusion]
via GitHub
Re: [PR] docs: document ClickBench setup details [datafusion]
via GitHub
Re: [PR] docs: document ClickBench setup details [datafusion]
via GitHub
Re: [PR] docs: document ClickBench setup details [datafusion]
via GitHub
Re: [PR] docs: document ClickBench setup details [datafusion]
via GitHub
Re: [PR] Use upstream arrow-rs record_batch! and create_array! macros [datafusion]
via GitHub
Re: [I] Improvements to ` BooleanGroupValueBuilder` (grouping by boolean columns) [datafusion]
via GitHub
[PR] chore: add Cargo http options to handle download errors [datafusion]
via GitHub
Re: [PR] chore: add Cargo http options to handle download errors [datafusion]
via GitHub
Re: [PR] chore: add Cargo http options to handle download errors [datafusion]
via GitHub
Re: [PR] chore: add Cargo http options to handle download errors [datafusion]
via GitHub
Re: [PR] chore: add Cargo http options to handle download errors [datafusion]
via GitHub
Re: [PR] chore: add Cargo http options to handle download errors [datafusion]
via GitHub
Re: [I] Reduce Github Action Usage [datafusion]
via GitHub
Re: [I] Reduce Github Action Usage [datafusion]
via GitHub
[PR] chore: extend pre commit instructions for AI agents [datafusion]
via GitHub
Re: [PR] chore: extend pre commit instructions for AI agents [datafusion]
via GitHub
Re: [PR] chore: extend pre commit instructions for AI agents [datafusion]
via GitHub
Re: [PR] chore: extend pre commit instructions for AI agents [datafusion]
via GitHub
Re: [PR] chore: extend pre commit instructions for AI agents [datafusion]
via GitHub
Re: [PR] chore: extend pre commit instructions for AI agents [datafusion]
via GitHub
Re: [PR] chore: extend pre commit instructions for AI agents [datafusion]
via GitHub
Re: [PR] chore: extend pre commit instructions for AI agents [datafusion]
via GitHub
Re: [PR] chore: Skip Code CI for non code changes [datafusion]
via GitHub
[PR] chore: Update to arrow/parquet 59.1.0 [datafusion]
via GitHub
Re: [PR] chore: Update to arrow/parquet 59.1.0 [datafusion]
via GitHub
Re: [PR] chore: Update to arrow/parquet 59.1.0 [datafusion]
via GitHub
Re: [PR] chore: Update to arrow/parquet 59.1.0 [datafusion]
via GitHub
Re: [PR] chore: Update to arrow/parquet 59.1.0 [datafusion]
via GitHub
Re: [PR] chore: Update to arrow/parquet 59.1.0 [datafusion]
via GitHub
Re: [PR] chore: Update to arrow/parquet 59.1.0 [datafusion]
via GitHub
Re: [PR] Add blog post: The Arrow C Data Interface: Zero-Copy Between Rust and the JVM in DataFusion Comet [datafusion-site]
via GitHub
Re: [PR] Add blog post: The Arrow C Data Interface: Zero-Copy Between Rust and the JVM in DataFusion Comet [datafusion-site]
via GitHub
[PR] IN LIST: add Float16 bitmap filter [datafusion]
via GitHub
Re: [PR] IN LIST: add Float16 bitmap filter [datafusion]
via GitHub
[PR] feat: support Spark 4.1 TIME type and expressions via codegen dispatch [datafusion-comet]
via GitHub
[I] [Enhancement] Enable mixed partial/final execution for approx_count_distinct (HyperLogLogPlusPlus) [datafusion-comet]
via GitHub
Re: [I] [Enhancement] Enable mixed partial/final execution for approx_count_distinct (HyperLogLogPlusPlus) [datafusion-comet]
via GitHub
Re: [I] [Enhancement] Enable mixed partial/final execution for approx_count_distinct (HyperLogLogPlusPlus) [datafusion-comet]
via GitHub
[PR] feat: support approx_count_distinct aggregate expression [datafusion-comet]
via GitHub
[PR] feat: support kurtosis aggregate [datafusion-comet]
via GitHub
Re: [PR] feat: Support duration type in approx_distinct [datafusion]
via GitHub
Re: [PR] feat: Support duration type in approx_distinct [datafusion]
via GitHub
Re: [PR] feat: Support duration type in approx_distinct [datafusion]
via GitHub
[PR] feat: support max_by aggregate expression [datafusion-comet]
via GitHub
[PR] feat: support listagg / string_agg aggregate (Spark 4.0+) [datafusion-comet]
via GitHub
[PR] feat: support grouping() and grouping_id() indicator functions [datafusion-comet]
via GitHub
Re: [PR] docs: add custom table provider filter pushdown examples [datafusion]
via GitHub
[I] Support grouping() and grouping_id() indicator functions [datafusion-comet]
via GitHub
Re: [I] Add a configurable cap on spill-file merge fan-in (max open files during external merge) [datafusion]
via GitHub
[I] Distinct-aggregate rewrite can split an incompatible-buffer aggregate across Comet and Spark, causing crash or wrong results [datafusion-comet]
via GitHub
Re: [PR] fix: count returning zero when scan is disabled and going through CometSparkColumnarToColumnar [datafusion-comet]
via GitHub
Re: [PR] fix: count returning zero when scan is disabled and going through CometSparkColumnarToColumnar [datafusion-comet]
via GitHub
Re: [PR] fix: count returning zero when scan is disabled and going through CometSparkColumnarToColumnar [datafusion-comet]
via GitHub
Re: [PR] fix: count returning zero when scan is disabled and going through CometSparkColumnarToColumnar [datafusion-comet]
via GitHub
Re: [PR] fix: count returning zero when scan is disabled and going through CometSparkColumnarToColumnar [datafusion-comet]
via GitHub
[PR] implement map_agg [datafusion]
via GitHub
Re: [PR] implement map_agg [datafusion]
via GitHub
Re: [PR] implement map_agg [datafusion]
via GitHub
Re: [PR] implement map_agg [datafusion]
via GitHub
[PR] fix: preserve precision in log() with mixed float-width arguments [datafusion]
via GitHub
[PR] refactor(hash-aggr): Simplify aggregate hash table [datafusion]
via GitHub
Re: [PR] refactor(hash-aggr): Simplify aggregate hash table [datafusion]
via GitHub
Re: [PR] refactor(hash-aggr): Simplify aggregate hash table [datafusion]
via GitHub
Re: [PR] refactor(hash-aggr): Simplify aggregate hash table [datafusion]
via GitHub
Re: [PR] refactor(hash-aggr): Simplify aggregate hash table [datafusion]
via GitHub
Re: [PR] refactor(hash-aggr): Simplify aggregate hash table [datafusion]
via GitHub
Re: [PR] refactor(hash-aggr): Simplify aggregate hash table [datafusion]
via GitHub
Re: [PR] refactor(hash-aggr): Simplify aggregate hash table [datafusion]
via GitHub
Re: [PR] refactor(hash-aggr): Simplify aggregate hash table [datafusion]
via GitHub
Re: [PR] refactor(hash-aggr): Simplify aggregate hash table [datafusion]
via GitHub
Re: [PR] refactor(hash-aggr): Simplify aggregate hash table [datafusion]
via GitHub
Re: [PR] refactor(hash-aggr): Simplify aggregate hash table [datafusion]
via GitHub
Re: [PR] refactor(hash-aggr): Simplify aggregate hash table [datafusion]
via GitHub
Re: [PR] refactor(hash-aggr): Simplify aggregate hash table [datafusion]
via GitHub
Re: [PR] refactor(hash-aggr): Simplify aggregate hash table [datafusion]
via GitHub
Re: [PR] refactor(hash-aggr): Simplify aggregate hash table [datafusion]
via GitHub
Re: [PR] refactor(hash-aggr): Simplify aggregate hash table [datafusion]
via GitHub
Re: [PR] refactor(hash-aggr): Simplify aggregate hash table [datafusion]
via GitHub
Re: [PR] refactor(hash-aggr): Simplify aggregate hash table [datafusion]
via GitHub
Re: [PR] refactor(hash-aggr): Simplify aggregate hash table [datafusion]
via GitHub
Re: [PR] refactor(hash-aggr): Simplify aggregate hash table [datafusion]
via GitHub
[PR] fix: include scanHashCode in CometIcebergNativeScan identity to prevent incorrect exchange reuse [datafusion-comet]
via GitHub
Re: [PR] fix: prevent wrong results from Iceberg native scan exchange reuse with different pushed filters [datafusion-comet]
via GitHub
Re: [PR] fix: prevent wrong results from Iceberg native scan exchange reuse with different pushed filters [datafusion-comet]
via GitHub
Re: [PR] fix: prevent wrong results from Iceberg native scan exchange reuse with different pushed filters [datafusion-comet]
via GitHub
Re: [PR] fix: prevent wrong results from Iceberg native scan exchange reuse with different pushed filters [datafusion-comet]
via GitHub
Re: [PR] fix: prevent wrong results from Iceberg native scan exchange reuse with different pushed filters [datafusion-comet]
via GitHub
Re: [PR] fix: prevent wrong results from Iceberg native scan exchange reuse with different pushed filters [datafusion-comet]
via GitHub
Re: [PR] fix: prevent wrong results from Iceberg native scan exchange reuse with different pushed filters [datafusion-comet]
via GitHub
[PR] Add NOT IN coverage for integer IN list tests [datafusion]
via GitHub
Re: [PR] feat: Allow submitting a pre-built physical plan to the scheduler [datafusion-ballista]
via GitHub
Earlier messages