Re: [PR] Improve performance of `first_value` by implementing special `GroupsAccumulator` [datafusion]

2025-03-23 Thread via GitHub
UBarney commented on code in PR #15266: URL: https://github.com/apache/datafusion/pull/15266#discussion_r2009048465 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -179,6 +292,420 @@ impl AggregateUDFImpl for FirstValue { } } +struct FirstPrimitiveGroupsAccumu

Re: [I] Merge operation involving map field fails [datafusion]

2025-03-23 Thread via GitHub
goldmedal commented on issue #15351: URL: https://github.com/apache/datafusion/issues/15351#issuecomment-2746108880 According to the error message: ``` and else (None) to common types in CASE WHEN expression ``` https://github.com/apache/datafusion/blob/d460abb63f6ff3abce6de02283

Re: [PR] Improve performance of `first_value` by implementing special `GroupsAccumulator` [datafusion]

2025-03-23 Thread via GitHub
UBarney commented on code in PR #15266: URL: https://github.com/apache/datafusion/pull/15266#discussion_r2009052462 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -179,6 +292,420 @@ impl AggregateUDFImpl for FirstValue { } } +struct FirstPrimitiveGroupsAccumu

Re: [PR] fix: Redundant files spilled during external sort + introduce `SpillManager` [datafusion]

2025-03-23 Thread via GitHub
alamb commented on code in PR #15355: URL: https://github.com/apache/datafusion/pull/15355#discussion_r2009077074 ## datafusion/physical-plan/src/spill.rs: ## @@ -223,25 +229,182 @@ impl IPCStreamWriter { } } +/// The `SpillManager` is responsible for the following tasks

Re: [I] March 17, 2025: This week(s) in DataFusion [datafusion]

2025-03-23 Thread via GitHub
alamb commented on issue #15269: URL: https://github.com/apache/datafusion/issues/15269#issuecomment-2741093458 New blog post by @XiangpengHao on parquet predicate evaluation: - https://datafusion.apache.org/blog/2025/03/20/parquet-pruning/ -- This is an automated message from the Apac

Re: [I] Support for user defined FFI functions [datafusion-ballista]

2025-03-23 Thread via GitHub
milenkovicm commented on issue #1215: URL: https://github.com/apache/datafusion-ballista/issues/1215#issuecomment-2746181514 I'm not sure what's your idea @westhide, udf support is not one size fits all solution, thus I'm not sure how it fits with current solution. We spent a lot of time

Re: [D] More thorough contribution guideline [datafusion]

2025-03-23 Thread via GitHub
GitHub user logan-keede edited a discussion: More thorough contribution guideline I am opening this discussion to discuss about how to approach refactoring and perhaps changes in general to make it easier for downstream repos and be more efficient with review process. This came up while dis

Re: [PR] Add all missing table options to be handled in any order [datafusion-sqlparser-rs]

2025-03-23 Thread via GitHub
tomershaniii commented on code in PR #1747: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1747#discussion_r2009102876 ## src/parser/mod.rs: ## @@ -6928,13 +6874,122 @@ impl<'a> Parser<'a> { }; } +let plain_options = self.parse_plain

Re: [PR] Add all missing table options to be handled in any order [datafusion-sqlparser-rs]

2025-03-23 Thread via GitHub
tomershaniii commented on code in PR #1747: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1747#discussion_r2009102876 ## src/parser/mod.rs: ## @@ -6928,13 +6874,122 @@ impl<'a> Parser<'a> { }; } +let plain_options = self.parse_plain

[I] Support transport UDF FFI library from client to Scheduler&Executor, and provoid interface load FFI into `TaskDefinition` function_registry [datafusion-ballista]

2025-03-23 Thread via GitHub
westhide opened a new issue, #1215: URL: https://github.com/apache/datafusion-ballista/issues/1215 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** A clear and concise description of what the problem is. Ex. I'm always frustrated w

Re: [PR] Improve performance of `first_value` by implementing special `GroupsAccumulator` [datafusion]

2025-03-23 Thread via GitHub
UBarney commented on code in PR #15266: URL: https://github.com/apache/datafusion/pull/15266#discussion_r2009048465 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -179,6 +292,420 @@ impl AggregateUDFImpl for FirstValue { } } +struct FirstPrimitiveGroupsAccumu

Re: [PR] Perf: Support Utf8View datatype single column comparisons for SortPre… [datafusion]

2025-03-23 Thread via GitHub
zhuqi-lucas commented on PR #15348: URL: https://github.com/apache/datafusion/pull/15348#issuecomment-2746127139 > Thank you @zhuqi-lucas -- this looks pretty sweet. I think we need to sort out nulls and safety comment and this will be good to go Thank you @alamb for review, good sugg

Re: [I] `batches_to_sort_string` differing from similar implementation in `assert_batches_sorted_eq` [datafusion]

2025-03-23 Thread via GitHub
Shreyaskr1409 commented on issue #15312: URL: https://github.com/apache/datafusion/issues/15312#issuecomment-2746162773 I am facing yet another issue same as mentioned in the following PR: https://github.com/apache/datafusion/pull/15288#discussion_r2004177854 -- This is an automated mes

[PR] Migrate physical plan tests to `insta` (Part-2) [datafusion]

2025-03-23 Thread via GitHub
Shreyaskr1409 opened a new pull request, #15364: URL: https://github.com/apache/datafusion/pull/15364 ## Which issue does this PR close? - Part of #15248. ## Rationale for this change Completely migrate physical plan tests to `insta` ## What changes are included in

[PR] Update the expected error message [datafusion-testing]

2025-03-23 Thread via GitHub
goldmedal opened a new pull request, #8: URL: https://github.com/apache/datafusion-testing/pull/8 Update the expected error message for https://github.com/apache/datafusion/issues/15359 -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] Support parquet_metadata for datafusion-cli [datafusion]

2025-03-23 Thread via GitHub
adriangb commented on PR #8413: URL: https://github.com/apache/datafusion/pull/8413#issuecomment-2746219812 I'd like to add `column_name` to the output. Would that be okay with folks or are we trying to match DuckDB 1:1? -- This is an automated message from the Apache Git Service. To resp

Re: [PR] Always use `PartitionMode::Auto` in planner [datafusion]

2025-03-23 Thread via GitHub
Dandandan commented on code in PR #15339: URL: https://github.com/apache/datafusion/pull/15339#discussion_r2009124405 ## datafusion/sqllogictest/test_files/explain_tree.slt: ## @@ -345,63 +345,68 @@ FROM physical_plan 01)┌───┐ -02)│CoalesceBat

[PR] Use `equals_datatype` to compare type when type coercion [datafusion]

2025-03-23 Thread via GitHub
goldmedal opened a new pull request, #15366: URL: https://github.com/apache/datafusion/pull/15366 ## Which issue does this PR close? - Closes #15351. ## Rationale for this change In the `delta-rs` case, they use a custom field name, `key_value`, for the m

Re: [PR] Improve performance of `first_value` by implementing special `GroupsAccumulator` [datafusion]

2025-03-23 Thread via GitHub
UBarney commented on code in PR #15266: URL: https://github.com/apache/datafusion/pull/15266#discussion_r2009052462 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -179,6 +292,420 @@ impl AggregateUDFImpl for FirstValue { } } +struct FirstPrimitiveGroupsAccumu

Re: [I] Datafusion Native llm.txt [datafusion]

2025-03-23 Thread via GitHub
prrao87 commented on issue #13501: URL: https://github.com/apache/datafusion/issues/13501#issuecomment-2746267635 Hi, I'm curious about the progress on this. Has there been a standardized way to create/generate the `llms.txt` file during the build process of the docs? Would love to see a co

Re: [PR] Allow setting ClientOptions for all `datafusion.object_store` contexts [datafusion-python]

2025-03-23 Thread via GitHub
nathschmidt commented on code in PR #1083: URL: https://github.com/apache/datafusion-python/pull/1083#discussion_r2009206331 ## src/store.rs: ## @@ -164,6 +166,41 @@ impl PyGoogleCloudContext { } } +#[pyclass(name = "ClientOptions", module = "datafusion.store", subclass

Re: [PR] Add all missing table options to be handled in any order [datafusion-sqlparser-rs]

2025-03-23 Thread via GitHub
tomershaniii commented on PR #1747: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1747#issuecomment-2746324591 @iffyio @mvzink thanks for the feedback, did some additional work per your comments (& commented on other) LMK if makes sense -- This is an automated message fr

Re: [PR] [Draft] Allow setting ClientOptions for all `datafusion.object_store` contexts [datafusion-python]

2025-03-23 Thread via GitHub
kylebarron commented on PR #1083: URL: https://github.com/apache/datafusion-python/pull/1083#issuecomment-2746392406 This is somewhat related to https://github.com/apache/datafusion-python/issues/899. Up for discussion with maintainers but I would personally argue for adopting my p

Re: [D] More thorough contribution guideline [datafusion]

2025-03-23 Thread via GitHub
GitHub user ozankabak added a comment to the discussion: More thorough contribution guideline > > iii. Collect feedback from downstream projects to reveal any possible > > design issues > > Do we have any communication channel for collecting feedback, or announcing > feature branch? We don'

Re: [PR] Support parquet_metadata for datafusion-cli [datafusion]

2025-03-23 Thread via GitHub
adriangb commented on PR #8413: URL: https://github.com/apache/datafusion/pull/8413#issuecomment-2746528066 I did not know of it, that's great! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Unsupported OS/arch [datafusion-comet]

2025-03-23 Thread via GitHub
jinwenjie123 commented on issue #1552: URL: https://github.com/apache/datafusion-comet/issues/1552#issuecomment-2746542634 @parthchandra By using the ubuntu 16.04, it failed to build the dependencies. I also tried to manipulate the zigbuild project, but when it tries to execute `cd n

Re: [PR] fix: `core_expressions` feature flag broken, move `overlay` into `core` functions [datafusion]

2025-03-23 Thread via GitHub
shruti2522 commented on PR #15217: URL: https://github.com/apache/datafusion/pull/15217#issuecomment-2746480965 > > hey @alamb, I have already added a re-export at the end of `datafusion/functions/src/string/overlay.rs` like this > > Thanks @shruti2522 - that looks good to me >

Re: [PR] Add support for DISTINCT + ORDER BY in ARRAY_AGG [datafusion]

2025-03-23 Thread via GitHub
gabotechs commented on PR #14413: URL: https://github.com/apache/datafusion/pull/14413#issuecomment-2746066503 Hi @alamb, any chance that someone takes a look at this one any time soon? I promise that 80% of the code are just new tests 🙏. -- This is an automated message from the Apache Gi

[PR] Documentation updates: mention correct dataset on basics page [datafusion-python]

2025-03-23 Thread via GitHub
floscha opened a new pull request, #1081: URL: https://github.com/apache/datafusion-python/pull/1081 Currently, the [basics page](https://datafusion.apache.org/python/user-guide/basics.html) in the Python docs mentions the Pokemon dataset while in fact NYC taxi data is used. This MR

[I] Support Custom Function Registration with Catalog and Schema [datafusion]

2025-03-23 Thread via GitHub
goldmedal opened a new issue, #15363: URL: https://github.com/apache/datafusion/issues/15363 ### Is your feature request related to a problem or challenge? In my case, I would like to extend a default function's supporting signature. For example, the `avg` function in DataFusion doesn

Re: [PR] Fix predicate pushdown for custom SchemaAdapters [datafusion]

2025-03-23 Thread via GitHub
alamb commented on PR #15263: URL: https://github.com/apache/datafusion/pull/15263#issuecomment-2744247187 > Oh I'm terribly sorry that's probably my bad... I constantly have issues with those submodules and have not yet spent the time to figure out how to avoid it. no worries -- I t

Re: [I] Merge operation involving map field fails [datafusion]

2025-03-23 Thread via GitHub
ion-elgreco commented on issue #15351: URL: https://github.com/apache/datafusion/issues/15351#issuecomment-2746134153 @goldmedal this is is the initial projection: ```rust crates/core/src/operations/merge/mod.rs:1270:5] &new_columns = [ ( "__delta_rs_c_foo",

Re: [I] Merge operation involving map field fails [datafusion]

2025-03-23 Thread via GitHub
ion-elgreco commented on issue #15351: URL: https://github.com/apache/datafusion/issues/15351#issuecomment-2746139415 @goldmedal it works fine for all other types except map, it fails here during plan creation: https://github.com/delta-io/delta-rs/blob/30dd5edf9717f6545ffbc73e65b676b8adaa0a

Re: [PR] fix: Redundant files spilled during external sort + introduce `SpillManager` [datafusion]

2025-03-23 Thread via GitHub
alamb commented on code in PR #15355: URL: https://github.com/apache/datafusion/pull/15355#discussion_r2009075036 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -379,46 +382,64 @@ impl ExternalSorter { /// How many bytes have been spilled to disk? fn spilled_by

Re: [PR] fix: Redundant files spilled during external sort + introduce `SpillManager` [datafusion]

2025-03-23 Thread via GitHub
alamb commented on code in PR #15355: URL: https://github.com/apache/datafusion/pull/15355#discussion_r2009075716 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -230,9 +219,14 @@ struct ExternalSorter { /// if `Self::in_mem_batches` are sorted in_mem_batches_sort

Re: [I] Extended tests failing on main [datafusion]

2025-03-23 Thread via GitHub
goldmedal commented on issue #15359: URL: https://github.com/apache/datafusion/issues/15359#issuecomment-2746154586 I created https://github.com/apache/datafusion-testing/pull/8 for this issue. -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] feat: add test to check for `ctx.read_json()` [datafusion-ballista]

2025-03-23 Thread via GitHub
milenkovicm commented on PR #1212: URL: https://github.com/apache/datafusion-ballista/pull/1212#issuecomment-2746183571 I won't be able to review this pr for few days. Will follow up asap. -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] Add all missing table options to be handled in any order [datafusion-sqlparser-rs]

2025-03-23 Thread via GitHub
tomershaniii commented on code in PR #1747: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1747#discussion_r2009103158 ## src/ast/helpers/stmt_create_table.rs: ## @@ -78,25 +79,22 @@ pub struct CreateTableBuilder { pub hive_formats: Option, pub table_prop

Re: [I] Merge operation involving map field fails [datafusion]

2025-03-23 Thread via GitHub
goldmedal commented on issue #15351: URL: https://github.com/apache/datafusion/issues/15351#issuecomment-2746211723 > [@goldmedal](https://github.com/goldmedal) it works fine for all other types except map, it fails here during plan creation: https://github.com/delta-io/delta-rs/blob/30dd5e

[I] Add documentation example for `AggregateExprBuilder` [datafusion]

2025-03-23 Thread via GitHub
alamb opened a new issue, #15369: URL: https://github.com/apache/datafusion/issues/15369 This looks very similar to https://docs.rs/datafusion/latest/datafusion/physical_expr/aggregate/struct.AggregateExprBuilder.html#method.distinct Though it seems like that structure h

Re: [PR] Add "end to end parquet reading test" for WASM [datafusion]

2025-03-23 Thread via GitHub
XiangpengHao commented on PR #15362: URL: https://github.com/apache/datafusion/pull/15362#issuecomment-2746395748 LGTM this is really nice, thank you @jsai28 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] Add all missing table options to be handled in any order [datafusion-sqlparser-rs]

2025-03-23 Thread via GitHub
tomershaniii commented on code in PR #1747: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1747#discussion_r2009043976 ## src/dialect/mysql.rs: ## @@ -141,6 +149,280 @@ impl Dialect for MySqlDialect { fn supports_set_names(&self) -> bool { true }

Re: [PR] fix: Unconditionally wrap UNION BY NAME input nodes w/ `Projection` [datafusion]

2025-03-23 Thread via GitHub
rkrishn7 commented on PR #15242: URL: https://github.com/apache/datafusion/pull/15242#issuecomment-2746563234 @Omega359 Thanks for your thoughts! And yes, I agree that the problem isn't with `normalize::convert_batches`. It is just being surfaced there. > I have a thought as to

Re: [PR] feat: introduce hadoop mini cluster to test native scan on hdfs [datafusion-comet]

2025-03-23 Thread via GitHub
wForget commented on code in PR #1556: URL: https://github.com/apache/datafusion-comet/pull/1556#discussion_r2009361760 ## pom.xml: ## @@ -58,6 +58,7 @@ under the License. 3.25.5 1.13.1 provided +3.3.4 Review Comment: Just for testing, I think we don't nee

Re: [I] Unsupported OS/arch [datafusion-comet]

2025-03-23 Thread via GitHub
jinwenjie123 commented on issue #1552: URL: https://github.com/apache/datafusion-comet/issues/1552#issuecomment-2747053567 I tried to run the benchmark for [TPCH_Q1](https://github.com/apache/spark/blob/master/sql/core/src/test/resources/tpch/q1.sql) query, but it looks like it dose not pr

Re: [I] GLIBC Version Not Compatible [datafusion-comet]

2025-03-23 Thread via GitHub
jinwenjie123 closed issue #1564: GLIBC Version Not Compatible URL: https://github.com/apache/datafusion-comet/issues/1564 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [I] Library Guide: Add SQL level user guide: [datafusion]

2025-03-23 Thread via GitHub
qstommyshu commented on issue #7302: URL: https://github.com/apache/datafusion/issues/7302#issuecomment-2746634501 Hi, I can try to add some documentation for SQL level user guide as I'm exploring the project. -- This is an automated message from the Apache Git Service. To respond to the

[PR] Move `DataSink` to `datasource` and add session crate [datafusion]

2025-03-23 Thread via GitHub
jayzhan-synnada opened a new pull request, #15371: URL: https://github.com/apache/datafusion/pull/15371 ## Which issue does this PR close? - Closes #. ## Rationale for this change DataSink in physical plan blocking us from add trait method that depends on `dataso

Re: [PR] Improve performance of `first_value` by implementing special `GroupsAccumulator` [datafusion]

2025-03-23 Thread via GitHub
UBarney commented on code in PR #15266: URL: https://github.com/apache/datafusion/pull/15266#discussion_r2009052462 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -179,6 +292,420 @@ impl AggregateUDFImpl for FirstValue { } } +struct FirstPrimitiveGroupsAccumu

[I] Allow setting `ClientOptions` for `datafusion.object_store` contexts [datafusion-python]

2025-03-23 Thread via GitHub
nathschmidt opened a new issue, #1082: URL: https://github.com/apache/datafusion-python/issues/1082 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** A clear and concise description of what the problem is. Ex. I'm always frustrated

Re: [PR] chore(deps): Update sqlparser to 0.55.0 [datafusion]

2025-03-23 Thread via GitHub
jonahgao merged PR #15183: URL: https://github.com/apache/datafusion/pull/15183 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] [DISCUSSION] Should DataFusion project use `lib.rs` pattern/antipattern? [datafusion]

2025-03-23 Thread via GitHub
shubhamdagar9854 commented on issue #11375: URL: https://github.com/apache/datafusion/issues/11375#issuecomment-2746811438 Example - Before (Traditional Approach) src/ │── lib.rs <-- Main entry point, contains logic │── models.rs │── utils.rs Example - After (Alternat

Re: [PR] Update datafusion-testing pin to fix extended tests [datafusion]

2025-03-23 Thread via GitHub
jonahgao merged PR #15368: URL: https://github.com/apache/datafusion/pull/15368 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Extended tests failing on main [datafusion]

2025-03-23 Thread via GitHub
jonahgao closed issue #15359: Extended tests failing on main URL: https://github.com/apache/datafusion/issues/15359 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

Re: [I] Allow UDFs to return custom `Diagnostic` [datafusion]

2025-03-23 Thread via GitHub
jsai28 commented on issue #15276: URL: https://github.com/apache/datafusion/issues/15276#issuecomment-2746726480 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] fix: add an "expr_planners" method to SessionState [datafusion]

2025-03-23 Thread via GitHub
niebayes commented on code in PR #15119: URL: https://github.com/apache/datafusion/pull/15119#discussion_r2009375028 ## datafusion/core/src/execution/context/mod.rs: ## @@ -1632,7 +1632,7 @@ impl FunctionRegistry for SessionContext { } fn expr_planners(&self) -> Vec>

Re: [I] Add documentation example for `AggregateExprBuilder` [datafusion]

2025-03-23 Thread via GitHub
Shreyaskr1409 commented on issue #15369: URL: https://github.com/apache/datafusion/issues/15369#issuecomment-2746418848 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Bigquery col alias [datafusion]

2025-03-23 Thread via GitHub
douenergy closed pull request #15370: Bigquery col alias URL: https://github.com/apache/datafusion/pull/15370 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-

Re: [I] Upgrade to sqlparser 0.55.0 [datafusion]

2025-03-23 Thread via GitHub
jonahgao closed issue #15071: Upgrade to sqlparser 0.55.0 URL: https://github.com/apache/datafusion/issues/15071 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] Migrate physical plan tests to `insta` (Part-2) [datafusion]

2025-03-23 Thread via GitHub
alamb commented on code in PR #15364: URL: https://github.com/apache/datafusion/pull/15364#discussion_r2009205442 ## datafusion/physical-plan/src/repartition/mod.rs: ## @@ -1333,23 +1335,58 @@ mod tests { let exec = RepartitionExec::try_new(Arc::new(input), partition

Re: [PR] fix: add an "expr_planners" method to SessionState [datafusion]

2025-03-23 Thread via GitHub
niebayes commented on code in PR #15119: URL: https://github.com/apache/datafusion/pull/15119#discussion_r2009391285 ## datafusion/core/src/execution/session_state.rs: ## @@ -1950,6 +1955,16 @@ mod tests { use super::{SessionContextProvider, SessionStateBuilder}; use c

Re: [PR] Format `Date32` to string given timestamp specifiers [datafusion]

2025-03-23 Thread via GitHub
alamb commented on code in PR #15361: URL: https://github.com/apache/datafusion/pull/15361#discussion_r200920 ## datafusion/functions/src/datetime/to_char.rs: ## @@ -277,7 +282,25 @@ fn _to_char_array(args: &[ColumnarValue]) -> Result { let result = formatter.value

Re: [PR] Enforce JOIN plan to require condition [datafusion]

2025-03-23 Thread via GitHub
comphead commented on code in PR #15334: URL: https://github.com/apache/datafusion/pull/15334#discussion_r2009201037 ## datafusion/sqllogictest/test_files/limit.slt: ## @@ -723,14 +723,14 @@ statement ok create table testSubQueryLimit (a int, b int) as values (1,2), (2,3), (3,4

[PR] Allow setting ClientOptions for all `datafusion.object_store` contexts [datafusion-python]

2025-03-23 Thread via GitHub
nathschmidt opened a new pull request, #1083: URL: https://github.com/apache/datafusion-python/pull/1083 # Which issue does this PR close? Closes #1082 # Rationale for this change This change adds a `PyClientOptions` wrapper around `object_store.ClientOptions` so t

Re: [PR] fix: add an "expr_planners" method to SessionState [datafusion]

2025-03-23 Thread via GitHub
niebayes commented on code in PR #15119: URL: https://github.com/apache/datafusion/pull/15119#discussion_r2009391285 ## datafusion/core/src/execution/session_state.rs: ## @@ -1950,6 +1955,16 @@ mod tests { use super::{SessionContextProvider, SessionStateBuilder}; use c

[PR] Update datafusion-testing pin to fix extended tests [datafusion]

2025-03-23 Thread via GitHub
alamb opened a new pull request, #15368: URL: https://github.com/apache/datafusion/pull/15368 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/15359 ## Rationale for this change @goldmedal updated the output in https://githu

Re: [PR] fix: add an "expr_planners" method to SessionState [datafusion]

2025-03-23 Thread via GitHub
niebayes commented on code in PR #15119: URL: https://github.com/apache/datafusion/pull/15119#discussion_r2009375028 ## datafusion/core/src/execution/context/mod.rs: ## @@ -1632,7 +1632,7 @@ impl FunctionRegistry for SessionContext { } fn expr_planners(&self) -> Vec>

Re: [PR] fix: add an "expr_planners" method to SessionState [datafusion]

2025-03-23 Thread via GitHub
niebayes commented on code in PR #15119: URL: https://github.com/apache/datafusion/pull/15119#discussion_r2009375028 ## datafusion/core/src/execution/context/mod.rs: ## @@ -1632,7 +1632,7 @@ impl FunctionRegistry for SessionContext { } fn expr_planners(&self) -> Vec>

Re: [I] Make it easier to run TPCH queries with datafusion-cli [datafusion]

2025-03-23 Thread via GitHub
niebayes commented on issue #14608: URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2746782663 Good to see this rust generator. We have adopted it in our database projection for benchmarking. -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] Add `LogicalPlanStats` to logical plan nodes [datafusion]

2025-03-23 Thread via GitHub
github-actions[bot] commented on PR #13618: URL: https://github.com/apache/datafusion/pull/13618#issuecomment-2746689386 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [I] `batches_to_sort_string` differing from similar implementation in `assert_batches_sorted_eq` [datafusion]

2025-03-23 Thread via GitHub
alamb commented on issue #15312: URL: https://github.com/apache/datafusion/issues/15312#issuecomment-2746394071 TLDR is I think this is "working as expected" (though somewhat confusing) and we can just refer to this ticket when migrating tests to `insta` if the output changes -- This is

Re: [PR] fix: Redundant files spilled during external sort + introduce `SpillManager` [datafusion]

2025-03-23 Thread via GitHub
Kontinuation commented on PR #15355: URL: https://github.com/apache/datafusion/pull/15355#issuecomment-2746830204 > 3\. After we have collected 1MB of merged batch, one spill will be triggered. And this 1MB space will be cleared, the merging can continue. > **Inefficency:** Now `Exter

Re: [PR] Support parquet_metadata for datafusion-cli [datafusion]

2025-03-23 Thread via GitHub
alamb commented on PR #8413: URL: https://github.com/apache/datafusion/pull/8413#issuecomment-2746383644 > I'd like to add `column_name` to the output. Would that be okay with folks or are we trying to match DuckDB 1:1? I don't think we need to match DuckDB 1:1 BTW you have pro