Re: [PR] Add `CREATE TRIGGER` support for SQL Server [datafusion-sqlparser-rs]

2025-04-29 Thread via GitHub
aharpervc commented on code in PR #1810: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1810#discussion_r2067492348 ## src/dialect/mssql.rs: ## @@ -215,6 +218,78 @@ impl MsSqlDialect { })) } +/// Parse a SQL CREATE statement +fn parse_create

Re: [PR] Add all missing table options to be handled in any order [datafusion-sqlparser-rs]

2025-04-29 Thread via GitHub
tomershaniii commented on code in PR #1747: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1747#discussion_r2067961343 ## src/ast/helpers/stmt_create_table.rs: ## @@ -76,27 +78,20 @@ pub struct CreateTableBuilder { pub constraints: Vec, pub hive_distribut

Re: [I] Main is broken [datafusion]

2025-04-29 Thread via GitHub
gstvg commented on issue #15896: URL: https://github.com/apache/datafusion/issues/15896#issuecomment-2840980768 After https://github.com/apache/datafusion/pull/15149, flatten stopped working for `List(FixedSizeList)` because it expected the inner fixed size list to be casted to list, which

Re: [PR] Add `union_tag` scalar function [datafusion]

2025-04-29 Thread via GitHub
alamb commented on PR #14687: URL: https://github.com/apache/datafusion/pull/14687#issuecomment-2840392253 > LGTM. I think I'd like to see a test with multiple columns but the logic looks solid to me. I believe the use of unsafe is indeed ok given the conditions outlined. Thanks agai

Re: [PR] Add `union_tag` scalar function [datafusion]

2025-04-29 Thread via GitHub
alamb commented on code in PR #14687: URL: https://github.com/apache/datafusion/pull/14687#discussion_r2067521368 ## datafusion/sqllogictest/test_files/union_function.slt: ## @@ -45,3 +49,19 @@ select union_extract(union_column, 1) from union_table; query error DataFusion err

[PR] chore: Prepare 0.8.1 release [datafusion-comet]

2025-04-29 Thread via GitHub
andygrove opened a new pull request, #1699: URL: https://github.com/apache/datafusion-comet/pull/1699 ## Which issue does this PR close? N/A ## Rationale for this change We want to create a 0.8.1 release with a fix needed by Iceberg. ## What changes

Re: [PR] Support `GroupsAccumulator` for Avg duration [datafusion]

2025-04-29 Thread via GitHub
shruti2522 commented on PR #15748: URL: https://github.com/apache/datafusion/pull/15748#issuecomment-2840577253 Got it @alamb @goldmedal, will test and share results soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] fix: fold cast null to substrait typed null [datafusion]

2025-04-29 Thread via GitHub
vbarua commented on code in PR #15854: URL: https://github.com/apache/datafusion/pull/15854#discussion_r2067003785 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -1590,6 +1590,21 @@ pub fn from_cast( schema: &DFSchemaRef, ) -> Result { let Cast { expr, da

Re: [I] Set up Comet + Iceberg integration tests in CI [datafusion-comet]

2025-04-29 Thread via GitHub
hsiang-c commented on issue #1685: URL: https://github.com/apache/datafusion-comet/issues/1685#issuecomment-2840445298 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] feat: Add `datafusion-spark` crate [datafusion]

2025-04-29 Thread via GitHub
alamb commented on PR #15168: URL: https://github.com/apache/datafusion/pull/15168#issuecomment-2840589691 This looks great to me -- I plan to merge it tomorrow and start collecting next steps in a new `EPIC` ticket unless someone beats me to it -- This is an automated message from the Ap

Re: [PR] Add all missing table options to be handled in any order [datafusion-sqlparser-rs]

2025-04-29 Thread via GitHub
iffyio commented on code in PR #1747: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1747#discussion_r2067857586 ## src/parser/mod.rs: ## @@ -7081,18 +7029,243 @@ impl<'a> Parser<'a> { if let Token::Word(word) = self.peek_token().token {

Re: [PR] Add `CREATE TRIGGER` support for SQL Server [datafusion-sqlparser-rs]

2025-04-29 Thread via GitHub
iffyio commented on code in PR #1810: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1810#discussion_r2067861834 ## src/dialect/mssql.rs: ## @@ -215,6 +218,78 @@ impl MsSqlDialect { })) } +/// Parse a SQL CREATE statement +fn parse_create(&s

Re: [PR] Improve support for cursors for SQL Server [datafusion-sqlparser-rs]

2025-04-29 Thread via GitHub
aharpervc commented on code in PR #1831: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1831#discussion_r2067493757 ## src/ast/mod.rs: ## @@ -4225,11 +4267,10 @@ impl fmt::Display for Statement { Statement::Fetch { name,

Re: [PR] fix: fold cast null to substrait typed null [datafusion]

2025-04-29 Thread via GitHub
vbarua commented on code in PR #15854: URL: https://github.com/apache/datafusion/pull/15854#discussion_r2067003785 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -1590,6 +1590,21 @@ pub fn from_cast( schema: &DFSchemaRef, ) -> Result { let Cast { expr, da

Re: [PR] Add all missing table options to be handled in any order [datafusion-sqlparser-rs]

2025-04-29 Thread via GitHub
iffyio commented on code in PR #1747: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1747#discussion_r2067864705 ## src/ast/helpers/stmt_create_table.rs: ## @@ -76,27 +78,20 @@ pub struct CreateTableBuilder { pub constraints: Vec, pub hive_distribution: H

Re: [PR] feat: More warning info for users [datafusion-comet]

2025-04-29 Thread via GitHub
hsiang-c commented on code in PR #1667: URL: https://github.com/apache/datafusion-comet/pull/1667#discussion_r2067903455 ## spark/src/main/scala/org/apache/comet/DataTypeSupport.scala: ## @@ -33,20 +37,25 @@ trait DataTypeSupport { * @return * true if the datatype is s

Re: [PR] Improve support for cursors for SQL Server [datafusion-sqlparser-rs]

2025-04-29 Thread via GitHub
iffyio commented on code in PR #1831: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1831#discussion_r2067910538 ## src/test_utils.rs: ## @@ -166,6 +168,30 @@ impl TestedDialects { only_statement } +/// The same as [`one_statement_parses_to`] bu

Re: [I] Question: why is the Visitor trait limited to statements, relations & expressions? [datafusion-sqlparser-rs]

2025-04-29 Thread via GitHub
ramnes commented on issue #934: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/934#issuecomment-2840901054 > Two options for a generalised Visitor trait come to mind: > > 1. expose pre + post trait method variants for every AST node type, or > 2. expose only two trai

Re: [I] Make ClickBench Q23 Go Faster [datafusion]

2025-04-29 Thread via GitHub
EmeraldShift commented on issue #15177: URL: https://github.com/apache/datafusion/issues/15177#issuecomment-2840412850 > They are currently exploring the possibility of using it alongside projections (a feature in ClickHouse akin to materialized views) to create secondary indexes and simila

Re: [PR] fix: fold cast null to substrait typed null [datafusion]

2025-04-29 Thread via GitHub
discord9 commented on code in PR #15854: URL: https://github.com/apache/datafusion/pull/15854#discussion_r2067801065 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -1590,6 +1590,21 @@ pub fn from_cast( schema: &DFSchemaRef, ) -> Result { let Cast { expr,

Re: [PR] Factor out Substrait consumers into separate files [datafusion]

2025-04-29 Thread via GitHub
vbarua commented on code in PR #15794: URL: https://github.com/apache/datafusion/pull/15794#discussion_r2067574175 ## datafusion/substrait/src/logical_plan/consumer/mod.rs: ## @@ -0,0 +1,30 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] Factor out Substrait consumers into separate files [datafusion]

2025-04-29 Thread via GitHub
vbarua commented on code in PR #15794: URL: https://github.com/apache/datafusion/pull/15794#discussion_r2067610862 ## datafusion/substrait/src/logical_plan/consumer/rex/extended_expression.rs: ## @@ -0,0 +1,109 @@ +// Licensed to the Apache Software Foundation (ASF) under one +/

Re: [PR] Factor out Substrait consumers into separate files [datafusion]

2025-04-29 Thread via GitHub
vbarua commented on code in PR #15794: URL: https://github.com/apache/datafusion/pull/15794#discussion_r2067574175 ## datafusion/substrait/src/logical_plan/consumer/mod.rs: ## @@ -0,0 +1,30 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [I] Question: why is the Visitor trait limited to statements, relations & expressions? [datafusion-sqlparser-rs]

2025-04-29 Thread via GitHub
freshtonic commented on issue #934: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/934#issuecomment-2840551720 > I can open a PR for this commit, but is this the direction we want to go in? I thought folks here wanted option 2 (which I didn't have the time to work on so far.

Re: [PR] chore: Prepare 0.8.1 release [branch-0.8] [datafusion-comet]

2025-04-29 Thread via GitHub
codecov-commenter commented on PR #1699: URL: https://github.com/apache/datafusion-comet/pull/1699#issuecomment-2840556859 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1699?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Add `union_tag` scalar function [datafusion]

2025-04-29 Thread via GitHub
alamb commented on code in PR #14687: URL: https://github.com/apache/datafusion/pull/14687#discussion_r2067520961 ## datafusion/sqllogictest/test_files/union_function.slt: ## @@ -23,7 +26,8 @@ query ?I select union_column, union_extract(union_column, 'int') from union_table; -

Re: [I] [substrait] Build basic test suite to validate produced Substrait plans [datafusion]

2025-04-29 Thread via GitHub
alamb commented on issue #15069: URL: https://github.com/apache/datafusion/issues/15069#issuecomment-2840601952 It is a good idea -- another potential issue is that it would effectively "tax" other features in the sense that writing tests for unrelated features might trigger a substrait bug

Re: [PR] Saner handling of nulls inside arrays [datafusion]

2025-04-29 Thread via GitHub
joroKr21 commented on PR #15149: URL: https://github.com/apache/datafusion/pull/15149#issuecomment-2840909664 Looks like a semantic merge conflict with #15160 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] feat: More warning info for users [datafusion-comet]

2025-04-29 Thread via GitHub
hsiang-c commented on code in PR #1667: URL: https://github.com/apache/datafusion-comet/pull/1667#discussion_r2067930692 ## spark/src/main/scala/org/apache/comet/DataTypeSupport.scala: ## @@ -33,20 +37,25 @@ trait DataTypeSupport { * @return * true if the datatype is s

[I] Register schema table, failed to resolve schema [datafusion]

2025-04-29 Thread via GitHub
shencangsheng opened a new issue, #15897: URL: https://github.com/apache/datafusion/issues/15897 ### Describe the bug I registered a table named report.user using ctx.register_csv, but encountered a "failed to resolve schema: report" error in ctx.sql. ```rust pub async fn re

[PR] Fix `flatten` scalar function when inner list is `FixedSizeList` [datafusion]

2025-04-29 Thread via GitHub
gstvg opened a new pull request, #15898: URL: https://github.com/apache/datafusion/pull/15898 ## Which issue does this PR close? ## Rationale for this change After #15149, `flatten` stopped working for `List(FixedSizeList)` because it expected the inner fixed size list to be ca

Re: [PR] Keeping pull request in sync with the base branch [datafusion]

2025-04-29 Thread via GitHub
alamb commented on PR #15894: URL: https://github.com/apache/datafusion/pull/15894#issuecomment-2840606083 Thanks #15603 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Keeping pull request in sync with the base branch [datafusion]

2025-04-29 Thread via GitHub
alamb merged PR #15894: URL: https://github.com/apache/datafusion/pull/15894 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] fix: Allow ORDER BY aggregates not present in SELECT list [datafusion]

2025-04-29 Thread via GitHub
alamb commented on PR #15876: URL: https://github.com/apache/datafusion/pull/15876#issuecomment-2840609788 FYI @jonahgao -- would you possible have time to review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] doc: Update known users [datafusion]

2025-04-29 Thread via GitHub
alamb commented on code in PR #15895: URL: https://github.com/apache/datafusion/pull/15895#discussion_r2067724398 ## docs/source/user-guide/introduction.md: ## @@ -120,11 +120,11 @@ Here are some active projects using DataFusion: - [Polygon.io](https://polygon.io/) Stock Market

Re: [I] Keeping pull request in sync with the base branch [datafusion]

2025-04-29 Thread via GitHub
alamb closed issue #15877: Keeping pull request in sync with the base branch URL: https://github.com/apache/datafusion/issues/15877 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Implement intermeidate result blocked approach sketch [datafusion]

2025-04-29 Thread via GitHub
alamb commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2840614135 This is very much on my list to review, but I am backed up and likely won't have a chance for another day or two -- This is an automated message from the Apache Git Service. To respo

Re: [PR] chore: Return NativeType instead of DataType for get_example_types [datafusion]

2025-04-29 Thread via GitHub
github-actions[bot] closed pull request #14778: chore: Return NativeType instead of DataType for get_example_types URL: https://github.com/apache/datafusion/pull/14778 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] chore : migrated all the UDFS to invoke_with_args [datafusion]

2025-04-29 Thread via GitHub
github-actions[bot] closed pull request #14779: chore : migrated all the UDFS to invoke_with_args URL: https://github.com/apache/datafusion/pull/14779 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [wip] attach diagnostic to duplicate table name error [datafusion]

2025-04-29 Thread via GitHub
github-actions[bot] commented on PR #14767: URL: https://github.com/apache/datafusion/pull/14767#issuecomment-2840626137 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Make Expr::alias and alias_qualified smarter by calling unalias [datafusion]

2025-04-29 Thread via GitHub
github-actions[bot] commented on PR #14749: URL: https://github.com/apache/datafusion/pull/14749#issuecomment-2840626173 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Simple Functions Preview [datafusion]

2025-04-29 Thread via GitHub
github-actions[bot] commented on PR #14668: URL: https://github.com/apache/datafusion/pull/14668#issuecomment-2840626218 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Draft: LogicalScalar [datafusion]

2025-04-29 Thread via GitHub
github-actions[bot] closed pull request #14609: Draft: LogicalScalar URL: https://github.com/apache/datafusion/pull/14609 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] Standardize CREATE TABLE options equals signs [datafusion-sqlparser-rs]

2025-04-29 Thread via GitHub
github-actions[bot] commented on PR #1751: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1751#issuecomment-2840628297 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or

Re: [PR] Improve support for cursors for SQL Server [datafusion-sqlparser-rs]

2025-04-29 Thread via GitHub
aharpervc commented on code in PR #1831: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1831#discussion_r2067496065 ## tests/sqlparser_mssql.rs: ## @@ -1393,6 +1394,85 @@ fn parse_mssql_declare() { let _ = ms().verified_stmt(declare_cursor_for_select); } +#

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-29 Thread via GitHub
aharpervc commented on PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#issuecomment-2840347205 @iffyio anything else on this one? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] Saner handling of nulls inside arrays [datafusion]

2025-04-29 Thread via GitHub
alamb commented on PR #15149: URL: https://github.com/apache/datafusion/pull/15149#issuecomment-2840387613 It appears the tests started failing on main after this PR was merged: - https://github.com/apache/datafusion/actions/runs/14740728702/job/41378119017 -- This is an automated mes

[I] Main is broken [datafusion]

2025-04-29 Thread via GitHub
xudong963 opened a new issue, #15896: URL: https://github.com/apache/datafusion/issues/15896 https://github.com/user-attachments/assets/777fe494-87e5-4698-835b-180bc793dff6"; /> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] General framework to decorrelate the subqueries [datafusion]

2025-04-29 Thread via GitHub
suibianwanwank commented on issue #5492: URL: https://github.com/apache/datafusion/issues/5492#issuecomment-2837737552 > Unify the optimizor for correlated query, regardless the query type (exists query, scalar query etc) I think a crucial starting point is to transform all correlated

Re: [PR] Add Extension Type / Metadata support for Scalar UDFs [datafusion]

2025-04-29 Thread via GitHub
berkaysynnada commented on PR #15646: URL: https://github.com/apache/datafusion/pull/15646#issuecomment-2837754661 > @berkaysynnada do you think you'll be able to review soon? I know we wanted to get this in earlier in the 48 cycle to shake out any bugs since it is a big change Sorry

Re: [PR] Feat: introduce `ExecutionPlan::partition_statistics` API [datafusion]

2025-04-29 Thread via GitHub
xudong963 commented on code in PR #15852: URL: https://github.com/apache/datafusion/pull/15852#discussion_r2065927159 ## datafusion/physical-plan/src/aggregates/mod.rs: ## @@ -941,49 +994,15 @@ impl ExecutionPlan for AggregateExec { } fn statistics(&self) -> Result {

Re: [PR] Set HashJoin seed [datafusion]

2025-04-29 Thread via GitHub
alamb commented on PR #15783: URL: https://github.com/apache/datafusion/pull/15783#issuecomment-2838662506 Thanks again @ctsk -- sorry for the delay in review / merge -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] Set HashJoin seed [datafusion]

2025-04-29 Thread via GitHub
alamb merged PR #15783: URL: https://github.com/apache/datafusion/pull/15783 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Questionable hash seed reuse between RepartitionExec and HashJoinExec [datafusion]

2025-04-29 Thread via GitHub
alamb closed issue #15620: Questionable hash seed reuse between RepartitionExec and HashJoinExec URL: https://github.com/apache/datafusion/issues/15620 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Factor out Substrait consumers into separate files [datafusion]

2025-04-29 Thread via GitHub
Blizzara commented on PR #15794: URL: https://github.com/apache/datafusion/pull/15794#issuecomment-2838771273 > @Blizzara applied your suggestions. I think scoping expressions and relations into their own modules made a lot of sense. Nice! I didn't dive into the details of each file,

Re: [PR] feat: More warning info for users [datafusion-comet]

2025-04-29 Thread via GitHub
andygrove commented on code in PR #1667: URL: https://github.com/apache/datafusion-comet/pull/1667#discussion_r2066356877 ## spark/src/main/scala/org/apache/comet/DataTypeSupport.scala: ## @@ -56,23 +65,46 @@ trait DataTypeSupport { * @return * true if all fields in th

Re: [PR] Saner handling of nulls inside arrays [datafusion]

2025-04-29 Thread via GitHub
thinkharderdev commented on PR #15149: URL: https://github.com/apache/datafusion/pull/15149#issuecomment-2838284869 > This missed the v47 train. Anything else needed to merge? @alamb @jayzhan211 Are you guys good with this? -- This is an automated message from the Apache Git Service

Re: [I] Keeping pull request in sync with the base branch [datafusion]

2025-04-29 Thread via GitHub
xudong963 commented on issue #15877: URL: https://github.com/apache/datafusion/issues/15877#issuecomment-2838492442 > I took a quick look and didn't see the sync branch option, but I may have missed it I'll double check it -- This is an automated message from the Apache Git Service

Re: [I] [substrait] Build basic test suite to validate produced Substrait plans [datafusion]

2025-04-29 Thread via GitHub
gabotechs commented on issue #15069: URL: https://github.com/apache/datafusion/issues/15069#issuecomment-2838305222 One idea that comes to mind is to be able to run the sqllogictests in "substrait roundrip" mode. Upon building a logical DataFusion plan, it would be converted to Substrait an

Re: [I] Keeping pull request in sync with the base branch [datafusion]

2025-04-29 Thread via GitHub
alamb commented on issue #15877: URL: https://github.com/apache/datafusion/issues/15877#issuecomment-2838453774 This is a great idea -- thank you @xudong963 I think these settings are controlled by the asf .yaml file in our repo https://github.com/apache/datafusion/blob/main/.asf.yam

Re: [D] How does 'sort' interact with record batches? [datafusion]

2025-04-29 Thread via GitHub
GitHub user alamb added a comment to the discussion: How does 'sort' interact with record batches? If the data isn't sorted by `userPrimaryKey` after `sort()` is called, then I think that is a bug. I wonder if you could provide an explain plan? Something like ```rust ctx .table("tabl

Re: [PR] Make Clickbench Q29 5x faster for datafusion [datafusion]

2025-04-29 Thread via GitHub
gatesn commented on PR #15532: URL: https://github.com/apache/datafusion/pull/15532#issuecomment-2838467529 Slightly off topic, but this query can also be hugely (we've seen ~50% reduction) accelerated by inserting a repartition that ensures the batch fits into the L1 cache. -- This is a

Re: [D] How does 'sort' interact with record batches? [datafusion]

2025-04-29 Thread via GitHub
GitHub user alamb added a comment to the discussion: How does 'sort' interact with record batches? I took a brief look around and I could not find any ticket that describes the "row number from a file" case, so I will create one GitHub link: https://github.com/apache/datafusion/discussions/1

Re: [D] How does 'sort' interact with record batches? [datafusion]

2025-04-29 Thread via GitHub
GitHub user alamb added a comment to the discussion: How does 'sort' interact with record batches? Sorry I just re-read this > My goal is that I will have a fully sorted file sorted by primary key where > each fileRowNumber is the index of that row in the file. I am not sure you will be able

Re: [PR] chore: Make Aggregate transformation more compact [datafusion-comet]

2025-04-29 Thread via GitHub
andygrove merged PR #1670: URL: https://github.com/apache/datafusion-comet/pull/1670 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Investigate unstable benchmark results on macOS [datafusion-comet]

2025-04-29 Thread via GitHub
alamb commented on issue #1648: URL: https://github.com/apache/datafusion-comet/issues/1648#issuecomment-2838915277 Related discord thread: https://discord.com/channels/885562378132000778/1363995762182193373 -- This is an automated message from the Apache Git Service. To respond to the m

Re: [I] Allow to filter null in `array_agg` [datafusion]

2025-04-29 Thread via GitHub
thinkharderdev closed issue #13742: Allow to filter null in `array_agg` URL: https://github.com/apache/datafusion/issues/13742 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Respect ignore_nulls in array_agg [datafusion]

2025-04-29 Thread via GitHub
thinkharderdev merged PR #15544: URL: https://github.com/apache/datafusion/pull/15544 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] chore: fix build errors [datafusion-comet]

2025-04-29 Thread via GitHub
andygrove merged PR #1690: URL: https://github.com/apache/datafusion-comet/pull/1690 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[I] New Null Handling Behavior in Joins: Null Matches Everything [datafusion]

2025-04-29 Thread via GitHub
tobixdev opened a new issue, #15891: URL: https://github.com/apache/datafusion/issues/15891 ### Is your feature request related to a problem or challenge? Building a system that works with graph-like data on DataFusion will stumble upon the need to join the intermediate results of gra

[PR] add unit tests for expression functions [datafusion-python]

2025-04-29 Thread via GitHub
timsaucer opened a new pull request, #1121: URL: https://github.com/apache/datafusion-python/pull/1121 # Which issue does this PR close? Follow on to https://github.com/apache/datafusion-python/issues/1116 # Rationale for this change We don't have code coverage for these

Re: [PR] Support `GroupsAccumulator` for Avg duration [datafusion]

2025-04-29 Thread via GitHub
goldmedal commented on PR #15748: URL: https://github.com/apache/datafusion/pull/15748#issuecomment-2838641022 > I think the groups accumulator will result in faster performance, not new functionality. Thanks for the explanation. If it's a performance improvement, it's better to run

Re: [I] Change `ReturnTypeInfo` to return a `Field` rather than `DataType` [datafusion]

2025-04-29 Thread via GitHub
timsaucer closed issue #14247: Change `ReturnTypeInfo` to return a `Field` rather than `DataType` URL: https://github.com/apache/datafusion/issues/14247 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Add Extension Type / Metadata support for Scalar UDFs [datafusion]

2025-04-29 Thread via GitHub
timsaucer merged PR #15646: URL: https://github.com/apache/datafusion/pull/15646 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] [DISCUSSION] JOIN "task force" / project team [datafusion]

2025-04-29 Thread via GitHub
milenkovicm commented on issue #15885: URL: https://github.com/apache/datafusion/issues/15885#issuecomment-2838931035 not sure if it will help direction [Debunking the Myth of Join Ordering: Toward Robust SQL Analytics](https://arxiv.org/abs/2502.15181) but cost nothing to share :) -- T

Re: [PR] Add Extension Type / Metadata support for Scalar UDFs [datafusion]

2025-04-29 Thread via GitHub
timsaucer commented on PR #15646: URL: https://github.com/apache/datafusion/pull/15646#issuecomment-2838939271 Thank you, everyone, for the thoughtful discussions and reviews. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] add unit tests for expression functions [datafusion-python]

2025-04-29 Thread via GitHub
timsaucer commented on PR #1121: URL: https://github.com/apache/datafusion-python/pull/1121#issuecomment-2838964166 FYI @deanm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [I] [DISCUSSION] Sorts being removed from subqueries [datafusion]

2025-04-29 Thread via GitHub
maxburke commented on issue #15886: URL: https://github.com/apache/datafusion/issues/15886#issuecomment-2838965128 I have to say it was very much unexpected. As a sanity check, I compared to Postgres which does not remove the sorting operation. The Postgres docs say that CTEs "effectively s

Re: [PR] Add `union_tag` scalar function [datafusion]

2025-04-29 Thread via GitHub
Omega359 commented on PR #14687: URL: https://github.com/apache/datafusion/pull/14687#issuecomment-2838977501 I'll review it today @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[PR] chore: Improve reporting of fallback reasons for CollectLimit [datafusion-comet]

2025-04-29 Thread via GitHub
andygrove opened a new pull request, #1694: URL: https://github.com/apache/datafusion-comet/pull/1694 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [D] How does 'sort' interact with record batches? [datafusion]

2025-04-29 Thread via GitHub
GitHub user alamb added a comment to the discussion: How does 'sort' interact with record batches? I think you might be able to get what you want by running a query for each file: Something like ```rust ctx .read_parquet("file1.parquet") .await? .window(vec![row_number

[I] Support file row index / row id for each file in a `ListingTableProvider` [datafusion]

2025-04-29 Thread via GitHub
alamb opened a new issue, #15892: URL: https://github.com/apache/datafusion/issues/15892 ### Is your feature request related to a problem or challenge? - Quoting @daphnenhuch-at from https://github.com/apache/datafusion/discussions/15711: > My goal is that I will have a fully s

Re: [D] How does 'sort' interact with record batches? [datafusion]

2025-04-29 Thread via GitHub
GitHub user alamb added a comment to the discussion: How does 'sort' interact with record batches? - I filed https://github.com/apache/datafusion/issues/15892 GitHub link: https://github.com/apache/datafusion/discussions/15711#discussioncomment-12980009 This is an automatically sent ema

Re: [I] [DISCUSSION] Sorts being removed from inner expressions [datafusion]

2025-04-29 Thread via GitHub
alamb commented on issue #15886: URL: https://github.com/apache/datafusion/issues/15886#issuecomment-2838541036 > I agree that, to my knowledge, quite a few engines remove sorting in subqueries without LIMIT/OFFSET. Adding a configuration option is a good solution. I also agree that

Re: [PR] chore: Improve reporting of fallback reasons for CollectLimit [datafusion-comet]

2025-04-29 Thread via GitHub
andygrove commented on code in PR #1694: URL: https://github.com/apache/datafusion-comet/pull/1694#discussion_r2066551903 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -397,19 +397,34 @@ class CometSparkSessionExtensions op,

[PR] migrate tests in `push_down_filters.rs` to use snapshot assertions [datafusion]

2025-04-29 Thread via GitHub
qstommyshu opened a new pull request, #15893: URL: https://github.com/apache/datafusion/pull/15893 ## Which issue does this PR close? - Related #15396 , #15446, #15884 ## Rationale for this change ## What changes are included in this PR? This is

Re: [PR] Map file-level column statistics to the table-level [datafusion]

2025-04-29 Thread via GitHub
xudong963 commented on PR #15865: URL: https://github.com/apache/datafusion/pull/15865#issuecomment-2837714159 @alamb I'll merge the PR and continue to fix tests in https://github.com/apache/datafusion/pull/15852 -- This is an automated message from the Apache Git Service. To respond to t

Re: [I] ListingTable statistics improperly merges statistics when files have different schemas [datafusion]

2025-04-29 Thread via GitHub
xudong963 closed issue #15689: ListingTable statistics improperly merges statistics when files have different schemas URL: https://github.com/apache/datafusion/issues/15689 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Map file-level column statistics to the table-level [datafusion]

2025-04-29 Thread via GitHub
xudong963 merged PR #15865: URL: https://github.com/apache/datafusion/pull/15865 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] chore(deps): bump blake3 from 1.8.1 to 1.8.2 [datafusion]

2025-04-29 Thread via GitHub
xudong963 merged PR #15890: URL: https://github.com/apache/datafusion/pull/15890 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Feat: introduce `ExecutionPlan::partition_statistics` API [datafusion]

2025-04-29 Thread via GitHub
xudong963 commented on PR #15852: URL: https://github.com/apache/datafusion/pull/15852#issuecomment-2838052185 > I also think you should add a note to the [upgrade guide](https://github.com/apache/datafusion/blob/main/docs/source/library-user-guide/upgrading.md) for this change Added

Re: [PR] fix: Allow ORDER BY aggregates not present in SELECT list [datafusion]

2025-04-29 Thread via GitHub
UBarney commented on code in PR #15876: URL: https://github.com/apache/datafusion/pull/15876#discussion_r2065894121 ## datafusion/sql/tests/cases/plan_to_sql.rs: ## @@ -345,10 +345,10 @@ fn roundtrip_statement_with_dialect_2() -> Result<(), DataFusionError> { #[test] fn round

Re: [PR] feat: make execution_graph.stages() public [datafusion-ballista]

2025-04-29 Thread via GitHub
milenkovicm commented on code in PR #1256: URL: https://github.com/apache/datafusion-ballista/pull/1256#discussion_r2065869484 ## ballista/scheduler/src/state/execution_graph.rs: ## @@ -218,7 +218,7 @@ impl ExecutionGraph { new_tid } -pub(crate) fn stages(&s

Re: [I] Add diagrams for relationship between `FileSource`, `DataSource` and `DataSourceExec` [datafusion]

2025-04-29 Thread via GitHub
xudong963 commented on issue #15887: URL: https://github.com/apache/datafusion/issues/15887#issuecomment-2838065846 I added a good first issue label, it's a good chance to learn the relationship between sources -- This is an automated message from the Apache Git Service. To respond to the

Re: [I] [DISCUSSION] DataFusion Road Map: Q3-Q4 2025 [datafusion]

2025-04-29 Thread via GitHub
xudong963 commented on issue #15878: URL: https://github.com/apache/datafusion/issues/15878#issuecomment-2838088958 My lists: - [ ] Work with @suremarc @alamb and @wiedld to make [Optimized SPM](https://github.com/apache/datafusion/issues/6672) cross the finish line - [ ] Finish all my

Re: [PR] Feat: introduce `ExecutionPlan::partition_statistics` API [datafusion]

2025-04-29 Thread via GitHub
xudong963 commented on code in PR #15852: URL: https://github.com/apache/datafusion/pull/15852#discussion_r2065766204 ## datafusion/physical-plan/src/coalesce_batches.rs: ## @@ -196,7 +196,14 @@ impl ExecutionPlan for CoalesceBatchesExec { } fn statistics(&self) -> R

[PR] chore(deps): bump blake3 from 1.8.1 to 1.8.2 [datafusion]

2025-04-29 Thread via GitHub
dependabot[bot] opened a new pull request, #15890: URL: https://github.com/apache/datafusion/pull/15890 Bumps [blake3](https://github.com/BLAKE3-team/BLAKE3) from 1.8.1 to 1.8.2. Release notes Sourced from https://github.com/BLAKE3-team/BLAKE3/releases";>blake3's releases. 1

Re: [I] Build failure when default features are disabled [datafusion-ballista]

2025-04-29 Thread via GitHub
milenkovicm closed issue #1254: Build failure when default features are disabled URL: https://github.com/apache/datafusion-ballista/issues/1254 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] feat: ballista client collects (few) metrics [datafusion-ballista]

2025-04-29 Thread via GitHub
milenkovicm merged PR #1251: URL: https://github.com/apache/datafusion-ballista/pull/1251 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

Re: [PR] bug: build fails with `--no-default-features` [datafusion-ballista]

2025-04-29 Thread via GitHub
milenkovicm merged PR #1255: URL: https://github.com/apache/datafusion-ballista/pull/1255 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

  1   2   >