Re: [PR] Test all examples from library-user-guide & user-guide docs [datafusion]

2025-02-10 Thread via GitHub
alamb commented on PR #14544: URL: https://github.com/apache/datafusion/pull/14544#issuecomment-2648082088 > > Making the examples in the doc rendred with sql rather than ``` was a great idea. However, since those files are automatically generated from the source code we need to update the

Re: [PR] Test all examples from library-user-guide & user-guide docs [datafusion]

2025-02-10 Thread via GitHub
ugoa commented on PR #14544: URL: https://github.com/apache/datafusion/pull/14544#issuecomment-2648104131 Yeah, with just ``` it will be treated as rust code and be run and tested -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [I] [DISCUSS] More extensive pre-release testing [datafusion]

2025-02-10 Thread via GitHub
edmondop commented on issue #13661: URL: https://github.com/apache/datafusion/issues/13661#issuecomment-2648104372 @alamb I did some personal testing with Cargo mutants. We can have a manually runnable pipeline, or something scheduled every day, that could run mutation testing and inform us

Re: [I] Create UNION plan node with correct schema [datafusion]

2025-02-10 Thread via GitHub
findepi commented on issue #14380: URL: https://github.com/apache/datafusion/issues/14380#issuecomment-2648109414 I appreciate the conciseness of representing all columns of a table with a wildcard. This doesn't change the fact that `Expr::Wildcard` is not an expression. If there are a

Re: [PR] feat: Add `array_max` function support [datafusion]

2025-02-10 Thread via GitHub
findepi commented on code in PR #14470: URL: https://github.com/apache/datafusion/pull/14470#discussion_r1949108541 ## datafusion/functions-nested/src/max.rs: ## @@ -0,0 +1,137 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agre

Re: [PR] function: Allow more expressive array signatures [datafusion]

2025-02-10 Thread via GitHub
jkosh44 commented on code in PR #14532: URL: https://github.com/apache/datafusion/pull/14532#discussion_r1949114839 ## datafusion/expr-common/src/signature.rs: ## @@ -227,25 +226,13 @@ impl Display for TypeSignatureClass { #[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Has

Re: [PR] function: Allow more expressive array signatures [datafusion]

2025-02-10 Thread via GitHub
jkosh44 commented on code in PR #14532: URL: https://github.com/apache/datafusion/pull/14532#discussion_r1949116013 ## datafusion/common/src/utils/mod.rs: ## @@ -602,26 +602,46 @@ pub fn base_type(data_type: &DataType) -> DataType { /// /// let data_type = DataType::List(Arc:

Re: [I] Project Ideas for GSoC 2025 (Google Summer of Code) [datafusion]

2025-02-10 Thread via GitHub
alamb commented on issue #14478: URL: https://github.com/apache/datafusion/issues/14478#issuecomment-2648095948 > [@alamb](https://github.com/alamb), it would be great if we can add the contents of this issue body, and the application guidelines sub-issue to our web page. I think it would l

Re: [I] proposal: deprecate `Expr::Wildcard` [datafusion]

2025-02-10 Thread via GitHub
findepi commented on issue #7765: URL: https://github.com/apache/datafusion/issues/7765#issuecomment-2648101769 `count(*)` is a special syntax that needs to be handled explicitly -- perhaps even at the parsing phase. It doesn't constitute a reason to keep wildcard as an expression.

Re: [PR] Test all examples from library-user-guide & user-guide docs [datafusion]

2025-02-10 Thread via GitHub
alamb commented on PR #14544: URL: https://github.com/apache/datafusion/pull/14544#issuecomment-2648113354 > Yeah, with just ``` it will be treated as rust code and be run and tested I pushed a fix in [e0b360c](https://github.com/apache/datafusion/pull/14544/commits/e0b360c9c1e28a3079

Re: [PR] Test all examples from library-user-guide & user-guide docs [datafusion]

2025-02-10 Thread via GitHub
alamb commented on PR #14544: URL: https://github.com/apache/datafusion/pull/14544#issuecomment-2648087614 Ah, I see now the issue is that we'll try and test all examples: https://github.com/apache/datafusion/actions/runs/13241873169/job/36958947894?pr=14544 So they need to be marked

Re: [PR] Implement predicate pruning for not like expressions [datafusion]

2025-02-10 Thread via GitHub
alamb commented on PR #14567: URL: https://github.com/apache/datafusion/pull/14567#issuecomment-2648116457 FYI @adriangb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] feat: Implement UNION ALL BY NAME [datafusion]

2025-02-10 Thread via GitHub
alamb commented on PR #14538: URL: https://github.com/apache/datafusion/pull/14538#issuecomment-2648118134 FYI @berkaysynnada and @ozankabak -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Fix: limit is missing after removing SPM [datafusion]

2025-02-10 Thread via GitHub
xudong963 commented on code in PR #14569: URL: https://github.com/apache/datafusion/pull/14569#discussion_r1948920668 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -373,9 +373,10 @@ pub fn ensure_sorting( return adjust_window_sort_removal(requiremen

Re: [PR] Fix: limit is missing after removing SPM [datafusion]

2025-02-10 Thread via GitHub
xudong963 commented on code in PR #14569: URL: https://github.com/apache/datafusion/pull/14569#discussion_r1948960557 ## datafusion/core/tests/physical_optimizer/enforce_sorting.rs: ## @@ -1943,6 +1943,30 @@ async fn test_remove_unnecessary_spm1() -> Result<()> { Ok(()) }

Re: [PR] Fix: limit is missing after removing SPM [datafusion]

2025-02-10 Thread via GitHub
xudong963 commented on code in PR #14569: URL: https://github.com/apache/datafusion/pull/14569#discussion_r1948961642 ## datafusion/core/tests/physical_optimizer/enforce_sorting.rs: ## @@ -1943,6 +1943,30 @@ async fn test_remove_unnecessary_spm1() -> Result<()> { Ok(()) }

Re: [PR] Fix: limit is missing after removing SPM [datafusion]

2025-02-10 Thread via GitHub
xudong963 commented on code in PR #14569: URL: https://github.com/apache/datafusion/pull/14569#discussion_r1948961642 ## datafusion/core/tests/physical_optimizer/enforce_sorting.rs: ## @@ -1943,6 +1943,30 @@ async fn test_remove_unnecessary_spm1() -> Result<()> { Ok(()) }

Re: [PR] Minor: remove unnecessary dependencies in `datafusion-sqllogictest` [datafusion]

2025-02-10 Thread via GitHub
alamb commented on code in PR #14578: URL: https://github.com/apache/datafusion/pull/14578#discussion_r1948962864 ## datafusion/sqllogictest/Cargo.toml: ## @@ -42,9 +42,6 @@ bytes = { workspace = true, optional = true } chrono = { workspace = true, optional = true } clap = { v

[PR] Minor: remove unnecessary dependencies in `datafusion-sqllogictest` [datafusion]

2025-02-10 Thread via GitHub
alamb opened a new pull request, #14578: URL: https://github.com/apache/datafusion/pull/14578 ## Which issue does this PR close? Noticed while working with @logan-keede on - https://github.com/apache/datafusion/pull/14555 ## Rationale for this change Since `dat

Re: [I] Project Ideas for GSoC 2025 (Google Summer of Code) [datafusion]

2025-02-10 Thread via GitHub
ozankabak commented on issue #14478: URL: https://github.com/apache/datafusion/issues/14478#issuecomment-2647837223 @alamb, it would be great if we can add the contents of this issue body, and the application guidelines sub-issue to our web page. I think it would look better for both prospe

Re: [I] [DISCUSSION] 2024 Q4 / 2025 Q1 Roadmap [datafusion]

2025-02-10 Thread via GitHub
alamb closed issue #13274: [DISCUSSION] 2024 Q4 / 2025 Q1 Roadmap URL: https://github.com/apache/datafusion/issues/13274 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] Test all examples from library-user-guide & user-guide docs [datafusion]

2025-02-10 Thread via GitHub
alamb commented on PR #14544: URL: https://github.com/apache/datafusion/pull/14544#issuecomment-2647956520 I also double checked the "hide lines starting with `#` and it looks great!" ![Screenshot 2025-02-10 at 8 14 32  AM](https://github.com/user-attachments/assets/7f1aa86e-43a9-4056-

Re: [I] Release DataFusion `46.0.0` [datafusion]

2025-02-10 Thread via GitHub
xudong963 commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2647969513 @alamb I'll also do some updates in the issue summary. Considering that this is the first time I've been involved in this process, could you please remind me if some cri

Re: [PR] Test all examples from library-user-guide & user-guide docs [datafusion]

2025-02-10 Thread via GitHub
alamb commented on PR #14544: URL: https://github.com/apache/datafusion/pull/14544#issuecomment-2647967289 I will push a few commits to fix this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] Round floats but not decimals in SqlLogicTests [datafusion]

2025-02-10 Thread via GitHub
findepi closed pull request #13743: Round floats but not decimals in SqlLogicTests URL: https://github.com/apache/datafusion/pull/13743 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] Test all examples from library-user-guide & user-guide docs [datafusion]

2025-02-10 Thread via GitHub
ugoa commented on PR #14544: URL: https://github.com/apache/datafusion/pull/14544#issuecomment-2648055540 > Making the examples in the doc rendred with sql rather than ``` was a great idea. However, since those files are automatically generated from the source code we need to update the gen

Re: [I] Document PREPARE statements [datafusion]

2025-02-10 Thread via GitHub
jonahgao commented on issue #13570: URL: https://github.com/apache/datafusion/issues/13570#issuecomment-2648055675 > I've started to put together some basic documentation for prepared statements, but when generating examples I'm struggling to use named parameters. > > Using `SessionCo

[PR] fix: correct the logic of transform shuffle exchange [datafusion-comet]

2025-02-10 Thread via GitHub
wForget opened a new pull request, #1384: URL: https://github.com/apache/datafusion-comet/pull/1384 ## Which issue does this PR close? minor fix ## Rationale for this change The child of CometShuffleExchangeExec should be columnar ## What changes are included in th

Re: [PR] Test all examples from library-user-guide & user-guide docs [datafusion]

2025-02-10 Thread via GitHub
ugoa commented on code in PR #14544: URL: https://github.com/apache/datafusion/pull/14544#discussion_r1949098300 ## docs/rustdoc_trim.py: ## @@ -0,0 +1,72 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE

Re: [I] Support FixedSizeList in `TryFrom<&DataType> for ScalarValue` [datafusion]

2025-02-10 Thread via GitHub
findepi commented on issue #8742: URL: https://github.com/apache/datafusion/issues/8742#issuecomment-2648070163 In https://github.com/apache/datafusion/pull/14558 the API changed, let me update the title accordingly -- This is an automated message from the Apache Git Service. To respond t

Re: [I] Update ClickBench benchmarks with DataFusion `45.0.0` (When Published) [datafusion]

2025-02-10 Thread via GitHub
Dandandan commented on issue #14246: URL: https://github.com/apache/datafusion/issues/14246#issuecomment-2648075345 Nice! Looks we have some more competition now from DuckDB: [results](https://benchmark.clickhouse.com/#eyJzeXN0ZW0iOnsiQWxsb3lEQiI6ZmFsc2UsIkFsbG95REIgKHR1bmVkKSI6ZmFsc2UsIkF0a

Re: [I] Test DataFusion 45.0.0 with Sail [datafusion]

2025-02-10 Thread via GitHub
alamb commented on issue #14408: URL: https://github.com/apache/datafusion/issues/14408#issuecomment-2648260952 > > > [@findepi](https://github.com/findepi) do you mean we should relax the check to ignore nullable / non nullable annotations? -- I think that would probably be ok too. > >

Re: [PR] fix: fetch is missed during EnforceDistribution [datafusion]

2025-02-10 Thread via GitHub
alamb commented on PR #14207: URL: https://github.com/apache/datafusion/pull/14207#issuecomment-2648266814 I wonder if this PR is related to - https://github.com/apache/datafusion/pull/14569 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [I] Hide boilerplate in documentation examples [datafusion]

2025-02-10 Thread via GitHub
alamb commented on issue #14557: URL: https://github.com/apache/datafusion/issues/14557#issuecomment-2648285636 > Please help review workaround solution in [997504f](https://github.com/apache/datafusion/commit/997504f85e37ef6269e5ee660a1136031d6d2a2) in [#14544](https://github.com/apache/da

Re: [PR] Make it easier to create a ScalarValure representing typed null (#14548) [datafusion]

2025-02-10 Thread via GitHub
alamb commented on PR #14558: URL: https://github.com/apache/datafusion/pull/14558#issuecomment-2648301293 > Thank you! > > **As a follow-up** we should remove `impl TryFrom<&DataType> for ScalarValue`. This will be a mechanical but potentially large change to downstream consu

Re: [PR] function: Allow more expressive array signatures [datafusion]

2025-02-10 Thread via GitHub
jkosh44 commented on PR #14532: URL: https://github.com/apache/datafusion/pull/14532#issuecomment-2648300594 > It might be a better approach to not modify the accepted argument types (i.e. don't convert `FixedSizeList` to `List` in `get_valid_types()`, and instead move the logic to `return_

Re: [I] Implement physical plan for EXISTS subquery [datafusion]

2025-02-10 Thread via GitHub
alamb commented on issue #123: URL: https://github.com/apache/datafusion/issues/123#issuecomment-2648298073 Looks good -- thanks for checking @logan-keede -- we can open a new issue if we find another hole -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [I] [DISCUSS] More extensive pre-release testing [datafusion]

2025-02-10 Thread via GitHub
alamb commented on issue #13661: URL: https://github.com/apache/datafusion/issues/13661#issuecomment-2648295735 > [@alamb](https://github.com/alamb) I did some personal testing with Cargo mutants. We can have a manually runnable pipeline, or something scheduled every day, that could run mut

Re: [I] Implement physical plan for EXISTS subquery [datafusion]

2025-02-10 Thread via GitHub
alamb closed issue #123: Implement physical plan for EXISTS subquery URL: https://github.com/apache/datafusion/issues/123 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [I] proposal: deprecate `Expr::Wildcard` [datafusion]

2025-02-10 Thread via GitHub
alamb commented on issue #7765: URL: https://github.com/apache/datafusion/issues/7765#issuecomment-2648305668 > `count(*)` is a special syntax that needs to be handled explicitly -- perhaps even at the parsing phase. > It doesn't constitute a reason to keep wildcard as an expression.

Re: [PR] Feat: support array_except function [datafusion-comet]

2025-02-10 Thread via GitHub
kazuyukitanimura commented on code in PR #1343: URL: https://github.com/apache/datafusion-comet/pull/1343#discussion_r1949690727 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -2387,6 +2387,8 @@ object QueryPlanSerde extends Logging with ShimQueryPla

Re: [PR] Feat: support array_except function [datafusion-comet]

2025-02-10 Thread via GitHub
kazuyukitanimura commented on code in PR #1343: URL: https://github.com/apache/datafusion-comet/pull/1343#discussion_r1949694482 ## spark/src/test/scala/org/apache/comet/CometArrayExpressionSuite.scala: ## @@ -292,4 +292,89 @@ class CometArrayExpressionSuite extends CometTestBas

Re: [PR] feat: Add fair unified memory pool [datafusion-comet]

2025-02-10 Thread via GitHub
kazuyukitanimura commented on PR #1369: URL: https://github.com/apache/datafusion-comet/pull/1369#issuecomment-2648941470 Thanks @andygrove for trying. You may have to increase the memory because the fair memory pool limits memory usage earlier in order to leave the the rest for other thre

[PR] Speedup `date_trunc` (~20% time reduction) [datafusion]

2025-02-10 Thread via GitHub
simonvandel opened a new pull request, #14593: URL: https://github.com/apache/datafusion/pull/14593 ## Which issue does this PR close? N/A ## Rationale for this change I haven't looked at the generated code, but i presume that using `try_unary` can lead to better vectori

Re: [I] Perf: Dataframe with_column and with_column_renamed are slow [datafusion]

2025-02-10 Thread via GitHub
Omega359 commented on issue #14563: URL: https://github.com/apache/datafusion/issues/14563#issuecomment-2649003746 If someone would be so kind as to [generate a flamegraph](https://datafusion.apache.org/library-user-guide/profiling.html#example-flamegraph-for-a-benchmark) for the benchmark

Re: [PR] Add support for MS Varbinary(MAX) (#1714) [datafusion-sqlparser-rs]

2025-02-10 Thread via GitHub
TylerBrinks commented on PR #1715: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1715#issuecomment-2649495366 I think I got it right in the updated most recent push. Awaiting approval if it passes the criteria. -- This is an automated message from the Apache Git Service. T

Re: [PR] doc: update ballista client front page [datafusion-ballista]

2025-02-10 Thread via GitHub
andygrove merged PR #1171: URL: https://github.com/apache/datafusion-ballista/pull/1171 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [I] `/api/executors` does not show executors if `TaskSchedulingPolicy::PullStaged` [datafusion-ballista]

2025-02-10 Thread via GitHub
andygrove closed issue #1174: `/api/executors` does not show executors if `TaskSchedulingPolicy::PullStaged` URL: https://github.com/apache/datafusion-ballista/issues/1174 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] fix: rest api `/api/executors` does not show executors if `TaskSchedulingPolicy::PullStaged` [datafusion-ballista]

2025-02-10 Thread via GitHub
andygrove merged PR #1175: URL: https://github.com/apache/datafusion-ballista/pull/1175 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] chore: generate change log for 44.0.0 [datafusion-ballista]

2025-02-10 Thread via GitHub
andygrove merged PR #1173: URL: https://github.com/apache/datafusion-ballista/pull/1173 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] Fix Float and Decimal coercion [datafusion]

2025-02-10 Thread via GitHub
findepi commented on code in PR #14273: URL: https://github.com/apache/datafusion/pull/14273#discussion_r1949783816 ## datafusion/sqllogictest/test_files/tpch/plans/q6.slt.part: ## @@ -31,13 +31,13 @@ logical_plan 01)Projection: sum(lineitem.l_extendedprice * lineitem.l_discoun

Re: [I] [DISCUSSION] 2025 Q1-Q2 Roadmap [datafusion]

2025-02-10 Thread via GitHub
Omega359 commented on issue #14580: URL: https://github.com/apache/datafusion/issues/14580#issuecomment-2649254014 My personal list - Find/fix cause of [with_column extremely poor performance](https://github.com/apache/datafusion/issues/14563) - Add [SessionConfig to ScalarFunc

[I] Extended tests are currently failing with 'No space left on device' [datafusion]

2025-02-10 Thread via GitHub
Omega359 opened a new issue, #14591: URL: https://github.com/apache/datafusion/issues/14591 ### Describe the bug https://github.com/apache/datafusion/actions/workflows/extended.yml We likely need to add cargo clean between steps. ### To Reproduce _No response_

Re: [I] Extended tests are currently failing with 'No space left on device' [datafusion]

2025-02-10 Thread via GitHub
Omega359 closed issue #14591: Extended tests are currently failing with 'No space left on device' URL: https://github.com/apache/datafusion/issues/14591 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] Extended tests are currently failing with 'No space left on device' [datafusion]

2025-02-10 Thread via GitHub
Omega359 commented on issue #14591: URL: https://github.com/apache/datafusion/issues/14591#issuecomment-2649265913 Dup of https://github.com/apache/datafusion/issues/14576 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [I] Extended tests are (still) failing on main [datafusion]

2025-02-10 Thread via GitHub
Omega359 commented on issue #14576: URL: https://github.com/apache/datafusion/issues/14576#issuecomment-2649274047 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[PR] Adding cargo clean at the end of every step [datafusion]

2025-02-10 Thread via GitHub
Omega359 opened a new pull request, #14592: URL: https://github.com/apache/datafusion/pull/14592 ## Which issue does this PR close? - Closes #14576 ## Rationale for this change Get CI working again. ## What changes are included in this PR? added carg

Re: [PR] function: Allow more expressive array signatures [datafusion]

2025-02-10 Thread via GitHub
jayzhan211 commented on code in PR #14532: URL: https://github.com/apache/datafusion/pull/14532#discussion_r1950111843 ## datafusion/expr-common/src/signature.rs: ## @@ -227,25 +226,13 @@ impl Display for TypeSignatureClass { #[derive(Debug, Clone, PartialEq, Eq, PartialOrd,

Re: [I] proposal: deprecate `Expr::Wildcard` [datafusion]

2025-02-10 Thread via GitHub
jayzhan211 commented on issue #7765: URL: https://github.com/apache/datafusion/issues/7765#issuecomment-2649611265 I don't think we use wildcard for count in datafusion, `COUNT_STAR_EXPANSION` is used instead which is `count(1)`. As long as we have alternative representation of wildcard (i.

Re: [PR] function: Allow more expressive array signatures [datafusion]

2025-02-10 Thread via GitHub
jayzhan211 commented on code in PR #14532: URL: https://github.com/apache/datafusion/pull/14532#discussion_r1950111843 ## datafusion/expr-common/src/signature.rs: ## @@ -227,25 +226,13 @@ impl Display for TypeSignatureClass { #[derive(Debug, Clone, PartialEq, Eq, PartialOrd,

Re: [PR] function: Allow more expressive array signatures [datafusion]

2025-02-10 Thread via GitHub
jayzhan211 commented on code in PR #14532: URL: https://github.com/apache/datafusion/pull/14532#discussion_r1950111843 ## datafusion/expr-common/src/signature.rs: ## @@ -227,25 +226,13 @@ impl Display for TypeSignatureClass { #[derive(Debug, Clone, PartialEq, Eq, PartialOrd,

Re: [PR] function: Allow more expressive array signatures [datafusion]

2025-02-10 Thread via GitHub
jayzhan211 commented on code in PR #14532: URL: https://github.com/apache/datafusion/pull/14532#discussion_r1950125583 ## datafusion/functions-nested/src/remove.rs: ## @@ -98,7 +99,7 @@ impl ScalarUDFImpl for ArrayRemove { } fn return_type(&self, arg_types: &[DataTyp

Re: [I] Create UNION plan node with correct schema [datafusion]

2025-02-10 Thread via GitHub
jayzhan211 commented on issue #14380: URL: https://github.com/apache/datafusion/issues/14380#issuecomment-2649641630 > Since `exprlist_to_fields` is called in the builder, it seems that wildcard expansion still hasn't been delayed. Computing schema for wildcard is different from expan

Re: [PR] Signature::Coercible with user defined implicit casting [datafusion]

2025-02-10 Thread via GitHub
shehabgamin commented on PR #14440: URL: https://github.com/apache/datafusion/pull/14440#issuecomment-2649643809 @jayzhan211 I will re-review by tomorrow EOD! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] Implement RightSemi join for SortMergeJoin [datafusion]

2025-02-10 Thread via GitHub
github-actions[bot] closed pull request #13584: Implement RightSemi join for SortMergeJoin URL: https://github.com/apache/datafusion/pull/13584 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] feat: Add fair unified memory pool [datafusion-comet]

2025-02-10 Thread via GitHub
kazuyukitanimura commented on code in PR #1369: URL: https://github.com/apache/datafusion-comet/pull/1369#discussion_r1949709211 ## native/core/src/execution/fair_memory_pool.rs: ## @@ -0,0 +1,159 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more cont

Re: [PR] feat: Add fair unified memory pool [datafusion-comet]

2025-02-10 Thread via GitHub
kazuyukitanimura commented on PR #1369: URL: https://github.com/apache/datafusion-comet/pull/1369#issuecomment-2648964068 Went ahead and kept the default pool choice @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Adding cargo clean at the end of every step [datafusion]

2025-02-10 Thread via GitHub
Omega359 commented on PR #14592: URL: https://github.com/apache/datafusion/pull/14592#issuecomment-2649389285 It didn't solve the issue :( -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Test all examples from library-user-guide & user-guide docs [datafusion]

2025-02-10 Thread via GitHub
ugoa commented on PR #14544: URL: https://github.com/apache/datafusion/pull/14544#issuecomment-2649706593 Hey @alamb , btw we probably need to remove the installation of `cmake` in the `.github/actions/setup-builder/action.yaml` since it was added for `snmalloc-rs = "0.3"` which we don't ne

Re: [PR] Fix Float and Decimal coercion [datafusion]

2025-02-10 Thread via GitHub
andygrove commented on code in PR #14273: URL: https://github.com/apache/datafusion/pull/14273#discussion_r1949798289 ## datafusion/sqllogictest/test_files/tpch/plans/q6.slt.part: ## @@ -31,13 +31,13 @@ logical_plan 01)Projection: sum(lineitem.l_extendedprice * lineitem.l_disco

[PR] Introducing mutation testing [datafusion]

2025-02-10 Thread via GitHub
edmondop opened a new pull request, #14590: URL: https://github.com/apache/datafusion/pull/14590 Adds an ad-hoc pipeline for running cargo mutants as discussed in #14589 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[I] DuplicateQualifiedField With Paritioned Data [datafusion-python]

2025-02-10 Thread via GitHub
cfis opened a new issue, #1018: URL: https://github.com/apache/datafusion-python/issues/1018 This might be more of an arrow issue, but I am running into this error: `Exception: DataFusion error: SchemaError(DuplicateQualifiedField { qualifier: Bare { table: "data" }, name: "year" }, S

Re: [I] Attach `Diagnostic` to "incompatible type in unary expression" error [datafusion]

2025-02-10 Thread via GitHub
alan910127 commented on issue #14433: URL: https://github.com/apache/datafusion/issues/14433#issuecomment-2649567282 @eliaperantoni thank you so much for the detailed explanation! Since `Expr::Negative` and `Expr::Not` just wraps another `Expr`, does it mean that I need to make span availab

Re: [PR] Signature::Coercible with user defined implicit casting [datafusion]

2025-02-10 Thread via GitHub
jayzhan211 commented on code in PR #14440: URL: https://github.com/apache/datafusion/pull/14440#discussion_r1950091410 ## datafusion/expr/src/type_coercion/functions.rs: ## @@ -596,75 +594,93 @@ fn get_valid_types( vec![vec![target_type; *num]] }

Re: [PR] chore: Adding an optional `hdfs` crate [datafusion-comet]

2025-02-10 Thread via GitHub
comphead commented on PR #1377: URL: https://github.com/apache/datafusion-comet/pull/1377#issuecomment-2649578244 @parthchandra @andygrove @kazuyukitanimura @mbutrovich -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[I] Fails to parse FORCE INDEX(idx_test) [datafusion-sqlparser-rs]

2025-02-10 Thread via GitHub
jasonbhansen opened a new issue, #1722: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1722 `fn main() -> ProfilerResult<()> { // tracing_subscriber::fmt() // .with_max_level(tracing::Level::DEBUG) // .init(); use sqlparser::dialect::GenericD

Re: [PR] Adding cargo clean at the end of every step [datafusion]

2025-02-10 Thread via GitHub
Omega359 commented on PR #14592: URL: https://github.com/apache/datafusion/pull/14592#issuecomment-2649406057 I recommend we disable that test until someone is able to spend more time looking into why it is using so much disk space -- This is an automated message from the Apache Git Servi

Re: [PR] fix(substrait): Do not add implicit groupBy expressions when building logical plans from Substrait [datafusion]

2025-02-10 Thread via GitHub
anlinc commented on code in PR #14553: URL: https://github.com/apache/datafusion/pull/14553#discussion_r194450 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -1090,11 +1090,31 @@ impl LogicalPlanBuilder { group_expr: impl IntoIterator>, aggr_expr: im

Re: [I] A 'cache control' header is missing or empty webkit [datafusion]

2025-02-10 Thread via GitHub
Omega359 commented on issue #14542: URL: https://github.com/apache/datafusion/issues/14542#issuecomment-2649418794 Is this an issue with DataFusion's website or something else? I'm leaning towards you filing this ticket on the wrong project here. -- This is an automated message from the A

Re: [I] [EPIC] A(nother) list of performance improvement tickets [datafusion]

2025-02-10 Thread via GitHub
Omega359 commented on issue #14482: URL: https://github.com/apache/datafusion/issues/14482#issuecomment-2649427623 I know the dataframe api isn't used by many but it needs some love too: https://github.com/apache/datafusion/issues/14563 -- This is an automated message from the Apache Git

Re: [PR] POC to show performance improvements of not copying token [datafusion-sqlparser-rs]

2025-02-10 Thread via GitHub
github-actions[bot] commented on PR #1561: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1561#issuecomment-2649649028 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or

Re: [PR] feat: Add `array_max` function support [datafusion]

2025-02-10 Thread via GitHub
jayzhan211 commented on PR #14470: URL: https://github.com/apache/datafusion/pull/14470#issuecomment-2649652793 > > I have the same question for `array_min`, but if this function is highly interested from many people then adding it to datafusion core is not a bad idea. [#14417 (comment)](h

Re: [I] CometHashJoin always selects BuildRight which causes potential performance regression [datafusion-comet]

2025-02-10 Thread via GitHub
hayman42 commented on issue #1382: URL: https://github.com/apache/datafusion-comet/issues/1382#issuecomment-2649656840 @parthchandra With comet shuffle disabled, the plan is almost like vanilla spark's because it replaces comet SHJ to spark SHJ. And thus it preserves spark's performance. H

Re: [PR] fix: correct the logic of transform shuffle exchange [datafusion-comet]

2025-02-10 Thread via GitHub
wForget closed pull request #1384: fix: correct the logic of transform shuffle exchange URL: https://github.com/apache/datafusion-comet/pull/1384 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] fix: correct the logic of transform shuffle exchange [datafusion-comet]

2025-02-10 Thread via GitHub
wForget commented on code in PR #1384: URL: https://github.com/apache/datafusion-comet/pull/1384#discussion_r1950145187 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -862,7 +862,7 @@ class CometSparkSessionExtensions newOp match

[I] Ad-hoc or scheduled mutation based testing [datafusion]

2025-02-10 Thread via GitHub
edmondop opened a new issue, #14589: URL: https://github.com/apache/datafusion/issues/14589 As described in #13661 we want to improve our pre-release testing. I propose we adopt [mutation based testing](https://en.wikipedia.org/wiki/Mutation_testing) using [`cargo mutants`](https://mutants

Re: [I] [DISCUSS] More extensive pre-release testing [datafusion]

2025-02-10 Thread via GitHub
edmondop commented on issue #13661: URL: https://github.com/apache/datafusion/issues/13661#issuecomment-2649098650 I have logged [this](https://github.com/apache/datafusion/issues/14589) @alamb there are some more details in that issue -- This is an automated message from the Apache Git

Re: [I] coercion of input types in `coalesce` leads to type unsupported arrow cast [datafusion]

2025-02-10 Thread via GitHub
alamb commented on issue #14581: URL: https://github.com/apache/datafusion/issues/14581#issuecomment-2649156397 I believe @jayzhan211 is working on coercion - https://github.com/apache/datafusion/pull/14440 -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] function: Allow more expressive array signatures [datafusion]

2025-02-10 Thread via GitHub
jkosh44 commented on code in PR #14532: URL: https://github.com/apache/datafusion/pull/14532#discussion_r1949114839 ## datafusion/expr-common/src/signature.rs: ## @@ -227,25 +226,13 @@ impl Display for TypeSignatureClass { #[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Has

Re: [PR] feat: Add fair unified memory pool [datafusion-comet]

2025-02-10 Thread via GitHub
kazuyukitanimura merged PR #1369: URL: https://github.com/apache/datafusion-comet/pull/1369 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsub

Re: [PR] feat: Add fair unified memory pool [datafusion-comet]

2025-02-10 Thread via GitHub
kazuyukitanimura commented on PR #1369: URL: https://github.com/apache/datafusion-comet/pull/1369#issuecomment-2649342229 Merged, thanks @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] Extended tests are (still) failing on main [datafusion]

2025-02-10 Thread via GitHub
alamb closed issue #14576: Extended tests are (still) failing on main URL: https://github.com/apache/datafusion/issues/14576 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] chore(deps): bump serialize-javascript and copy-webpack-plugin in /datafusion/wasmtest/datafusion-wasm-app [datafusion]

2025-02-10 Thread via GitHub
dependabot[bot] opened a new pull request, #14594: URL: https://github.com/apache/datafusion/pull/14594 Bumps [serialize-javascript](https://github.com/yahoo/serialize-javascript) to 6.0.2 and updates ancestor dependency [copy-webpack-plugin](https://github.com/webpack-contrib/copy-webpack-

Re: [PR] Adding cargo clean at the end of every step [datafusion]

2025-02-10 Thread via GitHub
alamb merged PR #14592: URL: https://github.com/apache/datafusion/pull/14592 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] CometHashJoin always selects BuildRight which causes potential performance regression [datafusion-comet]

2025-02-10 Thread via GitHub
parthchandra commented on issue #1382: URL: https://github.com/apache/datafusion-comet/issues/1382#issuecomment-2649541224 @hayman42 what a great find. I have not observed this myself even at SF1, probably because by default we were falling back to SMJ. Would you be able to compare the

Re: [PR] fix: correct the logic of transform shuffle exchange [datafusion-comet]

2025-02-10 Thread via GitHub
parthchandra commented on code in PR #1384: URL: https://github.com/apache/datafusion-comet/pull/1384#discussion_r1950074930 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -862,7 +862,7 @@ class CometSparkSessionExtensions newOp

Re: [PR] feat: Implement UNION ALL BY NAME [datafusion]

2025-02-10 Thread via GitHub
Omega359 commented on PR #14538: URL: https://github.com/apache/datafusion/pull/14538#issuecomment-2649370553 I ran the sqlite sqllogictests against your branch and it passed so none of those files covered union (all) by name -- This is an automated message from the Apache Git Service. To

Re: [PR] feat: Implement UNION ALL BY NAME [datafusion]

2025-02-10 Thread via GitHub
Omega359 commented on code in PR #14538: URL: https://github.com/apache/datafusion/pull/14538#discussion_r1949980327 ## datafusion/sqllogictest/test_files/union_by_name.slt: ## @@ -0,0 +1,264 @@ +# Licensed to the Apache Software Foundation (ASF) under one Review Comment: Du

Re: [I] Release DataFusion `46.0.0` [datafusion]

2025-02-10 Thread via GitHub
xudong963 commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2649751148 > [@xudong963](https://github.com/xudong963) when would you like to start making the release? Maybe we should targe the week of Feb 24 🤔 yes, the week is suitable. --

Re: [PR] Implement predicate pruning for not like expressions [datafusion]

2025-02-10 Thread via GitHub
adriangb commented on code in PR #14567: URL: https://github.com/apache/datafusion/pull/14567#discussion_r1950314013 ## datafusion/physical-optimizer/src/pruning.rs: ## @@ -1710,6 +1717,56 @@ fn build_like_match( Some(combined) } +// For predicate `col NOT LIKE 'foo%'`,

<    1   2   3   >