Re: [I] Simple Functions [datafusion]

2025-02-10 Thread via GitHub
findepi commented on issue #12635: URL: https://github.com/apache/datafusion/issues/12635#issuecomment-2650048313 Going back to this topic. Not much happened on the Simple Functions front, but a lot happened in the world. New type system changes (Logical types can be found in type signature

Re: [PR] feat: Add unbounded memory pool [datafusion-comet]

2025-02-10 Thread via GitHub
codecov-commenter commented on PR #1386: URL: https://github.com/apache/datafusion-comet/pull/1386#issuecomment-2650028890 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1386?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [I] Attach `Diagnostic` to "incompatible type in unary expression" error [datafusion]

2025-02-10 Thread via GitHub
eliaperantoni commented on issue #14433: URL: https://github.com/apache/datafusion/issues/14433#issuecomment-2650017992 > @eliaperantoni thank you so much for the detailed explanation! Since `Expr::Negative` and `Expr::Not` just wraps another `Expr`, does it mean that I need to make span av

Re: [PR] chore: Adding an optional `hdfs` crate [datafusion-comet]

2025-02-10 Thread via GitHub
zuston commented on PR #1377: URL: https://github.com/apache/datafusion-comet/pull/1377#issuecomment-2649984295 From the org of [datafusion-contrib](https://github.com/datafusion-contrib?q=hdfs&type=all&language=&sort=), I see many hdfs crates, which one is best for comet? -- T

[PR] feat: Add unbounded memory pool [datafusion-comet]

2025-02-10 Thread via GitHub
kazuyukitanimura opened a new pull request, #1386: URL: https://github.com/apache/datafusion-comet/pull/1386 ## Which issue does this PR close? ## Rationale for this change DataFusion has an unbounded memory pool. I found it useful for experimental purpose. ## What chang

Re: [PR] Implement predicate pruning for not like expressions [datafusion]

2025-02-10 Thread via GitHub
adriangb commented on code in PR #14567: URL: https://github.com/apache/datafusion/pull/14567#discussion_r1950314013 ## datafusion/physical-optimizer/src/pruning.rs: ## @@ -1710,6 +1717,56 @@ fn build_like_match( Some(combined) } +// For predicate `col NOT LIKE 'foo%'`,

Re: [PR] Implement predicate pruning for not like expressions [datafusion]

2025-02-10 Thread via GitHub
adriangb commented on code in PR #14567: URL: https://github.com/apache/datafusion/pull/14567#discussion_r1950314013 ## datafusion/physical-optimizer/src/pruning.rs: ## @@ -1710,6 +1717,56 @@ fn build_like_match( Some(combined) } +// For predicate `col NOT LIKE 'foo%'`,

Re: [PR] minor: check size overflow before string repeat build [datafusion]

2025-02-10 Thread via GitHub
wForget commented on PR #14575: URL: https://github.com/apache/datafusion/pull/14575#issuecomment-2649905045 > thanks @wForget I feel it is great to avoid fallible scenario. Please create a bench like for other string functions to see if we get a performance downside introducing new branch

[PR] enable full decimal to decimal support [datafusion-comet]

2025-02-10 Thread via GitHub
himadripal opened a new pull request, #1385: URL: https://github.com/apache/datafusion-comet/pull/1385 Completes #375 - enable decimal to decimal - remove hard coded castoptions to pass to native execution - use a regex to match arrow invalid argument error. ## Which is

Re: [PR] Implement predicate pruning for not like expressions [datafusion]

2025-02-10 Thread via GitHub
UBarney commented on code in PR #14567: URL: https://github.com/apache/datafusion/pull/14567#discussion_r1950259492 ## datafusion/physical-optimizer/src/pruning.rs: ## @@ -1710,6 +1717,56 @@ fn build_like_match( Some(combined) } +// For predicate `col NOT LIKE 'foo%'`, w

Re: [PR] Implement predicate pruning for not like expressions [datafusion]

2025-02-10 Thread via GitHub
UBarney commented on code in PR #14567: URL: https://github.com/apache/datafusion/pull/14567#discussion_r1950259274 ## datafusion/physical-optimizer/src/pruning.rs: ## @@ -1590,6 +1590,13 @@ fn build_statistics_expr( )), )) } +Opera

Re: [PR] Implement predicate pruning for not like expressions [datafusion]

2025-02-10 Thread via GitHub
UBarney commented on PR #14567: URL: https://github.com/apache/datafusion/pull/14567#issuecomment-2649842916 > I assume this should also work with no wildcards `col not like 'foo'`? Yes. add some test to demonstrate it There's also an optimization to rewrite `col not like 'foo'` as

[PR] refactor: remove uses of arrow_schema and use reexport in arrow instead [datafusion]

2025-02-10 Thread via GitHub
Chen-Yuan-Lai opened a new pull request, #14597: URL: https://github.com/apache/datafusion/pull/14597 ## Which issue does this PR close? - Closes #14115. ## Rationale for this change ## What changes are included in this PR? As https://github

Re: [PR] cli: Add nested expressions [datafusion]

2025-02-10 Thread via GitHub
jkosh44 closed pull request #14596: cli: Add nested expressions URL: https://github.com/apache/datafusion/pull/14596 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [PR] function: Allow more expressive array signatures [datafusion]

2025-02-10 Thread via GitHub
jkosh44 commented on code in PR #14532: URL: https://github.com/apache/datafusion/pull/14532#discussion_r1950225082 ## datafusion/expr-common/src/signature.rs: ## @@ -227,25 +226,13 @@ impl Display for TypeSignatureClass { #[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Has

Re: [PR] fix(substrait): Do not add implicit groupBy expressions when building logical plans from Substrait [datafusion]

2025-02-10 Thread via GitHub
anlinc commented on code in PR #14553: URL: https://github.com/apache/datafusion/pull/14553#discussion_r194450 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -1090,11 +1090,31 @@ impl LogicalPlanBuilder { group_expr: impl IntoIterator>, aggr_expr: im

[PR] cli: Add nested expressions [datafusion]

2025-02-10 Thread via GitHub
jkosh44 opened a new pull request, #14596: URL: https://github.com/apache/datafusion/pull/14596 ## Which issue does this PR close? None ## Rationale for this change Without this I am unable to use nested expressions with the CLI. ## What changes are included in thi

Re: [I] ListingTable cannot handle partition evolution [datafusion]

2025-02-10 Thread via GitHub
zhuqi-lucas commented on issue #13270: URL: https://github.com/apache/datafusion/issues/13270#issuecomment-2649754388 @adriangb Sorry for the delay, i am starting to investigate this issue this week. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [I] Release DataFusion `46.0.0` [datafusion]

2025-02-10 Thread via GitHub
xudong963 commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2649751148 > [@xudong963](https://github.com/xudong963) when would you like to start making the release? Maybe we should targe the week of Feb 24 🤔 yes, the week is suitable. --

Re: [I] Implement nested join optimization [datafusion]

2025-02-10 Thread via GitHub
clflushopt commented on issue #3843: URL: https://github.com/apache/datafusion/issues/3843#issuecomment-2649742371 Hey @alamb thanks for the clear answer, yes that sounds good ! It's seems that both ticket for interval boundary and selectivity analysis for `AND` & `OR` conjunctions seem ope

Re: [I] CometHashJoin always selects BuildRight which causes potential performance regression [datafusion-comet]

2025-02-10 Thread via GitHub
parthchandra commented on issue #1382: URL: https://github.com/apache/datafusion-comet/issues/1382#issuecomment-2649739992 > [@parthchandra](https://github.com/parthchandra) With comet shuffle disabled, the plan is almost like vanilla spark's because it replaces comet SHJ to spark SHJ. And

Re: [PR] Test all examples from library-user-guide & user-guide docs [datafusion]

2025-02-10 Thread via GitHub
ugoa commented on PR #14544: URL: https://github.com/apache/datafusion/pull/14544#issuecomment-2649706593 Hey @alamb , btw we probably need to remove the installation of `cmake` in the `.github/actions/setup-builder/action.yaml` since it was added for `snmalloc-rs = "0.3"` which we don't ne

Re: [PR] fix: correct the logic of transform shuffle exchange [datafusion-comet]

2025-02-10 Thread via GitHub
wForget commented on code in PR #1384: URL: https://github.com/apache/datafusion-comet/pull/1384#discussion_r1950145187 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -862,7 +862,7 @@ class CometSparkSessionExtensions newOp match

Re: [PR] fix: correct the logic of transform shuffle exchange [datafusion-comet]

2025-02-10 Thread via GitHub
wForget closed pull request #1384: fix: correct the logic of transform shuffle exchange URL: https://github.com/apache/datafusion-comet/pull/1384 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] CometHashJoin always selects BuildRight which causes potential performance regression [datafusion-comet]

2025-02-10 Thread via GitHub
hayman42 commented on issue #1382: URL: https://github.com/apache/datafusion-comet/issues/1382#issuecomment-2649656840 @parthchandra With comet shuffle disabled, the plan is almost like vanilla spark's because it replaces comet SHJ to spark SHJ. And thus it preserves spark's performance. H

Re: [PR] feat: Add `array_max` function support [datafusion]

2025-02-10 Thread via GitHub
jayzhan211 commented on PR #14470: URL: https://github.com/apache/datafusion/pull/14470#issuecomment-2649652793 > > I have the same question for `array_min`, but if this function is highly interested from many people then adding it to datafusion core is not a bad idea. [#14417 (comment)](h

Re: [PR] POC to show performance improvements of not copying token [datafusion-sqlparser-rs]

2025-02-10 Thread via GitHub
github-actions[bot] commented on PR #1561: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1561#issuecomment-2649649028 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or

Re: [PR] Implement RightSemi join for SortMergeJoin [datafusion]

2025-02-10 Thread via GitHub
github-actions[bot] closed pull request #13584: Implement RightSemi join for SortMergeJoin URL: https://github.com/apache/datafusion/pull/13584 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Signature::Coercible with user defined implicit casting [datafusion]

2025-02-10 Thread via GitHub
shehabgamin commented on PR #14440: URL: https://github.com/apache/datafusion/pull/14440#issuecomment-2649643809 @jayzhan211 I will re-review by tomorrow EOD! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [I] Create UNION plan node with correct schema [datafusion]

2025-02-10 Thread via GitHub
jayzhan211 commented on issue #14380: URL: https://github.com/apache/datafusion/issues/14380#issuecomment-2649641630 > Since `exprlist_to_fields` is called in the builder, it seems that wildcard expansion still hasn't been delayed. Computing schema for wildcard is different from expan

Re: [PR] function: Allow more expressive array signatures [datafusion]

2025-02-10 Thread via GitHub
jayzhan211 commented on code in PR #14532: URL: https://github.com/apache/datafusion/pull/14532#discussion_r1950125583 ## datafusion/functions-nested/src/remove.rs: ## @@ -98,7 +99,7 @@ impl ScalarUDFImpl for ArrayRemove { } fn return_type(&self, arg_types: &[DataTyp

Re: [PR] function: Allow more expressive array signatures [datafusion]

2025-02-10 Thread via GitHub
jayzhan211 commented on code in PR #14532: URL: https://github.com/apache/datafusion/pull/14532#discussion_r1950111843 ## datafusion/expr-common/src/signature.rs: ## @@ -227,25 +226,13 @@ impl Display for TypeSignatureClass { #[derive(Debug, Clone, PartialEq, Eq, PartialOrd,

Re: [PR] function: Allow more expressive array signatures [datafusion]

2025-02-10 Thread via GitHub
jayzhan211 commented on code in PR #14532: URL: https://github.com/apache/datafusion/pull/14532#discussion_r1950111843 ## datafusion/expr-common/src/signature.rs: ## @@ -227,25 +226,13 @@ impl Display for TypeSignatureClass { #[derive(Debug, Clone, PartialEq, Eq, PartialOrd,

Re: [I] proposal: deprecate `Expr::Wildcard` [datafusion]

2025-02-10 Thread via GitHub
jayzhan211 commented on issue #7765: URL: https://github.com/apache/datafusion/issues/7765#issuecomment-2649611265 I don't think we use wildcard for count in datafusion, `COUNT_STAR_EXPANSION` is used instead which is `count(1)`. As long as we have alternative representation of wildcard (i.

Re: [PR] function: Allow more expressive array signatures [datafusion]

2025-02-10 Thread via GitHub
jayzhan211 commented on code in PR #14532: URL: https://github.com/apache/datafusion/pull/14532#discussion_r1950111843 ## datafusion/expr-common/src/signature.rs: ## @@ -227,25 +226,13 @@ impl Display for TypeSignatureClass { #[derive(Debug, Clone, PartialEq, Eq, PartialOrd,

Re: [I] Create UNION plan node with correct schema [datafusion]

2025-02-10 Thread via GitHub
jayzhan211 commented on issue #14380: URL: https://github.com/apache/datafusion/issues/14380#issuecomment-2649605687 > but some users do not want the casts introduced by `TypeCoercion` We can make it optional -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] support simple lateral joins [datafusion]

2025-02-10 Thread via GitHub
skyzh commented on PR #14595: URL: https://github.com/apache/datafusion/pull/14595#issuecomment-2649598864 cc @alamb would you please help take a look as you were engaged on lateral join discussions before? Thanks :) -- This is an automated message from the Apache Git Service. To respond

[PR] support simple lateral joins [datafusion]

2025-02-10 Thread via GitHub
skyzh opened a new pull request, #14595: URL: https://github.com/apache/datafusion/pull/14595 ## Which issue does this PR close? - part of https://github.com/apache/datafusion/issues/10048 ## Rationale for this change Add partial lateral join support.

Re: [PR] chore: Adding an optional `hdfs` crate [datafusion-comet]

2025-02-10 Thread via GitHub
comphead commented on PR #1377: URL: https://github.com/apache/datafusion-comet/pull/1377#issuecomment-2649578244 @parthchandra @andygrove @kazuyukitanimura @mbutrovich -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Signature::Coercible with user defined implicit casting [datafusion]

2025-02-10 Thread via GitHub
jayzhan211 commented on code in PR #14440: URL: https://github.com/apache/datafusion/pull/14440#discussion_r1950091410 ## datafusion/expr/src/type_coercion/functions.rs: ## @@ -596,75 +594,93 @@ fn get_valid_types( vec![vec![target_type; *num]] }

[I] DuplicateQualifiedField With Paritioned Data [datafusion-python]

2025-02-10 Thread via GitHub
cfis opened a new issue, #1018: URL: https://github.com/apache/datafusion-python/issues/1018 This might be more of an arrow issue, but I am running into this error: `Exception: DataFusion error: SchemaError(DuplicateQualifiedField { qualifier: Bare { table: "data" }, name: "year" }, S

Re: [I] Attach `Diagnostic` to "incompatible type in unary expression" error [datafusion]

2025-02-10 Thread via GitHub
alan910127 commented on issue #14433: URL: https://github.com/apache/datafusion/issues/14433#issuecomment-2649567282 @eliaperantoni thank you so much for the detailed explanation! Since `Expr::Negative` and `Expr::Not` just wraps another `Expr`, does it mean that I need to make span availab

Re: [PR] fix: correct the logic of transform shuffle exchange [datafusion-comet]

2025-02-10 Thread via GitHub
parthchandra commented on code in PR #1384: URL: https://github.com/apache/datafusion-comet/pull/1384#discussion_r1950074930 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -862,7 +862,7 @@ class CometSparkSessionExtensions newOp

Re: [I] CometHashJoin always selects BuildRight which causes potential performance regression [datafusion-comet]

2025-02-10 Thread via GitHub
parthchandra commented on issue #1382: URL: https://github.com/apache/datafusion-comet/issues/1382#issuecomment-2649541224 @hayman42 what a great find. I have not observed this myself even at SF1, probably because by default we were falling back to SMJ. Would you be able to compare the

Re: [PR] chore: generate change log for 44.0.0 [datafusion-ballista]

2025-02-10 Thread via GitHub
andygrove merged PR #1173: URL: https://github.com/apache/datafusion-ballista/pull/1173 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] fix: rest api `/api/executors` does not show executors if `TaskSchedulingPolicy::PullStaged` [datafusion-ballista]

2025-02-10 Thread via GitHub
andygrove merged PR #1175: URL: https://github.com/apache/datafusion-ballista/pull/1175 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [I] `/api/executors` does not show executors if `TaskSchedulingPolicy::PullStaged` [datafusion-ballista]

2025-02-10 Thread via GitHub
andygrove closed issue #1174: `/api/executors` does not show executors if `TaskSchedulingPolicy::PullStaged` URL: https://github.com/apache/datafusion-ballista/issues/1174 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] doc: update ballista client front page [datafusion-ballista]

2025-02-10 Thread via GitHub
andygrove merged PR #1171: URL: https://github.com/apache/datafusion-ballista/pull/1171 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] Add support for MS Varbinary(MAX) (#1714) [datafusion-sqlparser-rs]

2025-02-10 Thread via GitHub
TylerBrinks commented on PR #1715: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1715#issuecomment-2649495366 I think I got it right in the updated most recent push. Awaiting approval if it passes the criteria. -- This is an automated message from the Apache Git Service. T

Re: [I] [EPIC] A(nother) list of performance improvement tickets [datafusion]

2025-02-10 Thread via GitHub
Omega359 commented on issue #14482: URL: https://github.com/apache/datafusion/issues/14482#issuecomment-2649427623 I know the dataframe api isn't used by many but it needs some love too: https://github.com/apache/datafusion/issues/14563 -- This is an automated message from the Apache Git

Re: [I] A 'cache control' header is missing or empty webkit [datafusion]

2025-02-10 Thread via GitHub
Omega359 commented on issue #14542: URL: https://github.com/apache/datafusion/issues/14542#issuecomment-2649418794 Is this an issue with DataFusion's website or something else? I'm leaning towards you filing this ticket on the wrong project here. -- This is an automated message from the A

Re: [PR] fix(substrait): Do not add implicit groupBy expressions when building logical plans from Substrait [datafusion]

2025-02-10 Thread via GitHub
anlinc commented on code in PR #14553: URL: https://github.com/apache/datafusion/pull/14553#discussion_r194450 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -1090,11 +1090,31 @@ impl LogicalPlanBuilder { group_expr: impl IntoIterator>, aggr_expr: im

Re: [PR] Adding cargo clean at the end of every step [datafusion]

2025-02-10 Thread via GitHub
Omega359 commented on PR #14592: URL: https://github.com/apache/datafusion/pull/14592#issuecomment-2649406057 I recommend we disable that test until someone is able to spend more time looking into why it is using so much disk space -- This is an automated message from the Apache Git Servi

Re: [PR] Adding cargo clean at the end of every step [datafusion]

2025-02-10 Thread via GitHub
Omega359 commented on PR #14592: URL: https://github.com/apache/datafusion/pull/14592#issuecomment-2649389285 It didn't solve the issue :( -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] feat: Implement UNION ALL BY NAME [datafusion]

2025-02-10 Thread via GitHub
Omega359 commented on code in PR #14538: URL: https://github.com/apache/datafusion/pull/14538#discussion_r1949980327 ## datafusion/sqllogictest/test_files/union_by_name.slt: ## @@ -0,0 +1,264 @@ +# Licensed to the Apache Software Foundation (ASF) under one Review Comment: Du

Re: [PR] feat: Implement UNION ALL BY NAME [datafusion]

2025-02-10 Thread via GitHub
Omega359 commented on PR #14538: URL: https://github.com/apache/datafusion/pull/14538#issuecomment-2649370553 I ran the sqlite sqllogictests against your branch and it passed so none of those files covered union (all) by name -- This is an automated message from the Apache Git Service. To

[PR] chore(deps): bump serialize-javascript and copy-webpack-plugin in /datafusion/wasmtest/datafusion-wasm-app [datafusion]

2025-02-10 Thread via GitHub
dependabot[bot] opened a new pull request, #14594: URL: https://github.com/apache/datafusion/pull/14594 Bumps [serialize-javascript](https://github.com/yahoo/serialize-javascript) to 6.0.2 and updates ancestor dependency [copy-webpack-plugin](https://github.com/webpack-contrib/copy-webpack-

Re: [PR] Adding cargo clean at the end of every step [datafusion]

2025-02-10 Thread via GitHub
alamb merged PR #14592: URL: https://github.com/apache/datafusion/pull/14592 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Extended tests are (still) failing on main [datafusion]

2025-02-10 Thread via GitHub
alamb closed issue #14576: Extended tests are (still) failing on main URL: https://github.com/apache/datafusion/issues/14576 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] feat: Add fair unified memory pool [datafusion-comet]

2025-02-10 Thread via GitHub
kazuyukitanimura commented on PR #1369: URL: https://github.com/apache/datafusion-comet/pull/1369#issuecomment-2649342229 Merged, thanks @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] feat: Add fair unified memory pool [datafusion-comet]

2025-02-10 Thread via GitHub
kazuyukitanimura merged PR #1369: URL: https://github.com/apache/datafusion-comet/pull/1369 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsub

Re: [PR] function: Allow more expressive array signatures [datafusion]

2025-02-10 Thread via GitHub
jkosh44 commented on code in PR #14532: URL: https://github.com/apache/datafusion/pull/14532#discussion_r1949114839 ## datafusion/expr-common/src/signature.rs: ## @@ -227,25 +226,13 @@ impl Display for TypeSignatureClass { #[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Has

[PR] Speedup `date_trunc` (~20% time reduction) [datafusion]

2025-02-10 Thread via GitHub
simonvandel opened a new pull request, #14593: URL: https://github.com/apache/datafusion/pull/14593 ## Which issue does this PR close? N/A ## Rationale for this change I haven't looked at the generated code, but i presume that using `try_unary` can lead to better vectori

[PR] Adding cargo clean at the end of every step [datafusion]

2025-02-10 Thread via GitHub
Omega359 opened a new pull request, #14592: URL: https://github.com/apache/datafusion/pull/14592 ## Which issue does this PR close? - Closes #14576 ## Rationale for this change Get CI working again. ## What changes are included in this PR? added carg

Re: [I] Extended tests are (still) failing on main [datafusion]

2025-02-10 Thread via GitHub
Omega359 commented on issue #14576: URL: https://github.com/apache/datafusion/issues/14576#issuecomment-2649274047 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [I] Extended tests are currently failing with 'No space left on device' [datafusion]

2025-02-10 Thread via GitHub
Omega359 commented on issue #14591: URL: https://github.com/apache/datafusion/issues/14591#issuecomment-2649265913 Dup of https://github.com/apache/datafusion/issues/14576 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [I] Extended tests are currently failing with 'No space left on device' [datafusion]

2025-02-10 Thread via GitHub
Omega359 closed issue #14591: Extended tests are currently failing with 'No space left on device' URL: https://github.com/apache/datafusion/issues/14591 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[I] Extended tests are currently failing with 'No space left on device' [datafusion]

2025-02-10 Thread via GitHub
Omega359 opened a new issue, #14591: URL: https://github.com/apache/datafusion/issues/14591 ### Describe the bug https://github.com/apache/datafusion/actions/workflows/extended.yml We likely need to add cargo clean between steps. ### To Reproduce _No response_

Re: [I] [DISCUSSION] 2025 Q1-Q2 Roadmap [datafusion]

2025-02-10 Thread via GitHub
Omega359 commented on issue #14580: URL: https://github.com/apache/datafusion/issues/14580#issuecomment-2649254014 My personal list - Find/fix cause of [with_column extremely poor performance](https://github.com/apache/datafusion/issues/14563) - Add [SessionConfig to ScalarFunc

Re: [PR] Fix Float and Decimal coercion [datafusion]

2025-02-10 Thread via GitHub
findepi commented on code in PR #14273: URL: https://github.com/apache/datafusion/pull/14273#discussion_r1949864649 ## datafusion/sqllogictest/test_files/tpch/plans/q6.slt.part: ## @@ -31,13 +31,13 @@ logical_plan 01)Projection: sum(lineitem.l_extendedprice * lineitem.l_discoun

Re: [PR] Fix Float and Decimal coercion [datafusion]

2025-02-10 Thread via GitHub
findepi commented on PR #14273: URL: https://github.com/apache/datafusion/pull/14273#issuecomment-2649198349 https://github.com/apache/datafusion/pull/14273#discussion_r1949864649 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] Fix Float and Decimal coercion [datafusion]

2025-02-10 Thread via GitHub
findepi closed pull request #14273: Fix Float and Decimal coercion URL: https://github.com/apache/datafusion/pull/14273 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [I] coercion of input types in `coalesce` leads to type unsupported arrow cast [datafusion]

2025-02-10 Thread via GitHub
alamb commented on issue #14581: URL: https://github.com/apache/datafusion/issues/14581#issuecomment-2649156397 I believe @jayzhan211 is working on coercion - https://github.com/apache/datafusion/pull/14440 -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] Fix Float and Decimal coercion [datafusion]

2025-02-10 Thread via GitHub
andygrove commented on code in PR #14273: URL: https://github.com/apache/datafusion/pull/14273#discussion_r1949798289 ## datafusion/sqllogictest/test_files/tpch/plans/q6.slt.part: ## @@ -31,13 +31,13 @@ logical_plan 01)Projection: sum(lineitem.l_extendedprice * lineitem.l_disco

[PR] Introducing mutation testing [datafusion]

2025-02-10 Thread via GitHub
edmondop opened a new pull request, #14590: URL: https://github.com/apache/datafusion/pull/14590 Adds an ad-hoc pipeline for running cargo mutants as discussed in #14589 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] [DISCUSS] More extensive pre-release testing [datafusion]

2025-02-10 Thread via GitHub
edmondop commented on issue #13661: URL: https://github.com/apache/datafusion/issues/13661#issuecomment-2649098650 I have logged [this](https://github.com/apache/datafusion/issues/14589) @alamb there are some more details in that issue -- This is an automated message from the Apache Git

[I] Ad-hoc or scheduled mutation based testing [datafusion]

2025-02-10 Thread via GitHub
edmondop opened a new issue, #14589: URL: https://github.com/apache/datafusion/issues/14589 As described in #13661 we want to improve our pre-release testing. I propose we adopt [mutation based testing](https://en.wikipedia.org/wiki/Mutation_testing) using [`cargo mutants`](https://mutants

Re: [PR] Fix Float and Decimal coercion [datafusion]

2025-02-10 Thread via GitHub
findepi commented on PR #14273: URL: https://github.com/apache/datafusion/pull/14273#issuecomment-2649082381 > Marking as a draft as I don't think this is waiting for review anymore (nor have we figured out a consensus either) I think we're having a lively discussion. It's definitely

Re: [PR] Fix Float and Decimal coercion [datafusion]

2025-02-10 Thread via GitHub
findepi commented on code in PR #14273: URL: https://github.com/apache/datafusion/pull/14273#discussion_r1949783816 ## datafusion/sqllogictest/test_files/tpch/plans/q6.slt.part: ## @@ -31,13 +31,13 @@ logical_plan 01)Projection: sum(lineitem.l_extendedprice * lineitem.l_discoun

[I] Fails to parse FORCE INDEX(idx_test) [datafusion-sqlparser-rs]

2025-02-10 Thread via GitHub
jasonbhansen opened a new issue, #1722: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1722 `fn main() -> ProfilerResult<()> { // tracing_subscriber::fmt() // .with_max_level(tracing::Level::DEBUG) // .init(); use sqlparser::dialect::GenericD

Re: [I] Perf: Dataframe with_column and with_column_renamed are slow [datafusion]

2025-02-10 Thread via GitHub
Omega359 commented on issue #14563: URL: https://github.com/apache/datafusion/issues/14563#issuecomment-2649003746 If someone would be so kind as to [generate a flamegraph](https://datafusion.apache.org/library-user-guide/profiling.html#example-flamegraph-for-a-benchmark) for the benchmark

Re: [PR] feat: Add fair unified memory pool [datafusion-comet]

2025-02-10 Thread via GitHub
kazuyukitanimura commented on PR #1369: URL: https://github.com/apache/datafusion-comet/pull/1369#issuecomment-2648964068 Went ahead and kept the default pool choice @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] feat: Add fair unified memory pool [datafusion-comet]

2025-02-10 Thread via GitHub
kazuyukitanimura commented on code in PR #1369: URL: https://github.com/apache/datafusion-comet/pull/1369#discussion_r1949709211 ## native/core/src/execution/fair_memory_pool.rs: ## @@ -0,0 +1,159 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more cont

Re: [PR] feat: Add fair unified memory pool [datafusion-comet]

2025-02-10 Thread via GitHub
kazuyukitanimura commented on PR #1369: URL: https://github.com/apache/datafusion-comet/pull/1369#issuecomment-2648941470 Thanks @andygrove for trying. You may have to increase the memory because the fair memory pool limits memory usage earlier in order to leave the the rest for other thre

Re: [PR] Feat: support array_except function [datafusion-comet]

2025-02-10 Thread via GitHub
kazuyukitanimura commented on code in PR #1343: URL: https://github.com/apache/datafusion-comet/pull/1343#discussion_r1949694482 ## spark/src/test/scala/org/apache/comet/CometArrayExpressionSuite.scala: ## @@ -292,4 +292,89 @@ class CometArrayExpressionSuite extends CometTestBas

Re: [PR] Feat: support array_except function [datafusion-comet]

2025-02-10 Thread via GitHub
kazuyukitanimura commented on code in PR #1343: URL: https://github.com/apache/datafusion-comet/pull/1343#discussion_r1949690727 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -2387,6 +2387,8 @@ object QueryPlanSerde extends Logging with ShimQueryPla

Re: [PR] Fix Float and Decimal coercion [datafusion]

2025-02-10 Thread via GitHub
andygrove commented on code in PR #14273: URL: https://github.com/apache/datafusion/pull/14273#discussion_r1949552271 ## datafusion/sqllogictest/test_files/tpch/plans/q6.slt.part: ## @@ -31,13 +31,13 @@ logical_plan 01)Projection: sum(lineitem.l_extendedprice * lineitem.l_disco

Re: [PR] fix: disable checking for uint_8 and uint_16 if complex type readers are enabled [datafusion-comet]

2025-02-10 Thread via GitHub
parthchandra commented on PR #1376: URL: https://github.com/apache/datafusion-comet/pull/1376#issuecomment-2648702671 @andygrove updates this to fallback, updated the unit tests and removed the draft tag -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [I] Question: `to_char(date, timstamp format)` [datafusion]

2025-02-10 Thread via GitHub
Omega359 commented on issue #14536: URL: https://github.com/apache/datafusion/issues/14536#issuecomment-2648671927 > I had the impression (although perhaps it is dated) that datafusion sought to be compatible with postgres to the extent reasonable. Assuming thats still the case is there a r

Re: [I] Custom CLI Mode with Manual ```import``` for Functions [datafusion]

2025-02-10 Thread via GitHub
Spaarsh commented on issue #14588: URL: https://github.com/apache/datafusion/issues/14588#issuecomment-2648658163 I am willing to work on this issue once it is validated by the community. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[I] Custom CLI Mode with Manual ```import``` for Functions [datafusion]

2025-02-10 Thread via GitHub
Spaarsh opened a new issue, #14588: URL: https://github.com/apache/datafusion/issues/14588 ### Is your feature request related to a problem or challenge? As we increase the number of functions in our core, it might lead to an increased runtime footprint for datafusion-cli in the futur

Re: [I] [EPIC] ClickBench Improvements (Vanity Benchmark) [datafusion]

2025-02-10 Thread via GitHub
Rachelint commented on issue #14586: URL: https://github.com/apache/datafusion/issues/14586#issuecomment-2648640934 For optimizer side, I suspect if `single_distinct_to_groupby` can really improve performance in current version? -- This is an automated message from the Apache Git Service.

Re: [I] Extended tests are (still) failing on main [datafusion]

2025-02-10 Thread via GitHub
alamb commented on issue #14576: URL: https://github.com/apache/datafusion/issues/14576#issuecomment-2648636639 I noticed that the github runner generated several warnings about diskspace https://github.com/user-attachments/assets/918fb3a4-1149-41b7-9027-e62721ac8800"; /> --

Re: [PR] add manual trigger for extended tests in pull requests [datafusion]

2025-02-10 Thread via GitHub
alamb commented on PR #14331: URL: https://github.com/apache/datafusion/pull/14331#issuecomment-2648628261 I merged this test up from main -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [I] [EPIC] ClickBench Improvements (Vanity Benchmark) [datafusion]

2025-02-10 Thread via GitHub
Rachelint commented on issue #14586: URL: https://github.com/apache/datafusion/issues/14586#issuecomment-2648625729 > > I am trying a poc about support block approach by only modifying codes of group values(we also need to modifying codes of GroupAccumulatortoo in [#11943](https://github.co

Re: [I] [EPIC] ClickBench Improvements (Vanity Benchmark) [datafusion]

2025-02-10 Thread via GitHub
alamb commented on issue #14586: URL: https://github.com/apache/datafusion/issues/14586#issuecomment-2648610440 > I am trying a poc about support block approach by only modifying codes of group values(we also need to modifying codes of GroupAccumulatortoo in https://github.com/apache/datafu

Re: [I] [EPIC] ClickBench Improvements (Vanity Benchmark) [datafusion]

2025-02-10 Thread via GitHub
Rachelint commented on issue #14586: URL: https://github.com/apache/datafusion/issues/14586#issuecomment-2648602542 A low hanging fruit #13617, i plan to finish it in this week. And maybe it is time to push #11943 forward... I am trying a poc about support `block approach` by `

Re: [I] Update ClickBench benchmarks with DataFusion `44.0.0` [datafusion]

2025-02-10 Thread via GitHub
alamb commented on issue #13983: URL: https://github.com/apache/datafusion/issues/13983#issuecomment-2648602300 Discussion about making more improvements: - https://the-asf.slack.com/archives/C04RJ0C85UZ/p1739204225620989 -- This is an automated message from the Apache Git Service. To r

Re: [I] [EPIC] ClickBench Improvements (Vanity Benchmark) [datafusion]

2025-02-10 Thread via GitHub
alamb commented on issue #14586: URL: https://github.com/apache/datafusion/issues/14586#issuecomment-2648588773 I took a brief look at [some results](https://benchmark.clickhouse.com/#eyJzeXN0ZW0iOnsiQWxsb3lEQiI6ZmFsc2UsIkFsbG95REIgKHR1bmVkKSI6ZmFsc2UsIkF0aGVuYSAocGFydGl0aW9uZWQpIjpmYWxzZSwi

Re: [PR] Add support for PostgreSQL and Redshift geometric operators [datafusion-sqlparser-rs]

2025-02-10 Thread via GitHub
benrsatori closed pull request #1721: Add support for PostgreSQL and Redshift geometric operators URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1721 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

  1   2   3   >