[PR] Add regexp_extract func [datafusion]

2025-01-24 Thread via GitHub
SKY-ALIN opened a new pull request, #14282: URL: https://github.com/apache/datafusion/pull/14282 ## Which issue does this PR close? Closes #14280. ## Rationale for this change Adding more functions ## What changes are included in this PR?

[PR] Minor: Update documentations for memory pool [datafusion]

2025-01-24 Thread via GitHub
appletreeisyellow opened a new pull request, #14278: URL: https://github.com/apache/datafusion/pull/14278 ## Which issue does this PR close? Closes #. ## Rationale for this change While using `MemoryPool` downstream, I noticed some typos and outdated documentatio

[I] Regression in CASE expression since DF 43 [datafusion]

2025-01-24 Thread via GitHub
andygrove opened a new issue, #14277: URL: https://github.com/apache/datafusion/issues/14277 ### Describe the bug I am trying to upgrade Comet to the latest DataFusion and see some queries fail. The regression seems to have been introduced in https://github.com/apache/datafusi

Re: [I] Document Schema metadata expectations [datafusion]

2025-01-24 Thread via GitHub
westonpace commented on issue #12736: URL: https://github.com/apache/datafusion/issues/12736#issuecomment-2613005807 I'm not sure if this is related or not but I encountered an error during optimization: ``` Error: join_selection caused by Internal error: PhysicalOptimizer rule 'j

Re: [PR] Support underscore separators in numbers for Clickhouse. Fixes #1659 [datafusion-sqlparser-rs]

2025-01-24 Thread via GitHub
mwylde commented on PR #1677: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1677#issuecomment-2613150709 This is great! Underscores are also supported in Postgres 16 (https://www.postgresql.org/docs/16/release-16.html#RELEASE-16-DATATYPES), so maybe it could be added to the p

Re: [PR] chore: Refactor QueryPlanSerde to allow logic to be moved to individual classes per expression [datafusion-comet]

2025-01-24 Thread via GitHub
andygrove commented on code in PR #1331: URL: https://github.com/apache/datafusion-comet/pull/1331#discussion_r1929180456 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -826,723 +826,790 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerd

Re: [I] Update ClickBench benchmarks with DataFusion `44.0.0` [datafusion]

2025-01-24 Thread via GitHub
alamb commented on issue #13983: URL: https://github.com/apache/datafusion/issues/13983#issuecomment-2613378310 > [@alamb](https://github.com/alamb) Excited to see further optmization about `late materialization`, it is really an important feature as I think ! I tried to use it in `HoraeDB`

Re: [I] ParquetScan with filter takes too much time to process [datafusion]

2025-01-24 Thread via GitHub
alamb commented on issue #13298: URL: https://github.com/apache/datafusion/issues/13298#issuecomment-2613379095 - See https://github.com/apache/arrow-rs/pull/6921 for some motion on this -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [I] Change `ReturnTypeInfo` to return a `Field` rather than `DataType` [datafusion]

2025-01-24 Thread via GitHub
alamb commented on issue #14247: URL: https://github.com/apache/datafusion/issues/14247#issuecomment-2613392271 > Also a note that using a Field would require a serialization/deserialization every time the extension type is used (whereas some core "datatype" based on the `ExtensionType` is

Re: [PR] fix: run sqllogictest with complete [datafusion]

2025-01-24 Thread via GitHub
Omega359 commented on code in PR #14254: URL: https://github.com/apache/datafusion/pull/14254#discussion_r1929242195 ## datafusion/sqllogictest/bin/sqllogictests.rs: ## @@ -64,6 +66,171 @@ fn value_normalizer(s: &String) -> String { s.trim_end().to_string() } +struct Cus

Re: [I] Document Schema metadata expectations [datafusion]

2025-01-24 Thread via GitHub
alamb commented on issue #12736: URL: https://github.com/apache/datafusion/issues/12736#issuecomment-2613360188 That definitely sounds like a bug -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] chore: Refactor QueryPlanSerde to allow logic to be moved to individual classes per expression [datafusion-comet]

2025-01-24 Thread via GitHub
andygrove merged PR #1331: URL: https://github.com/apache/datafusion-comet/pull/1331 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] fix: run sqllogictest with complete [datafusion]

2025-01-24 Thread via GitHub
logan-keede commented on code in PR #14254: URL: https://github.com/apache/datafusion/pull/14254#discussion_r1929233787 ## datafusion/sqllogictest/bin/sqllogictests.rs: ## @@ -64,6 +66,171 @@ fn value_normalizer(s: &String) -> String { s.trim_end().to_string() } +struct

Re: [I] Change `ReturnTypeInfo` to return a `Field` rather than `DataType` [datafusion]

2025-01-24 Thread via GitHub
alamb commented on issue #14247: URL: https://github.com/apache/datafusion/issues/14247#issuecomment-2613389124 > I am fine with DF not shipping extension types (ie no extension types until we add them explicitly in [#12644](https://github.com/apache/datafusion/issues/12644)). Let's look at

Re: [I] Change `ReturnTypeInfo` to return a `Field` rather than `DataType` [datafusion]

2025-01-24 Thread via GitHub
findepi commented on issue #14247: URL: https://github.com/apache/datafusion/issues/14247#issuecomment-2613388724 If we check & interpret the `extension.name`, then we're at home: https://github.com/apache/datafusion/blob/6686e034dd1008fc7303c482cf664402eca25d67/datafusion/common/src/types/l

Re: [I] Change `ReturnTypeInfo` to return a `Field` rather than `DataType` [datafusion]

2025-01-24 Thread via GitHub
findepi commented on issue #14247: URL: https://github.com/apache/datafusion/issues/14247#issuecomment-2613408823 Yes, once plan is lowered into "container" arrow types (like assembly), we no longer need to remember what were the logical/extension types. Before the lowering happens, the fun

Re: [PR] Document SQL dialect guidance [datafusion]

2025-01-24 Thread via GitHub
findepi commented on PR #13706: URL: https://github.com/apache/datafusion/pull/13706#issuecomment-2613412047 What is remaining to get this in? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] chore: Prepare for DataFusion 45 [datafusion-comet]

2025-01-24 Thread via GitHub
codecov-commenter commented on PR #1332: URL: https://github.com/apache/datafusion-comet/pull/1332#issuecomment-2613393542 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1332?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] build: re-enable upload-test-reports for macos-13 runner [datafusion-comet]

2025-01-24 Thread via GitHub
viirya commented on code in PR #1335: URL: https://github.com/apache/datafusion-comet/pull/1335#discussion_r1929220044 ## .github/workflows/pr_build.yml: ## @@ -137,7 +137,7 @@ jobs: spark-version: ['3.4', '3.5'] scala-version: ['2.12', '2.13'] fail-fast

Re: [PR] Fix Float and Decimal coercion [datafusion]

2025-01-24 Thread via GitHub
findepi commented on PR #14273: URL: https://github.com/apache/datafusion/pull/14273#issuecomment-2613421578 @alamb thanks for your feedback. I agree it's important to avoid back-and-forth with a change, so the broader review the better. > What the current coercion behavior is

Re: [PR] chore(deps): bump serde_json from 1.0.135 to 1.0.137 in /datafusion-cli [datafusion]

2025-01-24 Thread via GitHub
comphead merged PR #14261: URL: https://github.com/apache/datafusion/pull/14261 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] [DISCUSSION]: Unified approach for joins to output batches close to `batch_size` [datafusion]

2025-01-24 Thread via GitHub
comphead commented on issue #14238: URL: https://github.com/apache/datafusion/issues/14238#issuecomment-2613003876 thanks @korowa totally agree for the memory perspective, having splitter won't help as the memory already allocated for the batch. However another path related to coales

Re: [PR] fix(expr-common): Coerce to Decimal(20, 0) when combining UInt64 with signed integers [datafusion]

2025-01-24 Thread via GitHub
ozankabak commented on PR #14223: URL: https://github.com/apache/datafusion/pull/14223#issuecomment-2613004105 > 1. If it is relatively easy, we can do a quick follow-on PR to parse/type literals more intelligently. Maybe always parse as the smallest/narrowest datatype that will hold the co

[PR] make AnalysisContext aware of empty sets to represent certainly false bounds [datafusion]

2025-01-24 Thread via GitHub
buraksenn opened a new pull request, #14279: URL: https://github.com/apache/datafusion/pull/14279 ## Which issue does this PR close? Closes #14226 ## Rationale for this change Details from #14226: The [`AnalysisContext`](https://github.com/apache/datafusion/blob/2aff9

[I] regexp_extract func from Spark [datafusion]

2025-01-24 Thread via GitHub
SKY-ALIN opened a new issue, #14280: URL: https://github.com/apache/datafusion/issues/14280 ### Is your feature request related to a problem or challenge? Yes ### Describe the solution you'd like I'd like to implement [regexp_extract](https://spark.apache.org/docs/latest

[I] Querying Parquet file specifically with a predicate returns invalid data error but works in other situations [datafusion]

2025-01-24 Thread via GitHub
senyosimpson opened a new issue, #14281: URL: https://github.com/apache/datafusion/issues/14281 ### Describe the bug When making a query _with a predicate_ against Parquet files generated with [parquet-go](https://github.com/parquet-go/parquet-go) , DataFusion errors saying the data

Re: [PR] Minor: Update documentations for memory pool [datafusion]

2025-01-24 Thread via GitHub
appletreeisyellow commented on code in PR #14278: URL: https://github.com/apache/datafusion/pull/14278#discussion_r1929003494 ## datafusion/execution/src/memory_pool/mod.rs: ## @@ -140,9 +143,9 @@ pub trait MemoryPool: Send + Sync + std::fmt::Debug { /// [`MemoryReservation`] i

[PR] add tests to check precision loss fix [datafusion]

2025-01-24 Thread via GitHub
himadripal opened a new pull request, #14284: URL: https://github.com/apache/datafusion/pull/14284 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested

Re: [PR] Add casting of `count` to `Int64` in `array_repeat` function to ensure consistent integer type handling [datafusion]

2025-01-24 Thread via GitHub
korowa merged PR #14236: URL: https://github.com/apache/datafusion/pull/14236 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafus

Re: [I] Array repeat, upcast issue [datafusion]

2025-01-24 Thread via GitHub
korowa closed issue #14228: Array repeat, upcast issue URL: https://github.com/apache/datafusion/issues/14228 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-

Re: [I] Regression in CASE expression since DF 44 [datafusion]

2025-01-24 Thread via GitHub
andygrove commented on issue #14277: URL: https://github.com/apache/datafusion/issues/14277#issuecomment-2613176947 @aweltsch I have created a [PR](https://github.com/apache/datafusion/pull/14283) with a repro for this issue and wondered if you had thoughts on the best way to fix the regres

Re: [PR] Add casting of `count` to `Int64` in `array_repeat` function to ensure consistent integer type handling [datafusion]

2025-01-24 Thread via GitHub
korowa commented on PR #14236: URL: https://github.com/apache/datafusion/pull/14236#issuecomment-2613186549 Thank you @jatin510 @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [I] [DISCUSSION]: Unified approach for joins to output batches close to `batch_size` [datafusion]

2025-01-24 Thread via GitHub
korowa commented on issue #14238: URL: https://github.com/apache/datafusion/issues/14238#issuecomment-2613214394 > However another path related to coalesce might help downstream nodes or direct consumer not to struggle because of swarm of small batches I don't have a strong opinion he

Re: [PR] Support arrays_overlap function [datafusion]

2025-01-24 Thread via GitHub
erenavsarogullari commented on PR #14217: URL: https://github.com/apache/datafusion/pull/14217#issuecomment-2613347999 cc @jayzhan211 @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[PR] chore: Upgrade to Arrow 53.4.0 [datafusion-comet]

2025-01-24 Thread via GitHub
andygrove opened a new pull request, #1338: URL: https://github.com/apache/datafusion-comet/pull/1338 ## Which issue does this PR close? N/A ## Rationale for this change Make it explicit that we want to use `53.4.0` ## What changes are included in t

Re: [PR] feat: executor supports pluggable arrow flight server [datafusion-ballista]

2025-01-24 Thread via GitHub
andygrove merged PR #1170: URL: https://github.com/apache/datafusion-ballista/pull/1170 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] fix(expr-common): Coerce to Decimal(20, 0) when combining UInt64 with signed integers [datafusion]

2025-01-24 Thread via GitHub
alamb commented on PR #14223: URL: https://github.com/apache/datafusion/pull/14223#issuecomment-2612053357 Here is a PR that tries to demonstrate more of what is happening: - https://github.com/apache/datafusion/pull/14270 -- This is an automated message from the Apache Git Service. To

Re: [I] Change `ReturnTypeInfo` to return a `Field` rather than `DataType` [datafusion]

2025-01-24 Thread via GitHub
alamb commented on issue #14247: URL: https://github.com/apache/datafusion/issues/14247#issuecomment-2612213307 > Using Field directly doesn't yet answer the question "how to interpret given field". I agree -- but at least it makes it possible to get the info > So we gonna hav

Re: [PR] Fix DF 43 Regression: Coerce Various Scalar Func Args to String [datafusion]

2025-01-24 Thread via GitHub
jayzhan211 commented on code in PR #14268: URL: https://github.com/apache/datafusion/pull/14268#discussion_r1928519797 ## datafusion/functions/src/string/ascii.rs: ## @@ -61,7 +64,15 @@ impl Default for AsciiFunc { impl AsciiFunc { pub fn new() -> Self { Self { -

Re: [PR] fix(expr-common): Coerce to Decimal(20, 0) when combining UInt64 with signed integers [datafusion]

2025-01-24 Thread via GitHub
berkaysynnada commented on PR #14223: URL: https://github.com/apache/datafusion/pull/14223#issuecomment-2612423408 I think this change brings more harm than it helps. I wrote a simple benchmark: * Table having 100_000 rows, and one column in UInt64 * The values are randomly generat

Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-01-24 Thread via GitHub
andygrove commented on issue #14008: URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2613057921 @alamb I took the liberty of adding https://github.com/apache/datafusion/issues/14277 to the "must fix" list -- This is an automated message from the Apache Git Service. To

[PR] wip: Repro for regression in CASE expression [datafusion]

2025-01-24 Thread via GitHub
andygrove opened a new pull request, #14283: URL: https://github.com/apache/datafusion/pull/14283 ## Which issue does this PR close? Part of https://github.com/apache/datafusion/issues/14277 ## Rationale for this change Add unit test that demonstrates the

Re: [PR] Last Accumulator `update_batch` doesn't take the last value if the order by value are equals [datafusion]

2025-01-24 Thread via GitHub
korowa commented on code in PR #14232: URL: https://github.com/apache/datafusion/pull/14232#discussion_r1929071971 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -701,9 +713,98 @@ fn convert_to_sort_cols(arrs: &[ArrayRef], sort_exprs: &LexOrdering) -> Vec Result<()>

[PR] build: re-enable upload-test-reports for macos-13 runner [datafusion-comet]

2025-01-24 Thread via GitHub
viirya opened a new pull request, #1335: URL: https://github.com/apache/datafusion-comet/pull/1335 ## Which issue does this PR close? Closes #. ## Rationale for this change The feature `upload-test-reports` for macos-13 runner was disabled (https://github

Re: [PR] fix: do not compile `keda.proto` if feature not used. [datafusion-ballista]

2025-01-24 Thread via GitHub
andygrove merged PR #1168: URL: https://github.com/apache/datafusion-ballista/pull/1168 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] chore: publicly expose datafusion in ballista client [datafusion-ballista]

2025-01-24 Thread via GitHub
andygrove merged PR #1169: URL: https://github.com/apache/datafusion-ballista/pull/1169 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] chore: update to DF.44 [datafusion-ballista]

2025-01-24 Thread via GitHub
andygrove merged PR #1153: URL: https://github.com/apache/datafusion-ballista/pull/1153 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [I] Regression in CASE expression since DF 44 [datafusion]

2025-01-24 Thread via GitHub
aweltsch commented on issue #14277: URL: https://github.com/apache/datafusion/issues/14277#issuecomment-2613331984 Hi @andygrove thank you for notifying me of this regression. I took a cursory look today and it seems like you are right that casting the then expression might also solve this

Re: [I] Regression in CASE expression since DF 44 [datafusion]

2025-01-24 Thread via GitHub
andygrove commented on issue #14277: URL: https://github.com/apache/datafusion/issues/14277#issuecomment-261905 Thanks @aweltsch i appreciate it. I do have a PR up with a fix but it may not be the best fix. -- This is an automated message from the Apache Git Service. To respond to the

[PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-01-24 Thread via GitHub
alamb opened a new pull request, #14286: URL: https://github.com/apache/datafusion/pull/14286 Note: This PR contains a (substantial) example and supporting code. It has no changes to the core. ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issu

[I] [Experimental] Integrate Comet native reader with remote HDFS [datafusion-comet]

2025-01-24 Thread via GitHub
comphead opened a new issue, #1336: URL: https://github.com/apache/datafusion-comet/issues/1336 ### What is the problem the feature request solves? Currently Apache DataFusion Comet reads the data from underlying sources using builtin Comet reader which lacks support for nested types

[I] [DISCUSSION] Add `DedicatedExecutor` into the DataFusion crates to make using multiple threadpools easier [datafusion]

2025-01-24 Thread via GitHub
alamb opened a new issue, #14285: URL: https://github.com/apache/datafusion/issues/14285 ### Is your feature request related to a problem or challenge? - Related to https://github.com/apache/datafusion/issues/12393 ### Describe the solution you'd like As explained on the

Re: [PR] Support underscore separators in numbers for Clickhouse. Fixes #1659 [datafusion-sqlparser-rs]

2025-01-24 Thread via GitHub
graup commented on PR #1677: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1677#issuecomment-2613344382 Thanks for the comments! Seems Postgres ([patch](https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=faff8f8e4)) is using the same syntax rule as Clickhouse

Re: [PR] Document SQL dialect guidance [datafusion]

2025-01-24 Thread via GitHub
alamb commented on PR #13706: URL: https://github.com/apache/datafusion/pull/13706#issuecomment-2613429200 > What is remaining to get this in? From my perspective we need to get some sort of consensus on what the system currrently does / aims to do. I didn't get the sense the commente

Re: [PR] add try_swapping_with_projection method to ExecutionPlan trait [datafusion]

2025-01-24 Thread via GitHub
alamb commented on PR #14235: URL: https://github.com/apache/datafusion/pull/14235#issuecomment-2613431349 > @alamb are we okay with this API extension? I think it is inevitable at some point. I have a WIP PR (which has been in progress for a very long time, but I hope to get back to it soo

Re: [I] Build time regression [datafusion]

2025-01-24 Thread via GitHub
alamb commented on issue #14256: URL: https://github.com/apache/datafusion/issues/14256#issuecomment-2613493318 > After `physical-optimizer`, `datasource` could be a potential target to move out of core. Yes, 100% splitting out datasource is my next thing I would love to see (and I t

Re: [PR] build: re-enable upload-test-reports for macos-13 runner [datafusion-comet]

2025-01-24 Thread via GitHub
codecov-commenter commented on PR #1335: URL: https://github.com/apache/datafusion-comet/pull/1335#issuecomment-2613446876 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1335?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] add try_swapping_with_projection method to ExecutionPlan trait [datafusion]

2025-01-24 Thread via GitHub
alamb commented on code in PR #14235: URL: https://github.com/apache/datafusion/pull/14235#discussion_r1929280874 ## datafusion/physical-plan/src/execution_plan.rs: ## @@ -431,6 +434,20 @@ pub trait ExecutionPlan: Debug + DisplayAs + Send + Sync { fn cardinality_effect(&sel

Re: [PR] add tests to check precision loss fix [datafusion]

2025-01-24 Thread via GitHub
alamb commented on PR #14284: URL: https://github.com/apache/datafusion/pull/14284#issuecomment-2613454520 Thanks @himadripal I started the CI checks on this PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[I] Alternative approaches to "fan-out" style RepartitionExec [datafusion]

2025-01-24 Thread via GitHub
westonpace opened a new issue, #14287: URL: https://github.com/apache/datafusion/issues/14287 ### Is your feature request related to a problem or challenge? `RepartitionExec` is often used to fan out batches from a single partition into multiple partitions. For example, if we are sca

Re: [PR] build: re-enable upload-test-reports for macos-13 runner [datafusion-comet]

2025-01-24 Thread via GitHub
viirya commented on code in PR #1335: URL: https://github.com/apache/datafusion-comet/pull/1335#discussion_r1929302848 ## .github/workflows/pr_build.yml: ## @@ -197,7 +197,7 @@ jobs: test-target: [java] spark-version: ['4.0'] fail-fast: false -if: gi

Re: [PR] fix: run sqllogictest with complete [datafusion]

2025-01-24 Thread via GitHub
logan-keede commented on code in PR #14254: URL: https://github.com/apache/datafusion/pull/14254#discussion_r1929312022 ## datafusion/sqllogictest/bin/sqllogictests.rs: ## @@ -64,6 +66,171 @@ fn value_normalizer(s: &String) -> String { s.trim_end().to_string() } +struct

Re: [PR] fix: LimitPushdown rule uncorrect remove some GlobalLimitExec [datafusion]

2025-01-24 Thread via GitHub
alamb commented on code in PR #14245: URL: https://github.com/apache/datafusion/pull/14245#discussion_r1929294151 ## datafusion/sqllogictest/test_files/joins.slt: ## @@ -4247,8 +4247,10 @@ logical_plan physical_plan 01)CoalesceBatchesExec: target_batch_size=3, fetch=2 02)--Ha

Re: [I] Update ClickBench benchmarks with DataFusion `44.0.0` [datafusion]

2025-01-24 Thread via GitHub
alamb commented on issue #13983: URL: https://github.com/apache/datafusion/issues/13983#issuecomment-2612001363 If we wanted to juice our numbers we could turn off ut8 validation too but I feel like that would be cheating (as most/many systems would never run without validtion on) -- Thi

Re: [I] DDL Statement Propagation (`INSERT INTO` support) [datafusion-ballista]

2025-01-24 Thread via GitHub
alamb commented on issue #1164: URL: https://github.com/apache/datafusion-ballista/issues/1164#issuecomment-2612012112 > the problem is that SessionContext will execute DDL statements immediately and LogicalPlan::DDL will be swapped with LogicalPlan::Empty, thus no DDL information will rea

[PR] chore(deps): update sqlparser requirement from 0.53.0 to 0.54.0 [datafusion]

2025-01-24 Thread via GitHub
dependabot[bot] opened a new pull request, #14269: URL: https://github.com/apache/datafusion/pull/14269 Updates the requirements on [sqlparser](https://github.com/apache/datafusion-sqlparser-rs) to permit the latest version. Changelog Sourced from https://github.com/apache/datafus

Re: [I] Update ClickBench benchmarks with DataFusion `44.0.0` [datafusion]

2025-01-24 Thread via GitHub
alamb commented on issue #13983: URL: https://github.com/apache/datafusion/issues/13983#issuecomment-2611999149 > I think Q8, Q16~18, Q35 can be closer to `hyper` in 44.0, they are improved in [#12996](https://github.com/apache/datafusion/pull/12996) And Q35 can be even much faster when [#

Re: [I] [DISCUSSION]: Inconsistent Behavior Between prefer_existing_sort and AggregateExec's required_input_ordering [datafusion]

2025-01-24 Thread via GitHub
alamb commented on issue #14231: URL: https://github.com/apache/datafusion/issues/14231#issuecomment-2612024839 My preference is to try and keep the core of datafusion focused on executing the plans as provided as much as possible, and performing "always good optimizations" For optim

Re: [PR] chore(deps): bump rstest from 0.22.0 to 0.24.0 in /datafusion-cli [datafusion]

2025-01-24 Thread via GitHub
findepi merged PR #14262: URL: https://github.com/apache/datafusion/pull/14262 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafu

Re: [PR] Only support escape literals for Postgres, Redshift and generic dialect [datafusion-sqlparser-rs]

2025-01-24 Thread via GitHub
iffyio merged PR #1674: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1674 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] fix(expr-common): Coerce to Decimal(20, 0) when combining UInt64 with signed integers [datafusion]

2025-01-24 Thread via GitHub
alamb commented on PR #14223: URL: https://github.com/apache/datafusion/pull/14223#issuecomment-2612038896 @ozankabak note the coercion rules @jonahgao refers to above in https://github.com/apache/datafusion/pull/14223#issuecomment-2611707619 are for `UNION`, not for example, integer a

Re: [I] Update ClickBench benchmarks with DataFusion `44.0.0` [datafusion]

2025-01-24 Thread via GitHub
Dandandan commented on issue #13983: URL: https://github.com/apache/datafusion/issues/13983#issuecomment-2612039339 Q23 might be improved if it can utilize filter pushdown? I think a >5x improvement might come from that. -- This is an automated message from the Apache Git Service. To resp

[PR] Add more tests showing coercing behavior with literals [datafusion]

2025-01-24 Thread via GitHub
alamb opened a new pull request, #14270: URL: https://github.com/apache/datafusion/pull/14270 ## Which issue does this PR close? - Follow on to https://github.com/apache/datafusion/pull/14223 - Follow on to https://github.com/apache/datafusion/pull/14250 ## Rationale

[PR] BigQuery: Support trailing commas in column definitions list [datafusion-sqlparser-rs]

2025-01-24 Thread via GitHub
iffyio opened a new pull request, #1682: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1682 Adds support for BigQuery's `CREATE` statement containing trailing commas in the columns definitions list ```sql CREATE TABLE T (x INT64, y INT64,); ``` https://cloud.go

Re: [PR] Make TypedString contain Value instead of String to support and preserve other quote styles [datafusion-sqlparser-rs]

2025-01-24 Thread via GitHub
iffyio commented on code in PR #1679: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1679#discussion_r1927412763 ## tests/sqlparser_bigquery.rs: ## @@ -48,34 +48,34 @@ fn parse_literal_string() { let select = dialect.verified_only_select(sql); assert_eq!(

Re: [I] Change `ReturnTypeInfo` to return a `Field` rather than `DataType` [datafusion]

2025-01-24 Thread via GitHub
paleolimbot commented on issue #14247: URL: https://github.com/apache/datafusion/issues/14247#issuecomment-2612829804 Also a note that using a Field would require a serialization/deserialization every time the extension type is used (whereas some core "datatype" based on the `ExtensionType`

Re: [PR] Deprecate max statistics size properly [datafusion]

2025-01-24 Thread via GitHub
logan-keede commented on PR #14188: URL: https://github.com/apache/datafusion/pull/14188#issuecomment-2612848166 I thought this was done, apparently not. Please review the changes and let me know if we need to make more changes before merging this PR. @alamb Thanks for you hard wo

Re: [PR] Add support for mysql table hints [datafusion-sqlparser-rs]

2025-01-24 Thread via GitHub
iffyio commented on code in PR #1675: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1675#discussion_r1928898319 ## src/parser/mod.rs: ## @@ -11225,6 +11283,11 @@ impl<'a> Parser<'a> { let alias = self.maybe_parse_table_alias()?; +// ma

Re: [PR] Make TypedString contain Value instead of String to support and preserve other quote styles [datafusion-sqlparser-rs]

2025-01-24 Thread via GitHub
graup commented on code in PR #1679: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1679#discussion_r1928954728 ## src/ast/value.rs: ## @@ -97,6 +97,32 @@ pub enum Value { Placeholder(String), } +impl Into for Value { +fn into(self) -> String { +

Re: [PR] Make TypedString contain Value instead of String to support and preserve other quote styles [datafusion-sqlparser-rs]

2025-01-24 Thread via GitHub
graup commented on code in PR #1679: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1679#discussion_r1928955736 ## tests/sqlparser_bigquery.rs: ## @@ -48,34 +48,34 @@ fn parse_literal_string() { let select = dialect.verified_only_select(sql); assert_eq!(1

Re: [PR] Make TypedString contain Value instead of String to support and preserve other quote styles [datafusion-sqlparser-rs]

2025-01-24 Thread via GitHub
graup commented on code in PR #1679: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1679#discussion_r1928957606 ## tests/sqlparser_bigquery.rs: ## @@ -39,7 +39,7 @@ fn parse_literal_string() { r#"'''triple-single'unescaped''', "#, r#""double\"esca

Re: [PR] Support underscore separators in numbers for Clickhouse. Fixes #1659 [datafusion-sqlparser-rs]

2025-01-24 Thread via GitHub
graup commented on PR #1677: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1677#issuecomment-2612951228 Yeah, seems like this number behavior is still failing under the bignumbers feature. I can reproduce it locally with `cargo test --all-features`. I don't know enough to fix

Re: [PR] Make TypedString contain Value instead of String to support and preserve other quote styles [datafusion-sqlparser-rs]

2025-01-24 Thread via GitHub
graup commented on code in PR #1679: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1679#discussion_r1928959147 ## tests/sqlparser_bigquery.rs: ## @@ -2214,6 +2214,30 @@ fn test_select_as_value() { assert_eq!(Some(ValueTableMode::AsValue), select.value_table_m

Re: [PR] fix: run sqllogictest with complete [datafusion]

2025-01-24 Thread via GitHub
Omega359 commented on code in PR #14254: URL: https://github.com/apache/datafusion/pull/14254#discussion_r1928766203 ## datafusion/sqllogictest/bin/sqllogictests.rs: ## @@ -64,6 +66,171 @@ fn value_normalizer(s: &String) -> String { s.trim_end().to_string() } +struct Cus

Re: [PR] Reject CREATE TABLE/VIEW with duplicate column names [datafusion]

2025-01-24 Thread via GitHub
findepi closed pull request #13517: Reject CREATE TABLE/VIEW with duplicate column names URL: https://github.com/apache/datafusion/pull/13517 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [I] RFC: Should we remove pyarrow feature from datafusion core [datafusion]

2025-01-24 Thread via GitHub
robtandy commented on issue #14197: URL: https://github.com/apache/datafusion/issues/14197#issuecomment-2612681193 +1 remove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Reject CREATE TABLE/VIEW with duplicate column names [datafusion]

2025-01-24 Thread via GitHub
findepi commented on PR #13517: URL: https://github.com/apache/datafusion/pull/13517#issuecomment-2612664105 Given no-one else was involved so far, your personal preference is a verdict. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-01-24 Thread via GitHub
andygrove commented on issue #14008: URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2612692120 I created an issue to track our progress with upgrading Comet to use DataFusion 45 and linked to it from the PR description: https://github.com/apache/datafusion/issues/14274

Re: [PR] fix: run sqllogictest with complete [datafusion]

2025-01-24 Thread via GitHub
Omega359 commented on PR #14254: URL: https://github.com/apache/datafusion/pull/14254#issuecomment-2612695218 I'll do my best to find time today to review this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[I] Avoid deriving Fields for each invocation of `struct` and `named_struct` [datafusion]

2025-01-24 Thread via GitHub
pepijnve opened a new issue, #14275: URL: https://github.com/apache/datafusion/issues/14275 ### Is your feature request related to a problem or challenge? `struct` and `named_struct` do not yet implement `invoke_with_args`; only `invoke_batch` is implemented. Since `invoke_batch` does

Re: [I] Change `ReturnTypeInfo` to return a `Field` rather than `DataType` [datafusion]

2025-01-24 Thread via GitHub
findepi commented on issue #14247: URL: https://github.com/apache/datafusion/issues/14247#issuecomment-2612715603 I am fine with DF not shipping extension types (ie no extension types until we add them explicitly in https://github.com/apache/datafusion/issues/12644). Let's look at the exa

Re: [PR] Support underscore separators in numbers for Clickhouse. Fixes #1659 [datafusion-sqlparser-rs]

2025-01-24 Thread via GitHub
iffyio commented on PR #1677: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1677#issuecomment-2612739921 @graup there seems to a failing test on the PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Add more tests showing coercing behavior with literals [datafusion]

2025-01-24 Thread via GitHub
ozankabak commented on PR #14270: URL: https://github.com/apache/datafusion/pull/14270#issuecomment-2612284386 I think adding tests for comparison operations will probably expose the possible issue with the linked PR. (Or give us a peace of mind) -- This is an automated message from the A

Re: [I] Update ClickBench benchmarks with DataFusion `44.0.0` [datafusion]

2025-01-24 Thread via GitHub
Rachelint commented on issue #13983: URL: https://github.com/apache/datafusion/issues/13983#issuecomment-2612300838 @alamb Excited to see further optmization about`late materialization`, it is really an important feature as I think ! I tried to use it in `HoraeDB` last year, and found the

Re: [PR] Fix DF 43 Regression: Coerce Various Scalar Func Args to String [datafusion]

2025-01-24 Thread via GitHub
shehabgamin commented on code in PR #14268: URL: https://github.com/apache/datafusion/pull/14268#discussion_r1928539454 ## datafusion/functions/src/string/ascii.rs: ## @@ -61,7 +64,15 @@ impl Default for AsciiFunc { impl AsciiFunc { pub fn new() -> Self { Self { -

Re: [I] Improve efficiency of CI checks (so we can add MORE!) [datafusion]

2025-01-24 Thread via GitHub
logan-keede commented on issue #13845: URL: https://github.com/apache/datafusion/issues/13845#issuecomment-2612302035 > Thank you for the good ideas [@logan-keede](https://github.com/logan-keede) > > Given we have > > 1. Very limited bandwidth for maintenance (and even less for

Re: [PR] Fix DF 43 Regression: Coerce Various Scalar Func Args to String [datafusion]

2025-01-24 Thread via GitHub
shehabgamin commented on code in PR #14268: URL: https://github.com/apache/datafusion/pull/14268#discussion_r1928535027 ## datafusion/functions/src/string/ascii.rs: ## @@ -61,7 +64,15 @@ impl Default for AsciiFunc { impl AsciiFunc { pub fn new() -> Self { Self { -

Re: [PR] fix(expr-common): Coerce to Decimal(20, 0) when combining UInt64 with signed integers [datafusion]

2025-01-24 Thread via GitHub
findepi commented on PR #14223: URL: https://github.com/apache/datafusion/pull/14223#issuecomment-2612059136 > 1. If it is relatively easy, we can do a quick follow-on PR to parse/type literals more intelligently. Maybe always parse as the smallest/narrowest datatype that will hold the cons

Re: [I] Update ClickBench benchmarks with DataFusion `44.0.0` [datafusion]

2025-01-24 Thread via GitHub
alamb commented on issue #13983: URL: https://github.com/apache/datafusion/issues/13983#issuecomment-2612064902 > Q23 might be improved if it can utilize filter pushdown? I think a >5x improvement might come from that. Running without filter pushdown (the default) ```sql

Re: [PR] fix(expr-common): Coerce to Decimal(20, 0) when combining UInt64 with signed integers [datafusion]

2025-01-24 Thread via GitHub
jonahgao commented on PR #14223: URL: https://github.com/apache/datafusion/pull/14223#issuecomment-2612102948 Some information may be useful: 1. This coercion rule applies to union and comparisons (for example, a > b, a <= b), but does not apply to arithmetic operations (for example,

  1   2   >