[PR] Add alternate index strategy footnote to parquet indexing blog [datafusion-site]

2025-07-17 Thread via GitHub
alamb opened a new pull request, #90: URL: https://github.com/apache/datafusion-site/pull/90 @adamreeve had a good point on the parquet mailing list about putting the index directly into the footer. - https://lists.apache.org/thread/54yg6dxj2jygd5fom8yo8qw7l41ntwn9 I think this is

Re: [PR] Fix discrepancy in Float64 to timestamp(9) casts for constants [datafusion]

2025-07-17 Thread via GitHub
alamb commented on code in PR #16639: URL: https://github.com/apache/datafusion/pull/16639#discussion_r2213084931 ## datafusion/sqllogictest/test_files/timestamps.slt: ## @@ -394,12 +503,12 @@ SELECT COUNT(*) FROM ts_data_secs where ts > to_timestamp_seconds('2020-09-08 12 que

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213137522 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,342 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] chore(deps): Update sqlparser to 0.56 [datafusion]

2025-07-17 Thread via GitHub
crepererum commented on code in PR #16456: URL: https://github.com/apache/datafusion/pull/16456#discussion_r2213133593 ## Cargo.toml: ## @@ -167,7 +167,10 @@ recursive = "0.1.1" regex = "1.8" rstest = "0.25.0" serde_json = "1" -sqlparser = { version = "0.55.0", default-featur

[I] Release 0.56.1 (backport/fix release) [datafusion-sqlparser-rs]

2025-07-17 Thread via GitHub
crepererum opened a new issue, #1952: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1952 Since DataFusion has fallen a bit behind, having a stable intermediate step which we could use before jumping to 0.57 and beyond would be nice. Sadly, we cannot use 0.56 due to #1898. So

Re: [PR] Optimize char expression [datafusion]

2025-07-17 Thread via GitHub
ajita-asthana commented on PR #16076: URL: https://github.com/apache/datafusion/pull/16076#issuecomment-3083700208 @logan-keede could you please review this when you have time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213099661 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,342 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213223988 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,342 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] Add example of custom file schema casting rules [datafusion]

2025-07-17 Thread via GitHub
adriangb commented on code in PR #16803: URL: https://github.com/apache/datafusion/pull/16803#discussion_r2213484985 ## datafusion-examples/examples/custom_file_casts.rs: ## @@ -0,0 +1,204 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [I] Optimize the join operators [datafusion]

2025-07-17 Thread via GitHub
UBarney commented on issue #16710: URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3084255495 > > > Updated: our benchmark is using datafusion internal source to benchmark instead of datafusion-python, i am not sure if it will make a difference. > > > > > > Th

Re: [I] [EPIC] Implement expressions as ScalarUDFImpl [datafusion-comet]

2025-07-17 Thread via GitHub
akupchinskiy commented on issue #1819: URL: https://github.com/apache/datafusion-comet/issues/1819#issuecomment-3084259942 One limitation of trying to switch from PhysicalExpr to ScalarUDFImpl I faced - is lack of capability extract the batch size. That is why it won't work for non-determ

[I] `SessionState::sql_to_expr` does not report unconsumed input [datafusion]

2025-07-17 Thread via GitHub
pepijnve opened a new issue, #16810: URL: https://github.com/apache/datafusion/issues/16810 ### Describe the bug When the SQL string passed to `SessionState::sql_to_expr` contains trailing tokens this is silently ignored. This can lead to rather unexpected results. It would be better

[PR] Report error when `SessionState::sql_to_expr_with_alias` does not consume all input [datafusion]

2025-07-17 Thread via GitHub
pepijnve opened a new pull request, #16811: URL: https://github.com/apache/datafusion/pull/16811 ## Which issue does this PR close? - Closes #16810. ## Rationale for this change When parsing SQL strings into expressions it's preferable to get parse errors when unprocesse

Re: [PR] fix: hdfs read into buffer fully [datafusion-comet]

2025-07-17 Thread via GitHub
comphead commented on PR #2031: URL: https://github.com/apache/datafusion-comet/pull/2031#issuecomment-3084571582 > This patch looks good to me, and it reminds me another problem with fs-hdfs. > > The `HdfsErr` returned by `fs-hdfs` read functions does not contain JVM stack traces.

Re: [PR] feat: randn expression support [datafusion-comet]

2025-07-17 Thread via GitHub
mbutrovich commented on code in PR #2010: URL: https://github.com/apache/datafusion-comet/pull/2010#discussion_r2213341348 ## native/spark-expr/src/nondetermenistic_funcs/randn.rs: ## @@ -0,0 +1,265 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more co

Re: [PR] cache generation of dictionary keys and null arrays for ScalarValue [datafusion]

2025-07-17 Thread via GitHub
adriangb commented on PR #16789: URL: https://github.com/apache/datafusion/pull/16789#issuecomment-3084267767 Are we worried about memory overhead with this? One thing I think we could do is set a reasonable limit to the cache size - only write to the cache if `size` is less than 1024 * 102

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213137522 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,342 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] Add support for Float16 type in substrait [datafusion]

2025-07-17 Thread via GitHub
jatin510 commented on PR #16793: URL: https://github.com/apache/datafusion/pull/16793#issuecomment-3084767084 made some changes @gabotechs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Eliminate Self Joins [datafusion]

2025-07-17 Thread via GitHub
jonathanc-n commented on PR #16023: URL: https://github.com/apache/datafusion/pull/16023#issuecomment-3084775589 @berkaysynnada I'll be happy to review it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [I] Plan to replace `SchemaAdapter` with `PhysicalExprAdapter` [datafusion]

2025-07-17 Thread via GitHub
viirya commented on issue #16800: URL: https://github.com/apache/datafusion/issues/16800#issuecomment-3084789737 > There's some discussion in [#14993](https://github.com/apache/datafusion/issues/14993). Basically if we want to be able to customize how expressions are evaluated for a specifi

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213294534 ## datafusion/physical-plan/src/sorts/streaming_merge.rs: ## @@ -131,14 +168,42 @@ impl<'a> StreamingMergeBuilder<'a> { enable_round_robin_tie_breake

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213305286 ## datafusion/physical-plan/src/spill/get_size.rs: ## @@ -0,0 +1,216 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3084017814 @2010YOUY01 I've updated based on your comments and commented back on some -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Allow comparison between boolean and int values [datafusion]

2025-07-17 Thread via GitHub
comphead commented on PR #16798: URL: https://github.com/apache/datafusion/pull/16798#issuecomment-3084693715 > what about using explicit casting in applications? For example: > > ```shell > > select not(arrow_cast(1, 'Boolean')); > +--+

Re: [I] Plan to replace `SchemaAdapter` with `PhysicalExprAdapter` [datafusion]

2025-07-17 Thread via GitHub
parthchandra commented on issue #16800: URL: https://github.com/apache/datafusion/issues/16800#issuecomment-3084716341 > please take a look at [#16803](https://github.com/apache/datafusion/pull/16803). Thank you for this pointer to the example. -- This is an automated message from

Re: [PR] WIP: Update `object_store` 0.12.3 [datafusion]

2025-07-17 Thread via GitHub
comphead commented on PR #16753: URL: https://github.com/apache/datafusion/pull/16753#issuecomment-3084461568 > Didn't see this PR here, I've also "fixed" the dependabot PR #16807 😅 > > If someone just wants to approve+merge the dependabot version, that's fine (I personally won't do t

Re: [PR] Add example of custom file schema casting rules [datafusion]

2025-07-17 Thread via GitHub
comphead commented on code in PR #16803: URL: https://github.com/apache/datafusion/pull/16803#discussion_r2213640997 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -101,7 +101,7 @@ pub struct ListingTableConfig { /// Optional [`SchemaAdapterFactory`] for creating

Re: [PR] Add example of custom file schema casting rules [datafusion]

2025-07-17 Thread via GitHub
adriangb commented on code in PR #16803: URL: https://github.com/apache/datafusion/pull/16803#discussion_r2213648171 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -101,7 +101,7 @@ pub struct ListingTableConfig { /// Optional [`SchemaAdapterFactory`] for creating

Re: [PR] Update `upgrading.md` for new unified config for sql string mapping to utf8view [datafusion]

2025-07-17 Thread via GitHub
comphead commented on code in PR #16809: URL: https://github.com/apache/datafusion/pull/16809#discussion_r2213682081 ## docs/source/library-user-guide/upgrading.md: ## @@ -120,6 +120,56 @@ SET datafusion.execution.spill_compression = 'zstd'; For more details about this config

Re: [I] Integration tests are not being run [datafusion]

2025-07-17 Thread via GitHub
kosiew commented on issue #16801: URL: https://github.com/apache/datafusion/issues/16801#issuecomment-3084526034 Certainly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Update upgrade md for new unified config for sql string mapping to utf8view when we release datafusion 49.0.0 [datafusion]

2025-07-17 Thread via GitHub
comphead closed issue #16428: Update upgrade md for new unified config for sql string mapping to utf8view when we release datafusion 49.0.0 URL: https://github.com/apache/datafusion/issues/16428 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Update `upgrading.md` for new unified config for sql string mapping to utf8view [datafusion]

2025-07-17 Thread via GitHub
comphead merged PR #16809: URL: https://github.com/apache/datafusion/pull/16809 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Optimize the join operators [datafusion]

2025-07-17 Thread via GitHub
UBarney commented on issue #16710: URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3084107356 > Updated: our benchmark is using datafusion internal source to benchmark instead of datafusion-python, i am not sure if it will make a difference. The results are similar

Re: [I] Optimize the join operators [datafusion]

2025-07-17 Thread via GitHub
zhuqi-lucas commented on issue #16710: URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3084151418 > > Updated: our benchmark is using datafusion internal source to benchmark instead of datafusion-python, i am not sure if it will make a difference. > > The results a

Re: [PR] Implement equals for stateful functions [datafusion]

2025-07-17 Thread via GitHub
findepi commented on PR #16781: URL: https://github.com/apache/datafusion/pull/16781#issuecomment-3084158547 @alamb @timsaucer @kosiew would you like to take a look at new code pushed here since the time you last reviewed? -- This is an automated message from the Apache Git Service. To r

Re: [PR] Fix discrepancy in Float64 to timestamp(9) casts for constants [datafusion]

2025-07-17 Thread via GitHub
findepi commented on code in PR #16639: URL: https://github.com/apache/datafusion/pull/16639#discussion_r2213410294 ## datafusion/sqllogictest/test_files/timestamps.slt: ## @@ -394,12 +503,12 @@ SELECT COUNT(*) FROM ts_data_secs where ts > to_timestamp_seconds('2020-09-08 12 q

Re: [PR] Eliminate Self Joins [datafusion]

2025-07-17 Thread via GitHub
berkaysynnada commented on PR #16023: URL: https://github.com/apache/datafusion/pull/16023#issuecomment-3084658348 > @atahanyorganci Hello, would you still be interested in continuing with this? I’ll drive this to completion tomorrow. -- This is an automated message from the Apache

Re: [PR] Postgres: ALTER TABLE SET ( storage_parameters ) [datafusion-sqlparser-rs]

2025-07-17 Thread via GitHub
achristmascarl commented on code in PR #1947: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1947#discussion_r2213786861 ## src/ast/ddl.rs: ## @@ -351,6 +351,10 @@ pub enum AlterTableOperation { ValidateConstraint { name: Ident, }, +/// `SET

Re: [PR] Add example of custom file schema casting rules [datafusion]

2025-07-17 Thread via GitHub
adriangb merged PR #16803: URL: https://github.com/apache/datafusion/pull/16803 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] chore(deps): Update sqlparser to 0.56 [datafusion]

2025-07-17 Thread via GitHub
alamb commented on code in PR #16456: URL: https://github.com/apache/datafusion/pull/16456#discussion_r2213096517 ## Cargo.toml: ## @@ -167,7 +167,10 @@ recursive = "0.1.1" regex = "1.8" rstest = "0.25.0" serde_json = "1" -sqlparser = { version = "0.55.0", default-features =

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213099661 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,342 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213099661 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,342 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213137522 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,342 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] Add alternate index strategy footnote to parquet indexing blog [datafusion-site]

2025-07-17 Thread via GitHub
JigaoLuo commented on PR #90: URL: https://github.com/apache/datafusion-site/pull/90#issuecomment-3083683290 Thank you @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[PR] Update upgrade md for new unified config for sql string mapping to utf8view [datafusion]

2025-07-17 Thread via GitHub
zhuqi-lucas opened a new pull request, #16809: URL: https://github.com/apache/datafusion/pull/16809 ## Which issue does this PR close? - Closes [#16428](https://github.com/apache/datafusion/issues/16428) ## Rationale for this change Update upgrade md for new u

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2212846002 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,342 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213289055 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -1067,14 +1074,13 @@ impl GroupedHashAggregateStream { sort_batch(&batch, &expr, No

Re: [PR] cache generation of dictionary keys and null arrays for ScalarValue [datafusion]

2025-07-17 Thread via GitHub
adriangb commented on PR #16789: URL: https://github.com/apache/datafusion/pull/16789#issuecomment-3084383240 > Are we worried about memory overhead with this? One thing I think we could do is set a reasonable limit to the cache size - only write to the cache if `size` is less than 1024 * 1

[PR] SGA-11414 Added support for odbc escape sequencing for time date and … [datafusion-sqlparser-rs]

2025-07-17 Thread via GitHub
etgarperets opened a new pull request, #1953: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1953 …timestamp literals. For this I modified TypedString by adding uses_odbc_syntax flag. -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [I] Plan to replace `SchemaAdapter` with `PhysicalExprAdapter` [datafusion]

2025-07-17 Thread via GitHub
adriangb commented on issue #16800: URL: https://github.com/apache/datafusion/issues/16800#issuecomment-3084230441 > Could you clarify the latter two? From your description, they sound like areas where `PhysicalExprAdapter` could bring benefits — but I'm not quite sure how `SchemaAdapter` f

Re: [PR] fix: skip predicates on struct unnest in PushDownFilter [datafusion]

2025-07-17 Thread via GitHub
adriangb commented on PR #16790: URL: https://github.com/apache/datafusion/pull/16790#issuecomment-3084406929 > I think a little more docs / context / comments are needed otherwise this is good to merge. @akoshchiy could you add some comments explaining what's going on for future ref

Re: [PR] Postgres: ALTER TABLE SET ( storage_parameters ) [datafusion-sqlparser-rs]

2025-07-17 Thread via GitHub
iffyio commented on code in PR #1947: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1947#discussion_r2212658558 ## src/ast/ddl.rs: ## @@ -351,6 +351,10 @@ pub enum AlterTableOperation { ValidateConstraint { name: Ident, }, +/// `SET ( storag

Re: [I] Release DataFusion `49.0.0` (July 2025) [datafusion]

2025-07-17 Thread via GitHub
shehabgamin commented on issue #16235: URL: https://github.com/apache/datafusion/issues/16235#issuecomment-3083162826 Almost missed this thread. Will start testing when there is a `49` branch! -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] feat: randn expression support [datafusion-comet]

2025-07-17 Thread via GitHub
akupchinskiy commented on code in PR #2010: URL: https://github.com/apache/datafusion-comet/pull/2010#discussion_r2212698030 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -2765,6 +2765,26 @@ class CometExpressionSuite extends CometTestBase with Adap

Re: [PR] improve rust workflows without cache [datafusion-ballista]

2025-07-17 Thread via GitHub
Huy1Ng commented on PR #1275: URL: https://github.com/apache/datafusion-ballista/pull/1275#issuecomment-3083153716 - I reused the workflows from datafusion repo. `cancel` was removed in this commit: https://github.com/apache/datafusion/commit/0820eb987ff9555b48a0e1704f5a8644ea4ab087#diff-2

Re: [PR] Automatically split large single RecordBatches in `MemorySource` into smaller batches [datafusion]

2025-07-17 Thread via GitHub
zhuqi-lucas commented on PR #16734: URL: https://github.com/apache/datafusion/pull/16734#issuecomment-3083213166 I am wandering if we need to do performance benchmark for this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] improve rust workflows without cache [datafusion-ballista]

2025-07-17 Thread via GitHub
milenkovicm commented on PR #1275: URL: https://github.com/apache/datafusion-ballista/pull/1275#issuecomment-3083345764 You don't have to if you think this is better solution. I don't know much about this topic. What do you think ? I guess we need to publish docker, cancel job is you

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2212846002 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,342 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] Fix for Postgres regex and like binary operators [datafusion-sqlparser-rs]

2025-07-17 Thread via GitHub
iffyio merged PR #1928: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1928 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Snowflake: Improve accuracy of lookahead in implicit LIMIT alias [datafusion-sqlparser-rs]

2025-07-17 Thread via GitHub
iffyio merged PR #1941: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1941 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[I] `DataFusionError` leaks inner types to the user [datafusion]

2025-07-17 Thread via GitHub
90degs2infty opened a new issue, #16805: URL: https://github.com/apache/datafusion/issues/16805 ### Describe the bug Some of `DataFusionError`'s variants leak the inner type to the user. E.g. [`DataFusionError::ObjectStore` simply wraps `object_store::Error`](https://docs.rs/datafusi

Re: [I] Postgres dialect fails to parse "~ any(...)" [datafusion-sqlparser-rs]

2025-07-17 Thread via GitHub
iffyio closed issue #1776: Postgres dialect fails to parse "~ any(...)" URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1776 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] chore(deps): bump object_store from 0.12.2 to 0.12.3 [datafusion]

2025-07-17 Thread via GitHub
dependabot[bot] opened a new pull request, #16807: URL: https://github.com/apache/datafusion/pull/16807 Bumps [object_store](https://github.com/apache/arrow-rs-object-store) from 0.12.2 to 0.12.3. Changelog Sourced from https://github.com/apache/arrow-rs-object-store/blob/main/CHAN

[PR] chore(deps): bump substrait from 0.58.0 to 0.59.0 [datafusion]

2025-07-17 Thread via GitHub
dependabot[bot] opened a new pull request, #16808: URL: https://github.com/apache/datafusion/pull/16808 Bumps [substrait](https://github.com/substrait-io/substrait-rs) from 0.58.0 to 0.59.0. Release notes Sourced from https://github.com/substrait-io/substrait-rs/releases";>substrai

Re: [PR] MySQL: EXPLAIN ANALYZE format type [datafusion-sqlparser-rs]

2025-07-17 Thread via GitHub
iffyio commented on code in PR #1945: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1945#discussion_r2212639205 ## src/ast/mod.rs: ## @@ -7641,13 +7641,32 @@ impl fmt::Display for DuplicateTreatment { } } +#[derive(Debug, Copy, Clone, PartialEq, PartialOrd

[PR] chore(deps): bump the proto group with 2 updates [datafusion]

2025-07-17 Thread via GitHub
dependabot[bot] opened a new pull request, #16806: URL: https://github.com/apache/datafusion/pull/16806 Bumps the proto group with 2 updates: [pbjson-build](https://github.com/influxdata/pbjson) and [prost-build](https://github.com/tokio-rs/prost). Updates `pbjson-build` from 0.7.0 t

Re: [PR] Snowflake create database [datafusion-sqlparser-rs]

2025-07-17 Thread via GitHub
iffyio commented on code in PR #1939: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1939#discussion_r2212632085 ## src/ast/mod.rs: ## @@ -9524,6 +9550,29 @@ impl Display for Tag { } } +/// Snowflake `WITH CONTACT ( purpose = contact [ , purpose = contact .

Re: [PR] benchmark: Add parquet h2o support [datafusion]

2025-07-17 Thread via GitHub
2010YOUY01 commented on code in PR #16804: URL: https://github.com/apache/datafusion/pull/16804#discussion_r221265 ## benchmarks/bench.sh: ## @@ -100,15 +100,24 @@ clickbench_pushdown:ClickBench queries against partitioned (100 files) parqu clickbench_extended:Clic

Re: [PR] Add support for `DROP USER` statement [datafusion-sqlparser-rs]

2025-07-17 Thread via GitHub
iffyio merged PR #1951: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1951 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] benchmark: Add parquet h2o support [datafusion]

2025-07-17 Thread via GitHub
zhuqi-lucas commented on code in PR #16804: URL: https://github.com/apache/datafusion/pull/16804#discussion_r2212668550 ## benchmarks/bench.sh: ## @@ -100,15 +100,24 @@ clickbench_pushdown:ClickBench queries against partitioned (100 files) parqu clickbench_extended:Cli

Re: [PR] benchmark: Add parquet h2o support [datafusion]

2025-07-17 Thread via GitHub
zhuqi-lucas commented on code in PR #16804: URL: https://github.com/apache/datafusion/pull/16804#discussion_r2212666921 ## benchmarks/bench.sh: ## @@ -775,6 +840,7 @@ data_h2o() { # Set virtual environment directory VIRTUAL_ENV="${PWD}/venv" +rm -rf "$VIRTUAL_ENV

Re: [PR] benchmark: Add parquet h2o support [datafusion]

2025-07-17 Thread via GitHub
zhuqi-lucas commented on PR #16804: URL: https://github.com/apache/datafusion/pull/16804#issuecomment-3083126587 > Thank you! it LGTM. I have also tested it locally. Thank you @2010YOUY01 for review! -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2212828103 ## datafusion/core/tests/fuzz_cases/sort_fuzz.rs: ## @@ -377,3 +388,335 @@ fn make_staggered_i32_utf8_batches(len: usize) -> Vec { batches } + Review Co

Re: [PR] CI: Fix slow join test [datafusion]

2025-07-17 Thread via GitHub
crepererum merged PR #16796: URL: https://github.com/apache/datafusion/pull/16796 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] fix : cast_operands_to_double_type_to_fix_arithmetic_overflow [datafusion-comet]

2025-07-17 Thread via GitHub
andygrove commented on code in PR #1996: URL: https://github.com/apache/datafusion-comet/pull/1996#discussion_r2212916510 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -677,7 +677,14 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [PR] Benchmark for char expression [datafusion]

2025-07-17 Thread via GitHub
crepererum merged PR #16743: URL: https://github.com/apache/datafusion/pull/16743 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2212827518 ## datafusion/core/tests/fuzz_cases/sort_fuzz.rs: ## @@ -377,3 +388,335 @@ fn make_staggered_i32_utf8_batches(len: usize) -> Vec { batches } + +#[tokio::

Re: [I] joins::nested_loop_join::tests::join_maintains_right_order tests take over 60 seconds [datafusion]

2025-07-17 Thread via GitHub
crepererum closed issue #16792: joins::nested_loop_join::tests::join_maintains_right_order tests take over 60 seconds URL: https://github.com/apache/datafusion/issues/16792 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] chore(deps): Update sqlparser to 0.56 [datafusion]

2025-07-17 Thread via GitHub
crepererum commented on code in PR #16456: URL: https://github.com/apache/datafusion/pull/16456#discussion_r2212938215 ## Cargo.toml: ## @@ -167,7 +167,10 @@ recursive = "0.1.1" regex = "1.8" rstest = "0.25.0" serde_json = "1" -sqlparser = { version = "0.55.0", default-featur

Re: [PR] WIP: Update `object_store` 0.12.3 [datafusion]

2025-07-17 Thread via GitHub
crepererum commented on PR #16753: URL: https://github.com/apache/datafusion/pull/16753#issuecomment-3083485457 Didn't see this PR here, I've also "fixed" the dependabot PR #16807 :sweat_smile: -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [I] Release 0.56.1 (backport/fix release) [datafusion-sqlparser-rs]

2025-07-17 Thread via GitHub
alamb commented on issue #1952: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1952#issuecomment-3085079477 I will plan to make a release candidate later today or tomorrow. FYI @iffyio (No action required on your part, just FYI that I plan to make a patch release)

Re: [PR] chore(deps): Update sqlparser to 0.56 [datafusion]

2025-07-17 Thread via GitHub
Dimchikkk commented on code in PR #16456: URL: https://github.com/apache/datafusion/pull/16456#discussion_r2214246096 ## Cargo.toml: ## @@ -167,7 +167,10 @@ recursive = "0.1.1" regex = "1.8" rstest = "0.25.0" serde_json = "1" -sqlparser = { version = "0.55.0", default-feature

Re: [PR] Fix discrepancy in Float64 to timestamp(9) casts for constants [datafusion]

2025-07-17 Thread via GitHub
alamb commented on PR #16639: URL: https://github.com/apache/datafusion/pull/16639#issuecomment-3085157083 Thanks again @findepi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [I] Different result of double to timestamp(9) cast when source value is constant [datafusion]

2025-07-17 Thread via GitHub
alamb closed issue #16636: Different result of double to timestamp(9) cast when source value is constant URL: https://github.com/apache/datafusion/issues/16636 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Fix discrepancy in Float64 to timestamp(9) casts for constants [datafusion]

2025-07-17 Thread via GitHub
alamb merged PR #16639: URL: https://github.com/apache/datafusion/pull/16639 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: change Expr OuterReferenceColumn to Box type for reducing expr struct size [datafusion]

2025-07-17 Thread via GitHub
alamb commented on PR #16771: URL: https://github.com/apache/datafusion/pull/16771#issuecomment-3085737075 🤖: Benchmark completed Details ``` group main reduce_expr_size -

Re: [PR] Automatically split large single RecordBatches in `MemorySource` into smaller batches [datafusion]

2025-07-17 Thread via GitHub
alamb commented on PR #16734: URL: https://github.com/apache/datafusion/pull/16734#issuecomment-3085737166 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubun

Re: [PR] DuckDB, Postgres, SQLite: NOT NULL and NOTNULL expressions [datafusion-sqlparser-rs]

2025-07-17 Thread via GitHub
ryanschneider commented on PR #1927: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1927#issuecomment-3085745852 Ok @iffyio I believe I addressed all your latest feedback in https://github.com/apache/datafusion-sqlparser-rs/pull/1927/commits/1466e2ab212744bc2270546369a04ad8ab2e

Re: [PR] Allow comparison between boolean and int values [datafusion]

2025-07-17 Thread via GitHub
comphead commented on PR #16798: URL: https://github.com/apache/datafusion/pull/16798#issuecomment-3086629367 > @comphead We're talking about implicit conversions > > PG: postgres=# SELECT not(1); ERROR: argument of NOT must be type boolean, not type integer LINE 1: SELECT not(1);

[I] Improve performance on ClickBench [datafusion-comet]

2025-07-17 Thread via GitHub
Iskander14yo opened a new issue, #2035: URL: https://github.com/apache/datafusion-comet/issues/2035 Hi! Just made a [PR](https://github.com/ClickHouse/ClickBench/pull/557) to add Comet to [ClickBench](https://benchmark.clickhouse.com/) - one of the popular benchmarks for analytical w

Re: [PR] Add support for Float16 type in substrait [datafusion]

2025-07-17 Thread via GitHub
LiaCastaneda commented on code in PR #16793: URL: https://github.com/apache/datafusion/pull/16793#discussion_r2215099809 ## datafusion/substrait/src/logical_plan/producer/types.rs: ## @@ -96,7 +96,15 @@ pub(crate) fn to_substrait_type( nullability,

Re: [I] Optimize the join operators [datafusion]

2025-07-17 Thread via GitHub
zhuqi-lucas commented on issue #16710: URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3086584860 > > > > Updated: our benchmark is using datafusion internal source to benchmark instead of datafusion-python, i am not sure if it will make a difference. > > > > > >

Re: [PR] Add alternate index strategy footnote to parquet indexing blog [datafusion-site]

2025-07-17 Thread via GitHub
alamb merged PR #90: URL: https://github.com/apache/datafusion-site/pull/90 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusio

Re: [PR] Add alternate index strategy footnote to parquet indexing blog [datafusion-site]

2025-07-17 Thread via GitHub
alamb commented on PR #90: URL: https://github.com/apache/datafusion-site/pull/90#issuecomment-3085214456 Thank you @timsaucer 🙏 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Snowflake: Improve accuracy of lookahead in implicit LIMIT alias [datafusion-sqlparser-rs]

2025-07-17 Thread via GitHub
alamb commented on PR #1941: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1941#issuecomment-3085085656 It is pretty epic that this code keeps rolling along -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] fix: The inconsistency between scalar and array on the cast decimal to timestamp [datafusion]

2025-07-17 Thread via GitHub
findepi commented on code in PR #16539: URL: https://github.com/apache/datafusion/pull/16539#discussion_r2214149519 ## datafusion/common/src/scalar/mod.rs: ## @@ -3069,7 +3069,14 @@ impl ScalarValue { ScalarValue::Decimal128(Some(decimal_value), _, scale),

Re: [PR] fix: The inconsistency between scalar and array on the cast decimal to timestamp [datafusion]

2025-07-17 Thread via GitHub
findepi commented on PR #16539: URL: https://github.com/apache/datafusion/pull/16539#issuecomment-3085264189 > What is the status of this PR? Shall we merge it? Or are there outstanding issues to resolve? requires an update -- https://github.com/apache/datafusion/pull/16539#discussio

Re: [I] Plan to replace `SchemaAdapter` with `PhysicalExprAdapter` [datafusion]

2025-07-17 Thread via GitHub
viirya commented on issue #16800: URL: https://github.com/apache/datafusion/issues/16800#issuecomment-3085726035 > Shall we file (another) ticket to discuss the process? Yea, as it is not directly related to this change. > One way we could proceed is to document some rough guide

[PR] chore: use `equals_datatype` for `BinaryExpr` [datafusion]

2025-07-17 Thread via GitHub
comphead opened a new pull request, #16813: URL: https://github.com/apache/datafusion/pull/16813 ## Which issue does this PR close? - Closes #. ## Rationale for this change Current type check in `BinaryExpr` is erroneous(it doesn't consider element names diff

Re: [I] Plan to replace `SchemaAdapter` with `PhysicalExprAdapter` [datafusion]

2025-07-17 Thread via GitHub
adriangb commented on issue #16800: URL: https://github.com/apache/datafusion/issues/16800#issuecomment-3085658739 > One way we could proceed is to document some rough guidelines in the docs site, and then maybe add a label we can use to tag issues with proposals, so the current list is eas

  1   2   >