Re: [PR] Add support for Float16 type in substrait [datafusion]

2025-07-17 Thread via GitHub
LiaCastaneda commented on code in PR #16793: URL: https://github.com/apache/datafusion/pull/16793#discussion_r2215099809 ## datafusion/substrait/src/logical_plan/producer/types.rs: ## @@ -96,7 +96,15 @@ pub(crate) fn to_substrait_type( nullability,

[I] Improve performance on ClickBench [datafusion-comet]

2025-07-17 Thread via GitHub
Iskander14yo opened a new issue, #2035: URL: https://github.com/apache/datafusion-comet/issues/2035 Hi! Just made a [PR](https://github.com/ClickHouse/ClickBench/pull/557) to add Comet to [ClickBench](https://benchmark.clickhouse.com/) - one of the popular benchmarks for analytical w

Re: [PR] Allow comparison between boolean and int values [datafusion]

2025-07-17 Thread via GitHub
comphead commented on PR #16798: URL: https://github.com/apache/datafusion/pull/16798#issuecomment-3086629367 > @comphead We're talking about implicit conversions > > PG: postgres=# SELECT not(1); ERROR: argument of NOT must be type boolean, not type integer LINE 1: SELECT not(1);

Re: [I] Optimize the join operators [datafusion]

2025-07-17 Thread via GitHub
zhuqi-lucas commented on issue #16710: URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3086584860 > > > > Updated: our benchmark is using datafusion internal source to benchmark instead of datafusion-python, i am not sure if it will make a difference. > > > > > >

Re: [PR] benchmark: Add parquet h2o support [datafusion]

2025-07-17 Thread via GitHub
zhuqi-lucas commented on PR #16804: URL: https://github.com/apache/datafusion/pull/16804#issuecomment-3086556019 Thank you @alamb @jonathanc-n for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] feat: change Expr OuterReferenceColumn to Box type for reducing expr struct size [datafusion]

2025-07-17 Thread via GitHub
zhuqi-lucas commented on PR #16771: URL: https://github.com/apache/datafusion/pull/16771#issuecomment-3086549129 > πŸ€–: Benchmark completed > > Details > > ``` > group main reduce_expr_size > -

Re: [PR] Automatically split large single RecordBatches in `MemorySource` into smaller batches [datafusion]

2025-07-17 Thread via GitHub
zhuqi-lucas commented on PR #16734: URL: https://github.com/apache/datafusion/pull/16734#issuecomment-3086544110 Thank you @alamb , no regression from above benchmark. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] Allow comparison between boolean and int values [datafusion]

2025-07-17 Thread via GitHub
2010YOUY01 commented on PR #16798: URL: https://github.com/apache/datafusion/pull/16798#issuecomment-3086452777 @comphead We're talking about implicit conversions PG: postgres=# SELECT not(1); ERROR: argument of NOT must be type boolean, not type integer LINE 1: SELECT not(1)

Re: [PR] Fix: common_sub_expression_eliminate optimizer rule failed [datafusion]

2025-07-17 Thread via GitHub
github-actions[bot] commented on PR #16066: URL: https://github.com/apache/datafusion/pull/16066#issuecomment-3086432006 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

[PR] chore: use `equals_datatype` for `BinaryExpr` [datafusion]

2025-07-17 Thread via GitHub
comphead opened a new pull request, #16813: URL: https://github.com/apache/datafusion/pull/16813 ## Which issue does this PR close? - Closes #. ## Rationale for this change Current type check in `BinaryExpr` is erroneous(it doesn't consider element names diff

Re: [PR] Automatically split large single RecordBatches in `MemorySource` into smaller batches [datafusion]

2025-07-17 Thread via GitHub
alamb commented on PR #16734: URL: https://github.com/apache/datafusion/pull/16734#issuecomment-3085826016 πŸ€–: Benchmark completed Details ``` Comparing HEAD and perf-16717 Benchmark clickbench_extended.json ┏━━

Re: [PR] Automatically split large single RecordBatches in `MemorySource` into smaller batches [datafusion]

2025-07-17 Thread via GitHub
alamb commented on PR #16734: URL: https://github.com/apache/datafusion/pull/16734#issuecomment-3085796124 πŸ€– `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubun

Re: [PR] Automatically split large single RecordBatches in `MemorySource` into smaller batches [datafusion]

2025-07-17 Thread via GitHub
alamb commented on PR #16734: URL: https://github.com/apache/datafusion/pull/16734#issuecomment-3085796079 πŸ€–: Benchmark completed Details ``` Comparing HEAD and perf-16717 Benchmark tpch_mem_sf10.json ┏

Re: [PR] DuckDB, Postgres, SQLite: NOT NULL and NOTNULL expressions [datafusion-sqlparser-rs]

2025-07-17 Thread via GitHub
ryanschneider commented on PR #1927: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1927#issuecomment-3085745852 Ok @iffyio I believe I addressed all your latest feedback in https://github.com/apache/datafusion-sqlparser-rs/pull/1927/commits/1466e2ab212744bc2270546369a04ad8ab2e

Re: [PR] feat: change Expr OuterReferenceColumn to Box type for reducing expr struct size [datafusion]

2025-07-17 Thread via GitHub
alamb commented on PR #16771: URL: https://github.com/apache/datafusion/pull/16771#issuecomment-3085737075 πŸ€–: Benchmark completed Details ``` group main reduce_expr_size -

Re: [PR] Automatically split large single RecordBatches in `MemorySource` into smaller batches [datafusion]

2025-07-17 Thread via GitHub
alamb commented on PR #16734: URL: https://github.com/apache/datafusion/pull/16734#issuecomment-3085737166 πŸ€– `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubun

Re: [I] Plan to replace `SchemaAdapter` with `PhysicalExprAdapter` [datafusion]

2025-07-17 Thread via GitHub
viirya commented on issue #16800: URL: https://github.com/apache/datafusion/issues/16800#issuecomment-3085726035 > Shall we file (another) ticket to discuss the process? Yea, as it is not directly related to this change. > One way we could proceed is to document some rough guide

Re: [PR] cache generation of dictionary keys and null arrays for ScalarValue [datafusion]

2025-07-17 Thread via GitHub
adriangb commented on PR #16789: URL: https://github.com/apache/datafusion/pull/16789#issuecomment-308570 I agree. Let's more forward with this then. I'll allow a couple more days for review since we're in no rush. -- This is an automated message from the Apache Git Service. To respon

Re: [PR] DuckDB, Postgres, SQLite: NOT NULL and NOTNULL expressions [datafusion-sqlparser-rs]

2025-07-17 Thread via GitHub
ryanschneider commented on code in PR #1927: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1927#discussion_r2214414069 ## src/parser/mod.rs: ## @@ -16514,6 +16549,10 @@ impl<'a> Parser<'a> { Ok(None) } } + +pub fn in_normal_state(&se

Re: [I] Plan to replace `SchemaAdapter` with `PhysicalExprAdapter` [datafusion]

2025-07-17 Thread via GitHub
adriangb commented on issue #16800: URL: https://github.com/apache/datafusion/issues/16800#issuecomment-3085658739 > One way we could proceed is to document some rough guidelines in the docs site, and then maybe add a label we can use to tag issues with proposals, so the current list is eas

Re: [PR] Automatically split large single RecordBatches in `MemorySource` into smaller batches [datafusion]

2025-07-17 Thread via GitHub
alamb commented on PR #16734: URL: https://github.com/apache/datafusion/pull/16734#issuecomment-3085652294 > I am wandering if we need to do performance benchmark for this PR. It is a good idea -- I kicked off some bechmarks. I am not sure how many actually use an in-memory batch.

Re: [I] Plan to replace `SchemaAdapter` with `PhysicalExprAdapter` [datafusion]

2025-07-17 Thread via GitHub
alamb commented on issue #16800: URL: https://github.com/apache/datafusion/issues/16800#issuecomment-3085646642 > I wonder if it would be beneficial for DataFusion to adopt a similar lightweight proposal process for major design changes β€” I think it would be super helpful. Thank you

Re: [PR] Add example of custom file schema casting rules [datafusion]

2025-07-17 Thread via GitHub
alamb commented on code in PR #16803: URL: https://github.com/apache/datafusion/pull/16803#discussion_r2214384501 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -433,7 +433,7 @@ impl ListingTableConfig { /// `SchemaAdapterFactory` is set, in which case only the

Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]

2025-07-17 Thread via GitHub
alamb commented on PR #16681: URL: https://github.com/apache/datafusion/pull/16681#issuecomment-3085634080 I think this PR is an alternative to the proposed fix in - https://github.com/apache/datafusion/pull/16781 Is that right? Are you happy with the code in https://github.com/ap

Re: [PR] feat: change Expr OuterReferenceColumn to Box type for reducing expr struct size [datafusion]

2025-07-17 Thread via GitHub
alamb commented on PR #16771: URL: https://github.com/apache/datafusion/pull/16771#issuecomment-3085636073 πŸ€– `./gh_compare_branch_bench.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch_bench.sh) Running Linux aal-dev 6.11.0-1016-gcp #16~

Re: [PR] CI: Fix slow join test [datafusion]

2025-07-17 Thread via GitHub
alamb commented on PR #16796: URL: https://github.com/apache/datafusion/pull/16796#issuecomment-3085626043 Thank you @2010YOUY01 and @xudong963 / @crepererum for the review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] fix: hdfs read into buffer fully [datafusion-comet]

2025-07-17 Thread via GitHub
parthchandra commented on PR #2031: URL: https://github.com/apache/datafusion-comet/pull/2031#issuecomment-3085617517 Merged. Thank you for the reviews @andygrove @comphead @Kontinuation! -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] fix: hdfs read into buffer fully [datafusion-comet]

2025-07-17 Thread via GitHub
parthchandra merged PR #2031: URL: https://github.com/apache/datafusion-comet/pull/2031 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] fix: hdfs read into buffer fully [datafusion-comet]

2025-07-17 Thread via GitHub
parthchandra commented on PR #2031: URL: https://github.com/apache/datafusion-comet/pull/2031#issuecomment-3085613885 @comphead @Kontinuation @drexler-sky Opened a tracking issue for fs-hdfs issues: https://github.com/apache/datafusion-comet/issues/2034 -- This is an automated message fr

Re: [I] Upgrade to DataFusion 49.0.0 [datafusion-comet]

2025-07-17 Thread via GitHub
comphead commented on issue #1993: URL: https://github.com/apache/datafusion-comet/issues/1993#issuecomment-3085612153 object_store upgrade example https://github.com/apache/datafusion/pull/16807 -- This is an automated message from the Apache Git Service. To respond to the message, pleas

[I] Tracking fs-hdfs issues [datafusion-comet]

2025-07-17 Thread via GitHub
parthchandra opened a new issue, #2034: URL: https://github.com/apache/datafusion-comet/issues/2034 We are using https://github.com/datafusion-contrib/fs-hdfs to implement the `hdfs_object_store` and are actively addressing issues in the package. This is to keep track of them - [ ]

[PR] chore(deps): bump on-headers and compression in /datafusion/wasmtest/datafusion-wasm-app [datafusion]

2025-07-17 Thread via GitHub
dependabot[bot] opened a new pull request, #16812: URL: https://github.com/apache/datafusion/pull/16812 Bumps [on-headers](https://github.com/jshttp/on-headers) and [compression](https://github.com/expressjs/compression). These dependencies needed to be updated together. Updates `on-head

Re: [PR] chore(deps): bump object_store from 0.12.2 to 0.12.3 [datafusion]

2025-07-17 Thread via GitHub
alamb merged PR #16807: URL: https://github.com/apache/datafusion/pull/16807 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] chore(deps): Update sqlparser to 0.56 [datafusion]

2025-07-17 Thread via GitHub
alamb commented on code in PR #16456: URL: https://github.com/apache/datafusion/pull/16456#discussion_r2214358059 ## Cargo.toml: ## @@ -167,7 +167,10 @@ recursive = "0.1.1" regex = "1.8" rstest = "0.25.0" serde_json = "1" -sqlparser = { version = "0.55.0", default-features =

Re: [PR] chore(deps): bump object_store from 0.12.2 to 0.12.3 [datafusion]

2025-07-17 Thread via GitHub
alamb commented on PR #16807: URL: https://github.com/apache/datafusion/pull/16807#issuecomment-3085563885 Thank you @crepererum -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] WIP: Update `object_store` 0.12.3 [datafusion]

2025-07-17 Thread via GitHub
alamb closed pull request #16753: WIP: Update `object_store` 0.12.3 URL: https://github.com/apache/datafusion/pull/16753 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] cache generation of dictionary keys and null arrays for ScalarValue [datafusion]

2025-07-17 Thread via GitHub
alamb commented on PR #16789: URL: https://github.com/apache/datafusion/pull/16789#issuecomment-3085558681 I for one am not really worried about memory as we are talking about `8k * Int64` = 64k for the largest index size. I expect most people to use a single index size (e.g. Int32) so that

Re: [PR] fix: support nullable columns in pre-sorted data sources [datafusion]

2025-07-17 Thread via GitHub
alamb commented on PR #16783: URL: https://github.com/apache/datafusion/pull/16783#issuecomment-308738 Thanks @crepererum -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [I] CopyTo plan loses ordering requirement during physical plan optimization [datafusion]

2025-07-17 Thread via GitHub
alamb closed issue #16784: CopyTo plan loses ordering requirement during physical plan optimization URL: https://github.com/apache/datafusion/issues/16784 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] Fix: Preserve sorting for the COPY TO plan [datafusion]

2025-07-17 Thread via GitHub
alamb commented on PR #16785: URL: https://github.com/apache/datafusion/pull/16785#issuecomment-3085556000 Thanks again @bert-beyondloops -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] fix: support nullable columns in pre-sorted data sources [datafusion]

2025-07-17 Thread via GitHub
alamb merged PR #16783: URL: https://github.com/apache/datafusion/pull/16783 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Fix: Preserve sorting for the COPY TO plan [datafusion]

2025-07-17 Thread via GitHub
alamb merged PR #16785: URL: https://github.com/apache/datafusion/pull/16785 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] chore(deps): Update sqlparser to 0.56 [datafusion]

2025-07-17 Thread via GitHub
Dimchikkk commented on code in PR #16456: URL: https://github.com/apache/datafusion/pull/16456#discussion_r2214246096 ## Cargo.toml: ## @@ -167,7 +167,10 @@ recursive = "0.1.1" regex = "1.8" rstest = "0.25.0" serde_json = "1" -sqlparser = { version = "0.55.0", default-feature

Re: [PR] DuckDB, Postgres, SQLite: NOT NULL and NOTNULL expressions [datafusion-sqlparser-rs]

2025-07-17 Thread via GitHub
ryanschneider commented on PR #1927: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1927#issuecomment-3085286658 Thanks for the latest round of feedback @iffyio, I've been busy with some unrelated work so haven't had time to look at it all yet or start on the changes but hopef

Re: [PR] fix: The inconsistency between scalar and array on the cast decimal to timestamp [datafusion]

2025-07-17 Thread via GitHub
findepi commented on PR #16539: URL: https://github.com/apache/datafusion/pull/16539#issuecomment-3085264189 > What is the status of this PR? Shall we merge it? Or are there outstanding issues to resolve? requires an update -- https://github.com/apache/datafusion/pull/16539#discussio

Re: [PR] fix: The inconsistency between scalar and array on the cast decimal to timestamp [datafusion]

2025-07-17 Thread via GitHub
findepi commented on code in PR #16539: URL: https://github.com/apache/datafusion/pull/16539#discussion_r2214149519 ## datafusion/common/src/scalar/mod.rs: ## @@ -3069,7 +3069,14 @@ impl ScalarValue { ScalarValue::Decimal128(Some(decimal_value), _, scale),

Re: [PR] Add alternate index strategy footnote to parquet indexing blog [datafusion-site]

2025-07-17 Thread via GitHub
alamb merged PR #90: URL: https://github.com/apache/datafusion-site/pull/90 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusio

Re: [PR] Add alternate index strategy footnote to parquet indexing blog [datafusion-site]

2025-07-17 Thread via GitHub
alamb commented on PR #90: URL: https://github.com/apache/datafusion-site/pull/90#issuecomment-3085214456 Thank you @timsaucer πŸ™ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Fix discrepancy in Float64 to timestamp(9) casts for constants [datafusion]

2025-07-17 Thread via GitHub
alamb merged PR #16639: URL: https://github.com/apache/datafusion/pull/16639 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Fix discrepancy in Float64 to timestamp(9) casts for constants [datafusion]

2025-07-17 Thread via GitHub
alamb commented on PR #16639: URL: https://github.com/apache/datafusion/pull/16639#issuecomment-3085157083 Thanks again @findepi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [I] Different result of double to timestamp(9) cast when source value is constant [datafusion]

2025-07-17 Thread via GitHub
alamb closed issue #16636: Different result of double to timestamp(9) cast when source value is constant URL: https://github.com/apache/datafusion/issues/16636 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Add alternate index strategy footnote to parquet indexing blog [datafusion-site]

2025-07-17 Thread via GitHub
alamb commented on PR #90: URL: https://github.com/apache/datafusion-site/pull/90#issuecomment-3085112270 Thanks @zhuqi-lucas and @JigaoLuo -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [I] Release DataFusion `49.0.0` (July 2025) [datafusion]

2025-07-17 Thread via GitHub
alamb commented on issue #16235: URL: https://github.com/apache/datafusion/issues/16235#issuecomment-3085092592 I am still behind -- i will likely start in earnest on this task tomorrow (Friday) and over the weekend and hopefully we'll be reado to cut a Release Candidate (RC) early next wee

Re: [PR] Snowflake: Improve accuracy of lookahead in implicit LIMIT alias [datafusion-sqlparser-rs]

2025-07-17 Thread via GitHub
alamb commented on PR #1941: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1941#issuecomment-3085085656 It is pretty epic that this code keeps rolling along -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [I] Release 0.56.1 (backport/fix release) [datafusion-sqlparser-rs]

2025-07-17 Thread via GitHub
alamb commented on issue #1952: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1952#issuecomment-3085079477 I will plan to make a release candidate later today or tomorrow. FYI @iffyio (No action required on your part, just FYI that I plan to make a patch release)

Re: [PR] fix: hdfs read into buffer fully [datafusion-comet]

2025-07-17 Thread via GitHub
parthchandra commented on PR #2031: URL: https://github.com/apache/datafusion-comet/pull/2031#issuecomment-3084912625 > > reminds me another problem with fs-hdfs. > > The `HdfsErr` returned by `fs-hdfs` read functions does not contain JVM stack traces. If there's a read failure caused by

Re: [PR] fix: hdfs read into buffer fully [datafusion-comet]

2025-07-17 Thread via GitHub
parthchandra commented on code in PR #2031: URL: https://github.com/apache/datafusion-comet/pull/2031#discussion_r2213926681 ## native/hdfs/src/object_store/hdfs.rs: ## @@ -88,19 +88,33 @@ impl HadoopFileSystem { fn read_range(range: &Range, file: &HdfsFile) -> Result {

Re: [I] Plan to replace `SchemaAdapter` with `PhysicalExprAdapter` [datafusion]

2025-07-17 Thread via GitHub
viirya commented on issue #16800: URL: https://github.com/apache/datafusion/issues/16800#issuecomment-3084789737 > There's some discussion in [#14993](https://github.com/apache/datafusion/issues/14993). Basically if we want to be able to customize how expressions are evaluated for a specifi

Re: [PR] Eliminate Self Joins [datafusion]

2025-07-17 Thread via GitHub
jonathanc-n commented on PR #16023: URL: https://github.com/apache/datafusion/pull/16023#issuecomment-3084775589 @berkaysynnada I'll be happy to review it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] Add support for Float16 type in substrait [datafusion]

2025-07-17 Thread via GitHub
jatin510 commented on PR #16793: URL: https://github.com/apache/datafusion/pull/16793#issuecomment-3084767084 made some changes @gabotechs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [I] Plan to replace `SchemaAdapter` with `PhysicalExprAdapter` [datafusion]

2025-07-17 Thread via GitHub
parthchandra commented on issue #16800: URL: https://github.com/apache/datafusion/issues/16800#issuecomment-3084716341 > please take a look at [#16803](https://github.com/apache/datafusion/pull/16803). Thank you for this pointer to the example. -- This is an automated message from

Re: [PR] Allow comparison between boolean and int values [datafusion]

2025-07-17 Thread via GitHub
comphead commented on PR #16798: URL: https://github.com/apache/datafusion/pull/16798#issuecomment-3084693715 > what about using explicit casting in applications? For example: > > ```shell > > select not(arrow_cast(1, 'Boolean')); > +--+

Re: [PR] Add example of custom file schema casting rules [datafusion]

2025-07-17 Thread via GitHub
adriangb merged PR #16803: URL: https://github.com/apache/datafusion/pull/16803 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Postgres: ALTER TABLE SET ( storage_parameters ) [datafusion-sqlparser-rs]

2025-07-17 Thread via GitHub
achristmascarl commented on code in PR #1947: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1947#discussion_r2213786861 ## src/ast/ddl.rs: ## @@ -351,6 +351,10 @@ pub enum AlterTableOperation { ValidateConstraint { name: Ident, }, +/// `SET

Re: [PR] Eliminate Self Joins [datafusion]

2025-07-17 Thread via GitHub
berkaysynnada commented on PR #16023: URL: https://github.com/apache/datafusion/pull/16023#issuecomment-3084658348 > @atahanyorganci Hello, would you still be interested in continuing with this? I’ll drive this to completion tomorrow. -- This is an automated message from the Apache

Re: [PR] fix: hdfs read into buffer fully [datafusion-comet]

2025-07-17 Thread via GitHub
comphead commented on PR #2031: URL: https://github.com/apache/datafusion-comet/pull/2031#issuecomment-3084571582 > This patch looks good to me, and it reminds me another problem with fs-hdfs. > > The `HdfsErr` returned by `fs-hdfs` read functions does not contain JVM stack traces.

Re: [I] Update upgrade md for new unified config for sql string mapping to utf8view when we release datafusion 49.0.0 [datafusion]

2025-07-17 Thread via GitHub
comphead closed issue #16428: Update upgrade md for new unified config for sql string mapping to utf8view when we release datafusion 49.0.0 URL: https://github.com/apache/datafusion/issues/16428 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Update `upgrading.md` for new unified config for sql string mapping to utf8view [datafusion]

2025-07-17 Thread via GitHub
comphead merged PR #16809: URL: https://github.com/apache/datafusion/pull/16809 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Update `upgrading.md` for new unified config for sql string mapping to utf8view [datafusion]

2025-07-17 Thread via GitHub
comphead commented on code in PR #16809: URL: https://github.com/apache/datafusion/pull/16809#discussion_r2213682081 ## docs/source/library-user-guide/upgrading.md: ## @@ -120,6 +120,56 @@ SET datafusion.execution.spill_compression = 'zstd'; For more details about this config

Re: [I] Integration tests are not being run [datafusion]

2025-07-17 Thread via GitHub
kosiew commented on issue #16801: URL: https://github.com/apache/datafusion/issues/16801#issuecomment-3084526034 Certainly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Add example of custom file schema casting rules [datafusion]

2025-07-17 Thread via GitHub
adriangb commented on code in PR #16803: URL: https://github.com/apache/datafusion/pull/16803#discussion_r2213648171 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -101,7 +101,7 @@ pub struct ListingTableConfig { /// Optional [`SchemaAdapterFactory`] for creating

Re: [PR] Add example of custom file schema casting rules [datafusion]

2025-07-17 Thread via GitHub
comphead commented on code in PR #16803: URL: https://github.com/apache/datafusion/pull/16803#discussion_r2213640997 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -101,7 +101,7 @@ pub struct ListingTableConfig { /// Optional [`SchemaAdapterFactory`] for creating

Re: [PR] WIP: Update `object_store` 0.12.3 [datafusion]

2025-07-17 Thread via GitHub
comphead commented on PR #16753: URL: https://github.com/apache/datafusion/pull/16753#issuecomment-3084461568 > Didn't see this PR here, I've also "fixed" the dependabot PR #16807 πŸ˜… > > If someone just wants to approve+merge the dependabot version, that's fine (I personally won't do t

Re: [PR] fix: skip predicates on struct unnest in PushDownFilter [datafusion]

2025-07-17 Thread via GitHub
adriangb commented on PR #16790: URL: https://github.com/apache/datafusion/pull/16790#issuecomment-3084406929 > I think a little more docs / context / comments are needed otherwise this is good to merge. @akoshchiy could you add some comments explaining what's going on for future ref

[PR] SGA-11414 Added support for odbc escape sequencing for time date and … [datafusion-sqlparser-rs]

2025-07-17 Thread via GitHub
etgarperets opened a new pull request, #1953: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1953 …timestamp literals. For this I modified TypedString by adding uses_odbc_syntax flag. -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] cache generation of dictionary keys and null arrays for ScalarValue [datafusion]

2025-07-17 Thread via GitHub
adriangb commented on PR #16789: URL: https://github.com/apache/datafusion/pull/16789#issuecomment-3084383240 > Are we worried about memory overhead with this? One thing I think we could do is set a reasonable limit to the cache size - only write to the cache if `size` is less than 1024 * 1

[PR] Report error when `SessionState::sql_to_expr_with_alias` does not consume all input [datafusion]

2025-07-17 Thread via GitHub
pepijnve opened a new pull request, #16811: URL: https://github.com/apache/datafusion/pull/16811 ## Which issue does this PR close? - Closes #16810. ## Rationale for this change When parsing SQL strings into expressions it's preferable to get parse errors when unprocesse

[I] `SessionState::sql_to_expr` does not report unconsumed input [datafusion]

2025-07-17 Thread via GitHub
pepijnve opened a new issue, #16810: URL: https://github.com/apache/datafusion/issues/16810 ### Describe the bug When the SQL string passed to `SessionState::sql_to_expr` contains trailing tokens this is silently ignored. This can lead to rather unexpected results. It would be better

Re: [PR] cache generation of dictionary keys and null arrays for ScalarValue [datafusion]

2025-07-17 Thread via GitHub
adriangb commented on PR #16789: URL: https://github.com/apache/datafusion/pull/16789#issuecomment-3084267767 Are we worried about memory overhead with this? One thing I think we could do is set a reasonable limit to the cache size - only write to the cache if `size` is less than 1024 * 102

Re: [PR] Add example of custom file schema casting rules [datafusion]

2025-07-17 Thread via GitHub
adriangb commented on code in PR #16803: URL: https://github.com/apache/datafusion/pull/16803#discussion_r2213484985 ## datafusion-examples/examples/custom_file_casts.rs: ## @@ -0,0 +1,204 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [I] [EPIC] Implement expressions as ScalarUDFImpl [datafusion-comet]

2025-07-17 Thread via GitHub
akupchinskiy commented on issue #1819: URL: https://github.com/apache/datafusion-comet/issues/1819#issuecomment-3084259942 One limitation of trying to switch from PhysicalExpr to ScalarUDFImpl I faced - is lack of capability extract the batch size. That is why it won't work for non-determ

Re: [I] Optimize the join operators [datafusion]

2025-07-17 Thread via GitHub
UBarney commented on issue #16710: URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3084255495 > > > Updated: our benchmark is using datafusion internal source to benchmark instead of datafusion-python, i am not sure if it will make a difference. > > > > > > Th

Re: [I] Plan to replace `SchemaAdapter` with `PhysicalExprAdapter` [datafusion]

2025-07-17 Thread via GitHub
adriangb commented on issue #16800: URL: https://github.com/apache/datafusion/issues/16800#issuecomment-3084230441 > Could you clarify the latter two? From your description, they sound like areas where `PhysicalExprAdapter` could bring benefits β€” but I'm not quite sure how `SchemaAdapter` f

Re: [PR] Fix discrepancy in Float64 to timestamp(9) casts for constants [datafusion]

2025-07-17 Thread via GitHub
findepi commented on code in PR #16639: URL: https://github.com/apache/datafusion/pull/16639#discussion_r2213410294 ## datafusion/sqllogictest/test_files/timestamps.slt: ## @@ -394,12 +503,12 @@ SELECT COUNT(*) FROM ts_data_secs where ts > to_timestamp_seconds('2020-09-08 12 q

Re: [PR] Implement equals for stateful functions [datafusion]

2025-07-17 Thread via GitHub
findepi commented on PR #16781: URL: https://github.com/apache/datafusion/pull/16781#issuecomment-3084158547 @alamb @timsaucer @kosiew would you like to take a look at new code pushed here since the time you last reviewed? -- This is an automated message from the Apache Git Service. To r

Re: [I] Optimize the join operators [datafusion]

2025-07-17 Thread via GitHub
zhuqi-lucas commented on issue #16710: URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3084151418 > > Updated: our benchmark is using datafusion internal source to benchmark instead of datafusion-python, i am not sure if it will make a difference. > > The results a

Re: [I] Optimize the join operators [datafusion]

2025-07-17 Thread via GitHub
UBarney commented on issue #16710: URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3084107356 > Updated: our benchmark is using datafusion internal source to benchmark instead of datafusion-python, i am not sure if it will make a difference. The results are similar

Re: [PR] feat: randn expression support [datafusion-comet]

2025-07-17 Thread via GitHub
mbutrovich commented on code in PR #2010: URL: https://github.com/apache/datafusion-comet/pull/2010#discussion_r2213341348 ## native/spark-expr/src/nondetermenistic_funcs/randn.rs: ## @@ -0,0 +1,265 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more co

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3084017814 @2010YOUY01 I've updated based on your comments and commented back on some -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213305286 ## datafusion/physical-plan/src/spill/get_size.rs: ## @@ -0,0 +1,216 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213294534 ## datafusion/physical-plan/src/sorts/streaming_merge.rs: ## @@ -131,14 +168,42 @@ impl<'a> StreamingMergeBuilder<'a> { enable_round_robin_tie_breake

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213289055 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -1067,14 +1074,13 @@ impl GroupedHashAggregateStream { sort_batch(&batch, &expr, No

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213223988 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,342 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213137522 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,342 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213137522 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,342 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213137522 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,342 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] chore(deps): Update sqlparser to 0.56 [datafusion]

2025-07-17 Thread via GitHub
crepererum commented on code in PR #16456: URL: https://github.com/apache/datafusion/pull/16456#discussion_r2213133593 ## Cargo.toml: ## @@ -167,7 +167,10 @@ recursive = "0.1.1" regex = "1.8" rstest = "0.25.0" serde_json = "1" -sqlparser = { version = "0.55.0", default-featur

[I] Release 0.56.1 (backport/fix release) [datafusion-sqlparser-rs]

2025-07-17 Thread via GitHub
crepererum opened a new issue, #1952: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1952 Since DataFusion has fallen a bit behind, having a stable intermediate step which we could use before jumping to 0.57 and beyond would be nice. Sadly, we cannot use 0.56 due to #1898. So

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213099661 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,342 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-17 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2213099661 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,342 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

  1   2   >