Re: [D] DISCUSSION: Anyone around for the Databricks Data & AI Summit in San Francisco June 9–12? [datafusion]

2025-04-14 Thread via GitHub
GitHub user alamb added a comment to the discussion: DISCUSSION: Anyone around for the Databricks Data & AI Summit in San Francisco June 9–12? I arrive on June 8th and any time past 5PM should work on the 9th GitHub link: https://github.com/apache/datafusion/discussions/15657#discussioncommen

Re: [PR] Add Extension Type / Metadata support for Scalar UDFs [datafusion]

2025-04-14 Thread via GitHub
alamb commented on PR #15646: URL: https://github.com/apache/datafusion/pull/15646#issuecomment-2802933973 > > FYI @rluvaton as I think this is very related to > > > > * [deprecating `return_type` in favor of `return_type_from_args`  #15662](https://github.com/apache/datafusion/issues/

Re: [PR] Add `CREATE FUNCTION` support for SQL Server [datafusion-sqlparser-rs]

2025-04-14 Thread via GitHub
aharpervc commented on code in PR #1808: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1808#discussion_r2042834988 ## src/parser/mod.rs: ## @@ -5135,6 +5142,69 @@ impl<'a> Parser<'a> { })) } +/// Parse `CREATE FUNCTION` for [MsSql] +/// +

Re: [PR] Add coerce int96 option for Parquet to support different TimeUnits, test int96_from_spark.parquet from parquet-testing [datafusion]

2025-04-14 Thread via GitHub
mbutrovich commented on code in PR #15537: URL: https://github.com/apache/datafusion/pull/15537#discussion_r2042835936 ## datafusion/common/src/config.rs: ## @@ -459,6 +459,14 @@ config_namespace! { /// BLOB instead. pub binary_as_string: bool, default = false

Re: [PR] Apply pre-selection and computation skipping to short-circuit optimization [datafusion]

2025-04-14 Thread via GitHub
Dandandan commented on code in PR #15694: URL: https://github.com/apache/datafusion/pull/15694#discussion_r2042898541 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -811,58 +822,164 @@ impl BinaryExpr { } } +enum ShortCircuitStrategy<'a> { +None, +R

Re: [I] Issue with partitioned data [datafusion-ballista]

2025-04-14 Thread via GitHub
milenkovicm commented on issue #1239: URL: https://github.com/apache/datafusion-ballista/issues/1239#issuecomment-2802934407 I guess a single file with few rows will do, I can create directory structure by hand to reproduce it -- This is an automated message from the Apache Git Service.

Re: [I] Issue with partitioned data [datafusion-ballista]

2025-04-14 Thread via GitHub
mshauneu commented on issue #1239: URL: https://github.com/apache/datafusion-ballista/issues/1239#issuecomment-2802926137 Data is big to attache. You can generate it locally by: 1. Download: ``` seq -w 01 12 | xargs -I {} wget "https://d37ci6vzurychx.cloudfront.net/trip-data/y

Re: [PR] Add coerce int96 option for Parquet to support different TimeUnits, test int96_from_spark.parquet from parquet-testing [datafusion]

2025-04-14 Thread via GitHub
mbutrovich commented on PR #15537: URL: https://github.com/apache/datafusion/pull/15537#issuecomment-2802943864 > I checked that the data seems to come out ok with datafusion 46. Can you remind me what the different would be with this option (that the timestamp type is different?) >

Re: [PR] Add coerce int96 option for Parquet to support different TimeUnits, test int96_from_spark.parquet from parquet-testing [datafusion]

2025-04-14 Thread via GitHub
mbutrovich commented on code in PR #15537: URL: https://github.com/apache/datafusion/pull/15537#discussion_r2042835936 ## datafusion/common/src/config.rs: ## @@ -459,6 +459,14 @@ config_namespace! { /// BLOB instead. pub binary_as_string: bool, default = false

Re: [PR] Add coerce int96 option for Parquet to support different TimeUnits, test int96_from_spark.parquet from parquet-testing [datafusion]

2025-04-14 Thread via GitHub
alamb commented on code in PR #15537: URL: https://github.com/apache/datafusion/pull/15537#discussion_r2042825468 ## datafusion/datasource-parquet/src/file_format.rs: ## @@ -569,6 +582,46 @@ pub fn apply_file_schema_type_coercions( )) } +/// Coerces the file schema if th

Re: [PR] ci: fix workflow triggering extended tests from pr comments. [datafusion]

2025-04-14 Thread via GitHub
alamb commented on PR #15704: URL: https://github.com/apache/datafusion/pull/15704#issuecomment-2802872793 I tested it out on https://github.com/apache/datafusion/pull/15708#issuecomment-2802871540 and it seems to be working well so far 👌 Thank you @ashdnazg -- This is an automa

Re: [I] Allow parsing byte literals as FixedSizeBinary [datafusion]

2025-04-14 Thread via GitHub
alamb commented on issue #15686: URL: https://github.com/apache/datafusion/issues/15686#issuecomment-2802965975 Here is a sql reproducer of what happens today ```sql > create table t as values (arrow_cast(x'deadbeef', 'FixedSizeBinary(4)')); 0 row(s) fetched. Elapsed 0.006 sec

Re: [I] Weekly Plan (Andrew Lamb) April 7, 2025 [datafusion]

2025-04-14 Thread via GitHub
alamb closed issue #15616: Weekly Plan (Andrew Lamb) April 7, 2025 URL: https://github.com/apache/datafusion/issues/15616 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [I] Weekly Plan (Andrew Lamb) April 7, 2025 [datafusion]

2025-04-14 Thread via GitHub
alamb commented on issue #15616: URL: https://github.com/apache/datafusion/issues/15616#issuecomment-2802967742 I am going to be focued this week on working down the DataFusion review queue and pushing along things in arrow and I will be out next week, so I am not going to file another week

Re: [PR] chore: Prepare for datafusion 47.0.0 + arrow-rs 55.0.0 [datafusion-comet]

2025-04-14 Thread via GitHub
andygrove closed pull request #1642: chore: Prepare for datafusion 47.0.0 + arrow-rs 55.0.0 URL: https://github.com/apache/datafusion-comet/pull/1642 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] tests: only refresh the minimum sysinfo in mem limit tests. [datafusion]

2025-04-14 Thread via GitHub
alamb merged PR #15702: URL: https://github.com/apache/datafusion/pull/15702 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Perf: Support automatically concat_batches for sort which will improve performance [datafusion]

2025-04-14 Thread via GitHub
Dandandan commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2803501353 Hm that doesn't make much sense as > > Thanks for sharing the results @zhuqi-lucas this is really interesting! > > I think it mainly shows that we probably should try and

[PR] feat: transfer Apache Spark runtime conf to native engine [datafusion-comet]

2025-04-14 Thread via GitHub
comphead opened a new pull request, #1649: URL: https://github.com/apache/datafusion-comet/pull/1649 ## Which issue does this PR close? Related #1360. ## Rationale for this change Very often the native engine behavior depends on external Spark job params (HDFS co

Re: [I] GlobalLimitExec execution offset pagination query results in internal error [datafusion]

2025-04-14 Thread via GitHub
lalaorya closed issue #15665: GlobalLimitExec execution offset pagination query results in internal error URL: https://github.com/apache/datafusion/issues/15665 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[PR] chore: clean up `planner.rs` [datafusion-comet]

2025-04-14 Thread via GitHub
comphead opened a new pull request, #1650: URL: https://github.com/apache/datafusion-comet/pull/1650 ## Which issue does this PR close? Closes #. ## Rationale for this change When was working on #1360 found that `execution_props` not used and basically can be used fr

Re: [PR] feat: Emit warning with Diagnostic when doing = Null [datafusion]

2025-04-14 Thread via GitHub
changsun20 commented on PR #15696: URL: https://github.com/apache/datafusion/pull/15696#issuecomment-2802985874 @comphead I understand your concern. If displaying warnings to end users is what you'd like to see in this PR, could you confirm if @eliaperantoni's proposed solution in #14434 of

[PR] Enable setting default values for target_partitions and planning_concurrency [datafusion]

2025-04-14 Thread via GitHub
nuno-faria opened a new pull request, #15712: URL: https://github.com/apache/datafusion/pull/15712 ## Which issue does this PR close? - None ## Rationale for this change The [documentation](https://datafusion.apache.org/user-guide/configs.html) for the `d

Re: [PR] Perf: Support automatically concat_batches for sort which will improve performance [datafusion]

2025-04-14 Thread via GitHub
zhuqi-lucas commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2803881895 It seems when we merge the sorted batch, we already using the interleave to merge the sorted indices, here is the code: ```rust /// Drains the in_progress row indexe

Re: [PR] Perf: Support automatically concat_batches for sort which will improve performance [datafusion]

2025-04-14 Thread via GitHub
Dandandan commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2803478057 > > Thanks for sharing the results @zhuqi-lucas this is really interesting! > > I think it mainly shows that we probably should try and use more efficient in memory sorting (e.g.

Re: [PR] chore: clean up `planner.rs` [datafusion-comet]

2025-04-14 Thread via GitHub
codecov-commenter commented on PR #1650: URL: https://github.com/apache/datafusion-comet/pull/1650#issuecomment-2803580942 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1650?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [I] GlobalLimitExec execution offset pagination query results in internal error [datafusion]

2025-04-14 Thread via GitHub
akurmustafa commented on issue #15665: URL: https://github.com/apache/datafusion/issues/15665#issuecomment-2803679589 Thanks @lalaorya . Maybe it is worth to - check whether .`output_partitioning` flag is implemented for your source operator. - check whether `EnforceDistribution` rule

Re: [PR] doc/document options clause [datafusion]

2025-04-14 Thread via GitHub
alamb commented on PR #15708: URL: https://github.com/apache/datafusion/pull/15708#issuecomment-2802869717 Run extended tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] CLI query result header for cast expressions with literals is confusing [datafusion]

2025-04-14 Thread via GitHub
alamb commented on issue #5221: URL: https://github.com/apache/datafusion/issues/5221#issuecomment-2802853558 > I believe the correct behaviour in DataFusion should be something like: That seems reasonable to me ``` +--+ | CAST('1' AS INT) | +---

Re: [PR] doc/document options clause [datafusion]

2025-04-14 Thread via GitHub
alamb commented on PR #15708: URL: https://github.com/apache/datafusion/pull/15708#issuecomment-2802871540 > Run extended tests - FYI, I am using this PR as a way to test https://github.com/apache/datafusion/pull/15704 from @ashdnazg -- This is an automated message from the Apa

Re: [PR] Add `statistics_by_partition API` to ExecutionPlan [datafusion]

2025-04-14 Thread via GitHub
alamb commented on PR #15503: URL: https://github.com/apache/datafusion/pull/15503#issuecomment-2802923180 From my perspective, we should plan to merge this PR after we release DataFusion 47. I would be fine with modifying the existing `statistics` method as suggested by @berkaysynnada in

Re: [PR] Add coerce int96 option for Parquet to support different TimeUnits, test int96_from_spark.parquet from parquet-testing [datafusion]

2025-04-14 Thread via GitHub
parthchandra commented on code in PR #15537: URL: https://github.com/apache/datafusion/pull/15537#discussion_r2042968195 ## datafusion/common/src/config.rs: ## @@ -459,6 +459,14 @@ config_namespace! { /// BLOB instead. pub binary_as_string: bool, default = fals

Re: [PR] Add coerce int96 option for Parquet to support different TimeUnits, test int96_from_spark.parquet from parquet-testing [datafusion]

2025-04-14 Thread via GitHub
parthchandra commented on code in PR #15537: URL: https://github.com/apache/datafusion/pull/15537#discussion_r2042969802 ## datafusion/datasource-parquet/src/source.rs: ## @@ -438,6 +438,22 @@ impl ParquetSource { } } +/// Parses datafusion.common.config.ParquetOptions.c

Re: [PR] [WIP] docs: Add instructions on running TPC-H on macOS [datafusion-comet]

2025-04-14 Thread via GitHub
parthchandra commented on code in PR #1647: URL: https://github.com/apache/datafusion-comet/pull/1647#discussion_r2043091790 ## docs/source/contributor-guide/benchmarking_macos.md: ## @@ -0,0 +1,136 @@ + + +# Comet Benchmarking on macOS + +This guide is for setting up TPC-H benc

Re: [PR] Attach Diagnostic to syntax errors [datafusion]

2025-04-14 Thread via GitHub
logan-keede commented on PR #15680: URL: https://github.com/apache/datafusion/pull/15680#issuecomment-2803058957 > Thank you @logan-keede ! > > Is it possible to add a test for this feature, perhaps in > > https://github.com/apache/datafusion/blob/63f37a34404391f19114407c2a3965

Re: [PR] fix: fix spark/sql test failures in native_iceberg_compat [datafusion-comet]

2025-04-14 Thread via GitHub
parthchandra commented on PR #1593: URL: https://github.com/apache/datafusion-comet/pull/1593#issuecomment-2803362330 @andygrove I had introduced a regression which I fixed since your approval (if you want to look again) Basically, the results of the call to `findRowIndexColumnIndexInSch

Re: [PR] chore: Prepare for datafusion 47.0.0 and arrow-rs 55.0.0 [datafusion-comet]

2025-04-14 Thread via GitHub
comphead commented on PR #1563: URL: https://github.com/apache/datafusion-comet/pull/1563#issuecomment-2803327686 > @comphead Could you review the HDFS changes? There was a change in objectstore to use `Range` instead of `Range` Thanks @andygrove I think these changes LGTM -- This

[I] Postgres `CREATE SERVER` can't be parsed [datafusion-sqlparser-rs]

2025-04-14 Thread via GitHub
achristmascarl opened a new issue, #1814: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1814 Postgres's `CREATE SERVER` fails to be parsed with the following error: ```rust ParserError("Expected: an object type after CREATE, found: SERVER at Line: 1, Column: 8") ```

[I] API to match against any error in chain [datafusion]

2025-04-14 Thread via GitHub
DerGut opened a new issue, #15713: URL: https://github.com/apache/datafusion/issues/15713 ### Is your feature request related to a problem or challenge? Wrapping a `DataFusionError` in a `DataFusionError::Context` can break behavior for users if they `match` on specific error variants

[PR] doc : update RepartitionExec display tree [datafusion]

2025-04-14 Thread via GitHub
getChan opened a new pull request, #15710: URL: https://github.com/apache/datafusion/pull/15710 ## Which issue does this PR close? ## Rationale for this change #15606 changed repartitionExec display tree ## What changes are included in this PR?

Re: [PR] refactor!: consistent null handling in coercible signatures [datafusion]

2025-04-14 Thread via GitHub
alan910127 commented on PR #15404: URL: https://github.com/apache/datafusion/pull/15404#issuecomment-2802335458 Hi @alamb, I’ve updated `upgrading.md`. Since this is my first PR with an API change, I’m not 100% sure I did everything right. Would appreciate it if you could take a look and le

Re: [PR] Optimize TopK with threshold filter ~1.4x speedup [datafusion]

2025-04-14 Thread via GitHub
geoffreyclaude commented on code in PR #15697: URL: https://github.com/apache/datafusion/pull/15697#discussion_r2042765938 ## benchmarks/bench.sh: ## @@ -212,6 +212,10 @@ main() { # same data as for tpch data_tpch "1"

Re: [I] Release DataFusion `47.0.0` (April 2025) [datafusion]

2025-04-14 Thread via GitHub
andygrove commented on issue #15072: URL: https://github.com/apache/datafusion/issues/15072#issuecomment-2802283042 @alamb I am hoping that we can merge https://github.com/apache/datafusion/pull/15537 for this release. It was just rebased now that the arrow-rs upgrade is merged. -- This

Re: [PR] Add coerce int96 option for Parquet to support different TimeUnits, test int96_from_spark.parquet from parquet-testing [datafusion]

2025-04-14 Thread via GitHub
mbutrovich commented on code in PR #15537: URL: https://github.com/apache/datafusion/pull/15537#discussion_r2042769032 ## datafusion/datasource-parquet/src/source.rs: ## @@ -438,6 +438,22 @@ impl ParquetSource { } } +/// Parses datafusion.common.config.ParquetOptions.coe

Re: [PR] tests: only refresh the minimum sysinfo in mem limit tests. [datafusion]

2025-04-14 Thread via GitHub
jayzhan211 commented on PR #15702: URL: https://github.com/apache/datafusion/pull/15702#issuecomment-2800721115 Run extended tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Apply pre-selection and computation skipping to short-circuit optimization [datafusion]

2025-04-14 Thread via GitHub
acking-you commented on code in PR #15694: URL: https://github.com/apache/datafusion/pull/15694#discussion_r2041635786 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -811,58 +822,164 @@ impl BinaryExpr { } } +enum ShortCircuitStrategy<'a> { +None, +

[PR] doc/document options clause [datafusion]

2025-04-14 Thread via GitHub
marvelshan opened a new pull request, #15708: URL: https://github.com/apache/datafusion/pull/15708 ## Which issue does this PR close? - Closes #.#10451 ## Rationale for this change This PR adds documentation for the `OPTIONS` clause, including generic opt

Re: [PR] Per file filter evaluation [datafusion]

2025-04-14 Thread via GitHub
adriangb commented on PR #15057: URL: https://github.com/apache/datafusion/pull/15057#issuecomment-2801415154 > This is likely only applied to parquet filter so we can rewrite the filter when we know the filter + file_schema + table_schema (probably `build_row_filter`). We don't need optimi

Re: [PR] feat: support min/max for struct [datafusion]

2025-04-14 Thread via GitHub
chenkovsky commented on code in PR #15667: URL: https://github.com/apache/datafusion/pull/15667#discussion_r2042084363 ## datafusion/functions-aggregate/src/min_max.rs: ## @@ -610,10 +611,57 @@ fn min_batch(values: &ArrayRef) -> Result { min_binary_view

Re: [PR] Optimize TopK with threshold filter ~1.4x speedup [datafusion]

2025-04-14 Thread via GitHub
suibianwanwank commented on code in PR #15697: URL: https://github.com/apache/datafusion/pull/15697#discussion_r2042485745 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -202,27 +204,121 @@ impl TopK { }) .collect::>>()?; +// Selected indi

Re: [PR] Consolidate statistics merging code (try 2) [datafusion]

2025-04-14 Thread via GitHub
alamb commented on PR #15661: URL: https://github.com/apache/datafusion/pull/15661#issuecomment-2802249840 I hope you feel better! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] feat: Override MapBuilder values field with expected schema [datafusion-comet]

2025-04-14 Thread via GitHub
comphead commented on PR #1643: URL: https://github.com/apache/datafusion-comet/pull/1643#issuecomment-2802258771 Thanks @andygrove for the review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] feat: Override MapBuilder values field with expected schema [datafusion-comet]

2025-04-14 Thread via GitHub
comphead merged PR #1643: URL: https://github.com/apache/datafusion-comet/pull/1643 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [I] Read STRUCT of MAP fields with datafusion reader fails with schema issue [datafusion-comet]

2025-04-14 Thread via GitHub
comphead closed issue #1633: Read STRUCT of MAP fields with datafusion reader fails with schema issue URL: https://github.com/apache/datafusion-comet/issues/1633 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-14 Thread via GitHub
aharpervc commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2042317653 ## tests/sqlparser_mssql.rs: ## @@ -2036,3 +2036,78 @@ fn parse_mssql_merge_with_output() { OUTPUT $action, deleted.ProductID INTO dsi.temp_

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-14 Thread via GitHub
aharpervc commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2042331806 ## src/ast/mod.rs: ## @@ -4050,6 +4050,14 @@ pub enum Statement { arguments: Vec, options: Vec, }, +/// Go (SQL Server) +

Re: [PR] Apply pre-selection and computation skipping to short-circuit optimization [datafusion]

2025-04-14 Thread via GitHub
acking-you commented on PR #15694: URL: https://github.com/apache/datafusion/pull/15694#issuecomment-2801131412 > Very cool. It would be nice to run some e2e benchmarks (TPC-H, clickbench) with this to see the impact here. I tried running clickbench, and there wasn't a significant imp

Re: [PR] Optimize TopK with threshold filter ~1.4x speedup [datafusion]

2025-04-14 Thread via GitHub
Dandandan commented on code in PR #15697: URL: https://github.com/apache/datafusion/pull/15697#discussion_r2042788789 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -202,27 +204,121 @@ impl TopK { }) .collect::>>()?; +// Selected indices i

[PR] Refactor regexp slt tests [datafusion]

2025-04-14 Thread via GitHub
kumarlokesh opened a new pull request, #15709: URL: https://github.com/apache/datafusion/pull/15709 ## Which issue does this PR close? - Closes #14452. ## Rationale for this change ## What changes are included in this PR? ## Are these change

Re: [PR] ExecutionPlan: add APIs for filter pushdown & optimizer rule to apply them [datafusion]

2025-04-14 Thread via GitHub
berkaysynnada commented on PR #15566: URL: https://github.com/apache/datafusion/pull/15566#issuecomment-2801793059 I'm working on the failures now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[I] Re-enable tests for FIRS/LAST [datafusion-comet]

2025-04-14 Thread via GitHub
andygrove opened a new issue, #1646: URL: https://github.com/apache/datafusion-comet/issues/1646 ### What is the problem the feature request solves? During the upgrade to DataFusion 47.0.0 () it was necessary to disable some tests that use FIRST and LAST because the behavior of these

Re: [PR] Add `CREATE FUNCTION` support for SQL Server [datafusion-sqlparser-rs]

2025-04-14 Thread via GitHub
aharpervc commented on code in PR #1808: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1808#discussion_r204227 ## src/ast/ddl.rs: ## @@ -2157,6 +2157,10 @@ impl fmt::Display for ClusteredBy { #[cfg_attr(feature = "serde", derive(Serialize, Deserialize))] #[c

Re: [PR] Add `CREATE FUNCTION` support for SQL Server [datafusion-sqlparser-rs]

2025-04-14 Thread via GitHub
aharpervc commented on code in PR #1808: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1808#discussion_r2042196775 ## src/parser/mod.rs: ## @@ -5135,6 +5146,63 @@ impl<'a> Parser<'a> { })) } +/// Parse `CREATE FUNCTION` for [SQL Server] +//

Re: [PR] Add `CREATE FUNCTION` support for SQL Server [datafusion-sqlparser-rs]

2025-04-14 Thread via GitHub
aharpervc commented on code in PR #1808: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1808#discussion_r2042207498 ## src/ast/mod.rs: ## @@ -4050,6 +4051,16 @@ pub enum Statement { arguments: Vec, options: Vec, }, +/// Return (SQL Server

Re: [PR] Add `CREATE FUNCTION` support for SQL Server [datafusion-sqlparser-rs]

2025-04-14 Thread via GitHub
aharpervc commented on code in PR #1808: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1808#discussion_r2042206786 ## src/parser/mod.rs: ## @@ -5135,6 +5146,63 @@ impl<'a> Parser<'a> { })) } +/// Parse `CREATE FUNCTION` for [SQL Server] +//

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-14 Thread via GitHub
aharpervc commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2042385712 ## src/parser/mod.rs: ## @@ -15017,6 +15026,48 @@ impl<'a> Parser<'a> { } } +fn parse_go(&mut self) -> Result { +// previ

Re: [PR] feat: add `register_metadata` function for `GroupsAccumulator` [datafusion]

2025-04-14 Thread via GitHub
rluvaton commented on code in PR #15022: URL: https://github.com/apache/datafusion/pull/15022#discussion_r2042442753 ## datafusion/expr-common/src/groups_accumulator.rs: ## @@ -251,3 +261,18 @@ pub trait GroupsAccumulator: Send { /// compute, not `O(num_groups)` fn siz

Re: [PR] feat: add `register_metadata` function for `GroupsAccumulator` [datafusion]

2025-04-14 Thread via GitHub
rluvaton commented on code in PR #15022: URL: https://github.com/apache/datafusion/pull/15022#discussion_r2042444937 ## datafusion/expr-common/src/ordering.rs: ## Review Comment: Moved from `datafusion/physical-plan/src/ordering.rs` to be able to access it in `datafusion/e

Re: [PR] Add support for `PRINT` statement for SQL Server [datafusion-sqlparser-rs]

2025-04-14 Thread via GitHub
aharpervc commented on code in PR #1811: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1811#discussion_r2042340044 ## src/ast/mod.rs: ## @@ -4050,6 +4050,14 @@ pub enum Statement { arguments: Vec, options: Vec, }, +/// ```sql +/// PR

Re: [PR] Apply pre-selection and computation skipping to short-circuit optimization [datafusion]

2025-04-14 Thread via GitHub
Dandandan commented on PR #15694: URL: https://github.com/apache/datafusion/pull/15694#issuecomment-2800856291 | However, one point needs to be confirmed: [filter_record_batch](https://docs.rs/arrow-select/54.2.1/src/arrow_select/filter.rs.html#202-205) will retain rows that are null.

Re: [PR] Add `CREATE TRIGGER` support for SQL Server [datafusion-sqlparser-rs]

2025-04-14 Thread via GitHub
aharpervc commented on PR #1810: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1810#issuecomment-2802226543 FYI for reviewers I rebased this on the create function branch, due to the return statement logic here: https://github.com/apache/datafusion-sqlparser-rs/pull/1808#disc

Re: [PR] Add coerce int96 option for Parquet to support different TimeUnits, test int96_from_spark.parquet from parquet-testing [datafusion]

2025-04-14 Thread via GitHub
mbutrovich commented on PR #15537: URL: https://github.com/apache/datafusion/pull/15537#issuecomment-2802225475 I believe all dependencies are updated, marking this as ready for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Add coerce int96 option for Parquet to support different TimeUnits, test int96_from_spark.parquet from parquet-testing [datafusion]

2025-04-14 Thread via GitHub
parthchandra commented on code in PR #15537: URL: https://github.com/apache/datafusion/pull/15537#discussion_r2042746875 ## datafusion/sqllogictest/test_files/information_schema.slt: ## @@ -296,6 +297,7 @@ datafusion.execution.parquet.bloom_filter_fpp NULL (writing) Sets bloom

[PR] [WIP] docs: Add instructions on running TPC-H on macOS [datafusion-comet]

2025-04-14 Thread via GitHub
andygrove opened a new pull request, #1647: URL: https://github.com/apache/datafusion-comet/pull/1647 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] Add all missing table options to be handled in any order [datafusion-sqlparser-rs]

2025-04-14 Thread via GitHub
mvzink commented on PR #1747: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1747#issuecomment-2802376530 > could you please specify what you see as a blocker for this PR? No blockers IMO; like I said about the "putting subparsers inside `Dialect`" topic, that's just my

Re: [I] CLI query result header for cast expressions with literals is confusing [datafusion]

2025-04-14 Thread via GitHub
qstommyshu commented on issue #5221: URL: https://github.com/apache/datafusion/issues/5221#issuecomment-2801969793 Looks like this issue still persist in DataFusion CLI v46.0.1, and this behaviour is not the same it is in postgresql. In datafusion: ``` select cast('1' as int);

Re: [PR] feat: Override MapBuilder values field with expected schema [datafusion-comet]

2025-04-14 Thread via GitHub
andygrove commented on code in PR #1643: URL: https://github.com/apache/datafusion-comet/pull/1643#discussion_r2042233446 ## native/core/src/execution/shuffle/row.rs: ## @@ -1876,937 +1878,834 @@ fn make_builders( (DataType::Boolean, DataType::Boolean) => {

Re: [PR] Fix internal error in sort when hitting memory limit [datafusion]

2025-04-14 Thread via GitHub
DerGut commented on code in PR #15692: URL: https://github.com/apache/datafusion/pull/15692#discussion_r2041911140 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -1552,6 +1593,62 @@ mod tests { Ok(()) } +#[tokio::test] +async fn test_batch_reservati

Re: [PR] Remove waits from blocking threads reading spill files. [datafusion]

2025-04-14 Thread via GitHub
ashdnazg commented on PR #15654: URL: https://github.com/apache/datafusion/pull/15654#issuecomment-2800693022 Seems to be contention with `refresh_all` in the memory monitoring task. PR here: https://github.com/apache/datafusion/pull/15702 -- This is an automated message from the Ap

Re: [PR] Add `CREATE TRIGGER` support for SQL Server [datafusion-sqlparser-rs]

2025-04-14 Thread via GitHub
aharpervc commented on PR #1810: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1810#issuecomment-2802223801 This fails to parse on this branch, but should: ```mssql CREATE TRIGGER some_trigger ON some_table FOR INSERT AS BEGIN IF 1=1 BEGIN

Re: [PR] Upgrade to arrow/parquet 55, and `object_store` to `0.12.0` and pyo3 to `0.24.0` [datafusion]

2025-04-14 Thread via GitHub
alamb commented on PR #15466: URL: https://github.com/apache/datafusion/pull/15466#issuecomment-2801288477 Thanks everyone! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] chore: Prepare for datafusion 47.0.0 and arrow-rs 55.0.0 [datafusion-comet]

2025-04-14 Thread via GitHub
andygrove commented on PR #1563: URL: https://github.com/apache/datafusion-comet/pull/1563#issuecomment-2801995151 @comphead Could you review the HDFS changes? There was a change in objectstore to use `Range` instead of `Range` -- This is an automated message from the Apache Git Service.

Re: [PR] feat: add `register_metadata` function for `GroupsAccumulator` [datafusion]

2025-04-14 Thread via GitHub
rluvaton commented on PR #15022: URL: https://github.com/apache/datafusion/pull/15022#issuecomment-2802187978 > IMO the downside of this approach is that it allows to misuse the interface - it shouldn't be possible to call `register_metadata` after calling `update_batch`. > > I don't

Re: [PR] feat: add `register_metadata` function for `GroupsAccumulator` [datafusion]

2025-04-14 Thread via GitHub
rluvaton commented on PR #15022: URL: https://github.com/apache/datafusion/pull/15022#issuecomment-2801979793 I found out that in `row_hash.rs` the ordering can be changed in the middle after spilling and it can be really beneficial, so unless we recreate the group accumulators after changi

Re: [PR] Add support for `PRINT` statement for SQL Server [datafusion-sqlparser-rs]

2025-04-14 Thread via GitHub
aharpervc commented on code in PR #1811: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1811#discussion_r2042341589 ## tests/sqlparser_mssql.rs: ## @@ -2036,3 +2036,37 @@ fn parse_mssql_merge_with_output() { OUTPUT $action, deleted.ProductID INTO dsi.temp_

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-14 Thread via GitHub
aharpervc commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2042359100 ## src/parser/mod.rs: ## @@ -475,6 +475,12 @@ impl<'a> Parser<'a> { if expecting_statement_delimiter && word.keyword == Keyword

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-14 Thread via GitHub
aharpervc commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2042317653 ## tests/sqlparser_mssql.rs: ## @@ -2036,3 +2036,78 @@ fn parse_mssql_merge_with_output() { OUTPUT $action, deleted.ProductID INTO dsi.temp_

Re: [PR] Add `CREATE FUNCTION` support for SQL Server [datafusion-sqlparser-rs]

2025-04-14 Thread via GitHub
aharpervc commented on code in PR #1808: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1808#discussion_r2042434307 ## src/ast/mod.rs: ## @@ -4050,6 +4051,16 @@ pub enum Statement { arguments: Vec, options: Vec, }, +/// Return (SQL Server

Re: [PR] Optimize TopK with threshold filter ~1.4x speedup [datafusion]

2025-04-14 Thread via GitHub
Dandandan commented on code in PR #15697: URL: https://github.com/apache/datafusion/pull/15697#discussion_r2042643102 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -202,27 +204,121 @@ impl TopK { }) .collect::>>()?; +// Selected indices i

Re: [PR] tests: only refresh the minimum sysinfo in mem limit tests. [datafusion]

2025-04-14 Thread via GitHub
ashdnazg commented on PR #15702: URL: https://github.com/apache/datafusion/pull/15702#issuecomment-2800693884 Can we run the extended tests on this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] Fix internal error in sort when hitting memory limit [datafusion]

2025-04-14 Thread via GitHub
DerGut commented on code in PR #15692: URL: https://github.com/apache/datafusion/pull/15692#discussion_r2041254566 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -759,12 +761,51 @@ impl ExternalSorter { if self.runtime.disk_manager.tmp_files_enabled() {

Re: [PR] Upgrade to arrow/parquet 55, and `object_store` to `0.12.0` and pyo3 to `0.24.0` [datafusion]

2025-04-14 Thread via GitHub
alamb merged PR #15466: URL: https://github.com/apache/datafusion/pull/15466 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Release DataFusion `47.0.0` (April 2025) [datafusion]

2025-04-14 Thread via GitHub
alamb commented on issue #15072: URL: https://github.com/apache/datafusion/issues/15072#issuecomment-2801294273 Ok, I just merged https://github.com/apache/datafusion/pull/15466 / upgrade to dependencies (arrow/object_store/parquet) 47.0.0 I don't know of anything else we are now wait

Re: [PR] tests: only refresh the minimum sysinfo in mem limit tests. [datafusion]

2025-04-14 Thread via GitHub
2010YOUY01 commented on PR #15702: URL: https://github.com/apache/datafusion/pull/15702#issuecomment-2801137808 Thank you for the fix. To test this PR, I think you can push the change to your cloned repository of `datafusion`'s main branch, and the extended test CI will be triggered there.

Re: [PR] Apply pre-selection and computation skipping to short-circuit optimization [datafusion]

2025-04-14 Thread via GitHub
acking-you commented on code in PR #15694: URL: https://github.com/apache/datafusion/pull/15694#discussion_r2041670623 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -811,58 +822,164 @@ impl BinaryExpr { } } +enum ShortCircuitStrategy<'a> { +None, +

[PR] Make tree the Default EXPLAIN Format and Reorder Documentation Sections [datafusion]

2025-04-14 Thread via GitHub
kosiew opened a new pull request, #15706: URL: https://github.com/apache/datafusion/pull/15706 ## Which issue does this PR close? Closes #15705 ## Rationale for this change #15427 changed the default EXPLAIN format to `tree`. This PR updates the EXPLAIN docume

Re: [PR] Optimize TopK with threshold filter ~1.4x speedup [datafusion]

2025-04-14 Thread via GitHub
Dandandan commented on code in PR #15697: URL: https://github.com/apache/datafusion/pull/15697#discussion_r2041528193 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -202,27 +204,99 @@ impl TopK { }) .collect::>>()?; +// selected indices +

Re: [PR] ExecutionPlan: add APIs for filter pushdown & optimizer rule to apply them [datafusion]

2025-04-14 Thread via GitHub
adriangb commented on PR #15566: URL: https://github.com/apache/datafusion/pull/15566#issuecomment-2801746776 > I've made an attempt [pydantic@2dfa8b8](https://github.com/pydantic/datafusion/commit/2dfa8b803f2103c6ff81cfa483dbb70150feeb67) > > I hope things becomes more clear now. I j

Re: [PR] chore: Prepare for datafusion 47.0.0 and arrow-rs 55.0.0 [datafusion-comet]

2025-04-14 Thread via GitHub
andygrove commented on code in PR #1563: URL: https://github.com/apache/datafusion-comet/pull/1563#discussion_r2042171775 ## native/core/src/execution/expressions/bloom_filter_might_contain.rs: ## @@ -140,4 +141,8 @@ impl PhysicalExpr for BloomFilterMightContain { A

Re: [PR] chore: Prepare for datafusion 47.0.0 and arrow-rs 55.0.0 [datafusion-comet]

2025-04-14 Thread via GitHub
andygrove commented on code in PR #1563: URL: https://github.com/apache/datafusion-comet/pull/1563#discussion_r2042173423 ## native/core/src/parquet/parquet_exec.rs: ## @@ -80,23 +82,33 @@ pub(crate) fn init_datasource_exec( parquet_source = parquet_source.with_pre

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-04-14 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2041544882 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -431,12 +422,16 @@ impl ExternalSorter { let batches_to_spill = std::mem::take(globally_sorted_batch

  1   2   >