[PR] feat: expose submit and cancel job methods as public in scheduler [datafusion-ballista]

2025-05-09 Thread via GitHub
milenkovicm opened a new pull request, #1260: URL: https://github.com/apache/datafusion-ballista/pull/1260 # Which issue does this PR close? Closes #. # Rationale for this change - provide ability to submit and submit and cancel job without going through grpc

Re: [I] [feature] allow pretty-printing [datafusion-sqlparser-rs]

2025-05-09 Thread via GitHub
alamb commented on issue #1845: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1845#issuecomment-2867279673 If the idea is that `{:#}` printing of AST nodes would result in printing a "pretty" version of the sql statements (potentially with newlines, etc) sounds like a great

Re: [PR] fix: Support Schema Evolution in iceberg [datafusion-comet]

2025-05-09 Thread via GitHub
parthchandra commented on PR #1723: URL: https://github.com/apache/datafusion-comet/pull/1723#issuecomment-2867552108 The config `CometConf.COMET_SCHEMA_EVOLUTION_ENABLED` is valid for Parquet files as well so removing it is not correct imo. Also, `ScanRule` is called at planning time wh

[PR] Migrate Optimizer tests to insta, part7 [datafusion]

2025-05-09 Thread via GitHub
qstommyshu opened a new pull request, #16010: URL: https://github.com/apache/datafusion/pull/16010 ## Which issue does this PR close? - Related #15396 , #15446, #15884, #15893, #15937, #15945, #15984 ## Rationale for this change ## What changes are include

[I] Support Aggregating by `RunArray`s [datafusion]

2025-05-09 Thread via GitHub
brancz opened a new issue, #16011: URL: https://github.com/apache/datafusion/issues/16011 ### Is your feature request related to a problem or challenge? It's currently not possible to aggregate by `RunArrays`. Example code grouping by a `RunArray` ```rust use arrow

Re: [I] Implement method to apply scalar or aggregate function to Array elements [datafusion]

2025-05-09 Thread via GitHub
alamb commented on issue #15882: URL: https://github.com/apache/datafusion/issues/15882#issuecomment-2867583963 > I'd like to work on implementing this approach. What do you think about organizing array operations this way? If this approach seems reasonable, I'm happy to start working on it

Re: [PR] Additional placeholder datatype inferencing [datafusion]

2025-05-09 Thread via GitHub
alamb commented on code in PR #15980: URL: https://github.com/apache/datafusion/pull/15980#discussion_r2082330048 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -1494,6 +1494,14 @@ impl LogicalPlan { let mut param_types: HashMap> = HashMap::new(); self.a

Re: [PR] fix: Support Schema Evolution in iceberg [datafusion-comet]

2025-05-09 Thread via GitHub
parthchandra commented on PR #1723: URL: https://github.com/apache/datafusion-comet/pull/1723#issuecomment-2867573764 One set of Spark Sql test failures in native_iceberg_compat is exactly because this schema evolution/type promotion check is not being performed correctly (or not being per

Re: [PR] Box field to reduce DatafusionError size [datafusion]

2025-05-09 Thread via GitHub
comphead commented on PR #15990: URL: https://github.com/apache/datafusion/pull/15990#issuecomment-2867598303 api change are pretty expensive from the user migration point of view, I'd vote to keep this as is until we get an issue justifying the change -- This is an automated message from

Re: [PR] Box field to reduce DatafusionError size [datafusion]

2025-05-09 Thread via GitHub
ctsk commented on PR #15990: URL: https://github.com/apache/datafusion/pull/15990#issuecomment-2867591550 No, there's no particular issue motivating this change. I just think it's good practice to not have enum variants differ in size so much. In addition, `SchemaError::FieldNotFound` alrea

Re: [PR] perf: Add performance tracing capability [datafusion-comet]

2025-05-09 Thread via GitHub
andygrove commented on PR #1706: URL: https://github.com/apache/datafusion-comet/pull/1706#issuecomment-2867602988 Thanks for the reviews @comphead and @parthchandra -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Fix: parsing ident starting with underscore in certain dialects [datafusion-sqlparser-rs]

2025-05-09 Thread via GitHub
iffyio merged PR #1835: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1835 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[I] Processing logic of the function `ceil`/`floor` for Decimal128 datatype seems to be wrong [datafusion-comet]

2025-05-09 Thread via GitHub
tlm365 opened a new issue, #1729: URL: https://github.com/apache/datafusion-comet/issues/1729 ### Describe the bug While adding unit tests for `ceil`/`floor` I accidentally discovered this issue, it seems like this is an error ### Steps to reproduce Refer PR #1728 (comme

Re: [PR] Minor: Add unit tests for `ceil`/`floor` functions [datafusion-comet]

2025-05-09 Thread via GitHub
tlm365 commented on code in PR #1728: URL: https://github.com/apache/datafusion-comet/pull/1728#discussion_r2082843668 ## native/spark-expr/src/math_funcs/ceil.rs: ## @@ -81,3 +81,162 @@ fn decimal_ceil_f(scale: &i8) -> impl Fn(i128) -> i128 { let div = 10_i128.pow_wrapping

Re: [PR] Postgresql ALTER TABLE operation: REPLICA IDENTITY [datafusion-sqlparser-rs]

2025-05-09 Thread via GitHub
iffyio commented on code in PR #1844: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1844#discussion_r2082694788 ## src/ast/ddl.rs: ## @@ -39,6 +39,27 @@ use crate::ast::{ use crate::keywords::Keyword; use crate::tokenizer::Token; +#[derive(Debug, Clone, Partia

Re: [PR] Add late pruning of file based on file level statistics [datafusion]

2025-05-09 Thread via GitHub
adriangb commented on PR #16014: URL: https://github.com/apache/datafusion/pull/16014#issuecomment-2868157717 A couple of thoughts: 1. Needs cleanup. 2. Not sure how to construct the empty stream. 3. It might be nice to implement pruning for `Vec` where each statistic represents an

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-05-09 Thread via GitHub
iffyio commented on PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#issuecomment-2868158991 Closing this PR per my comment in https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2072719447 Thanks again @aharpervc for working on this and sorr

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-05-09 Thread via GitHub
iffyio closed pull request #1809: Add support for `GO` batch delimiter in SQL Server URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[PR] Minor: Add unit tests for `ceil`/`floor` functions [datafusion-comet]

2025-05-09 Thread via GitHub
tlm365 opened a new pull request, #1728: URL: https://github.com/apache/datafusion-comet/pull/1728 ## Which issue does this PR close? Closes #. ## Rationale for this change The unit tests for `ceil`/`floor` functions is missing ## What changes are included

Re: [PR] fix: Skip row index Spark SQL tests for native_datafusion Parquet scan [datafusion-comet]

2025-05-09 Thread via GitHub
parthchandra commented on PR #1724: URL: https://github.com/apache/datafusion-comet/pull/1724#issuecomment-2868109380 I thought the plan was to fix this by falling back to Spark? We can check if the schema has a field name matching `FileFormat.ROW_INDEX_TEMPORARY_COLUMN_NAME` and fall b

Re: [PR] fix: allow arbitrary operators with ANY and ALL on Postgres [datafusion-sqlparser-rs]

2025-05-09 Thread via GitHub
iffyio commented on code in PR #1842: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1842#discussion_r2082691233 ## src/parser/mod.rs: ## @@ -3471,7 +3471,7 @@ impl<'a> Parser<'a> { right }; -if !matches!( +

Re: [I] Add support for event tracing for visualizing where time is spent during execution [datafusion-comet]

2025-05-09 Thread via GitHub
andygrove closed issue #1705: Add support for event tracing for visualizing where time is spent during execution URL: https://github.com/apache/datafusion-comet/issues/1705 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Fix: `build_predicate_expression` method doesn't process `false` expr correctly [datafusion]

2025-05-09 Thread via GitHub
xudong963 commented on code in PR #15995: URL: https://github.com/apache/datafusion/pull/15995#discussion_r2081509137 ## datafusion/physical-optimizer/src/pruning.rs: ## @@ -3585,12 +3605,10 @@ mod tests { prune_with_expr( // false -// constan

Re: [PR] Add h2o window benchmark [datafusion]

2025-05-09 Thread via GitHub
2010YOUY01 commented on code in PR #16003: URL: https://github.com/apache/datafusion/pull/16003#discussion_r2081536976 ## benchmarks/queries/h2o/groupby.sql: ## @@ -1,10 +1,19 @@ SELECT id1, SUM(v1) AS v1 FROM x GROUP BY id1; + Review Comment: It's a hack: before, the runne

Re: [I] Project Ideas for GSoC 2025 (Google Summer of Code) [datafusion]

2025-05-09 Thread via GitHub
XiangpengHao commented on issue #14478: URL: https://github.com/apache/datafusion/issues/14478#issuecomment-2866642602 > Since you previously expressed interest in this project, would you be open to co-mentoring? > [@XiangpengHao](https://github.com/XiangpengHao) [@Rachelint](https://git

Re: [PR] implement `AggregateExec.partition_statistics` [datafusion]

2025-05-09 Thread via GitHub
xudong963 commented on code in PR #15954: URL: https://github.com/apache/datafusion/pull/15954#discussion_r2081994480 ## datafusion/physical-plan/src/test/exec.rs: ## @@ -614,6 +614,103 @@ impl ExecutionPlan for StatisticsExec { } } +/// A mock execution plan that return

Re: [PR] implement `AggregateExec.partition_statistics` [datafusion]

2025-05-09 Thread via GitHub
xudong963 commented on code in PR #15954: URL: https://github.com/apache/datafusion/pull/15954#discussion_r2081997257 ## datafusion/physical-plan/src/aggregates/mod.rs: ## @@ -3041,4 +3048,96 @@ mod tests { run_test_with_spill_pool_if_necessary(20_000, false).await?;

Re: [PR] Box field to reduce DatafusionError size [datafusion]

2025-05-09 Thread via GitHub
ctsk closed pull request #15990: Box field to reduce DatafusionError size URL: https://github.com/apache/datafusion/pull/15990 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Box field to reduce DatafusionError size [datafusion]

2025-05-09 Thread via GitHub
ctsk commented on PR #15990: URL: https://github.com/apache/datafusion/pull/15990#issuecomment-2867666348 That seems reasonable. Closing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] feat(proto): udf decoding fallback [datafusion]

2025-05-09 Thread via GitHub
leoyvens commented on PR #15997: URL: https://github.com/apache/datafusion/pull/15997#issuecomment-2866429693 @alamb thanks for your review, I've extended a test to cover this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [I] Project Ideas for GSoC 2025 (Google Summer of Code) [datafusion]

2025-05-09 Thread via GitHub
Rachelint commented on issue #14478: URL: https://github.com/apache/datafusion/issues/14478#issuecomment-2867248618 > Since you previously expressed interest in this project, would you be open to co-mentoring? >[@XiangpengHao](https://github.com/XiangpengHao) [@Rachelint](https://git

Re: [PR] Migrate Optimizer tests to insta, part7 [datafusion]

2025-05-09 Thread via GitHub
qstommyshu commented on PR #16010: URL: https://github.com/apache/datafusion/pull/16010#issuecomment-2867770264 This is the last PR for #15396 . I also generalized `assert_optimized_plan_eq_snapshot` so that all test optimizer files can re-use this function. -- This is an automated mess

Re: [I] Support metadata on literal values [datafusion]

2025-05-09 Thread via GitHub
timsaucer commented on issue #15797: URL: https://github.com/apache/datafusion/issues/15797#issuecomment-2867767697 In addition to the above, we are dropping metadata during optimizer, specifically it looks like it is happening at the `OptimizeProjections` rule. Still investigating. -- T

Re: [I] Select with order by from empty table triggers SanityCheckPlan error [datafusion]

2025-05-09 Thread via GitHub
osipovartem commented on issue #16001: URL: https://github.com/apache/datafusion/issues/16001#issuecomment-2866169342 We cannot pass DF SanityCheck since FileScanConfig implements DataSource trait with ```rust impl DataSource for FileScanConfig { . fn output_partitionin

[I] Add support for SMJ with RightSemi join [datafusion-comet]

2025-05-09 Thread via GitHub
andygrove opened a new issue, #1725: URL: https://github.com/apache/datafusion-comet/issues/1725 ### What is the problem the feature request solves? DataFusion just added support for SMJ with RightSemi join in https://github.com/apache/datafusion/pull/15972, so we should be able to s

[PR] Update arrow/parquet `55.1.0` [datafusion]

2025-05-09 Thread via GitHub
alamb opened a new pull request, #16012: URL: https://github.com/apache/datafusion/pull/16012 ## Which issue does this PR close? N/a ## Rationale for this change We released a new version of arrow which is faster and better, so let's update it in DataFusion ## What ch

Re: [PR] Update arrow/parquet `55.1.0` [datafusion]

2025-05-09 Thread via GitHub
alamb commented on code in PR #16012: URL: https://github.com/apache/datafusion/pull/16012#discussion_r2082424139 ## datafusion/sqllogictest/test_files/struct.slt: ## @@ -53,9 +53,9 @@ select * from struct_values; query TT select arrow_typeof(s1), arrow_typeof(s2) from struct_

Re: [PR] Fix: parsing ident starting with underscore in certain dialects [datafusion-sqlparser-rs]

2025-05-09 Thread via GitHub
MohamedAbdeen21 commented on code in PR #1835: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1835#discussion_r2082425046 ## src/tokenizer.rs: ## @@ -1281,20 +1262,91 @@ impl<'a> Tokenizer<'a> { return Ok(Some(Token::make_word(s.as_

Re: [PR] Fix: parsing ident starting with underscore in certain dialects [datafusion-sqlparser-rs]

2025-05-09 Thread via GitHub
MohamedAbdeen21 commented on code in PR #1835: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1835#discussion_r2082424681 ## src/tokenizer.rs: ## @@ -1281,20 +1262,91 @@ impl<'a> Tokenizer<'a> { return Ok(Some(Token::make_word(s.as_

Re: [PR] Update arrow/parquet `55.1.0` [datafusion]

2025-05-09 Thread via GitHub
alamb commented on code in PR #16012: URL: https://github.com/apache/datafusion/pull/16012#discussion_r2082424139 ## datafusion/sqllogictest/test_files/struct.slt: ## @@ -53,9 +53,9 @@ select * from struct_values; query TT select arrow_typeof(s1), arrow_typeof(s2) from struct_

[I] CometMemoryPool sometimes goes negative [datafusion-comet]

2025-05-09 Thread via GitHub
andygrove opened a new issue, #1726: URL: https://github.com/apache/datafusion-comet/issues/1726 ### Describe the bug Using the new profiling features added in https://github.com/apache/datafusion-comet/pull/1706, I have seen that the `used` memory attribute in `CometMemoryPool` some

[PR] build(deps): bump ring from 0.17.9 to 0.17.14 [datafusion-python]

2025-05-09 Thread via GitHub
dependabot[bot] opened a new pull request, #1124: URL: https://github.com/apache/datafusion-python/pull/1124 Bumps [ring](https://github.com/briansmith/ring) from 0.17.9 to 0.17.14. Changelog Sourced from https://github.com/briansmith/ring/blob/main/RELEASES.md";>ring's changelog.

Re: [PR] build(deps): bump ring from 0.17.9 to 0.17.13 [datafusion-python]

2025-05-09 Thread via GitHub
dependabot[bot] commented on PR #1044: URL: https://github.com/apache/datafusion-python/pull/1044#issuecomment-2867809815 Superseded by #1124. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] build(deps): bump ring from 0.17.9 to 0.17.13 [datafusion-python]

2025-05-09 Thread via GitHub
dependabot[bot] closed pull request #1044: build(deps): bump ring from 0.17.9 to 0.17.13 URL: https://github.com/apache/datafusion-python/pull/1044 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [I] Linear Aggregate Functions Optimization [datafusion]

2025-05-09 Thread via GitHub
berkaysynnada commented on issue #15633: URL: https://github.com/apache/datafusion/issues/15633#issuecomment-2866549061 Hi @Rachelint, it seems you've been focusing on other topics over the past few weeks and haven’t had a chance to pick this up. If you don't mind, I’d like to take this on

[PR] Add h2o window benchmark [datafusion]

2025-05-09 Thread via GitHub
2010YOUY01 opened a new pull request, #16003: URL: https://github.com/apache/datafusion/pull/16003 ## Which issue does this PR close? - Closes #. ## Rationale for this change DataFusion doesn't have an integration benchmark focus on window functions yet. Duck

Re: [I] Pass `PartitionedFile` into `FileSource` for late file stats based pruning [datafusion]

2025-05-09 Thread via GitHub
alamb commented on issue #16000: URL: https://github.com/apache/datafusion/issues/16000#issuecomment-2867822624 I think the idea of passing the partitioned file (with the optional per-file statistics) is an excellent idea 👍 -- This is an automated message from the Apache Git Service. To

Re: [I] Support Aggregating by `RunArray`s [datafusion]

2025-05-09 Thread via GitHub
alamb commented on issue #16011: URL: https://github.com/apache/datafusion/issues/16011#issuecomment-2867819386 I think a specialization for RunEndArrays will likely be quite powerful -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] Fix: `build_predicate_expression` method doesn't process `false` expr correctly [datafusion]

2025-05-09 Thread via GitHub
xudong963 commented on code in PR #15995: URL: https://github.com/apache/datafusion/pull/15995#discussion_r2081517734 ## datafusion/physical-optimizer/src/pruning.rs: ## @@ -1538,13 +1550,21 @@ fn build_predicate_expression( build_predicate_expression(&right, schema

[I] [datafusion-spark] Optimize `char` expression [datafusion]

2025-05-09 Thread via GitHub
andygrove opened a new issue, #16009: URL: https://github.com/apache/datafusion/issues/16009 ### Is your feature request related to a problem or challenge? PR https://github.com/apache/datafusion/pull/15994 added an implementation of `char`. There was a suggested optimization in the r

Re: [PR] Migrate Optimizer tests to insta, part6 [datafusion]

2025-05-09 Thread via GitHub
alamb commented on PR #15984: URL: https://github.com/apache/datafusion/pull/15984#issuecomment-2867208259 🎉 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [I] Project Ideas for GSoC 2025 (Google Summer of Code) [datafusion]

2025-05-09 Thread via GitHub
alamb commented on issue #14478: URL: https://github.com/apache/datafusion/issues/14478#issuecomment-2867211346 > One of our projects on spilling execution has been accepted by Google, and I’ll be the primary mentor. I likewise will be very happy to help -- This is an automated mes

Re: [PR] Add h2o window benchmark [datafusion]

2025-05-09 Thread via GitHub
Dandandan commented on PR #16003: URL: https://github.com/apache/datafusion/pull/16003#issuecomment-2866398017 > Looks like they're around 10 times faster Probably not very comparable, as they execute it differently (different machine / different way of loading the data). But would no

Re: [PR] [datafusion-spark] Add Spark-compatible `char` expression [datafusion]

2025-05-09 Thread via GitHub
andygrove commented on code in PR #15994: URL: https://github.com/apache/datafusion/pull/15994#discussion_r2082124914 ## datafusion/spark/src/function/string/char.rs: ## @@ -0,0 +1,130 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lice

Re: [PR] chore(deps): bump nix from 0.29.0 to 0.30.1 [datafusion]

2025-05-09 Thread via GitHub
comphead merged PR #16002: URL: https://github.com/apache/datafusion/pull/16002 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] refactor: remove deprecated `JsonExec` [datafusion]

2025-05-09 Thread via GitHub
comphead commented on PR #16005: URL: https://github.com/apache/datafusion/pull/16005#issuecomment-2866887640 Please refer to deprecated policy https://datafusion.apache.org/contributor-guide/api-health.html#deprecation-guidelines -- This is an automated message from the Apache Git Servic

Re: [I] Linear Aggregate Functions Optimization [datafusion]

2025-05-09 Thread via GitHub
Rachelint commented on issue #15633: URL: https://github.com/apache/datafusion/issues/15633#issuecomment-2866894705 > Hi [@Rachelint](https://github.com/Rachelint), it seems you've been focusing on other topics over the past few weeks and haven’t had a chance to pick this up. If you don't m

Re: [PR] Box field to reduce DatafusionError size [datafusion]

2025-05-09 Thread via GitHub
comphead commented on PR #15990: URL: https://github.com/apache/datafusion/pull/15990#issuecomment-2866897984 112 bytes is still okay to be on stack? is there any issue causing this change? -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] Implement RightSemi join for SortMergeJoin [datafusion]

2025-05-09 Thread via GitHub
comphead merged PR #15972: URL: https://github.com/apache/datafusion/pull/15972 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

[PR] refactor: remove deprecated `MemoryExec` [datafusion]

2025-05-09 Thread via GitHub
miroim opened a new pull request, #16007: URL: https://github.com/apache/datafusion/pull/16007 ## Which issue does this PR close? Closes of #15950 . ## Rationale for this change The `MemoryExec` structure was deprecated in DataFusion 46 and is scheduled for removal.

Re: [I] SortMergeJoin: Add RightSemi join support [datafusion]

2025-05-09 Thread via GitHub
comphead closed issue #13471: SortMergeJoin: Add RightSemi join support URL: https://github.com/apache/datafusion/issues/13471 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Add support for table valued functions for SQL Server [datafusion-sqlparser-rs]

2025-05-09 Thread via GitHub
aharpervc commented on code in PR #1839: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1839#discussion_r2081895986 ## src/parser/mod.rs: ## @@ -5203,19 +5203,73 @@ impl<'a> Parser<'a> { let (name, args) = self.parse_create_function_name_and_params()?;

Re: [PR] Add support for table valued functions for SQL Server [datafusion-sqlparser-rs]

2025-05-09 Thread via GitHub
aharpervc commented on code in PR #1839: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1839#discussion_r2081901411 ## src/parser/mod.rs: ## @@ -5203,19 +5203,73 @@ impl<'a> Parser<'a> { let (name, args) = self.parse_create_function_name_and_params()?;

Re: [PR] Add support for table valued functions for SQL Server [datafusion-sqlparser-rs]

2025-05-09 Thread via GitHub
aharpervc commented on PR #1839: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1839#issuecomment-2866937145 > took a quick look and left a couple comments, @aharpervc could you rebase on main now that the other PR has landed, in order to remove the extra diff? Done 👍

Re: [PR] Optimize hash partitioning for cache friendliness [datafusion]

2025-05-09 Thread via GitHub
alamb commented on PR #15981: URL: https://github.com/apache/datafusion/pull/15981#issuecomment-2867285850 > I think something like that is done already in the "convert to state" logic - it will dynamically decide to skip aggregating once it sees that the group vs input rows ratio is small.

[PR] refactor: remove deprecated `JsonExec` [datafusion]

2025-05-09 Thread via GitHub
miroim opened a new pull request, #16005: URL: https://github.com/apache/datafusion/pull/16005 ## Which issue does this PR close? Part of #15950 . ## Rationale for this change The `JsonExec` structure was deprecated in DataFusion 46 and is scheduled for removal. Dev

Re: [PR] perf: Add performance tracing capability [datafusion-comet]

2025-05-09 Thread via GitHub
andygrove merged PR #1706: URL: https://github.com/apache/datafusion-comet/pull/1706 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Migrate Optimizer tests to insta, part6 [datafusion]

2025-05-09 Thread via GitHub
qstommyshu commented on code in PR #15984: URL: https://github.com/apache/datafusion/pull/15984#discussion_r2081559659 ## datafusion/optimizer/src/analyzer/type_coercion.rs: ## @@ -1569,37 +1810,60 @@ mod test { let empty = empty_with_type(DataType::Boolean); l

[PR] Pin to pre-release object store [datafusion]

2025-05-09 Thread via GitHub
alamb opened a new pull request, #16013: URL: https://github.com/apache/datafusion/pull/16013 ## Which issue does this PR close? - Related to https://github.com/apache/arrow-rs-object-store/issues/287 ## Rationale for this change Let's test object store pre-release ##

Re: [I] CometMemoryPool sometimes goes negative [datafusion-comet]

2025-05-09 Thread via GitHub
andygrove commented on issue #1726: URL: https://github.com/apache/datafusion-comet/issues/1726#issuecomment-2867901734 This issue only affects the tracing feature, not actual Comet usage. -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] Map file-level column statistics to the table-level [datafusion]

2025-05-09 Thread via GitHub
adriangb commented on code in PR #15865: URL: https://github.com/apache/datafusion/pull/15865#discussion_r2082446025 ## datafusion/datasource/src/schema_adapter.rs: ## @@ -334,4 +340,126 @@ impl SchemaMapper for SchemaMapping { let record_batch = RecordBatch::try_new_wi

[PR] fix: Fix data race in memory profiling [datafusion-comet]

2025-05-09 Thread via GitHub
andygrove opened a new pull request, #1727: URL: https://github.com/apache/datafusion-comet/pull/1727 ## Which issue does this PR close? Closes https://github.com/apache/datafusion-comet/issues/1726 ## Rationale for this change Fix a data race that caused

Re: [I] CometMemoryPool sometimes goes negative [datafusion-comet]

2025-05-09 Thread via GitHub
andygrove commented on issue #1726: URL: https://github.com/apache/datafusion-comet/issues/1726#issuecomment-2867885658 It appears to be a data race. ``` app-20250509150049-0037/0/stdout:CometTaskMemoryManager used went negative: -208311; totalAllocated=3653311522; totalReleased=3

Re: [PR] Update extending-operators.md [datafusion]

2025-05-09 Thread via GitHub
alamb commented on code in PR #15832: URL: https://github.com/apache/datafusion/pull/15832#discussion_r2082547957 ## docs/source/library-user-guide/extending-operators.md: ## @@ -19,4 +19,448 @@ # Extending DataFusion's operators: custom LogicalPlan and Execution Plans -Com

Re: [I] Support metadata on literal values [datafusion]

2025-05-09 Thread via GitHub
timsaucer commented on issue #15797: URL: https://github.com/apache/datafusion/issues/15797#issuecomment-2866320802 I believe I've tracked the issue down to `create_physical_expr` in the planner. This is where we are losing metadata. I think this is a relatively easy fix. -- This is an a

Re: [PR] Migrate Optimizer tests to insta, part6 [datafusion]

2025-05-09 Thread via GitHub
blaginin merged PR #15984: URL: https://github.com/apache/datafusion/pull/15984 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Migrate Optimizer tests to insta, part6 [datafusion]

2025-05-09 Thread via GitHub
blaginin commented on code in PR #15984: URL: https://github.com/apache/datafusion/pull/15984#discussion_r208156 ## datafusion/optimizer/src/analyzer/type_coercion.rs: ## @@ -1569,37 +1810,60 @@ mod test { let empty = empty_with_type(DataType::Boolean); let

Re: [PR] Migrate Optimizer tests to insta, part6 [datafusion]

2025-05-09 Thread via GitHub
qstommyshu commented on code in PR #15984: URL: https://github.com/apache/datafusion/pull/15984#discussion_r2081563280 ## datafusion/optimizer/src/test/mod.rs: ## @@ -99,29 +99,20 @@ pub fn get_tpch_table_schema(table: &str) -> Schema { } } Review Comment: Looks like

Re: [PR] refactor: remove deprecated `ParquetExec` [datafusion]

2025-05-09 Thread via GitHub
miroim commented on code in PR #15973: URL: https://github.com/apache/datafusion/pull/15973#discussion_r2081824041 ## datafusion/datasource-parquet/src/mod.rs: ## @@ -32,511 +30,18 @@ mod row_group_filter; pub mod source; mod writer; -use std::any::Any; -use std::fmt::Format

[I] `GenericDialect` should support multi-table DELETE and DELETE without FROM clause [datafusion-sqlparser-rs]

2025-05-09 Thread via GitHub
takaebato opened a new issue, #1846: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1846 `GenericDialect` is intended to parse any SQL that other dialects can handle, but it currently fails to parse multi-table DELETE statements and DELETE statements without a FROM clause.

Re: [I] Move datasource-parquet `should_enable_page_index` from `mod.rs` to `opener.rs` [datafusion]

2025-05-09 Thread via GitHub
miroim commented on issue #16008: URL: https://github.com/apache/datafusion/issues/16008#issuecomment-2866801684 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] [wip] feat: Add framework for supporting multiple telemetry providers [datafusion-comet]

2025-05-09 Thread via GitHub
andygrove commented on PR #1722: URL: https://github.com/apache/datafusion-comet/pull/1722#issuecomment-2866738663 Current status: ## Chrome ![2025-05-09_08-11](https://github.com/user-attachments/assets/6f223e0a-1668-4208-92b9-351afaeaa108) ## OpenTelemetry I ca

[PR] refactor: remove deprecated `ArrowExec` [datafusion]

2025-05-09 Thread via GitHub
miroim opened a new pull request, #16006: URL: https://github.com/apache/datafusion/pull/16006 ## Which issue does this PR close? Part of #15950 . ## Rationale for this change The `ArrowExec` structure was deprecated in DataFusion 46 and is scheduled for removal. De

Re: [PR] Map file-level column statistics to the table-level [datafusion]

2025-05-09 Thread via GitHub
xudong963 commented on code in PR #15865: URL: https://github.com/apache/datafusion/pull/15865#discussion_r2082010143 ## datafusion/datasource/src/schema_adapter.rs: ## @@ -334,4 +340,126 @@ impl SchemaMapper for SchemaMapping { let record_batch = RecordBatch::try_new_w

Re: [PR] perf: Add performance tracing capability [datafusion-comet]

2025-05-09 Thread via GitHub
andygrove commented on PR #1706: URL: https://github.com/apache/datafusion-comet/pull/1706#issuecomment-2867156825 @parthchandra @comphead this is now ready for another round of reviews -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] perf: Add performance tracing capability [datafusion-comet]

2025-05-09 Thread via GitHub
andygrove commented on code in PR #1706: URL: https://github.com/apache/datafusion-comet/pull/1706#discussion_r2082029591 ## spark/src/main/scala/org/apache/comet/CometExecIterator.scala: ## @@ -133,29 +133,25 @@ class CometExecIterator( def getNextBatch(): Option[ColumnarB

Re: [PR] perf: Add performance tracing capability [datafusion-comet]

2025-05-09 Thread via GitHub
andygrove commented on code in PR #1706: URL: https://github.com/apache/datafusion-comet/pull/1706#discussion_r2082030075 ## native/core/src/execution/tracing.rs: ## @@ -0,0 +1,111 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [I] Linear Aggregate Functions Optimization [datafusion]

2025-05-09 Thread via GitHub
Rachelint commented on issue #15633: URL: https://github.com/apache/datafusion/issues/15633#issuecomment-2867161790 @berkaysynnada If don't mind, I can still help working (start trying it yestoday), sorry for really long delay due to just finish some private things this week. Here is

Re: [PR] implement `AggregateExec.partition_statistics` [datafusion]

2025-05-09 Thread via GitHub
xudong963 commented on code in PR #15954: URL: https://github.com/apache/datafusion/pull/15954#discussion_r2081995641 ## datafusion/physical-plan/src/aggregates/mod.rs: ## @@ -3041,4 +3048,96 @@ mod tests { run_test_with_spill_pool_if_necessary(20_000, false).await?;

Re: [PR] perf: Add performance tracing capability [datafusion-comet]

2025-05-09 Thread via GitHub
andygrove commented on code in PR #1706: URL: https://github.com/apache/datafusion-comet/pull/1706#discussion_r2082030586 ## native/core/src/execution/tracing.rs: ## @@ -0,0 +1,111 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] fix: Fix data race in memory profiling [datafusion-comet]

2025-05-09 Thread via GitHub
parthchandra commented on code in PR #1727: URL: https://github.com/apache/datafusion-comet/pull/1727#discussion_r2082639273 ## spark/src/main/java/org/apache/spark/CometTaskMemoryManager.java: ## @@ -30,36 +34,41 @@ * memory manager. This assumes Spark's off-heap memory mode

Re: [PR] fix: Skip row index Spark SQL tests for native_datafusion Parquet scan [datafusion-comet]

2025-05-09 Thread via GitHub
mbutrovich commented on PR #1724: URL: https://github.com/apache/datafusion-comet/pull/1724#issuecomment-2868196546 > I thought the plan was to fix this by falling back to Spark? We can check if the schema has a field name matching `FileFormat.ROW_INDEX_TEMPORARY_COLUMN_NAME` and fall back

Re: [PR] implement pretty-printing with `{:#}` [datafusion-sqlparser-rs]

2025-05-09 Thread via GitHub
fivemtebex commented on PR #1847: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1847#issuecomment-2868151947 Impressive project structure! The architecture really demonstrates solid design principles. -- This is an automated message from the Apache Git Service. To respond t

[PR] Add late pruning of file based on file level statistics [datafusion]

2025-05-09 Thread via GitHub
adriangb opened a new pull request, #16014: URL: https://github.com/apache/datafusion/pull/16014 Closes #16000 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

Re: [I] Pass `PartitionedFile` into `FileSource` for late file stats based pruning [datafusion]

2025-05-09 Thread via GitHub
adriangb commented on issue #16000: URL: https://github.com/apache/datafusion/issues/16000#issuecomment-2868157070 POC at https://github.com/apache/datafusion/pull/16014 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[PR] implement pretty-printing with {:#} [datafusion-sqlparser-rs]

2025-05-09 Thread via GitHub
lovasoa opened a new pull request, #1847: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1847 closes https://github.com/apache/datafusion-sqlparser-rs/issues/1845 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-05-09 Thread via GitHub
berkaysynnada commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2865349477 Update: I tried to re-order the rules like: 1) EnforceSorting 2) FilterPushdown 3) EnforceDistribution 4) CombinePartialFinalAggregate ... 5) CoalesceBatches

Re: [PR] Add support for table valued functions for SQL Server [datafusion-sqlparser-rs]

2025-05-09 Thread via GitHub
iffyio commented on code in PR #1839: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1839#discussion_r2082704398 ## src/parser/mod.rs: ## @@ -5203,19 +5203,79 @@ impl<'a> Parser<'a> { let (name, args) = self.parse_create_function_name_and_params()?;

Re: [PR] fix: Fix data race in memory profiling [datafusion-comet]

2025-05-09 Thread via GitHub
codecov-commenter commented on PR #1727: URL: https://github.com/apache/datafusion-comet/pull/1727#issuecomment-2868000912 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1727?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-05-09 Thread via GitHub
aharpervc commented on PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#issuecomment-2868211237 > Closing this PR per my comment in [#1809 (comment)](https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2072719447) Thanks again @aharpervc for wo

  1   2   >