Re: [PR] Remove CoalescePartitions insertion from HashJoinExec [datafusion]

2025-03-30 Thread via GitHub
ctsk commented on PR #15476: URL: https://github.com/apache/datafusion/pull/15476#issuecomment-2764514013 Sorry about that! Thanks for tracking it down @goldmedal. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [I] IDEA: Use one of the examples from Datafusion Blog 45 to complete custom logical plans/execution plans page [datafusion]

2025-03-30 Thread via GitHub
alamb commented on issue #15422: URL: https://github.com/apache/datafusion/issues/15422#issuecomment-2764516243 > I'd love to work on this! [@alamb](https://github.com/alamb) Could you share the link to the blog examples please? Most of them are linked from https://datafusion.apache.

Re: [PR] Revert #15476 to fix the datafusion-examples CI fail [datafusion]

2025-03-30 Thread via GitHub
alamb merged PR #15496: URL: https://github.com/apache/datafusion/pull/15496 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Saner handling of nulls inside arrays [datafusion]

2025-03-30 Thread via GitHub
alamb commented on PR #15149: URL: https://github.com/apache/datafusion/pull/15149#issuecomment-2764507295 @thinkharderdev shall we merge this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] FIX : some benchmarks are failing [datafusion]

2025-03-30 Thread via GitHub
getChan commented on PR #15367: URL: https://github.com/apache/datafusion/pull/15367#issuecomment-2764524744 > Thank you for the fix! I noticed there are two other tests panicked on the same line of source code, is this fix still applicable? > > 1. Running sqllogictest with sqlite che

Re: [PR] feat: add missing PyLogicalPlan to_variant [datafusion-python]

2025-03-30 Thread via GitHub
timsaucer commented on PR #1085: URL: https://github.com/apache/datafusion-python/pull/1085#issuecomment-2764531402 This is a really big PR and it's not immediately obvious what problem it's trying to solve. Can you expand the description or at least link it to an issue describing the prob

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-30 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2020153605 ## datafusion/physical-expr-common/src/physical_expr.rs: ## @@ -283,6 +284,47 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug + DynEq + DynHash { /

Re: [PR] fix: Queries similar to `count-bug` produce incorrect results [datafusion]

2025-03-30 Thread via GitHub
suibianwanwank commented on PR #15281: URL: https://github.com/apache/datafusion/pull/15281#issuecomment-2764559643 I'm not sure if I misunderstood something. The fields on both sides of the `=` here come from two tables. I think this is similar to SELECT * FROM A t1, A t2 where t1.id = t2.

Re: [PR] Decimal type support for `to_timestamp` [datafusion]

2025-03-30 Thread via GitHub
alamb commented on code in PR #15486: URL: https://github.com/apache/datafusion/pull/15486#discussion_r2020130924 ## datafusion/sqllogictest/test_files/timestamps.slt: ## @@ -416,6 +416,33 @@ SELECT to_timestamp(123456789.123456789) as c1, cast(123456789.123456789 as time

Re: [PR] Change default `EXPLAIN` format in `datafusion-cli` to `tree` format [datafusion]

2025-03-30 Thread via GitHub
alamb commented on PR #15427: URL: https://github.com/apache/datafusion/pull/15427#issuecomment-2764527480 > I think one issue with the current approach is that loading from env will break. I pushed a fix and a test for this -- This is an automated message from the Apache Git Servi

[PR] Update changelog and version number [datafusion-python]

2025-03-30 Thread via GitHub
timsaucer opened a new pull request, #1089: URL: https://github.com/apache/datafusion-python/pull/1089 Merge 46 release into main -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] Format `Date32` to string given timestamp specifiers [datafusion]

2025-03-30 Thread via GitHub
alamb commented on code in PR #15361: URL: https://github.com/apache/datafusion/pull/15361#discussion_r2020139946 ## datafusion/functions/src/datetime/to_char.rs: ## @@ -220,6 +221,13 @@ fn _to_char_scalar( } } +// eagerly cast Date32 values to Date64 to supp

Re: [PR] Document SQL dialect guidance [datafusion]

2025-03-30 Thread via GitHub
alamb commented on PR #13706: URL: https://github.com/apache/datafusion/pull/13706#issuecomment-2764507105 As there isn't consensus for now, closing the PR to clear from review queue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [I] Organize fields inside `SortMergeJoinStream` [datafusion]

2025-03-30 Thread via GitHub
suibianwanwank commented on issue #15406: URL: https://github.com/apache/datafusion/issues/15406#issuecomment-2764656937 Oops, thanks for the correction, I get it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] fix: Queries similar to `count-bug` produce incorrect results [datafusion]

2025-03-30 Thread via GitHub
suibianwanwank commented on PR #15281: URL: https://github.com/apache/datafusion/pull/15281#issuecomment-2764481850 I guess you might want to see this code, which outputs a True Literal at the Join to determine if there is a matching line or not https://github.com/apache/datafusion/blob/

Re: [PR] FIX : some benchmarks are failing [datafusion]

2025-03-30 Thread via GitHub
2010YOUY01 commented on PR #15367: URL: https://github.com/apache/datafusion/pull/15367#issuecomment-2764458328 Thank you for the fix! I noticed there are two other tests panicked on the same line of source code, is this fix still applicable? 1. Running sqllogictest with sqlite check: ht

Re: [I] `custom_datasource` example panicked during `RepartitionExec` planning [datafusion]

2025-03-30 Thread via GitHub
zhuqi-lucas commented on issue #15493: URL: https://github.com/apache/datafusion/issues/15493#issuecomment-2764477578 Revert the https://github.com/apache/datafusion/pull/15476 Just tested, also fixed the tpch bench. -- This is an automated message from the Apache Git Service. To r

Re: [PR] fix: Queries similar to `count-bug` produce incorrect results [datafusion]

2025-03-30 Thread via GitHub
jayzhan211 commented on PR #15281: URL: https://github.com/apache/datafusion/pull/15281#issuecomment-2764483579 ``` statement count 0 create table t(a int, b int) as values (11, 2), (3, 0); statement count 0 create table t2(a int, b int) as values (11, 3), (13, 1); quer

Re: [PR] feat: add test to check for `ctx.read_json()` [datafusion-ballista]

2025-03-30 Thread via GitHub
westhide commented on code in PR #1212: URL: https://github.com/apache/datafusion-ballista/pull/1212#discussion_r2020114983 ## ballista/scheduler/src/scheduler_server/grpc.rs: ## @@ -124,14 +128,36 @@ impl SchedulerGrpc }; let mut tasks = vec![]; +

Re: [PR] fix: Queries similar to `count-bug` produce incorrect results [datafusion]

2025-03-30 Thread via GitHub
suibianwanwank commented on PR #15281: URL: https://github.com/apache/datafusion/pull/15281#issuecomment-2764487354 > In your original query, we have eq check on the same column, so I think we can evaluate to false in `SimplifyExpressions` before `DecorrelatePredicateSubquery`. > > I

Re: [PR] fix: Queries similar to `count-bug` produce incorrect results [datafusion]

2025-03-30 Thread via GitHub
jayzhan211 commented on PR #15281: URL: https://github.com/apache/datafusion/pull/15281#issuecomment-2764487737 ``` For query, select e.b ,(select case when max(e2.a) > 10 then 'a' else 'b' end from t e2 where e2.b = e.b+1 ) from t e; After `SimplifyExpressions`, we know e2.b =

Re: [PR] feat: add test to check for `ctx.read_json()` [datafusion-ballista]

2025-03-30 Thread via GitHub
milenkovicm commented on code in PR #1212: URL: https://github.com/apache/datafusion-ballista/pull/1212#discussion_r2020086784 ## ballista/scheduler/src/scheduler_server/grpc.rs: ## @@ -124,14 +128,36 @@ impl SchedulerGrpc }; let mut tasks = vec![];

Re: [PR] feat: Improve fetch partition performance, support skip validation arrow ipc files [datafusion-ballista]

2025-03-30 Thread via GitHub
westhide commented on PR #1216: URL: https://github.com/apache/datafusion-ballista/pull/1216#issuecomment-2764432837 > Thanks @westhide this PR makes sense, > > It also brings configurable validation option, which I did not really think about. I have few questions, for which I do not

Re: [I] `custom_datasource` example panicked during `RepartitionExec` planning [datafusion]

2025-03-30 Thread via GitHub
goldmedal commented on issue #15493: URL: https://github.com/apache/datafusion/issues/15493#issuecomment-2764446051 I saw similar error messages when running tpch sqllogictest in the latest main branch ``` ~/git/datafusion ▓▒░ INCLUDE_TPCH=true cargo test --test sqllogictests -- tpch

Re: [I] `custom_datasource` example panicked during `RepartitionExec` planning [datafusion]

2025-03-30 Thread via GitHub
goldmedal commented on issue #15493: URL: https://github.com/apache/datafusion/issues/15493#issuecomment-2764449044 I guess #15476 is the root cause. When I checked out 907150326, the tests passed. ❌ 14635dab4 (HEAD -> main, origin/main, origin/HEAD, goldmedal/main) perf:

[I] Add `statistics_by_partition` API to `ExecutionPlan` [datafusion]

2025-03-30 Thread via GitHub
xudong963 opened a new issue, #15495: URL: https://github.com/apache/datafusion/issues/15495 After https://github.com/apache/datafusion/pull/15432 is merged, we'll have partition-level statistics in `DataSource. ' To make it be used by query optimization, we should flow the info to ot

Re: [I] `custom_datasource` example panicked during `RepartitionExec` planning [datafusion]

2025-03-30 Thread via GitHub
goldmedal commented on issue #15493: URL: https://github.com/apache/datafusion/issues/15493#issuecomment-2764454319 By the way, the fail example is `dataframe`, not `custom_datasource`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[PR] Revert #15476 to fix the datafusion-examples CI fail [datafusion]

2025-03-30 Thread via GitHub
goldmedal opened a new pull request, #15496: URL: https://github.com/apache/datafusion/pull/15496 ## Which issue does this PR close? - Closes #15493. ## Rationale for this change ## What changes are included in this PR? - Revert #15476 #

Re: [PR] experiment: Selectively remove CoalesceBatchesExec [datafusion]

2025-03-30 Thread via GitHub
ctsk commented on code in PR #15479: URL: https://github.com/apache/datafusion/pull/15479#discussion_r2020104151 ## datafusion/physical-optimizer/src/coalesce_batches.rs: ## @@ -92,3 +92,73 @@ impl PhysicalOptimizerRule for CoalesceBatches { true } } + +/// Remove

Re: [I] `custom_datasource` example panicked during `RepartitionExec` planning [datafusion]

2025-03-30 Thread via GitHub
acking-you commented on issue #15493: URL: https://github.com/apache/datafusion/issues/15493#issuecomment-2764464827 Interesting,I just noticed that the CI also failed at this point -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] experiment: Selectively remove CoalesceBatchesExec [datafusion]

2025-03-30 Thread via GitHub
alamb commented on PR #15479: URL: https://github.com/apache/datafusion/pull/15479#issuecomment-2764497802 BTW thank you very much @ctsk -- it is really cool to see the joins get some careful love and attention ❤️ -- This is an automated message from the Apache Git Service. To respond t

Re: [PR] Format `Date32` to string given timestamp specifiers [datafusion]

2025-03-30 Thread via GitHub
Omega359 commented on code in PR #15361: URL: https://github.com/apache/datafusion/pull/15361#discussion_r2020191523 ## datafusion/functions/src/datetime/to_char.rs: ## @@ -220,6 +221,13 @@ fn _to_char_scalar( } } +// eagerly cast Date32 values to Date64 to s

Re: [PR] Format `Date32` to string given timestamp specifiers [datafusion]

2025-03-30 Thread via GitHub
Omega359 commented on code in PR #15361: URL: https://github.com/apache/datafusion/pull/15361#discussion_r2020191523 ## datafusion/functions/src/datetime/to_char.rs: ## @@ -220,6 +221,13 @@ fn _to_char_scalar( } } +// eagerly cast Date32 values to Date64 to s

Re: [PR] Format `Date32` to string given timestamp specifiers [datafusion]

2025-03-30 Thread via GitHub
Omega359 commented on code in PR #15361: URL: https://github.com/apache/datafusion/pull/15361#discussion_r2020170477 ## datafusion/functions/src/datetime/to_char.rs: ## @@ -220,6 +221,13 @@ fn _to_char_scalar( } } +// eagerly cast Date32 values to Date64 to s

Re: [I] Migrate subtrait tests to `insta` [datafusion]

2025-03-30 Thread via GitHub
alamb closed issue #15398: Migrate subtrait tests to `insta` URL: https://github.com/apache/datafusion/issues/15398 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

Re: [PR] feat: add missing PyLogicalPlan to_variant [datafusion-python]

2025-03-30 Thread via GitHub
chenkovsky commented on PR #1085: URL: https://github.com/apache/datafusion-python/pull/1085#issuecomment-2764593959 > This is a really big PR and it's not immediately obvious to me what problem it's trying to solve. Can you expand the description or at least link it to an issue describing

[PR] build(deps): bump astral-sh/setup-uv from 5 to 6 [datafusion-python]

2025-03-30 Thread via GitHub
dependabot[bot] opened a new pull request, #1090: URL: https://github.com/apache/datafusion-python/pull/1090 Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from 5 to 6. Release notes Sourced from https://github.com/astral-sh/setup-uv/releases";>astral-sh/setup-uv

Re: [PR] Documentation updates: mention correct dataset on basics page [datafusion-python]

2025-03-30 Thread via GitHub
timsaucer merged PR #1081: URL: https://github.com/apache/datafusion-python/pull/1081 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] Migrate datafusion/sql tests to insta, part1 [datafusion]

2025-03-30 Thread via GitHub
qstommyshu commented on PR #15484: URL: https://github.com/apache/datafusion/pull/15484#issuecomment-2764550055 Sure, I'll do `git cherry-pick` and split them into multiple smaller PRs. They should be organized by modules instead of number of lines as my commits are mostly based on modules

Re: [PR] Remove redundant statistics from FileScanConfig [datafusion]

2025-03-30 Thread via GitHub
xudong963 commented on PR #14955: URL: https://github.com/apache/datafusion/pull/14955#issuecomment-2764553128 > > It seems the `statistics` generated in line 884 will be lost. > > > > > > https://github.com/apache/datafusion/blob/main/datafusion/core/src/datasource/listing/ta

[PR] chore: update changelog for 45.0.0 [datafusion-ballista]

2025-03-30 Thread via GitHub
milenkovicm opened a new pull request, #1218: URL: https://github.com/apache/datafusion-ballista/pull/1218 # Which issue does this PR close? Closes #. # Rationale for this change # What changes are included in this PR? # Are there any user-facing c

Re: [PR] fix: Assertion fail in external sort [datafusion]

2025-03-30 Thread via GitHub
alamb commented on code in PR #15469: URL: https://github.com/apache/datafusion/pull/15469#discussion_r2020125627 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -397,16 +393,13 @@ impl ExternalSorter { self.metrics.spill_metrics.spill_file_count.value() } -

Re: [PR] fix: Queries similar to `count-bug` produce incorrect results [datafusion]

2025-03-30 Thread via GitHub
suibianwanwank commented on PR #15281: URL: https://github.com/apache/datafusion/pull/15281#issuecomment-2764503913 > ``` >After `SimplifyExpressions`, we know e2.b = 2.b + 1 is false, since they are the same column. And, then we can infer taht >max(e2.a) is NULL. >select e.b ,(select

Re: [PR] Migrate datafusion/sql tests to insta, part1 [datafusion]

2025-03-30 Thread via GitHub
alamb commented on PR #15484: URL: https://github.com/apache/datafusion/pull/15484#issuecomment-2764504770 > The code changes is now done, please review carefully as the code changes is LARGE. Thank you so much @qstommyshu -- this looks epic Is there any chance you can

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-03-30 Thread via GitHub
alamb commented on PR #15423: URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2764511786 FYI @ctsk this seems potentially related to some of the work you are doing as well -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [I] `custom_datasource` example panicked during `RepartitionExec` planning [datafusion]

2025-03-30 Thread via GitHub
alamb closed issue #15493: `custom_datasource` example panicked during `RepartitionExec` planning URL: https://github.com/apache/datafusion/issues/15493 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] Support macro `ensure_or_internal_err` to clean up repeated sanity checks [datafusion]

2025-03-30 Thread via GitHub
alan910127 commented on issue #15492: URL: https://github.com/apache/datafusion/issues/15492#issuecomment-2764511135 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] IDEA: Use one of the examples from Datafusion Blog 45 to complete custom logical plans/execution plans page [datafusion]

2025-03-30 Thread via GitHub
edmondop commented on issue #15422: URL: https://github.com/apache/datafusion/issues/15422#issuecomment-2764530536 I see two possible strategies: 1. build first a self contained example (i.e. datafusion-examples crate) inspired by one the blog posts, and store in datafusion-examples

Re: [PR] Update changelog and version number [datafusion-python]

2025-03-30 Thread via GitHub
timsaucer merged PR #1089: URL: https://github.com/apache/datafusion-python/pull/1089 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] feat: support unparser [datafusion-python]

2025-03-30 Thread via GitHub
timsaucer merged PR #1088: URL: https://github.com/apache/datafusion-python/pull/1088 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] Remove redundant statistics from FileScanConfig [datafusion]

2025-03-30 Thread via GitHub
alamb commented on PR #14955: URL: https://github.com/apache/datafusion/pull/14955#issuecomment-2764527928 > It seems the `statistics` generated in line 884 will be lost. > > https://github.com/apache/datafusion/blob/main/datafusion/core/src/datasource/listing/table.rs#L884-L960

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-30 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2020157010 ## datafusion/proto/src/physical_plan/to_proto.rs: ## @@ -210,6 +212,7 @@ pub fn serialize_physical_expr( value: &Arc, codec: &dyn PhysicalExtensionCode

Re: [PR] fix: Queries similar to `count-bug` produce incorrect results [datafusion]

2025-03-30 Thread via GitHub
suibianwanwank commented on PR #15281: URL: https://github.com/apache/datafusion/pull/15281#issuecomment-2764502074 > > Hi @xudong963, would you be interested in reviewing this PR? Any feedback would be greatly appreciated! > > I'm sorry, it's weird I didn't receive the notification,

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-30 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2020153780 ## datafusion/physical-plan/src/dynamic_filters.rs: ## @@ -0,0 +1,226 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

Re: [PR] Revert #15476 to fix the datafusion-examples CI fail [datafusion]

2025-03-30 Thread via GitHub
alamb commented on PR #15496: URL: https://github.com/apache/datafusion/pull/15496#issuecomment-2764507735 Since CI is failing without this let's merge it in -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] Change default `EXPLAIN` format in `datafusion-cli` to `tree` format [datafusion]

2025-03-30 Thread via GitHub
alamb commented on code in PR #15427: URL: https://github.com/apache/datafusion/pull/15427#discussion_r2020134939 ## datafusion-cli/tests/cli_integration.rs: ## @@ -74,6 +75,31 @@ fn cli_quick_test<'a>( assert_cmd_snapshot!(cmd); } +#[rstest] Review Comment: Thanks,

Re: [I] Support for MySQL := assignment operator [datafusion-sqlparser-rs]

2025-03-30 Thread via GitHub
barsela1 closed issue #1778: Support for MySQL := assignment operator URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1778 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] Add `deserialize` to `BatchSerializer` [datafusion]

2025-03-30 Thread via GitHub
berkaysynnada commented on PR #15411: URL: https://github.com/apache/datafusion/pull/15411#issuecomment-2751437750 I've two questions: 1) Why don't we use BatchDeserializer to deserialize? 2) Why are we converting them to async as they don't yield at all already? -- This is an automated m

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-30 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2020152758 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -186,6 +235,90 @@ impl TopK { Ok(()) } +fn calculate_dynamic_filters( +thresholds:

[PR] Migrate datafusion/sql tests to insta, part1 [datafusion]

2025-03-30 Thread via GitHub
qstommyshu opened a new pull request, #15497: URL: https://github.com/apache/datafusion/pull/15497 ## Which issue does this PR close? - Related #15397, this is a part of #15484 breaking down. ## Rationale for this change ## What changes are included in thi

Re: [PR] feat: add test to check for `ctx.read_json()` [datafusion-ballista]

2025-03-30 Thread via GitHub
milenkovicm commented on code in PR #1212: URL: https://github.com/apache/datafusion-ballista/pull/1212#discussion_r2020223746 ## ballista/scheduler/src/state/mod.rs: ## @@ -248,7 +262,7 @@ impl SchedulerState,` and `task.manager.launch_multi_task` and cancel the jobs there?

Re: [PR] feat: add test to check for `ctx.read_json()` [datafusion-ballista]

2025-03-30 Thread via GitHub
milenkovicm commented on code in PR #1212: URL: https://github.com/apache/datafusion-ballista/pull/1212#discussion_r2020224499 ## ballista/scheduler/src/state/task_manager.rs: ## @@ -524,24 +524,35 @@ impl TaskManager pub(crate) async fn launch_multi_task( &self,

Re: [PR] bench: Document how to use cross platform Samply profiler [datafusion]

2025-03-30 Thread via GitHub
alamb commented on code in PR #15481: URL: https://github.com/apache/datafusion/pull/15481#discussion_r2020122550 ## docs/source/library-user-guide/profiling.md: ## @@ -82,6 +82,43 @@ CARGO_PROFILE_RELEASE_DEBUG=true cargo flamegraph --root --bench sql_planner -- [Video: how

Re: [PR] Document SQL dialect guidance [datafusion]

2025-03-30 Thread via GitHub
alamb closed pull request #13706: Document SQL dialect guidance URL: https://github.com/apache/datafusion/pull/13706 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [PR] Decimal type support for `to_timestamp` [datafusion]

2025-03-30 Thread via GitHub
jatin510 commented on code in PR #15486: URL: https://github.com/apache/datafusion/pull/15486#discussion_r2020135443 ## datafusion/sqllogictest/test_files/timestamps.slt: ## @@ -416,6 +416,33 @@ SELECT to_timestamp(123456789.123456789) as c1, cast(123456789.123456789 as time -

Re: [PR] Saner handling of nulls inside arrays [datafusion]

2025-03-30 Thread via GitHub
thinkharderdev commented on PR #15149: URL: https://github.com/apache/datafusion/pull/15149#issuecomment-2764520144 > @thinkharderdev shall we merge this PR? I think it's good to go. There are still some open comments so I was waiting for another approval, but if we're all agreed then

Re: [PR] 1065/enhancement/add ctx to `__init__.py` [datafusion-python]

2025-03-30 Thread via GitHub
timsaucer commented on PR #1072: URL: https://github.com/apache/datafusion-python/pull/1072#issuecomment-2764532709 I recommend closing this PR or at lest changing it to: ```python global_ctx = SessionContext.global_ctx() ``` That *I think* will also help with making sure

Re: [PR] fix: Queries similar to `count-bug` produce incorrect results [datafusion]

2025-03-30 Thread via GitHub
jayzhan211 commented on PR #15281: URL: https://github.com/apache/datafusion/pull/15281#issuecomment-2764506178 when we compare table1.a = table2.b, we compare them row by row. ``` t1.a t2.b 1 1 2 3 3 2 4 4 ``` ``` t1.a=t2.b true false false true

Re: [I] IDEA: Use one of the examples from Datafusion Blog 45 to complete custom logical plans/execution plans page [datafusion]

2025-03-30 Thread via GitHub
Max-Meldrum commented on issue #15422: URL: https://github.com/apache/datafusion/issues/15422#issuecomment-2764551007 > > I'd love to work on this! [@alamb](https://github.com/alamb) Could you share the link to the blog examples please? > > Most of them are linked from https://datafu

Re: [PR] feat: add test to check for `ctx.read_json()` [datafusion-ballista]

2025-03-30 Thread via GitHub
westhide commented on code in PR #1212: URL: https://github.com/apache/datafusion-ballista/pull/1212#discussion_r2020169629 ## ballista/scheduler/src/scheduler_server/grpc.rs: ## @@ -124,14 +128,36 @@ impl SchedulerGrpc }; let mut tasks = vec![]; +

Re: [PR] Migrate datafusion/sql tests to insta,sql integrations [datafusion]

2025-03-30 Thread via GitHub
qstommyshu closed pull request #15484: Migrate datafusion/sql tests to insta,sql integrations URL: https://github.com/apache/datafusion/pull/15484 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] [EPIC] Add support for all Map functions [datafusion-comet]

2025-03-30 Thread via GitHub
kurosch commented on issue #1044: URL: https://github.com/apache/datafusion-comet/issues/1044#issuecomment-2764660850 I would like to work on the `[]` operator. I assume it has to be implemented for arrays and maps. -- This is an automated message from the Apache Git Service. To respond

Re: [PR] fix: Queries similar to `count-bug` produce incorrect results [datafusion]

2025-03-30 Thread via GitHub
jayzhan211 commented on PR #15281: URL: https://github.com/apache/datafusion/pull/15281#issuecomment-2764486169 In your original query, we have eq check on the same column, so I think we can evaluate to false in `SimplifyExpressions` before `DecorrelatePredicateSubquery`. In my query

[PR] datafusion-python 46.0.0 announcement [datafusion-site]

2025-03-30 Thread via GitHub
timsaucer opened a new pull request, #65: URL: https://github.com/apache/datafusion-site/pull/65 Work in progress -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] Introduce load-balanced `split_groups_by_statistics` method [datafusion]

2025-03-30 Thread via GitHub
leoyvens commented on code in PR #15473: URL: https://github.com/apache/datafusion/pull/15473#discussion_r2020240967 ## datafusion/datasource/benches/split_groups_by_statistics.rs: ## @@ -0,0 +1,178 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more co

Re: [PR] datafusion-python 46.0.0 announcement [datafusion-site]

2025-03-30 Thread via GitHub
renato2099 commented on PR #65: URL: https://github.com/apache/datafusion-site/pull/65#issuecomment-2764724217 looks like a lot was done for this release! thanks for putting this together Tim! -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] bench: Document how to use cross platform Samply profiler [datafusion]

2025-03-30 Thread via GitHub
comphead commented on code in PR #15481: URL: https://github.com/apache/datafusion/pull/15481#discussion_r2020242855 ## docs/source/library-user-guide/profiling.md: ## @@ -82,6 +82,43 @@ CARGO_PROFILE_RELEASE_DEBUG=true cargo flamegraph --root --bench sql_planner -- [Video:

[I] Improve `String` to `&str` conversions [datafusion]

2025-03-30 Thread via GitHub
comphead opened a new issue, #15498: URL: https://github.com/apache/datafusion/issues/15498 ### Is your feature request related to a problem or challenge? I was following to https://lucumr.pocoo.org/2025/3/23/from-string/ blog where a cheaper way introduced when converting a `String`

Re: [I] Improve `String` to `&str` conversions [datafusion]

2025-03-30 Thread via GitHub
comphead commented on issue #15498: URL: https://github.com/apache/datafusion/issues/15498#issuecomment-2764731939 In DataFusion there are some places where conversion used by indiredctions, or `as_str` or other methods, I'm thinking of creating a string utility method to make this conversi

Re: [I] Make it easier to run TPCH queries with datafusion-cli [datafusion]

2025-03-30 Thread via GitHub
clflushopt commented on issue #14608: URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2764733544 @lmwnshn @matthewmturner we now have a live crate for integrations https://crates.io/crates/tpchgen and a cli available https://github.com/clflushopt/tpchgen-rs special thank

Re: [PR] Enhance Schema adapter to accommodate evolving struct [datafusion]

2025-03-30 Thread via GitHub
TheBuilderJR commented on PR #15295: URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2764720923 I tried patching this in and getting these failures. Any idea why? Unfortunately I can't share the full source. ``` Error fetching table metadata: Failed to collect dat

Re: [I] Organize fields inside `SortMergeJoinStream` [datafusion]

2025-03-30 Thread via GitHub
comphead commented on issue #15406: URL: https://github.com/apache/datafusion/issues/15406#issuecomment-2764735186 Thanks for taking care on that, the SMJ structure is historically too complicated and break it down into smaller ones would benefit. For example it can be broken down in

Re: [PR] feat: enable iceberg compat tests, more tests for complex types [datafusion-comet]

2025-03-30 Thread via GitHub
comphead commented on code in PR #1550: URL: https://github.com/apache/datafusion-comet/pull/1550#discussion_r2020248798 ## spark/src/main/scala/org/apache/spark/sql/comet/CometScanExec.scala: ## @@ -490,8 +490,7 @@ object CometScanExec extends DataTypeSupport { // TODO a

Re: [I] datafusion-cli: document reading partitioned parquet [datafusion]

2025-03-30 Thread via GitHub
marvelshan commented on issue #15309: URL: https://github.com/apache/datafusion/issues/15309#issuecomment-2764874334 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] feat: add test to check for `ctx.read_json()` [datafusion-ballista]

2025-03-30 Thread via GitHub
milenkovicm commented on code in PR #1212: URL: https://github.com/apache/datafusion-ballista/pull/1212#discussion_r2020086784 ## ballista/scheduler/src/scheduler_server/grpc.rs: ## @@ -124,14 +128,36 @@ impl SchedulerGrpc }; let mut tasks = vec![];

Re: [PR] Add all missing table options to be handled in any order [datafusion-sqlparser-rs]

2025-03-30 Thread via GitHub
tomershaniii commented on PR #1747: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1747#issuecomment-2764615971 Hi again @iffyio @mvzink, see the latest commit, per your suggestions: - Permissive parsing (removed value validation for key/value parameters) - Centralized all

Re: [PR] Introduce load-balanced `split_groups_by_statistics` method [datafusion]

2025-03-30 Thread via GitHub
xudong963 commented on code in PR #15473: URL: https://github.com/apache/datafusion/pull/15473#discussion_r2020323603 ## datafusion/datasource/benches/split_groups_by_statistics.rs: ## @@ -0,0 +1,178 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [PR] Feat/ffi scalar udf [datafusion-python]

2025-03-30 Thread via GitHub
CrystalZhou0529 commented on PR #1033: URL: https://github.com/apache/datafusion-python/pull/1033#issuecomment-2764914770 This could be reviewed now! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] feat: add test to check for `ctx.read_json()` [datafusion-ballista]

2025-03-30 Thread via GitHub
westhide commented on code in PR #1212: URL: https://github.com/apache/datafusion-ballista/pull/1212#discussion_r2020090087 ## ballista/scheduler/src/scheduler_server/grpc.rs: ## @@ -124,14 +128,36 @@ impl SchedulerGrpc }; let mut tasks = vec![]; +

Re: [PR] feat: enable iceberg compat tests, more tests for complex types [datafusion-comet]

2025-03-30 Thread via GitHub
comphead merged PR #1550: URL: https://github.com/apache/datafusion-comet/pull/1550 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [PR] Add hook for sharing join state in distributed execution [datafusion]

2025-03-30 Thread via GitHub
github-actions[bot] commented on PR #12523: URL: https://github.com/apache/datafusion/pull/12523#issuecomment-2764926021 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] test: attempt to analyze boundaries for select columns [datafusion]

2025-03-30 Thread via GitHub
github-actions[bot] commented on PR #14308: URL: https://github.com/apache/datafusion/pull/14308#issuecomment-2764925938 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [I] Make it easier to run TPCH queries with datafusion-cli [datafusion]

2025-03-30 Thread via GitHub
alamb commented on issue #14608: URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2764746130 I think the next question in my mind is exactly how to integrate this into datafusion-cli We could follow the model of duckdb and create a table function like `dbgen(sf = 1

Re: [PR] feat: Add union_by_name, union_by_name_distinct to DataFrame api [datafusion]

2025-03-30 Thread via GitHub
berkaysynnada commented on code in PR #15489: URL: https://github.com/apache/datafusion/pull/15489#discussion_r2020254263 ## datafusion/core/src/dataframe/mod.rs: ## @@ -724,6 +764,45 @@ impl DataFrame { }) } +/// Calculate the union of two [`DataFrame`]s usi

Re: [PR] Migrate datafusion/sql tests to insta, part1 [datafusion]

2025-03-30 Thread via GitHub
alamb merged PR #15497: URL: https://github.com/apache/datafusion/pull/15497 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: Add union_by_name, union_by_name_distinct to DataFrame api [datafusion]

2025-03-30 Thread via GitHub
Omega359 commented on code in PR #15489: URL: https://github.com/apache/datafusion/pull/15489#discussion_r2020279541 ## datafusion/core/src/dataframe/mod.rs: ## @@ -724,6 +764,45 @@ impl DataFrame { }) } +/// Calculate the union of two [`DataFrame`]s using co

[PR] WIP: non-trivial cases migration [datafusion]

2025-03-30 Thread via GitHub
qstommyshu opened a new pull request, #15499: URL: https://github.com/apache/datafusion/pull/15499 ## Which issue does this PR close? - Related #15397, this is a part of #15484 breaking down. - Checkout things to note of the whole migration in comments section of #15484.

Re: [PR] Support computing statistics for FileGroup [datafusion]

2025-03-30 Thread via GitHub
xudong963 commented on PR #15432: URL: https://github.com/apache/datafusion/pull/15432#issuecomment-2764431052 > Would it be possible to add some unit tests for `compute_summary_statistics`? Something like: Thanks @alamb ! I'm cooking it -- This is an automated message from the Apa

Re: [PR] feat: add test to check for `ctx.read_json()` [datafusion-ballista]

2025-03-30 Thread via GitHub
westhide commented on code in PR #1212: URL: https://github.com/apache/datafusion-ballista/pull/1212#discussion_r2020084368 ## ballista/scheduler/src/scheduler_server/grpc.rs: ## @@ -124,14 +128,36 @@ impl SchedulerGrpc }; let mut tasks = vec![]; +

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-03-30 Thread via GitHub
goldmedal commented on PR #15423: URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2764422078 > I think to support this selection vector, the executors need to be updated to interpret an additional metadata column. However, since executors are part of the public interface t

  1   2   >