codecov-commenter commented on PR #1954:
URL:
https://github.com/apache/datafusion-comet/pull/1954#issuecomment-3016356204
##
[Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1954?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca
iffyio commented on code in PR #1908:
URL:
https://github.com/apache/datafusion-sqlparser-rs/pull/1908#discussion_r2173612089
##
src/parser/mod.rs:
##
@@ -8477,7 +8477,14 @@ impl<'a> Parser<'a> {
pub fn parse_alter_table_operation(&mut self) ->
Result {
let opera
2010YOUY01 commented on PR #16500:
URL: https://github.com/apache/datafusion/pull/16500#issuecomment-3016318383
Thank you! this implementation looks correct to me.
Since the state transition in joins are tricky, could you add a test (or
ensure there are some existing tests), to double
GitHub user debajyoti-truefoundry added a comment to the discussion:
DataSourceExec metrics explanation
Seems like this is the case.
Ref:
https://discord.com/channels/885562378132000778/1388186871594745956/1388748472491970662
GitHub link:
https://github.com/apache/datafusion/discussions/165
GitHub user 2010YOUY01 added a comment to the discussion: DataSourceExec
metrics explanation
My guess:
If data source is actually scanning in [0,10], [15,20]. During [10,15] it's not
scanning because this datasource operator is not scheduled in the runtime, and
its parents are using the CPU t
GitHub user debajyoti-truefoundry added a comment to the discussion:
DataSourceExec metrics explanation
Thanks for your response. I have a follow-up question.
For `time_scanning_total`,
> /// Sum of time between when the [`FileStream`] requests data from
/// the stream and when a [`Record
zhuqi-lucas commented on code in PR #16604:
URL: https://github.com/apache/datafusion/pull/16604#discussion_r2173581271
##
datafusion/core/src/dataframe/mod.rs:
##
@@ -1615,11 +1617,27 @@ impl DataFrame {
/// # }
/// ```
pub fn explain(self, verbose: bool, analyze
chenkovsky opened a new pull request, #16610:
URL: https://github.com/apache/datafusion/pull/16610
## Which issue does this PR close?
- Closes #16607.
## Rationale for this change
unparser for get_field will check the first parameter. currently it only
allows column.
chenkovsky commented on issue #16607:
URL: https://github.com/apache/datafusion/issues/16607#issuecomment-3016207543
take
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To
Huy1Ng opened a new pull request, #1275:
URL: https://github.com/apache/datafusion-ballista/pull/1275
# Which issue does this PR close?
Part of #1128.
# Rationale for this change
In the issue
# What changes are included in this PR?
Improve the rust workflows by fol
alamb closed issue #16596: It should be disallowed to specify both order_by and
within_group.
URL: https://github.com/apache/datafusion/issues/16596
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to t
alamb commented on PR #16606:
URL: https://github.com/apache/datafusion/pull/16606#issuecomment-3016152969
Thank you @watchingthewheelsgo and @findepi
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to g
alamb merged PR #16606:
URL: https://github.com/apache/datafusion/pull/16606
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@datafusi
theirix opened a new pull request, #16609:
URL: https://github.com/apache/datafusion/pull/16609
## Which issue does this PR close?
- Closes #16260.
## Rationale for this change
As mentioned in that issue, it's reasonable to autodetect a file suffix. For
example,
tglanz opened a new pull request, #1954:
URL: https://github.com/apache/datafusion-comet/pull/1954
## Which issue does this PR close?
- #1952
- #1953
## Rationale for this change
Described in #1819
## What changes are included in this PR?
- Move bloom
tglanz opened a new issue, #1953:
URL: https://github.com/apache/datafusion-comet/issues/1953
Supports #1952
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe
tglanz opened a new issue, #1952:
URL: https://github.com/apache/datafusion-comet/issues/1952
### What is the problem the feature request solves?
Described in #1819
### Describe the potential solution
_No response_
### Additional context
_No response_
--
jkosh44 commented on PR #16608:
URL: https://github.com/apache/datafusion/pull/16608#issuecomment-3016026935
@gabotechs FYI
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
andygrove merged PR #1951:
URL: https://github.com/apache/datafusion-comet/pull/1951
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@
Iskander14yo commented on issue #864:
URL:
https://github.com/apache/datafusion-comet/issues/864#issuecomment-3016021200
For anyone wondering (seems like HDFS issue?):
Set the _full path_ to the jar (with `hdfs://`) for `spark.jars` param and
just _the name of jar_ for both `spark.driver
jkosh44 opened a new pull request, #16608:
URL: https://github.com/apache/datafusion/pull/16608
## Which issue does this PR close?
- Closes #16273.
## Rationale for this change
This commit adds support for the Arrow Dictionary type in Substrait plans.
## What chang
jkosh44 commented on code in PR #16558:
URL: https://github.com/apache/datafusion/pull/16558#discussion_r2173512225
##
datafusion/substrait/src/logical_plan/producer/types.rs:
##
@@ -360,6 +372,11 @@ mod tests {
round_trip_type(DataType::Timestamp(TimeUnit::Nanoseco
jkosh44 commented on code in PR #16558:
URL: https://github.com/apache/datafusion/pull/16558#discussion_r2173511981
##
datafusion/substrait/src/logical_plan/producer/types.rs:
##
@@ -360,6 +372,11 @@ mod tests {
round_trip_type(DataType::Timestamp(TimeUnit::Nanoseco
jkosh44 commented on PR #16503:
URL: https://github.com/apache/datafusion/pull/16503#issuecomment-3016010500
@gabotechs This response from Substrait makes me a little nervous about this
approach:
https://github.com/substrait-io/substrait/issues/822#issuecomment-3008350100
Duration do
jkosh44 commented on issue #16273:
URL: https://github.com/apache/datafusion/issues/16273#issuecomment-3016000944
take
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To un
findepi commented on issue #8777:
URL: https://github.com/apache/datafusion/issues/8777#issuecomment-3015995295
Maintaining streaming processing is very useful in the number of
circumstances: a query with LIMIT, a query with TopN over sorted input data, an
interactive query.
I don't know
findepi commented on code in PR #16606:
URL: https://github.com/apache/datafusion/pull/16606#discussion_r2173497813
##
datafusion/sql/src/expr/function.rs:
##
@@ -227,6 +227,10 @@ impl SqlToRel<'_, S> {
OVER is for window functions, whereas WITHIN GROUP is for
dependabot[bot] opened a new pull request, #1174:
URL: https://github.com/apache/datafusion-python/pull/1174
Bumps [arrow](https://github.com/apache/arrow-rs) from 55.1.0 to 55.2.0.
Release notes
Sourced from https://github.com/apache/arrow-rs/releases";>arrow's
releases.
ar
Dimchikkk commented on PR #1899:
URL:
https://github.com/apache/datafusion-sqlparser-rs/pull/1899#issuecomment-3015983460
> @Dimchikkk I don't have a plan myself -- I think it will be driven by
someone who needs features of sqlparser in a new version of DataFusion. Perhaps
you can help out
leung-ming commented on PR #1893:
URL:
https://github.com/apache/datafusion-comet/pull/1893#issuecomment-3015975515
performance looks improved on my laptop (i7-10710U)
before:
```
aggregate/avg_decimal_datafusion
time: [1.5631 ms 1.5829 ms 1.6037 ms]
alamb commented on issue #13818:
URL: https://github.com/apache/datafusion/issues/13818#issuecomment-3015973967
> Moved, it's already on
https://github.com/datafusion-contrib/datafusion-fiddle. I made a small revamp
before moving it, so it should look a bit better now.
Nice! it is al
alamb commented on code in PR #16538:
URL: https://github.com/apache/datafusion/pull/16538#discussion_r2173481689
##
datafusion/sql/src/expr/function.rs:
##
@@ -404,6 +404,11 @@ impl SqlToRel<'_, S> {
}
(!within_group.is_empty()).then_so
alamb commented on PR #16606:
URL: https://github.com/apache/datafusion/pull/16606#issuecomment-3015971875
Thank you @watchingthewheelsgo 🙏 -- I kicked off the tests!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use t
iffyio merged PR #1905:
URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1905
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr
iffyio closed issue #1904: Wrong join precedence parsing for non-Snowflake
dialects (nested joins parsed incorrectly)
URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1904
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to Git
iffyio commented on code in PR #1905:
URL:
https://github.com/apache/datafusion-sqlparser-rs/pull/1905#discussion_r2173481353
##
src/dialect/mod.rs:
##
@@ -278,6 +278,34 @@ pub trait Dialect: Debug + Any {
false
}
+/// Indicates whether the dialect supports
suibianwanwank commented on issue #8777:
URL: https://github.com/apache/datafusion/issues/8777#issuecomment-3015967757
Hi, I’d like to take a try at this task.
My plan is to first support `CTE` with the WITH ... AS MATERIALIZED syntax.
After that, we can explore broader optimizations
leung-ming commented on code in PR #1951:
URL: https://github.com/apache/datafusion-comet/pull/1951#discussion_r2173445457
##
native/spark-expr/src/conversion_funcs/cast.rs:
##
@@ -2681,7 +2681,7 @@ mod tests {
assert_eq!(casted.value(0), 4200);
// https://
kazantsev-maksim commented on code in PR #1932:
URL: https://github.com/apache/datafusion-comet/pull/1932#discussion_r2173441584
##
spark/src/test/scala/org/apache/comet/CometMapExpressionSuite.scala:
##
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
drexler-sky commented on PR #1945:
URL:
https://github.com/apache/datafusion-comet/pull/1945#issuecomment-3015781373
Thanks @andygrove @parthchandra
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go
gabotechs commented on issue #13818:
URL: https://github.com/apache/datafusion/issues/13818#issuecomment-3015777584
Moved, it's already on
https://github.com/datafusion-contrib/datafusion-fiddle. I made a small revamp
before moving it, so it should look a bit better now.
Note: deploy
andygrove commented on code in PR #1943:
URL: https://github.com/apache/datafusion-comet/pull/1943#discussion_r2173423255
##
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala:
##
@@ -1179,6 +1179,29 @@ object QueryPlanSerde extends Logging with CometExprShim
{
andygrove commented on code in PR #1932:
URL: https://github.com/apache/datafusion-comet/pull/1932#discussion_r2173421986
##
spark/src/test/scala/org/apache/comet/CometMapExpressionSuite.scala:
##
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under o
Dandandan commented on PR #16599:
URL: https://github.com/apache/datafusion/pull/16599#issuecomment-3015708840
> Here is a proposed alternative:
>
> * [Add comments to ClickBench queries about setting binary_as_string
#16605](https://github.com/apache/datafusion/pull/16605)
>
>
andygrove commented on issue #1856:
URL:
https://github.com/apache/datafusion-comet/issues/1856#issuecomment-3015648326
I plan on creating the first release candidate on Monday
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
andygrove commented on code in PR #1951:
URL: https://github.com/apache/datafusion-comet/pull/1951#discussion_r2173395678
##
native/spark-expr/src/conversion_funcs/cast.rs:
##
@@ -2681,7 +2681,7 @@ mod tests {
assert_eq!(casted.value(0), 4200);
// https://g
hmadison opened a new issue, #16607:
URL: https://github.com/apache/datafusion/issues/16607
### Describe the bug
When attempting to invoke `plan_to_sql` on a plan which includes accessing a
field inside of a nested structure, the invocation fails with the following
error:
```
nirnayroy commented on PR #15928:
URL: https://github.com/apache/datafusion/pull/15928#issuecomment-3015598680
Hi @blaginin , thanks for the help and suggestions for improvement.
I have addressed the requested changes. Please have another look.
> Tests are failing. If that helps, yo
nirnayroy commented on code in PR #15928:
URL: https://github.com/apache/datafusion/pull/15928#discussion_r2173375931
##
datafusion/functions/src/regex/regexpinstr.rs:
##
@@ -0,0 +1,804 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor lic
chenkovsky commented on issue #16590:
URL: https://github.com/apache/datafusion/issues/16590#issuecomment-3015556166
here's a PR #16161 , that leaves more structure in logical plan after
resolving the grouping expr. I will try to solve substrait creation problem
next step.
--
This is an
watchingthewheelsgo opened a new pull request, #16606:
URL: https://github.com/apache/datafusion/pull/16606
## Which issue does this PR close?
- Closes #16596.
## Rationale for this change
## What changes are included in this PR?
raise e
nirnayroy commented on code in PR #15928:
URL: https://github.com/apache/datafusion/pull/15928#discussion_r2173329256
##
datafusion/functions/benches/regx.rs:
##
@@ -127,6 +128,46 @@ fn criterion_benchmark(c: &mut Criterion) {
})
});
+c.bench_function("regexp
korowa commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2173258366
##
datafusion/physical-plan/src/joins/nested_loop_join.rs:
##
@@ -729,10 +716,26 @@ struct NestedLoopJoinStream {
right_side_ordered: bool,
/// Current st
alamb commented on PR #16599:
URL: https://github.com/apache/datafusion/pull/16599#issuecomment-3015414174
Here is a proposed alternative:
- https://github.com/apache/datafusion/pull/16605
> As mentioned earlier, I worder though if most of the query performance
might be solved by m
zhuqi-lucas commented on code in PR #16604:
URL: https://github.com/apache/datafusion/pull/16604#discussion_r2173316232
##
datafusion/core/src/dataframe/mod.rs:
##
@@ -1615,11 +1617,27 @@ impl DataFrame {
/// # }
/// ```
pub fn explain(self, verbose: bool, analyze
alamb opened a new pull request, #16605:
URL: https://github.com/apache/datafusion/pull/16605
## Which issue does this PR close?
- Closes https://github.com/apache/datafusion/issues/16591
- Closes https://github.com/apache/datafusion/pull/16599
## Rationale for this c
alamb commented on code in PR #16604:
URL: https://github.com/apache/datafusion/pull/16604#discussion_r2173310059
##
datafusion/core/src/dataframe/mod.rs:
##
@@ -1615,11 +1617,27 @@ impl DataFrame {
/// # }
/// ```
pub fn explain(self, verbose: bool, analyze: bool
Dandandan commented on PR #16599:
URL: https://github.com/apache/datafusion/pull/16599#issuecomment-3015377418
I agree, we should default by doing the correct thing. the binary_as_string
is a nice thing for fixing the benchmark, but by default we shouldn't do it.
As mentioned earlier,
zhuqi-lucas commented on PR #16604:
URL: https://github.com/apache/datafusion/pull/16604#issuecomment-3015305319
Testing result, it looks good:
```rust
cargo run --profile release-nonlto --target aarch64-apple-darwin --bin
dfbench -- clickbench --queries-path ./benchmarks/querie
zhuqi-lucas commented on PR #16599:
URL: https://github.com/apache/datafusion/pull/16599#issuecomment-3015303000
> > Now it default to false, but i am not sure if it will make other things
broken.
>
> Yeah I think it will break other things -- it isn't correct in general to
treat bin
zhuqi-lucas commented on issue #16591:
URL: https://github.com/apache/datafusion/issues/16591#issuecomment-3015301156
A side topic for debugging the benchmark:
https://github.com/apache/datafusion/issues/16603
I submit a PR try to make benchmark debug mode to use tree format for
zhuqi-lucas opened a new pull request, #16604:
URL: https://github.com/apache/datafusion/pull/16604
## Which issue does this PR close?
- Closes [#16603](https://github.com/apache/datafusion/issues/16603)
## Rationale for this change
When debugging the issue:
https://github
alamb commented on PR #16599:
URL: https://github.com/apache/datafusion/pull/16599#issuecomment-3015291713
> Now it default to false, but i am not sure if it will make other things
broken.
Yeah I think it will break other things -- it isn't correct in general to
treat binary columns
alamb commented on PR #16580:
URL: https://github.com/apache/datafusion/pull/16580#issuecomment-3015286049
Thank you @tlm365 🙏
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comme
alamb commented on PR #16580:
URL: https://github.com/apache/datafusion/pull/16580#issuecomment-3015286001
FYI @andygrove and @shehabgamin
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the spe
alamb commented on PR #16351:
URL: https://github.com/apache/datafusion/pull/16351#issuecomment-3015285765
Thanks again @corwinjoy / @adamreeve and everyone else. This is great
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
alamb closed issue #15216: Support integration with Parquet modular encryption
URL: https://github.com/apache/datafusion/issues/15216
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comm
alamb merged PR #16351:
URL: https://github.com/apache/datafusion/pull/16351
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@datafusi
zhuqi-lucas commented on issue #16603:
URL: https://github.com/apache/datafusion/issues/16603#issuecomment-3015281832
take
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
T
zhuqi-lucas opened a new issue, #16603:
URL: https://github.com/apache/datafusion/issues/16603
### Is your feature request related to a problem or challenge?
Currently, our benchmark debug is not using explain tree format, this ticket
will improve it to tree format.
### Describ
alamb commented on issue #16588:
URL: https://github.com/apache/datafusion/issues/16588#issuecomment-3015280286
Thanks @ianthetechie -- I have marked this issue as a regression in 48
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to G
2010YOUY01 opened a new issue, #16602:
URL: https://github.com/apache/datafusion/issues/16602
### Is your feature request related to a problem or challenge?
Original discussion question:
https://github.com/apache/datafusion/discussions/16572
There are many fine grained metrics
GitHub user 2010YOUY01 added a comment to the discussion: DataSourceExec
metrics explanation
`time_elapsed_opening` for example, unfortunately we have to search for this
name in the codebase, and then follow several indirections to find the comment
to explain this metrics:
https://github.com/
zhuqi-lucas commented on PR #16599:
URL: https://github.com/apache/datafusion/pull/16599#issuecomment-3015250746
Further debugging now, it only happen when we using
```rust
./datafusion-cli -c ""
```
But not happened for internal datafusion-cli run:
```rust
./da
codecov-commenter commented on PR #1951:
URL:
https://github.com/apache/datafusion-comet/pull/1951#issuecomment-3015235423
##
[Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1951?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca
zhuqi-lucas commented on PR #16599:
URL: https://github.com/apache/datafusion/pull/16599#issuecomment-3015221855
> Actually I am now torn about this as it will further diverge
datafusion-cli and the core library.
>
> Maybe we can just solve the human error part with comments in the qu
l1t1 opened a new issue, #16601:
URL: https://github.com/apache/datafusion/issues/16601
I read the guide of
https://datafusion.apache.org/library-user-guide/functions/adding-udfs.html,
and only find **adding-a-scalar-async-udf** and **writing-the-udtf**.
ref:
https://github.com/apache
l1t1 commented on issue #16600:
URL: https://github.com/apache/datafusion/issues/16600#issuecomment-3015204515
> I think you can use `cargo build --profile release-nonlto` which will turn
off link time optimization and reduce final link time and memory usage
>
> In terms of taking lot
alamb commented on PR #16599:
URL: https://github.com/apache/datafusion/pull/16599#issuecomment-3015200737
Actually I am now torn about this as it will further diverge datafusion-cli
and the core library.
Maybe we can just solve the human error part with comments in the queries.
I'l
alamb commented on issue #16600:
URL: https://github.com/apache/datafusion/issues/16600#issuecomment-3015195619
I think you can use `cargo build --profile release-nonlto` which will turn
off link time optimization and reduce final link time and memory usage
In terms of taking lots of
alamb commented on PR #16599:
URL: https://github.com/apache/datafusion/pull/16599#issuecomment-3015194305
Given how much we/I use datafusion-cli to test benchmark performance
(clickbench in particular) I think this is a good change to help
--
This is an automated message from the Apache
alamb commented on code in PR #16599:
URL: https://github.com/apache/datafusion/pull/16599#discussion_r2173214983
##
datafusion-cli/src/main.rs:
##
@@ -171,7 +171,13 @@ async fn main_inner() -> Result<()> {
env::set_current_dir(p).unwrap();
};
-let session_co
osipovartem commented on PR #16571:
URL: https://github.com/apache/datafusion/pull/16571#issuecomment-3015191081
Let me fix it
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific commen
leung-ming commented on PR #1915:
URL:
https://github.com/apache/datafusion-comet/pull/1915#issuecomment-3015182461
> > I am not implemented a new dragonbox, I just copy it, add 4 `pub` to
expose the decimal interface.
>
> Could this be done in the original crate? I understand that t
l1t1 opened a new issue, #16600:
URL: https://github.com/apache/datafusion/issues/16600
### Is your feature request related to a problem or challenge?
when I compile benchmarks binaries with `cargo build --release` and use
`mold` as the linker, it occurs following errors
```
(s
alamb commented on PR #16571:
URL: https://github.com/apache/datafusion/pull/16571#issuecomment-3015178652

I can't merge this PR because it has a conflict that must
Dimchikkk commented on PR #1905:
URL:
https://github.com/apache/datafusion-sqlparser-rs/pull/1905#issuecomment-3015171544
Thanks for the review, @iffyio - I've addressed your feedback and it’s ready
for another round when you are.
--
This is an automated message from the Apache Git Servi
Samyak2 commented on PR #16500:
URL: https://github.com/apache/datafusion/pull/16500#issuecomment-3015164616
Rebased on latest main
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific co
kazuyukitanimura commented on issue #16577:
URL: https://github.com/apache/datafusion/issues/16577#issuecomment-3015103616
I just realized we can just do
```
to_char(from_unixtime(expression[, timezone]), format)
```
However, `to_char` has this problem
https://github.com/apache/d
89 matches
Mail list logo