xudong963 commented on code in PR #15852:
URL: https://github.com/apache/datafusion/pull/15852#discussion_r2062983955
##
datafusion/physical-plan/src/execution_plan.rs:
##
@@ -430,6 +430,32 @@ pub trait ExecutionPlan: Debug + DisplayAs + Send + Sync {
Ok(Statistics::new
xudong963 commented on code in PR #15852:
URL: https://github.com/apache/datafusion/pull/15852#discussion_r2062952007
##
datafusion/datasource/src/file_groups.rs:
##
@@ -421,7 +421,7 @@ impl FileGroup {
}
/// Get the statistics for this group
-pub fn statistics(&
chenkovsky commented on PR #15867:
URL: https://github.com/apache/datafusion/pull/15867#issuecomment-2834018098
I tested shared concurrent hashset(DashSet) to avoid clone, but no
performance gain.
something like
```rust
static global_values: LazyLock> =
LazyLock::new(|| Das
joroKr21 commented on PR #15149:
URL: https://github.com/apache/datafusion/pull/15149#issuecomment-2833989444
This missed the v47 train. Anything else needed to merge?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use th
joroKr21 commented on PR #15544:
URL: https://github.com/apache/datafusion/pull/15544#issuecomment-2833988848
This missed the v47 train. Anything else needed to merge?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use th
kosiew commented on PR #1086:
URL:
https://github.com/apache/datafusion-python/pull/1086#issuecomment-2833965602
Closing this.
Moving the configuration from Rust to Python in #1119
--
This is an automated message from the Apache Git Service.
To respond to the message, please log o
kosiew closed pull request #1086: Partial fix for #1078 — [Add Dataframe
display config]
URL: https://github.com/apache/datafusion-python/pull/1086
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to th
kosiew opened a new pull request, #1119:
URL: https://github.com/apache/datafusion-python/pull/1119
## Which issue does this PR close?
partial fix for #1078
## Rationale for this change
This change improves the flexibility and performance of DataFrame rendering
in no
comphead commented on issue #1681:
URL:
https://github.com/apache/datafusion-comet/issues/1681#issuecomment-2833858936
The simplest test
```
test("native reader - read a STRUCT subfield - field from second") {
testSingleLineQuery(
"""
|select named_struct('
kosiew opened a new issue, #1118:
URL: https://github.com/apache/datafusion-python/issues/1118
**Describe the bug**
When running `pytest` on the `main` branch, tests fail with the following
error:
```
import functions as F
E ModuleNotFoundError: No module named 'funct
kosiew commented on issue #1118:
URL:
https://github.com/apache/datafusion-python/issues/1118#issuecomment-2833823260
cc @deanm
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
chenkovsky opened a new pull request, #1117:
URL: https://github.com/apache/datafusion-python/pull/1117
# Which issue does this PR close?
No
# Rationale for this change
expr depends on functions.
functions depends on expr.
there's a recursive import.
jayzhan211 commented on issue #15872:
URL: https://github.com/apache/datafusion/issues/15872#issuecomment-2833729992
It seems like the syntax you mentioned is not supported yet
```
statement count 0
create table t(a varchar) as values ('a'), ('b');
query error DataFusion e
jayzhan211 commented on PR #15861:
URL: https://github.com/apache/datafusion/pull/15861#issuecomment-2833715141
> Whereas the DataFusion::AvroError is only produced by the avro reader but
it affects every place where DataFusionError can appear.
How about we convert the error into the
jayzhan211 commented on PR #15867:
URL: https://github.com/apache/datafusion/pull/15867#issuecomment-2833713281
>
BTW, if we want to run count distinct in big data scenario, we have to use
two-step process. so I think we have to add an configure to toggle this
optimization.
We ca
juju4 opened a new issue, #15872:
URL: https://github.com/apache/datafusion/issues/15872
### Describe the bug
From https://github.com/openobserve/openobserve/discussions/6584
regexp_match does not seem to work with length or space matches. see below.
### To Reproduce
`
blaginin opened a new pull request, #15871:
URL: https://github.com/apache/datafusion/pull/15871
## Which issue does this PR close?
- Closes https://github.com/apache/datafusion/issues/258
## Rationale for this change
## What changes are included in this PR?
blaginin commented on code in PR #15793:
URL: https://github.com/apache/datafusion/pull/15793#discussion_r2062708916
##
datafusion/common/src/config.rs:
##
@@ -1995,11 +2052,11 @@ config_namespace! {
}
}
-pub trait FormatOptionsExt: Display {}
+pub trait OutputFormatExt:
codecov-commenter commented on PR #1680:
URL:
https://github.com/apache/datafusion-comet/pull/1680#issuecomment-2833580963
##
[Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1680?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca
rroelke commented on PR #15861:
URL: https://github.com/apache/datafusion/pull/15861#issuecomment-2833570923
From the lint description:
> Enum size is bounded by the largest variant. Having one large variant can
penalize the memory layout of that enum.
That is to say, the pres
huaxingao commented on PR #1683:
URL:
https://github.com/apache/datafusion-comet/pull/1683#issuecomment-2833571912
Iceberg shades Parquet. In our internal version of Iceberg, we remove the
shading. In OSS, when enabling Comet native execution in
https://github.com/apache/iceberg/pull/12709
rroelke commented on code in PR #15861:
URL: https://github.com/apache/datafusion/pull/15861#discussion_r2062689096
##
datafusion/common/src/error.rs:
##
@@ -59,7 +59,7 @@ pub enum DataFusionError {
ParquetError(ParquetError),
/// Error when reading Avro data.
#[c
klemniops commented on code in PR #15861:
URL: https://github.com/apache/datafusion/pull/15861#discussion_r2062687991
##
datafusion/common/src/error.rs:
##
@@ -59,7 +59,7 @@ pub enum DataFusionError {
ParquetError(ParquetError),
/// Error when reading Avro data.
#
klemniops commented on code in PR #15861:
URL: https://github.com/apache/datafusion/pull/15861#discussion_r2062687991
##
datafusion/common/src/error.rs:
##
@@ -59,7 +59,7 @@ pub enum DataFusionError {
ParquetError(ParquetError),
/// Error when reading Avro data.
#
klemniops commented on PR #15861:
URL: https://github.com/apache/datafusion/pull/15861#issuecomment-2833568011
From the lint description:
> Enum size is bounded by the largest variant. Having one large variant can
penalize the memory layout of that enum.
That is to say, the presenc
comphead commented on code in PR #1680:
URL: https://github.com/apache/datafusion-comet/pull/1680#discussion_r2062680371
##
native/spark-expr/src/array_funcs/array_repeat.rs:
##
@@ -0,0 +1,216 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contribu
gabotechs commented on code in PR #15857:
URL: https://github.com/apache/datafusion/pull/15857#discussion_r2062676790
##
datafusion/optimizer/src/analyzer/type_coercion.rs:
##
@@ -726,6 +726,8 @@ fn extract_window_frame_target_type(col_type: &DataType) ->
Result {
Ok(D
LucaCappelletti94 opened a new pull request, #1830:
URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1830
This pull request adds support for the [`CREATE
DOMAIN`](https://www.postgresql.org/docs/current/sql-createdomain.html) syntax
and the tests to validate whether the implement
deanm commented on PR #1076:
URL:
https://github.com/apache/datafusion-python/pull/1076#issuecomment-2833514544
I got it from polars, not clever on my part.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL
LucaCappelletti94 opened a new issue, #1829:
URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1829
[`CREATE
DOMAIN`](https://www.postgresql.org/docs/current/sql-createdomain.html) is not
supported.
Statements such as the following fail parsing at this time:
```sql
berkaysynnada commented on code in PR #15793:
URL: https://github.com/apache/datafusion/pull/15793#discussion_r2062667570
##
datafusion/common/src/config.rs:
##
@@ -1995,11 +2052,11 @@ config_namespace! {
}
}
-pub trait FormatOptionsExt: Display {}
+pub trait OutputForma
berkaysynnada commented on code in PR #15852:
URL: https://github.com/apache/datafusion/pull/15852#discussion_r2062662498
##
datafusion/datasource/src/file_groups.rs:
##
@@ -421,7 +421,7 @@ impl FileGroup {
}
/// Get the statistics for this group
-pub fn statisti
LucaCappelletti94 opened a new pull request, #1828:
URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1828
This pull request:
* Adds support for [`DROP
DOMAIN`](https://www.postgresql.org/docs/current/sql-dropdomain.html) syntax
resolving issue #1827
* Adds tests for `DROP
LucaCappelletti94 opened a new issue, #1827:
URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1827
The [`DROP
DOMAIN`](https://www.postgresql.org/docs/current/sql-dropdomain.html) syntax is
not currently supported.
Statements such as the following, therefore, cannot be p
Rachelint commented on PR #15591:
URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2833519008
@Dandandan @alamb this pr may be ready now.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above t
timsaucer opened a new issue, #1116:
URL: https://github.com/apache/datafusion-python/issues/1116
**Describe the bug**
https://github.com/apache/datafusion-python/pull/1074 introduced a
regression on `main`. It has two issues: functions is not imported properly in
`expr.py` and also
timsaucer commented on PR #1074:
URL:
https://github.com/apache/datafusion-python/pull/1074#issuecomment-2833517926
I must have missed that CI didn't run on this. We have a circular import and
broke CI on `main`.
--
This is an automated message from the Apache Git Service.
To respond to
LucaCappelletti94 opened a new pull request, #1826:
URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1826
This pull request resolves the bug described in issue #1825, which was
caused by an incorrect implementation of the named argument parsing. It also
adds a few tests to verify
timsaucer merged PR #1074:
URL: https://github.com/apache/datafusion-python/pull/1074
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...
timsaucer commented on issue #1115:
URL:
https://github.com/apache/datafusion-python/issues/1115#issuecomment-2833485753
If we do not merge in #1112 then we at least need to make a different PR
that includes the documentation changes so the site will build without issues
and render properl
timsaucer commented on PR #1085:
URL:
https://github.com/apache/datafusion-python/pull/1085#issuecomment-2833467666
I'm sorry it took me so long to get around to reviewing this. Thank you for
the contribution!
--
This is an automated message from the Apache Git Service.
To respond to the
timsaucer merged PR #1076:
URL: https://github.com/apache/datafusion-python/pull/1076
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...
timsaucer closed issue #1075: Add a Col class instead of just col function to
use __getattr__ method
URL: https://github.com/apache/datafusion-python/issues/1075
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL abo
LucaCappelletti94 opened a new issue, #1825:
URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1825
Attempting to parse a function such as the following currently fails with
`ParserError("Expected: ), found: INT")`
```sql
CREATE OR REPLACE FUNCTION check_values_differen
LucaCappelletti94 commented on issue #1807:
URL:
https://github.com/apache/datafusion-sqlparser-rs/issues/1807#issuecomment-2833477202
Ping to request I am expected to do with this, happy to remove the test if
it is indeed pointless.
--
This is an automated message from the Apache Git Se
LucaCappelletti94 closed issue #1804: Missing support for `INHERITS` operation
from PostgreSQL
URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1804
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL abo
LucaCappelletti94 commented on issue #1804:
URL:
https://github.com/apache/datafusion-sqlparser-rs/issues/1804#issuecomment-2833476990
Closing issue as relevant pull request has been merged.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log
timsaucer commented on PR #1108:
URL:
https://github.com/apache/datafusion-python/pull/1108#issuecomment-2833465864
Thank you again!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
timsaucer merged PR #1108:
URL: https://github.com/apache/datafusion-python/pull/1108
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...
Adez017 commented on PR #15832:
URL: https://github.com/apache/datafusion/pull/15832#issuecomment-2833455331
> You can rebase with main
doe this solve the issue ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and
Rachelint opened a new issue, #15870:
URL: https://github.com/apache/datafusion/issues/15870
### Is your feature request related to a problem or challenge?
I found aggregation fuzzer are still hard to use when I act as an user
currently.
Some points I noticed can be improved:
crystalxyz commented on PR #1112:
URL:
https://github.com/apache/datafusion-python/pull/1112#issuecomment-2833418001
@timsaucer The documentation changes look good to me! Do you think that
adding a comment in `python/datafusion/user_defined` to explain the renaming
would be helpful? People
iffyio commented on code in PR #1747:
URL:
https://github.com/apache/datafusion-sqlparser-rs/pull/1747#discussion_r2062561210
##
src/parser/mod.rs:
##
@@ -7081,18 +7029,243 @@ impl<'a> Parser<'a> {
if let Token::Word(word) = self.peek_token().token {
xudong963 commented on PR #15832:
URL: https://github.com/apache/datafusion/pull/15832#issuecomment-2833377199
Would anyone happen to know how to preview the HTML format for the PR
changes?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log
xudong963 commented on PR #15832:
URL: https://github.com/apache/datafusion/pull/15832#issuecomment-2833376316
You can rebase with main
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specifi
kosiew commented on PR #1108:
URL:
https://github.com/apache/datafusion-python/pull/1108#issuecomment-2833375574
Thank you @timsaucer for the detailed review.
I have corrected the above.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log
xudong963 commented on issue #14554:
URL: https://github.com/apache/datafusion/issues/14554#issuecomment-2833359178
Also, there is a newer paper for the topic:
https://15799.courses.cs.cmu.edu/spring2025/papers/11-unnesting/neumann-btw2025.pdf
--
This is an automated message from the Apac
xudong963 commented on PR #15865:
URL: https://github.com/apache/datafusion/pull/15865#issuecomment-2833360326
Fyi @friendlymatthew
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific c
chenkovsky commented on PR #15867:
URL: https://github.com/apache/datafusion/pull/15867#issuecomment-2833360269
BTW, if we want to run count distinct in big data scenario, we have to use
two-step process. so I think we have to add an configure to toggle this
optimization.
--
This is an a
iffyio commented on code in PR #1780:
URL:
https://github.com/apache/datafusion-sqlparser-rs/pull/1780#discussion_r2062522304
##
src/parser/mod.rs:
##
@@ -15375,6 +15391,17 @@ impl<'a> Parser<'a> {
}
}
+fn prefixed_expr(expr: Expr, prefix: Option) -> Expr {
Review Comm
iffyio commented on code in PR #1759:
URL:
https://github.com/apache/datafusion-sqlparser-rs/pull/1759#discussion_r2062500408
##
src/parser/mod.rs:
##
@@ -10574,11 +10598,96 @@ impl<'a> Parser<'a> {
for_clause,
settings,
format
EmilyMatt commented on code in PR #1670:
URL: https://github.com/apache/datafusion-comet/pull/1670#discussion_r2062485241
##
spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala:
##
@@ -430,55 +430,43 @@ class CometSparkSessionExtensions
op,
EmilyMatt commented on code in PR #1670:
URL: https://github.com/apache/datafusion-comet/pull/1670#discussion_r2062480644
##
spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala:
##
@@ -430,55 +430,43 @@ class CometSparkSessionExtensions
op,
63 matches
Mail list logo