alamb commented on PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2845809583
See discussion on https://github.com/apache/datafusion/issues/15173
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and
github-actions[bot] closed pull request #14362: Support marking columns as
system columns via Field's metadata
URL: https://github.com/apache/datafusion/pull/14362
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL a
github-actions[bot] commented on PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2819872465
Thank you for your contribution. Unfortunately, this pull request is stale
because it has been open 60 days with no activity. Please remove the stale
label or comment or
adriangb commented on code in PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#discussion_r1964480019
##
datafusion/common/src/dfschema.rs:
##
@@ -1056,6 +1079,107 @@ pub fn qualified_name(qualifier:
Option<&TableReference>, name: &str) -> String
}
}
+///
adriangb commented on code in PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#discussion_r1964478571
##
datafusion/common/src/dfschema.rs:
##
@@ -1056,6 +1079,107 @@ pub fn qualified_name(qualifier:
Option<&TableReference>, name: &str) -> String
}
}
+///
chenkovsky commented on PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2646050299
because in this PR system columns are propagated by default. I guess we have
to write a lot of monkey patch to prevent system column leak.
--
This is an automated message from t
chenkovsky commented on PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2646049486
could you please also test
```rust
let tb = ctx.table("test").await.unwrap();
tb.write_csv(".csv", DataFrameWriteOptions::new(),
None).await.unwrap();
```
chenkovsky commented on PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2645665768
1. could you please add more dataframe tests?
e.g.
```
assert!(ctx.table("test").await.unwrap().union(ctx.table("test").await.unwrap()).unwrap().select(vec![col("_rowi
jayzhan211 commented on code in PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#discussion_r1946315799
##
datafusion/common/src/dfschema.rs:
##
@@ -1056,6 +1079,107 @@ pub fn qualified_name(qualifier:
Option<&TableReference>, name: &str) -> String
}
}
+/
adriangb commented on PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2628399078
Makes sense.
I *think* I've gotten all of the important cases here working, could you
take a look at the tests and see if you are happy with them?
I will note that if w
chenkovsky commented on PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2626887571
> One thing I didn't understand from you PR @chenkovsky: you got it to work
only modifying TableScan? I had trouble understanding when I was grokking your
PR. If you have a better
adriangb commented on PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2626145613
For now I pushed
https://github.com/apache/datafusion/pull/14362/commits/c2b58ee1632b7090188f4f7d5af6c78fbf40462e
which has a test that I think covers all of the cases discussed.
adriangb commented on PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2626122404
One thing I didn't understand from you PR @chenkovsky: you got it to work
only modifying TableScan? I had trouble understanding when I was grokking your
PR. If you have a better way
adriangb commented on PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2625987325
I see. That makes sense for something like row_id: once you write it to a
new file it may not be valid anymore. But for other metadata you might want to
round trip it as a system co
chenkovsky commented on PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2625969618
> > > So you're worried that if a Parquet file has this special metadata on
a field we will wrongly interpret it as a system column? Or are you saying
that's a good thing / the go
adriangb commented on PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#issuecomment-262595
See also https://github.com/apache/datafusion/issues/12736
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use t
adriangb commented on PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2625961004
> > So you're worried that if a Parquet file has this special metadata on a
field we will wrongly interpret it as a system column? Or are you saying that's
a good thing / the goal?
chenkovsky commented on PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2625954231
> So you're worried that if a Parquet file has this special metadata on a
field we will wrongly interpret it as a system column? Or are you saying that's
a good thing / the goal?
adriangb commented on PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2625952339
So you're worried that if a Parquet file has this special metadata on a
field we will wrongly interpret it as a system column? Or are you saying that's
a good thing / the goal?
--
chenkovsky commented on PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2625942249
> I don't think (2) or (3) are real concerns. Things manipulating metadata
should already be careful about not clobbering existing metadata and I really,
really don't think the pe
chenkovsky commented on PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2625266767
> > Another thing I haven't checked for this approach is that system column
doesn't only apply on scan. it can also be applied on Projection, Join,
SubqueryAlias
>
> Could
adriangb commented on PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2625004316
> Another thing I haven't checked for this approach is that system column
doesn't only apply on scan. it can also be applied on Projection, Join,
SubqueryAlias
Could you give
adriangb commented on PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2624999006
For (1) yes implementers of TableProvider will have to decide what fields
they return, but isn't that already the case? I don't know how DataFusion would
automatically determine wha
adriangb commented on PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2624991055
I don't think (2) or (3) are real concerns. Things manipulating metadata
should already be careful about not clobbering existing metadata and I really,
really don't think the perfor
chenkovsky commented on PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2624968962
i dont mean that this way cannot reach the target. but when i use spark
style api i dont need to take care it. it's battery included. no need to call
extra api to do it.
pl
chenkovsky commented on PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2624764005
> @chenkovsky does 3cfce67b035c82f3c442c7e12fb50c2cfb836a80 satisfy the use
case?
yes,it's the case. but here you decided whether this is system field. it
should be decided
adriangb commented on PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2624775441
> it should be decided by table provider automatically
I'm not sure I follow. At some point you have to mark a field as a system
column. I used explicit batches / schemas just to
adriangb commented on PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2624730749
@chenkovsky does aa5abb94774faa9a607361232c6e5784b5ac1744 satisfy the use
case?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log
chenkovsky commented on PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2624083257
by the way, I also encapsulate METADATA_OFFSET.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL ab
chenkovsky commented on PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2624080673
Yes, it supports. it's my fault. I have added the missing ut.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and
adriangb commented on PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2623202562
Was that working in #14057? I didn't see a test for it.
Hypothetically speaking we could do something in DFSchema to deduplicate but
I worry that won't make it work e.g. we'll
chenkovsky commented on PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2623174593
> Is this that important to support? The example seems a bit contrived, I
think it'd be more reasonable if it occurred naturally as part of a join or
something where a user could
adriangb commented on PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2623148014
I will still give a shot at adding that feature tomorrow. But I'm not sold
on the behavior being ideal even if that's what Spark does. Besides if it's an
error now we can always mak
chenkovsky commented on PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2623096983
something like this.
```rust
#[tokio::test]
async fn test_name_conflict() {
let batch = record_batch!(
("_rowid", UInt32, [0, 1, 2]),
("_ro
adriangb commented on PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2623125556
Is this that important to support? The example seems a bit contrived, I
think it'd be more reasonable if it occurred naturally as part of a join or
something where a user could une
adriangb commented on PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2622993100
@chenkovsky could you maybe translate that into a test that we can add? I'm
having trouble imagining in what sorts of situations this would apply.
Generally in SQL if you have two c
chenkovsky commented on PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2622977789
as described in document,
https://spark.apache.org/docs/3.5.1/api/java/org/apache/spark/sql/connector/catalog/SupportsMetadataColumns.html
If a table column and a metadata c
adriangb commented on PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2622510720
@chenkovsky @jayzhan211 could you please review?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL abo
adriangb commented on code in PR #14362:
URL: https://github.com/apache/datafusion/pull/14362#discussion_r1934364024
##
datafusion/expr/src/utils.rs:
##
@@ -736,11 +802,18 @@ pub fn exprlist_to_fields<'a>(
.into_iter()
.map(|c| c.flat_na
39 matches
Mail list logo