Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-05-01 Thread via GitHub
alamb commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2845809583 See discussion on https://github.com/apache/datafusion/issues/15173 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-04-30 Thread via GitHub
github-actions[bot] closed pull request #14362: Support marking columns as system columns via Field's metadata URL: https://github.com/apache/datafusion/pull/14362 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-04-21 Thread via GitHub
github-actions[bot] commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2819872465 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-02-20 Thread via GitHub
adriangb commented on code in PR #14362: URL: https://github.com/apache/datafusion/pull/14362#discussion_r1964480019 ## datafusion/common/src/dfschema.rs: ## @@ -1056,6 +1079,107 @@ pub fn qualified_name(qualifier: Option<&TableReference>, name: &str) -> String } } +///

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-02-20 Thread via GitHub
adriangb commented on code in PR #14362: URL: https://github.com/apache/datafusion/pull/14362#discussion_r1964478571 ## datafusion/common/src/dfschema.rs: ## @@ -1056,6 +1079,107 @@ pub fn qualified_name(qualifier: Option<&TableReference>, name: &str) -> String } } +///

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-02-08 Thread via GitHub
chenkovsky commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2646050299 because in this PR system columns are propagated by default. I guess we have to write a lot of monkey patch to prevent system column leak. -- This is an automated message from t

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-02-08 Thread via GitHub
chenkovsky commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2646049486 could you please also test ```rust let tb = ctx.table("test").await.unwrap(); tb.write_csv(".csv", DataFrameWriteOptions::new(), None).await.unwrap(); ```

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-02-08 Thread via GitHub
chenkovsky commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2645665768 1. could you please add more dataframe tests? e.g. ``` assert!(ctx.table("test").await.unwrap().union(ctx.table("test").await.unwrap()).unwrap().select(vec![col("_rowi

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-02-07 Thread via GitHub
jayzhan211 commented on code in PR #14362: URL: https://github.com/apache/datafusion/pull/14362#discussion_r1946315799 ## datafusion/common/src/dfschema.rs: ## @@ -1056,6 +1079,107 @@ pub fn qualified_name(qualifier: Option<&TableReference>, name: &str) -> String } } +/

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-01-31 Thread via GitHub
adriangb commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2628399078 Makes sense. I *think* I've gotten all of the important cases here working, could you take a look at the tests and see if you are happy with them? I will note that if w

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-01-31 Thread via GitHub
chenkovsky commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2626887571 > One thing I didn't understand from you PR @chenkovsky: you got it to work only modifying TableScan? I had trouble understanding when I was grokking your PR. If you have a better

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-01-30 Thread via GitHub
adriangb commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2626145613 For now I pushed https://github.com/apache/datafusion/pull/14362/commits/c2b58ee1632b7090188f4f7d5af6c78fbf40462e which has a test that I think covers all of the cases discussed.

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-01-30 Thread via GitHub
adriangb commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2626122404 One thing I didn't understand from you PR @chenkovsky: you got it to work only modifying TableScan? I had trouble understanding when I was grokking your PR. If you have a better way

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-01-30 Thread via GitHub
adriangb commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2625987325 I see. That makes sense for something like row_id: once you write it to a new file it may not be valid anymore. But for other metadata you might want to round trip it as a system co

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-01-30 Thread via GitHub
chenkovsky commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2625969618 > > > So you're worried that if a Parquet file has this special metadata on a field we will wrongly interpret it as a system column? Or are you saying that's a good thing / the go

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-01-30 Thread via GitHub
adriangb commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-262595 See also https://github.com/apache/datafusion/issues/12736 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-01-30 Thread via GitHub
adriangb commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2625961004 > > So you're worried that if a Parquet file has this special metadata on a field we will wrongly interpret it as a system column? Or are you saying that's a good thing / the goal?

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-01-30 Thread via GitHub
chenkovsky commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2625954231 > So you're worried that if a Parquet file has this special metadata on a field we will wrongly interpret it as a system column? Or are you saying that's a good thing / the goal?

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-01-30 Thread via GitHub
adriangb commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2625952339 So you're worried that if a Parquet file has this special metadata on a field we will wrongly interpret it as a system column? Or are you saying that's a good thing / the goal? --

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-01-30 Thread via GitHub
chenkovsky commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2625942249 > I don't think (2) or (3) are real concerns. Things manipulating metadata should already be careful about not clobbering existing metadata and I really, really don't think the pe

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-01-30 Thread via GitHub
chenkovsky commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2625266767 > > Another thing I haven't checked for this approach is that system column doesn't only apply on scan. it can also be applied on Projection, Join, SubqueryAlias > > Could

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-01-30 Thread via GitHub
adriangb commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2625004316 > Another thing I haven't checked for this approach is that system column doesn't only apply on scan. it can also be applied on Projection, Join, SubqueryAlias Could you give

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-01-30 Thread via GitHub
adriangb commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2624999006 For (1) yes implementers of TableProvider will have to decide what fields they return, but isn't that already the case? I don't know how DataFusion would automatically determine wha

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-01-30 Thread via GitHub
adriangb commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2624991055 I don't think (2) or (3) are real concerns. Things manipulating metadata should already be careful about not clobbering existing metadata and I really, really don't think the perfor

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-01-30 Thread via GitHub
chenkovsky commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2624968962 i dont mean that this way cannot reach the target. but when i use spark style api i dont need to take care it. it's battery included. no need to call extra api to do it. pl

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-01-30 Thread via GitHub
chenkovsky commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2624764005 > @chenkovsky does 3cfce67b035c82f3c442c7e12fb50c2cfb836a80 satisfy the use case? yes,it's the case. but here you decided whether this is system field. it should be decided

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-01-30 Thread via GitHub
adriangb commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2624775441 > it should be decided by table provider automatically I'm not sure I follow. At some point you have to mark a field as a system column. I used explicit batches / schemas just to

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-01-30 Thread via GitHub
adriangb commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2624730749 @chenkovsky does aa5abb94774faa9a607361232c6e5784b5ac1744 satisfy the use case? -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-01-30 Thread via GitHub
chenkovsky commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2624083257 by the way, I also encapsulate METADATA_OFFSET. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-01-30 Thread via GitHub
chenkovsky commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2624080673 Yes, it supports. it's my fault. I have added the missing ut. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-01-29 Thread via GitHub
adriangb commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2623202562 Was that working in #14057? I didn't see a test for it. Hypothetically speaking we could do something in DFSchema to deduplicate but I worry that won't make it work e.g. we'll

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-01-29 Thread via GitHub
chenkovsky commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2623174593 > Is this that important to support? The example seems a bit contrived, I think it'd be more reasonable if it occurred naturally as part of a join or something where a user could

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-01-29 Thread via GitHub
adriangb commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2623148014 I will still give a shot at adding that feature tomorrow. But I'm not sold on the behavior being ideal even if that's what Spark does. Besides if it's an error now we can always mak

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-01-29 Thread via GitHub
chenkovsky commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2623096983 something like this. ```rust #[tokio::test] async fn test_name_conflict() { let batch = record_batch!( ("_rowid", UInt32, [0, 1, 2]), ("_ro

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-01-29 Thread via GitHub
adriangb commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2623125556 Is this that important to support? The example seems a bit contrived, I think it'd be more reasonable if it occurred naturally as part of a join or something where a user could une

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-01-29 Thread via GitHub
adriangb commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2622993100 @chenkovsky could you maybe translate that into a test that we can add? I'm having trouble imagining in what sorts of situations this would apply. Generally in SQL if you have two c

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-01-29 Thread via GitHub
chenkovsky commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2622977789 as described in document, https://spark.apache.org/docs/3.5.1/api/java/org/apache/spark/sql/connector/catalog/SupportsMetadataColumns.html If a table column and a metadata c

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-01-29 Thread via GitHub
adriangb commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2622510720 @chenkovsky @jayzhan211 could you please review? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-01-29 Thread via GitHub
adriangb commented on code in PR #14362: URL: https://github.com/apache/datafusion/pull/14362#discussion_r1934364024 ## datafusion/expr/src/utils.rs: ## @@ -736,11 +802,18 @@ pub fn exprlist_to_fields<'a>( .into_iter() .map(|c| c.flat_na