kosiew closed pull request #15295: Enhance Schema adapter to accommodate
evolving struct
URL: https://github.com/apache/datafusion/pull/15295
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the spec
kosiew commented on PR #15295:
URL: https://github.com/apache/datafusion/pull/15295#issuecomment-3043339058
Reimplementation completed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specifi
kosiew commented on PR #15295:
URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2901313004
Thanks @alamb , @adriangb for the review.
Working on the smaller PRs..
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to
adriangb commented on code in PR #15295:
URL: https://github.com/apache/datafusion/pull/15295#discussion_r2100981813
##
datafusion/core/src/datasource/listing/table.rs:
##
@@ -1178,6 +1207,31 @@ impl ListingTable {
}
}
+/// Extension trait for FileSource to allow schema
alamb commented on code in PR #15295:
URL: https://github.com/apache/datafusion/pull/15295#discussion_r2100961322
##
datafusion/datasource/src/nested_schema_adapter.rs:
##
@@ -0,0 +1,943 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor li
alamb commented on PR #15295:
URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2898922232
> This PR is large and cumbersome to review.
> I propose to close it and re-implement as:
The break up as you suggest sounds reasonable to me
@mbutrovich and @adrian
TheBuilderJR commented on PR #15295:
URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2895687205
@kosiew that sounds good but please keep this working branch around on the
`schema-adapter` branch. Maybe you can cherry-pick the changes onto a new
branch when you break these
kosiew commented on PR #15295:
URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2893829674
I propose to close this large PR and re-implement as:
## PR 1: Extract and test core SchemaAdapter helpers
**Description**
Pull the field‐mapping logic out of `Defaul
TheBuilderJR commented on PR #15295:
URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2887827231
Ok I verified it works. Can you update the PR to remove all the extra
logging and put it up for review? This is amazing work. Thank you!
cc @alamb any chance you can take
TheBuilderJR commented on PR #15295:
URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2879192184
@kosiew sounds good, I can’t get to a computer for the next few days but can
try afterwards. In the meantime can you try one last thing - in your test order
the files backwards.
TheBuilderJR commented on PR #15295:
URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2875527091
@kosiew in case it's helpful I ended up building your datafusion fork (on
branch schema-adapter) locally so you can get more useful stack traces. See
below:
```
Error
TheBuilderJR commented on PR #15295:
URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2875480275
@kosiew did you update the code or just add a test? I'm getting the same
error
```
Error fetching table metadata: Failed to collect data frame results:
Shared(ArrowErr
kosiew commented on PR #15295:
URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2871799892
hi @TheBuilderJR ,
I pushed a simpler test_datafusion_schema_evolution-
https://github.com/apache/datafusion/pull/15295/files#diff-f73022e8850396e8f5595b58e1b24a7fd08499dc6ac
TheBuilderJR commented on PR #15295:
URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2865192764
@kosiew thanks I tried that but am getting this error:
```
Error fetching table metadata: Failed to collect data frame results:
Shared(ArrowError(ExternalError(Executio
kosiew commented on PR #15295:
URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2864869245
> there's 10 v1 files, 5 v2 files, 3 v3 files and 1 v4 files. Ideally
ListingTableConfig could just derive the mapping from each. Is that possible
with your abstraction?
# Here
TheBuilderJR commented on PR #15295:
URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2861173475
@kosiew so I think the tricky part is that there are actually multiple
evolutions.
Basically my code currenty looks like this
```
let con
kosiew commented on PR #15295:
URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2857995748
hi @TheBuilderJR ,
In the [2 schemas you
quoted](https://github.com/apache/datafusion/pull/15295#issuecomment-2851094063),
there are:
1. file_schema (from the parquet files)
kosiew opened a new pull request, #15295:
URL: https://github.com/apache/datafusion/pull/15295
## Which issue does this PR close?
- Closes #14757.
## Rationale for this change
[arrow-rs suggests that SchemaAdapter is better approach for handling
evolving
struct](https:/
kosiew commented on PR #15295:
URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2856949489
@TheBuilderJR ,
Thanks for providing the schemas.
I'll look at this again and update later.
--
This is an automated message from the Apache Git Service.
To respond to the messa
TheBuilderJR commented on PR #15295:
URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2851094063
Actually this is the two schemas:
Error fetching table metadata: Failed to collect data frame results:
Shared(Plan("Cannot cast file schema field additionalInfo of type St
TheBuilderJR commented on PR #15295:
URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2851086728
@kosiew sorry I just saw this, I don't remember the schemas but happy to
test again. Is this PR ready?
--
This is an automated message from the Apache Git Service.
To respond
kosiew commented on PR #15295:
URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2837154356
Closing, no update from @TheBuilderJR
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
kosiew closed pull request #15295: Enhance Schema adapter to accommodate
evolving struct
URL: https://github.com/apache/datafusion/pull/15295
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the spec
kosiew commented on PR #15295:
URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2743256589
hi @TheBuilderJR
I haven't tidied up the PR yet.
Here's how I used the NestedSchema in
test_datafusion_schema_evolution_with_compaction
```rust
use datafusion:
kosiew commented on PR #15295:
URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2764961704
@TheBuilderJR ,
Thanks for testing this.
`Unexpected batch schema from file, expected 1 cols but got 12`
Can you share the 2 schemas?
--
This is an automated
TheBuilderJR commented on PR #15295:
URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2764720923
I tried patching this in and getting these failures. Any idea why?
Unfortunately I can't share the full source.
```
Error fetching table metadata: Failed to collect dat
kosiew commented on code in PR #15295:
URL: https://github.com/apache/datafusion/pull/15295#discussion_r2013310548
##
datafusion/datasource/src/nested_schema_adapter.rs:
##
@@ -0,0 +1,868 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor l
TheBuilderJR commented on code in PR #15295:
URL: https://github.com/apache/datafusion/pull/15295#discussion_r2012299357
##
datafusion/datasource/src/nested_schema_adapter.rs:
##
@@ -0,0 +1,868 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contrib
kosiew commented on PR #15295:
URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2750776919
hi @TheBuilderJR ,
I looked at #15259 and wanted to try incorporating its `struct_coercion`.
But the amended `struct_coercion` caused `sqllogictests` to fail. So I did not
TheBuilderJR commented on PR #15295:
URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2745948813
Nice! Fwiw another edge case I found recently that's probably worth testing
is a List where the Struct evolves. I ended up solving it by updating
list_coersion but curious if yo
TheBuilderJR commented on PR #15295:
URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2739119123
@kosiew any chance you can try running the test case in
https://github.com/apache/datafusion/issues/14757? It's a real world example of
schema evolution that I hope can be solve
kosiew commented on code in PR #15295:
URL: https://github.com/apache/datafusion/pull/15295#discussion_r2000486806
##
datafusion/datasource/src/nested_schema_adapter.rs:
##
@@ -0,0 +1,582 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor l
kosiew commented on code in PR #15295:
URL: https://github.com/apache/datafusion/pull/15295#discussion_r2000486806
##
datafusion/datasource/src/nested_schema_adapter.rs:
##
@@ -0,0 +1,582 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor l
kosiew opened a new pull request, #15295:
URL: https://github.com/apache/datafusion/pull/15295
## Which issue does this PR close?
- Closes #14757.
## Rationale for this change
This PR introduces a `NestedStructSchemaAdapter` to improve schema evolution
handling in DataFu
34 matches
Mail list logo