Re: [EXT] Re: Struct evolution

2022-12-14 Thread Matthew Scanlon
Hello, I just wanted to follow up on our previous conversation and gain a bit more insight into the behavior of pyarrow tables reading from/writing to dataframes. I have noticed some interesting behavior related to the issue mentioned above; specifically that this does not seem to be an issue when

Re: [EXT] Re: Struct evolution

2022-11-28 Thread Matthew Scanlon
Hi Micah, I was wondering where the arrow project stands on this issue, as it looks like there are not many work arounds to using pyarrow.list_(pyarrow.struct()) as many other datatypes that would "fit the bill" of what a list of structs achieves raises a pyarrow.lib.ArrowNotImplementedError when c

Re: [EXT] Re: Struct evolution

2022-11-14 Thread Matthew Scanlon
Hi Micah. Sorry for the late reply as I have been on holiday. I am referring to datasets. And this was specifically noticed in python though I would imagine the issue can be abstracted to other languages as well On Tue, Nov 8, 2022 at 12:53 PM Micah Kornfield wrote: > Hi Matthew, > Could you gi

Re: Struct evolution

2022-11-10 Thread David Li
IIRC the discovery step does already try to unify the schemas, it's just that right now, schema unification is basically not implemented. There's a long-standing Jira/PR [1] that might be good for someone to pick up and push over the finish line. [1]: https://github.com/apache/arrow/pull/12000

Re: Struct evolution

2022-11-10 Thread Weston Pace
> I’ve done something like this in the past. It was two parts - first figure > out the desired schema and then when reading files make them conform to > that schema. Good point. So far I've just been focusing on the second part. There is a dataset discovery step that will try and do the first pa

Re: Struct evolution

2022-11-09 Thread Ben Chambers
I’ve done something like this in the past. It was two parts - first figure out the desired schema and then when reading files make them conform to that schema. The first step could be by specifying the schema or by unioning the schemas. Fields appearing in only some files are treated as null in th

Re: Struct evolution

2022-11-09 Thread Weston Pace
>From a datasets / Acero perspective I have been thinking about this in the back of my mind for a while and decided to write my thoughts down in a document. I will send it in a separate email. On Tue, Nov 8, 2022 at 9:53 AM Micah Kornfield wrote: > > Hi Matthew, > Could you give some more specif

Re: Struct evolution

2022-11-08 Thread Micah Kornfield
Hi Matthew, Could you give some more specifics about what language/component you are using. In general, Arrow at a specification level doesn't deal with schema evolution. Is this in regard to Datasets or a different component? Thanks, Micah On Mon, Nov 7, 2022 at 5:06 PM Matthew Scanlon < matth