Hello,
I just wanted to follow up on our previous conversation and gain a bit more
insight into the behavior of pyarrow tables reading from/writing to
dataframes.
I have noticed some interesting behavior related to the issue mentioned
above; specifically that this does not seem to be an issue when
Hi Micah,
I was wondering where the arrow project stands on this issue, as it looks
like there are not many work arounds to using
pyarrow.list_(pyarrow.struct()) as many other datatypes that would "fit the
bill" of what a list of structs achieves raises a
pyarrow.lib.ArrowNotImplementedError when c
Hi Micah. Sorry for the late reply as I have been on holiday.
I am referring to datasets. And this was specifically noticed in python
though I would imagine the issue can be abstracted to other languages as
well
On Tue, Nov 8, 2022 at 12:53 PM Micah Kornfield
wrote:
> Hi Matthew,
> Could you gi
IIRC the discovery step does already try to unify the schemas, it's just that
right now, schema unification is basically not implemented. There's a
long-standing Jira/PR [1] that might be good for someone to pick up and push
over the finish line.
[1]: https://github.com/apache/arrow/pull/12000
> I’ve done something like this in the past. It was two parts - first figure
> out the desired schema and then when reading files make them conform to
> that schema.
Good point. So far I've just been focusing on the second part. There
is a dataset discovery step that will try and do the first pa
I’ve done something like this in the past. It was two parts - first figure
out the desired schema and then when reading files make them conform to
that schema.
The first step could be by specifying the schema or by unioning the
schemas. Fields appearing in only some files are treated as null in th
>From a datasets / Acero perspective I have been thinking about this in
the back of my mind for a while and decided to write my thoughts down
in a document. I will send it in a separate email.
On Tue, Nov 8, 2022 at 9:53 AM Micah Kornfield wrote:
>
> Hi Matthew,
> Could you give some more specif
Hi Matthew,
Could you give some more specifics about what language/component you are
using. In general, Arrow at a specification level doesn't deal with schema
evolution. Is this in regard to Datasets or a different component?
Thanks,
Micah
On Mon, Nov 7, 2022 at 5:06 PM Matthew Scanlon <
matth