> I’ve done something like this in the past. It was two parts - first figure > out the desired schema and then when reading files make them conform to > that schema.
Good point. So far I've just been focusing on the second part. There is a dataset discovery step that will try and do the first part but it isn't terribly flexible at the moment. Improving this is probably worth consideration as well. On Wed, Nov 9, 2022 at 5:25 PM Ben Chambers <bchamb...@apache.org> wrote: > > I’ve done something like this in the past. It was two parts - first figure > out the desired schema and then when reading files make them conform to > that schema. > > The first step could be by specifying the schema or by unioning the > schemas. Fields appearing in only some files are treated as null in the > others. Fields with different types are up cast. > > The second step then involves for each file figuring out how to convert to > the desired. I found it easiest to do this per column of the desired > schema. Then it can be (1) reference a column (2) reference a column and > cast or (3) create a column of nulls of a given type. > > Is something like that you had in mind? > > On Wed, Nov 9, 2022 at 5:11 PM Weston Pace <weston.p...@gmail.com> wrote: > > > From a datasets / Acero perspective I have been thinking about this in > > the back of my mind for a while and decided to write my thoughts down > > in a document. I will send it in a separate email. > > > > On Tue, Nov 8, 2022 at 9:53 AM Micah Kornfield <emkornfi...@gmail.com> > > wrote: > > > > > > Hi Matthew, > > > Could you give some more specifics about what language/component you are > > > using. In general, Arrow at a specification level doesn't deal with > > schema > > > evolution. Is this in regard to Datasets or a different component? > > > > > > Thanks, > > > Micah > > > > > > On Mon, Nov 7, 2022 at 5:06 PM Matthew Scanlon < > > > matthew.scan...@exosfinancial.com> wrote: > > > > > > > Good afternoon, I wanted to reach out and open a dialog about structs, > > the > > > > evolution of them in schemas, and if support for such a feature is on > > the > > > > road map or a hard pass for the arrow team. > > > > > > > > Currently, it appears structs support removing a field, but will there > > be > > > > support for adding fields later on? Are there any recommended patterns > > for > > > > supporting such a field. For example, if a field foo is a struct with > > > > sub_fields A, B and then later field C gets added, the old data can > > not be > > > > loaded using the new schema. > > > > > > > > Thank you. > > > > > > > > Matthew Scanlon > > > > > > > > -- > > > > > > > > > > > > Broker-Dealer services offered through Exos Securities LLC, member of > > > > SIPC <http://www.sipc.org/> / FINRA <http://www.finra.org/> / > > > > BrokerCheck > > > > <https://brokercheck.finra.org/>/ 2022 Exos, inc. For important > > > > disclosures, click here > > > > <https://www.exosfinancial.com/general-disclosures>. > > > > > > > > > > > > > > > > > >