>
> With these two together, it would seem not too difficult to create a text
> representation for Arrow schemas that (at some point) has some
> compatibility guarantees, but maybe I'm missing something?


I think the main risk is if somehow flatbuffers JSON parsing doesn't handle
backward compatible changes to the arrow schema message.  Given the way the
documentation is describing the JSON functionality I think this would be
considered a bug.

The one downside to calling the "schema" canonical is the flatbuffers JSON
functionality only appears to be available in C++ and Java via JNI, so it
wouldn't have cross language support.  I think this issue is more one of
semantics though (i.e. does the JSON description become part of the "Arrow
spec" or does it live as a C++/Python only feature).

-Micah


On Tue, Dec 10, 2019 at 10:51 AM Christian Hudon <chr...@elementai.com>
wrote:

> Micah: I didn't know that Flatbuffers supported serialization to/from JSON,
> thanks. That seems like a very good start, at least. I'll aim to create a
> draft pull request that at least wires everything up in Arrow so we can
> load/save a Schema.fbs instance from/to JSON. At least it'll make it easier
> for me to see how Arrow schemas would look in JSON with that.
>
> Otherwise, I'm still gathering requirements internally here. For example,
> one thing that would be nice would be to be able to output a JSON Schema
> from at least a subset of the Arrow schema. (That way our users could start
> by passing around JSON with a given schema, and transition pieces of a
> workflow to Arrow as they're ready.) But that part can also be done outside
> of the Arrow code, if deemed not relevant to have in the Arrow codebase
> itself.
>
> One core requirement for us, however, would be eventual compatibility
> between Arrow versions for a given text representation of a schema.
> Meaning, if you have a text description of a given Arrow schema, you can
> load it into different versions of Arrow and it creates a valid Schema
> Flatbuffer description, that Arrow can use. Wes, were you thinking of that,
> or of something else, when you wrote "only makes sense if it is offered
> without any backward/forward compatibility guarantees"?
>
> For the now, or me, assuming the JSON serialization done by the Flatbuffer
> libraries is usable, it seems we have all the pieces to make this happen:
> 1) The binary Schema.fbs data structures has to be compatible between
> different versions of Arrow, otherwise two processes with different Arrow
> versions won't be able to interoperate, no?
> 2) The Flatbuffer <-> JSON serialization supplied by the Flatbuffers
> library also has to be compatible between different versions of the
> Flatbuffers library, since the main use case seems to be storing
> Flatbuffers assets into version control. Breaking changes there will also
> be painful to their users.
>
> With these two together, it would seem not too difficult to create a text
> representation for Arrow schemas that (at some point) has some
> compatibility guarantees, but maybe I'm missing something?
>
> Thanks,
>
>   Christian
>
> Le lun. 9 déc. 2019, à 07 h 00, Wes McKinney <wesmck...@gmail.com> a
> écrit :
>
> > The only "canonical" representation of schemas at the moment is the
> > Flatbuffers data structure [1]
> >
> > Having a human-readable/parseable text representation I think only
> > makes sense if it is offered without any backward/forward
> > compatibility guarantees.
> >
> > Note I had previously opened
> > https://issues.apache.org/jira/browse/ARROW-3730 where I noted that
> > there's no way (aside from generating the Flatbuffers messages) to
> > generate a schema representation that can be used later to reconstruct
> > a schema in a program. If such a representation were human
> > readable/editable that seems beneficial.
> >
> >
> >
> > [1]: https://github.com/apache/arrow/blob/master/format/Schema.fbs
> >
> > On Sat, Dec 7, 2019 at 11:56 AM Maarten Ballintijn <maart...@xs4all.nl>
> > wrote:
> > >
> > >
> > > Is there a syntax specified for schemas?
> > >
> > > Cheers,
> > > Maarten.
> > >
> > >
> > > > On Dec 6, 2019, at 5:01 PM, Micah Kornfield <emkornfi...@gmail.com>
> > wrote:
> > > >
> > > > Hi Christian,
> > > > As far as I know no-one is working on a canonical text representation
> > for
> > > > schemas.  A JSON serializer exists for integration test purposes, but
> > > > IMO it shouldn't be relied upon as canonical.
> > > >
> > > > It looks like Flatbuffers supports serialization to/from JSON [1
> > > > <https://google.github.io/flatbuffers/flatbuffers_guide_use_cpp.html
> > >],
> > > > using that functionality might be a promising avenue to pursue for a
> > human
> > > > readable schema. I could see adding a helper method someplace under
> > IPC for
> > > > this.  Would that meet your needs?  I think if there are other
> > > > requirements, then a proposal would be welcome.  Ideally, a solution
> > would
> > > > not require additional build/runtime dependencies.
> > > >
> > > >
> > > > Thanks,
> > > > Micah
> > > >
> > > > [1] See Text & schema parsing
> > > > https://google.github.io/flatbuffers/flatbuffers_guide_use_cpp.html
> > > >
> > > > On Fri, Dec 6, 2019 at 1:26 PM Christian Hudon <chr...@elementai.com
> >
> > wrote:
> > > >
> > > >> Hi,
> > > >>
> > > >> For the uses I would like to make of Arrow, I would need a
> > human-readable
> > > >> and -writable version of an Arrow Schema, that could be converted to
> > and
> > > >> from the Arrow Schema C++ object. Going through the doc for 0.15.1,
> I
> > don't
> > > >> see anything to that effect, with the closest being the ToString()
> > method
> > > >> on DataType instances, but which is meant for debugging only. (I
> need
> > an
> > > >> expression of an Arrow Schema that people can read, and that can
> live
> > > >> outside of the code for a particular operation.)
> > > >>
> > > >> Is a text representation of an Arrow Schema something that is being
> > worked
> > > >> on now? If not, would you folks be interested in me putting up an
> > initial
> > > >> proposal for discussion? Any design constraints I should pay
> > attention to,
> > > >> then?
> > > >>
> > > >> Thanks,
> > > >>
> > > >>  Christian
> > > >> --
> > > >>
> > > >>
> > > >> │ Christian Hudon
> > > >>
> > > >> │ Applied Research Scientist
> > > >>
> > > >>   Element AI, 6650 Saint-Urbain #500
> > > >>
> > > >>   Montréal, QC, H2S 3G9, Canada
> > > >>   Elementai.com
> > > >>
> > >
> >
>
>
> --
>
>
> │ Christian Hudon
>
> │ Applied Research Scientist
>
>    Element AI, 6650 Saint-Urbain #500
>
>    Montréal, QC, H2S 3G9, Canada
>    Elementai.com
>

Reply via email to