Micah: I didn't know that Flatbuffers supported serialization to/from JSON,
thanks. That seems like a very good start, at least. I'll aim to create a
draft pull request that at least wires everything up in Arrow so we can
load/save a Schema.fbs instance from/to JSON. At least it'll make it easier
for me to see how Arrow schemas would look in JSON with that.

Otherwise, I'm still gathering requirements internally here. For example,
one thing that would be nice would be to be able to output a JSON Schema
from at least a subset of the Arrow schema. (That way our users could start
by passing around JSON with a given schema, and transition pieces of a
workflow to Arrow as they're ready.) But that part can also be done outside
of the Arrow code, if deemed not relevant to have in the Arrow codebase
itself.

One core requirement for us, however, would be eventual compatibility
between Arrow versions for a given text representation of a schema.
Meaning, if you have a text description of a given Arrow schema, you can
load it into different versions of Arrow and it creates a valid Schema
Flatbuffer description, that Arrow can use. Wes, were you thinking of that,
or of something else, when you wrote "only makes sense if it is offered
without any backward/forward compatibility guarantees"?

For the now, or me, assuming the JSON serialization done by the Flatbuffer
libraries is usable, it seems we have all the pieces to make this happen:
1) The binary Schema.fbs data structures has to be compatible between
different versions of Arrow, otherwise two processes with different Arrow
versions won't be able to interoperate, no?
2) The Flatbuffer <-> JSON serialization supplied by the Flatbuffers
library also has to be compatible between different versions of the
Flatbuffers library, since the main use case seems to be storing
Flatbuffers assets into version control. Breaking changes there will also
be painful to their users.

With these two together, it would seem not too difficult to create a text
representation for Arrow schemas that (at some point) has some
compatibility guarantees, but maybe I'm missing something?

Thanks,

  Christian

Le lun. 9 déc. 2019, à 07 h 00, Wes McKinney <wesmck...@gmail.com> a écrit :

> The only "canonical" representation of schemas at the moment is the
> Flatbuffers data structure [1]
>
> Having a human-readable/parseable text representation I think only
> makes sense if it is offered without any backward/forward
> compatibility guarantees.
>
> Note I had previously opened
> https://issues.apache.org/jira/browse/ARROW-3730 where I noted that
> there's no way (aside from generating the Flatbuffers messages) to
> generate a schema representation that can be used later to reconstruct
> a schema in a program. If such a representation were human
> readable/editable that seems beneficial.
>
>
>
> [1]: https://github.com/apache/arrow/blob/master/format/Schema.fbs
>
> On Sat, Dec 7, 2019 at 11:56 AM Maarten Ballintijn <maart...@xs4all.nl>
> wrote:
> >
> >
> > Is there a syntax specified for schemas?
> >
> > Cheers,
> > Maarten.
> >
> >
> > > On Dec 6, 2019, at 5:01 PM, Micah Kornfield <emkornfi...@gmail.com>
> wrote:
> > >
> > > Hi Christian,
> > > As far as I know no-one is working on a canonical text representation
> for
> > > schemas.  A JSON serializer exists for integration test purposes, but
> > > IMO it shouldn't be relied upon as canonical.
> > >
> > > It looks like Flatbuffers supports serialization to/from JSON [1
> > > <https://google.github.io/flatbuffers/flatbuffers_guide_use_cpp.html
> >],
> > > using that functionality might be a promising avenue to pursue for a
> human
> > > readable schema. I could see adding a helper method someplace under
> IPC for
> > > this.  Would that meet your needs?  I think if there are other
> > > requirements, then a proposal would be welcome.  Ideally, a solution
> would
> > > not require additional build/runtime dependencies.
> > >
> > >
> > > Thanks,
> > > Micah
> > >
> > > [1] See Text & schema parsing
> > > https://google.github.io/flatbuffers/flatbuffers_guide_use_cpp.html
> > >
> > > On Fri, Dec 6, 2019 at 1:26 PM Christian Hudon <chr...@elementai.com>
> wrote:
> > >
> > >> Hi,
> > >>
> > >> For the uses I would like to make of Arrow, I would need a
> human-readable
> > >> and -writable version of an Arrow Schema, that could be converted to
> and
> > >> from the Arrow Schema C++ object. Going through the doc for 0.15.1, I
> don't
> > >> see anything to that effect, with the closest being the ToString()
> method
> > >> on DataType instances, but which is meant for debugging only. (I need
> an
> > >> expression of an Arrow Schema that people can read, and that can live
> > >> outside of the code for a particular operation.)
> > >>
> > >> Is a text representation of an Arrow Schema something that is being
> worked
> > >> on now? If not, would you folks be interested in me putting up an
> initial
> > >> proposal for discussion? Any design constraints I should pay
> attention to,
> > >> then?
> > >>
> > >> Thanks,
> > >>
> > >>  Christian
> > >> --
> > >>
> > >>
> > >> │ Christian Hudon
> > >>
> > >> │ Applied Research Scientist
> > >>
> > >>   Element AI, 6650 Saint-Urbain #500
> > >>
> > >>   Montréal, QC, H2S 3G9, Canada
> > >>   Elementai.com
> > >>
> >
>


-- 


│ Christian Hudon

│ Applied Research Scientist

   Element AI, 6650 Saint-Urbain #500

   Montréal, QC, H2S 3G9, Canada
   Elementai.com

Reply via email to