> > With these two together, it would seem not too difficult to create a text > representation for Arrow schemas that (at some point) has some > compatibility guarantees, but maybe I'm missing something?
I think the main risk is if somehow flatbuffers JSON parsing doesn't handle backward compatible changes to the arrow schema message. Given the way the documentation is describing the JSON functionality I think this would be considered a bug. The one downside to calling the "schema" canonical is the flatbuffers JSON functionality only appears to be available in C++ and Java via JNI, so it wouldn't have cross language support. I think this issue is more one of semantics though (i.e. does the JSON description become part of the "Arrow spec" or does it live as a C++/Python only feature). -Micah On Tue, Dec 10, 2019 at 10:51 AM Christian Hudon <chr...@elementai.com> wrote: > Micah: I didn't know that Flatbuffers supported serialization to/from JSON, > thanks. That seems like a very good start, at least. I'll aim to create a > draft pull request that at least wires everything up in Arrow so we can > load/save a Schema.fbs instance from/to JSON. At least it'll make it easier > for me to see how Arrow schemas would look in JSON with that. > > Otherwise, I'm still gathering requirements internally here. For example, > one thing that would be nice would be to be able to output a JSON Schema > from at least a subset of the Arrow schema. (That way our users could start > by passing around JSON with a given schema, and transition pieces of a > workflow to Arrow as they're ready.) But that part can also be done outside > of the Arrow code, if deemed not relevant to have in the Arrow codebase > itself. > > One core requirement for us, however, would be eventual compatibility > between Arrow versions for a given text representation of a schema. > Meaning, if you have a text description of a given Arrow schema, you can > load it into different versions of Arrow and it creates a valid Schema > Flatbuffer description, that Arrow can use. Wes, were you thinking of that, > or of something else, when you wrote "only makes sense if it is offered > without any backward/forward compatibility guarantees"? > > For the now, or me, assuming the JSON serialization done by the Flatbuffer > libraries is usable, it seems we have all the pieces to make this happen: > 1) The binary Schema.fbs data structures has to be compatible between > different versions of Arrow, otherwise two processes with different Arrow > versions won't be able to interoperate, no? > 2) The Flatbuffer <-> JSON serialization supplied by the Flatbuffers > library also has to be compatible between different versions of the > Flatbuffers library, since the main use case seems to be storing > Flatbuffers assets into version control. Breaking changes there will also > be painful to their users. > > With these two together, it would seem not too difficult to create a text > representation for Arrow schemas that (at some point) has some > compatibility guarantees, but maybe I'm missing something? > > Thanks, > > Christian > > Le lun. 9 déc. 2019, à 07 h 00, Wes McKinney <wesmck...@gmail.com> a > écrit : > > > The only "canonical" representation of schemas at the moment is the > > Flatbuffers data structure [1] > > > > Having a human-readable/parseable text representation I think only > > makes sense if it is offered without any backward/forward > > compatibility guarantees. > > > > Note I had previously opened > > https://issues.apache.org/jira/browse/ARROW-3730 where I noted that > > there's no way (aside from generating the Flatbuffers messages) to > > generate a schema representation that can be used later to reconstruct > > a schema in a program. If such a representation were human > > readable/editable that seems beneficial. > > > > > > > > [1]: https://github.com/apache/arrow/blob/master/format/Schema.fbs > > > > On Sat, Dec 7, 2019 at 11:56 AM Maarten Ballintijn <maart...@xs4all.nl> > > wrote: > > > > > > > > > Is there a syntax specified for schemas? > > > > > > Cheers, > > > Maarten. > > > > > > > > > > On Dec 6, 2019, at 5:01 PM, Micah Kornfield <emkornfi...@gmail.com> > > wrote: > > > > > > > > Hi Christian, > > > > As far as I know no-one is working on a canonical text representation > > for > > > > schemas. A JSON serializer exists for integration test purposes, but > > > > IMO it shouldn't be relied upon as canonical. > > > > > > > > It looks like Flatbuffers supports serialization to/from JSON [1 > > > > <https://google.github.io/flatbuffers/flatbuffers_guide_use_cpp.html > > >], > > > > using that functionality might be a promising avenue to pursue for a > > human > > > > readable schema. I could see adding a helper method someplace under > > IPC for > > > > this. Would that meet your needs? I think if there are other > > > > requirements, then a proposal would be welcome. Ideally, a solution > > would > > > > not require additional build/runtime dependencies. > > > > > > > > > > > > Thanks, > > > > Micah > > > > > > > > [1] See Text & schema parsing > > > > https://google.github.io/flatbuffers/flatbuffers_guide_use_cpp.html > > > > > > > > On Fri, Dec 6, 2019 at 1:26 PM Christian Hudon <chr...@elementai.com > > > > wrote: > > > > > > > >> Hi, > > > >> > > > >> For the uses I would like to make of Arrow, I would need a > > human-readable > > > >> and -writable version of an Arrow Schema, that could be converted to > > and > > > >> from the Arrow Schema C++ object. Going through the doc for 0.15.1, > I > > don't > > > >> see anything to that effect, with the closest being the ToString() > > method > > > >> on DataType instances, but which is meant for debugging only. (I > need > > an > > > >> expression of an Arrow Schema that people can read, and that can > live > > > >> outside of the code for a particular operation.) > > > >> > > > >> Is a text representation of an Arrow Schema something that is being > > worked > > > >> on now? If not, would you folks be interested in me putting up an > > initial > > > >> proposal for discussion? Any design constraints I should pay > > attention to, > > > >> then? > > > >> > > > >> Thanks, > > > >> > > > >> Christian > > > >> -- > > > >> > > > >> > > > >> │ Christian Hudon > > > >> > > > >> │ Applied Research Scientist > > > >> > > > >> Element AI, 6650 Saint-Urbain #500 > > > >> > > > >> Montréal, QC, H2S 3G9, Canada > > > >> Elementai.com > > > >> > > > > > > > > -- > > > │ Christian Hudon > > │ Applied Research Scientist > > Element AI, 6650 Saint-Urbain #500 > > Montréal, QC, H2S 3G9, Canada > Elementai.com >