The Arrow IPC specification allows for custom metadata in both the Schema and the individual Fields: https://arrow.apache.org/docs/format/Columnar.html#schema-message
Might that work for you? Another alternative would be to track your metadata in a separate object outside of the Arrow data. Neal On Fri, Feb 26, 2021 at 5:02 AM Michael Lavina <michael.lav...@factset.com> wrote: > Hello Everyone, > > > > Some background. My name is Michael and I work at FactSet, which if you > use Arrow you may have heard because one of our architects did a talk on > using Arrow and Dremio. > > > https://hello.dremio.com/eliminate-data-transfer-bottlenecks-with-apache-arrow-flight.html?utm_medium=social-free&utm_source=linkedin&utm_term=na&utm_content=na&utm_campaign=eliminate-data-transfer-bottlenecks-with-apache-arrow-flight > > > > His team has decided to use Arrow as a tabular data interchange format. > Other teams are doing other things. We are working on standardizing our > tabular data interchange format at our company. > > > > We have our own open-sourced columnar based schema defined in protobuf. > https://github.com/factset/stachschema > > > > We looked into Apache Arrow a few years ago, but decided not to use it as > it was not mature enough at the time and we had two specific requirements > > 1) We needed this data not just for analytics but rendering as well and > rendering requires a lot more complicated information such as understanding > the type of data and relationship between data i.e. grouping > > 2) We need SDKs that support typescript/javascript both browser and node > and supports both creating and consuming arrow. > > > > Now that Apache Arrow is more mature and stabilized i.e. the schema and > sdks are post 1.x we are looking into it again. > > > > 1. we are thinking of defining specific metadata in a similar way we > do for STACH that let’s us define some rendering specific e.g. adding a > metadata to a Field Schema called isHidden to denote whether we should > render the data column or not. > 2. It seems like there is a well developed javascript SDK that we can > use. I am still reading the source code and the Observable articles to > truly understand how it works. > 1. I read one of the issues is that the JS library might be out > sync, so do people know how actively that repo is maintained. > 2. If there needs to be work done I think we would be able to help > if we had some help getting started with understanding that repo. > > > > If possible we would be interested to continue to chat about the above > ideas, get more information about if Apache Arrow is right for the job, and > if there is already discussion of other people are using arrow for > rendering in addition to analytics. > > > > To clarify what I mean for existing render technologies I know stuff like > Falcon and Perspective exist, but those seem to be for basic table > rendering for simple tables. I mean to create a superset of arrow by > definfing metadata that allows for complex nested headers and nested rows. > Something like the image below. Then you can imagine even more data > attached such as describing the data and relationships to other data on the > page. You can image in the dataset there is some `personId` that is set to > not be rendered. This personId can then be used to gather more information > in another api call if you wanted to render a tooltip with maybe some bio > information. In short, rendered tables require a lot more information than > just the data. Does it make sense to build this upon Arrow. > > > > > > -Thanks > > Michael > > >