The Arrow IPC specification allows for custom metadata in both the Schema
and the individual Fields:
https://arrow.apache.org/docs/format/Columnar.html#schema-message

Might that work for you? Another alternative would be to track your
metadata in a separate object outside of the Arrow data.

Neal

On Fri, Feb 26, 2021 at 5:02 AM Michael Lavina <michael.lav...@factset.com>
wrote:

> Hello Everyone,
>
>
>
> Some background. My name is Michael and I work at FactSet, which if you
> use Arrow you may have heard because one of our architects did a talk on
> using Arrow and Dremio.
>
>
> https://hello.dremio.com/eliminate-data-transfer-bottlenecks-with-apache-arrow-flight.html?utm_medium=social-free&utm_source=linkedin&utm_term=na&utm_content=na&utm_campaign=eliminate-data-transfer-bottlenecks-with-apache-arrow-flight
>
>
>
> His team has decided to use Arrow as a tabular data interchange format.
> Other teams are doing other things. We are working on standardizing our
> tabular data interchange format at our company.
>
>
>
> We have our own open-sourced columnar based schema defined in protobuf.
> https://github.com/factset/stachschema
>
>
>
> We looked into Apache Arrow a few years ago, but decided not to use it as
> it was not mature enough at the time and we had two specific requirements
>
> 1) We needed this data not just for analytics but rendering as well and
> rendering requires a lot more complicated information such as understanding
> the type of data and relationship between data i.e. grouping
>
> 2) We need SDKs that support typescript/javascript both browser and node
> and supports both creating and consuming arrow.
>
>
>
> Now that Apache Arrow is more mature and stabilized i.e. the schema and
> sdks are post 1.x we are looking into it again.
>
>
>
>    1. we are thinking of defining specific metadata in a similar way we
>    do for STACH that let’s us define some rendering specific e.g. adding a
>    metadata to a Field Schema called isHidden to denote whether we should
>    render the data column or not.
>    2. It seems like there is a well developed javascript SDK that we can
>    use. I am still reading the source code and the Observable articles to
>    truly understand how it works.
>       1. I read one of the issues is that the JS library might be out
>       sync, so do people know how actively that repo is maintained.
>       2. If there needs to be work done I think we would be able to help
>       if we had some help getting started with understanding that repo.
>
>
>
> If possible we would be interested to continue to chat about the above
> ideas, get more information about if Apache Arrow is right for the job, and
> if there is already discussion of other people are using arrow for
> rendering in addition to analytics.
>
>
>
> To clarify what I mean for existing render technologies I know stuff like
> Falcon and Perspective exist, but those seem to be for basic table
> rendering for simple tables. I mean to create a superset of arrow by
> definfing metadata that allows for complex nested headers and nested rows.
> Something like the image below. Then you can imagine even more data
> attached such as describing the data and relationships to other data on the
> page. You can image in the dataset there is some `personId` that is set to
> not be rendered. This personId can then be used to gather more information
> in another api call if you wanted to render a tooltip with maybe some bio
> information. In short, rendered tables require a lot more information than
> just the data. Does it make sense to build this upon Arrow.
>
>
>
>
>
> -Thanks
>
> Michael
>
>
>

Reply via email to