Hey Neal,

Thanks for the response and I am glad I am using this correctly. I have never 
really used email servers so hopefully this works.

That’s exactly what I was thinking of doing is to create a standard metadata 
schema to built on top of Apache Arrow with some predefined user types.

I guess I was just wondering if I was trying to use a screwdriver as a hammer. 
It can work because we are using the metadata and that could be anything but 
maybe like you said we should be creating a separate standard entirely for 
defining the schema to render tables instead of defining it within Arrow.

Does it defeat the value of Arrow if are sending the data using buffers and 
stream and a giant string of stringified metadata when I could maybe define the 
metadata in protobuf binary separately.

In addition, I was curious with all these visualization tools has someone 
already developed a standard metadata for arrow to help with rendering. Stuff 
like how to denote grouping of data, relationship between columns and hidden 
information.

-Michael

From: Neal Richardson <neal.p.richard...@gmail.com>
Date: Friday, February 26, 2021 at 1:38 PM
To: dev <dev@arrow.apache.org>
Subject: Re: [JS] Exploring usage of apache arrow at my company for complex 
table rendering
The Arrow IPC specification allows for custom metadata in both the Schema
and the individual Fields:
https://urldefense.com/v3/__https://arrow.apache.org/docs/format/Columnar.html*schema-message__;Iw!!PBKjc0U4!ZDNX2q8bDIOFv2QGswzYOu9kXjf-yQ_0OvCT9gc-9kIH6GXS0qYzmwCGSdcKvxxhHK7K$<https://urldefense.com/v3/__https:/arrow.apache.org/docs/format/Columnar.html*schema-message__;Iw!!PBKjc0U4!ZDNX2q8bDIOFv2QGswzYOu9kXjf-yQ_0OvCT9gc-9kIH6GXS0qYzmwCGSdcKvxxhHK7K$>

Might that work for you? Another alternative would be to track your
metadata in a separate object outside of the Arrow data.

Neal

On Fri, Feb 26, 2021 at 5:02 AM Michael Lavina <michael.lav...@factset.com>
wrote:

> Hello Everyone,
>
>
>
> Some background. My name is Michael and I work at FactSet, which if you
> use Arrow you may have heard because one of our architects did a talk on
> using Arrow and Dremio.
>
>
> https://urldefense.com/v3/__https://hello.dremio.com/eliminate-data-transfer-bottlenecks-with-apache-arrow-flight.html?utm_medium=social-free&utm_source=linkedin&utm_term=na&utm_content=na&utm_campaign=eliminate-data-transfer-bottlenecks-with-apache-arrow-flight__;!!PBKjc0U4!ZDNX2q8bDIOFv2QGswzYOu9kXjf-yQ_0OvCT9gc-9kIH6GXS0qYzmwCGSdcKv9lV4pkV$<https://urldefense.com/v3/__https:/hello.dremio.com/eliminate-data-transfer-bottlenecks-with-apache-arrow-flight.html?utm_medium=social-free&utm_source=linkedin&utm_term=na&utm_content=na&utm_campaign=eliminate-data-transfer-bottlenecks-with-apache-arrow-flight__;!!PBKjc0U4!ZDNX2q8bDIOFv2QGswzYOu9kXjf-yQ_0OvCT9gc-9kIH6GXS0qYzmwCGSdcKv9lV4pkV$>
>
>
>
> His team has decided to use Arrow as a tabular data interchange format.
> Other teams are doing other things. We are working on standardizing our
> tabular data interchange format at our company.
>
>
>
> We have our own open-sourced columnar based schema defined in protobuf.
> https://urldefense.com/v3/__https://github.com/factset/stachschema__;!!PBKjc0U4!ZDNX2q8bDIOFv2QGswzYOu9kXjf-yQ_0OvCT9gc-9kIH6GXS0qYzmwCGSdcKv6XjzSrx$<https://urldefense.com/v3/__https:/github.com/factset/stachschema__;!!PBKjc0U4!ZDNX2q8bDIOFv2QGswzYOu9kXjf-yQ_0OvCT9gc-9kIH6GXS0qYzmwCGSdcKv6XjzSrx$>
>
>
>
> We looked into Apache Arrow a few years ago, but decided not to use it as
> it was not mature enough at the time and we had two specific requirements
>
> 1) We needed this data not just for analytics but rendering as well and
> rendering requires a lot more complicated information such as understanding
> the type of data and relationship between data i.e. grouping
>
> 2) We need SDKs that support typescript/javascript both browser and node
> and supports both creating and consuming arrow.
>
>
>
> Now that Apache Arrow is more mature and stabilized i.e. the schema and
> sdks are post 1.x we are looking into it again.
>
>
>
>    1. we are thinking of defining specific metadata in a similar way we
>    do for STACH that let’s us define some rendering specific e.g. adding a
>    metadata to a Field Schema called isHidden to denote whether we should
>    render the data column or not.
>    2. It seems like there is a well developed javascript SDK that we can
>    use. I am still reading the source code and the Observable articles to
>    truly understand how it works.
>       1. I read one of the issues is that the JS library might be out
>       sync, so do people know how actively that repo is maintained.
>       2. If there needs to be work done I think we would be able to help
>       if we had some help getting started with understanding that repo.
>
>
>
> If possible we would be interested to continue to chat about the above
> ideas, get more information about if Apache Arrow is right for the job, and
> if there is already discussion of other people are using arrow for
> rendering in addition to analytics.
>
>
>
> To clarify what I mean for existing render technologies I know stuff like
> Falcon and Perspective exist, but those seem to be for basic table
> rendering for simple tables. I mean to create a superset of arrow by
> definfing metadata that allows for complex nested headers and nested rows.
> Something like the image below. Then you can imagine even more data
> attached such as describing the data and relationships to other data on the
> page. You can image in the dataset there is some `personId` that is set to
> not be rendered. This personId can then be used to gather more information
> in another api call if you wanted to render a tooltip with maybe some bio
> information. In short, rendered tables require a lot more information than
> just the data. Does it make sense to build this upon Arrow.
>
>
>
>
>
> -Thanks
>
> Michael
>
>
>

Reply via email to