Hello Everyone,

Some background. My name is Michael and I work at FactSet, which if you use 
Arrow you may have heard because one of our architects did a talk on using 
Arrow and Dremio.
https://hello.dremio.com/eliminate-data-transfer-bottlenecks-with-apache-arrow-flight.html?utm_medium=social-free&utm_source=linkedin&utm_term=na&utm_content=na&utm_campaign=eliminate-data-transfer-bottlenecks-with-apache-arrow-flight

His team has decided to use Arrow as a tabular data interchange format.  Other 
teams are doing other things. We are working on standardizing our tabular data 
interchange format at our company.

We have our own open-sourced columnar based schema defined in protobuf. 
https://github.com/factset/stachschema

We looked into Apache Arrow a few years ago, but decided not to use it as it 
was not mature enough at the time and we had two specific requirements
1) We needed this data not just for analytics but rendering as well and 
rendering requires a lot more complicated information such as understanding the 
type of data and relationship between data i.e. grouping
2) We need SDKs that support typescript/javascript both browser and node and 
supports both creating and consuming arrow.

Now that Apache Arrow is more mature and stabilized i.e. the schema and sdks 
are post 1.x we are looking into it again.


  1.  we are thinking of defining specific metadata in a similar way we do for 
STACH that let’s us define some rendering specific e.g. adding a metadata to a 
Field Schema called isHidden to denote whether we should render the data column 
or not.
  2.  It seems like there is a well developed javascript SDK that we can use. I 
am still reading the source code and the Observable articles to truly 
understand how it works.
     *   I read one of the issues is that the JS library might be out sync, so 
do people know how actively that repo is maintained.
     *   If there needs to be work done I think we would be able to help if we 
had some help getting started with understanding that repo.

If possible we would be interested to continue to chat about the above ideas, 
get more information about if Apache Arrow is right for the job, and if there 
is already discussion of other people are using arrow for rendering in addition 
to analytics.

To clarify what I mean for existing render technologies I know stuff like 
Falcon and Perspective exist, but those seem to be for basic table rendering 
for simple tables. I mean to create a superset of arrow by definfing metadata 
that allows for complex nested headers and nested rows. Something like the 
image below. Then you can imagine even more data attached such as describing 
the data and relationships to other data on the page. You can image in the 
dataset there is some `personId` that is set to not be rendered. This personId 
can then be used to gather more information in another api call if you wanted 
to render a tooltip with maybe some bio information. In short, rendered tables 
require a lot more information than just the data. Does it make sense to build 
this upon Arrow.

[cid:image001.png@01D70C15.94EDD4E0]

-Thanks
Michael

Reply via email to