Re: Modeling N-Dim Arrays in Arrow

Robert Nishihara Thu, 16 Nov 2017 15:15:02 -0800

Great!

On Thu, Nov 16, 2017 at 3:04 PM Lewis John McGibbney <lewi...@apache.org>
wrote:


> Fantastic Robert, thank you for the pointers.
> The documentation and graphics on ray github pages is very helpful.
> Lewis
>
> On 2017-11-16 11:20, Robert Nishihara <robertnishih...@gmail.com> wrote:
> > Yes definitely! You can do this through high level Python APIs, e.g.,
> > something like
> >
> https://github.com/apache/arrow/blob/ca3acdc138b1ac27c9111b236d33593988689a20/python/pyarrow/tests/test_serialization.py#L214-L216
> > .
> >
> > You can also share the numpy arrays using shared memory, e.g.,
> >
> https://issues.apache.org/jira/browse/ARROW-1792?focusedCommentId=16252940&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16252940
> >
> > You can also do this through C++.
> >
> > Some benchmarks at
> >
> https://ray-project.github.io/2017/10/15/fast-python-serialization-with-ray-and-arrow.html
> > .
> > On Thu, Nov 16, 2017 at 10:49 AM Lewis John McGibbney <
> lewi...@apache.org>
> > wrote:
> >
> > > Hi Folks,
> > >
> > > Array-oriented scientific data (such as satellite remote sensing data)
> is
> > > commonly encoded using NetCDF [0] and HDF [1] data formats as these
> formats
> > > have been designed and developed to offer amongst other things,
> some/all of
> > > the following features
> > >  * Self-Describing. A netCDF file includes information about the data
> it
> > > contains.
> > > * Portable. A netCDF file can be accessed by computers with different
> ways
> > > of storing integers, characters, and floating-point numbers.
> > >  * Scalable. A small subset of a large dataset may be accessed
> efficiently.
> > >  * Appendable. Data may be appended to a properly structured netCDF
> file
> > > without copying the dataset or redefining its structure.
> > >  * Sharable. One writer and multiple readers may simultaneously access
> the
> > > same netCDF file.
> > >  * Archivable. Access to all earlier forms of netCDF data will be
> > > supported by current and future versions of the software.
> > >
> > > I am currently toying with the idea of exploring and hopefully
> > > benchmarking use of storage-class memory hardware combined with Arrow
> as a
> > > mechanism for improving both fast and flexible data access and possibly
> > > analysis.
> > >
> > > Very first question, has anyone attempted to/are currently using Arrow
> to
> > > store N-Dim array-based data?
> > >
> > > Thanks in advance,
> > > Lewis
> > >
> > > [0] http://www.unidata.ucar.edu/software/netcdf/
> > > [1] https://www.hdfgroup.org/solutions/hdf5/
> > >
> >
>

Re: Modeling N-Dim Arrays in Arrow

Reply via email to