Great! On Thu, Nov 16, 2017 at 3:04 PM Lewis John McGibbney <lewi...@apache.org> wrote:
> Fantastic Robert, thank you for the pointers. > The documentation and graphics on ray github pages is very helpful. > Lewis > > On 2017-11-16 11:20, Robert Nishihara <robertnishih...@gmail.com> wrote: > > Yes definitely! You can do this through high level Python APIs, e.g., > > something like > > > https://github.com/apache/arrow/blob/ca3acdc138b1ac27c9111b236d33593988689a20/python/pyarrow/tests/test_serialization.py#L214-L216 > > . > > > > You can also share the numpy arrays using shared memory, e.g., > > > https://issues.apache.org/jira/browse/ARROW-1792?focusedCommentId=16252940&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16252940 > > > > You can also do this through C++. > > > > Some benchmarks at > > > https://ray-project.github.io/2017/10/15/fast-python-serialization-with-ray-and-arrow.html > > . > > On Thu, Nov 16, 2017 at 10:49 AM Lewis John McGibbney < > lewi...@apache.org> > > wrote: > > > > > Hi Folks, > > > > > > Array-oriented scientific data (such as satellite remote sensing data) > is > > > commonly encoded using NetCDF [0] and HDF [1] data formats as these > formats > > > have been designed and developed to offer amongst other things, > some/all of > > > the following features > > > * Self-Describing. A netCDF file includes information about the data > it > > > contains. > > > * Portable. A netCDF file can be accessed by computers with different > ways > > > of storing integers, characters, and floating-point numbers. > > > * Scalable. A small subset of a large dataset may be accessed > efficiently. > > > * Appendable. Data may be appended to a properly structured netCDF > file > > > without copying the dataset or redefining its structure. > > > * Sharable. One writer and multiple readers may simultaneously access > the > > > same netCDF file. > > > * Archivable. Access to all earlier forms of netCDF data will be > > > supported by current and future versions of the software. > > > > > > I am currently toying with the idea of exploring and hopefully > > > benchmarking use of storage-class memory hardware combined with Arrow > as a > > > mechanism for improving both fast and flexible data access and possibly > > > analysis. > > > > > > Very first question, has anyone attempted to/are currently using Arrow > to > > > store N-Dim array-based data? > > > > > > Thanks in advance, > > > Lewis > > > > > > [0] http://www.unidata.ucar.edu/software/netcdf/ > > > [1] https://www.hdfgroup.org/solutions/hdf5/ > > > > > >