Hi Pierre, If you are sharing the schema out of band from the actual object storage you probably need to use ArrowVectorLoader/ArrowVectorUnloader [1] to get the an ArrowRecordBatch and correspond methods to MessageSerializers [2] to read/write bytes. A simpler approach (with larger objects) would be to use ArrowStreamWriter/ArrowStreamReader [3] which are analogous to the stream readers/writers in python.
Hope this helps. -Micah [1] https://arrow.apache.org/docs/java/org/apache/arrow/vector/VectorUnloader.html [2] https://arrow.apache.org/docs/java/org/apache/arrow/vector/ipc/message/MessageSerializer.html#deserializeRecordBatch-org.apache.arrow.flatbuf.Message-io.netty.buffer.ArrowBuf- [3] https://arrow.apache.org/docs/java/org/apache/arrow/vector/ipc/ArrowStreamReader.html On Wed, Feb 26, 2020 at 5:02 AM Pierre Avérous <piaver...@gmail.com> wrote: > Yo, > > I'm Pierre, a french student, currently working on a cross-runtime project > and would like to use the apache arrow Plasma Store. The project would get > data in a Python runtime, store it in the plasma store, and some job in a > Java runtime would then process the data asynchronously. > > I'm having trouble with the Java part of this, as the Python implementation > is very well documented. I managed to read and write byte arrays into the > plasma store from my Java runtime, but I could not quite figure out how to > process more complex objects. For instance, i'd like to dump a Pandas > dataframe into the plasma store from the Python runtime, and read it in > Java. I struggle with the metadata of the object put in the plasma store. I > tried setting up a VectorSchemaRoot, with a predefined Schema in Java, but > could not figure out how to write it to the plasma store, or to read from > the plasma store into a VectorSchemaRoot. > > Would you be able to help me out with this? A small code sample of how it > should be used in Java would help a lot. > > Best, > Pierre Averous >