hi Micah and Tewfik, The functionality is exposed in Python, see e.g.
https://github.com/apache/arrow/blob/apache-arrow-0.16.0/python/pyarrow/tests/test_ipc.py#L685 As Micah said, very small batches aren't necessarily optimized for compactness (for example buffers are padded to multiples of 8). Give this a try though and see how it works Thanks Wes On Sun, Feb 16, 2020 at 9:26 AM Micah Kornfield <emkornfi...@gmail.com> wrote: > > I should note, it isn't necessarily just the extra metadata. For single > row values, there is also an overhead for padding requirements. You should > be able to measure this by looking at the size of the buffer you are using > before writing any batches to the stream (I believe the schema is written > eagerly), and subtracting that from the final size. > > Looking at python documentation I don't see it exposed, but the underlying > function does exist in C++ [1]. People more familiar with the python may be > able to offer more details. > > I think for this type of use-case it probably makes sense to expose it. > Want to try to create a patch for it? > > Thanks, > Micah > > > [1] > https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/writer.h#L215 > > On Fri, Feb 14, 2020 at 3:09 PM Tewfik Zeghmi <zeg...@gmail.com> wrote: > > > Hi Micah, > > > > The primary language is Python. I'm hoping the that the small overhead of > > metadata is small compared to the schema information. > > > > thank you! > > > > On Fri, Feb 14, 2020 at 3:07 PM Micah Kornfield <emkornfi...@gmail.com> > > wrote: > > > >> Hi Tewfik, > >> What language? it is possible to serialize them separately but the right > >> hooks might not be exposed in all languages. > >> > >> There is still going to be a higher overhead for single row values in > >> Arrow > >> compared to Avro due to metadata requirements. > >> > >> Thanks, > >> Micah > >> > >> On Fri, Feb 14, 2020 at 1:33 PM Tewfik Zeghmi <zeg...@gmail.com> wrote: > >> > >> > Hi, > >> > > >> > I have a use case of creating a feature store to serve low latency > >> traffic. > >> > Given a key, we need the ability to save and read a feature vector in a > >> low > >> > latency Key Value store. Serializing an Arrow table with one row is > >> takes > >> > 1344 bytes, while the same singular row serialized with AVRO without the > >> > schema uses 236 bytes. > >> > > >> > Is it possible to save serialize an Arrow table/RecordBatch > >> independently > >> > of the schema? Ideally, we'd like to serialize the schema once and not > >> > along with every feature key, then be able to read the RecordBatch with > >> the > >> > schema. > >> > > >> > thank you! > >> > > >> > > > > > > -- > > Taleb Tewfik Zeghmi > >