Domink's point is that the IPC reader currently first writes the whole
thing into a Vec<u8>, and then copies all of that to buffers using
IPC::Buffer offsets and lengths. Thus, it performs 2 memcopies of the whole
data and needs to hold 2x the required memory (the Vec<u8> and the
arrow::Buffers).

I noticed this while going through it on my proposal repo, and I rewrote it
using `Reader::Seek`
<https://github.com/jorgecarleitao/arrow2/blob/main/src/io/ipc/read/deserialize.rs#L66>
to write directly to typed buffers. Coincidentally, this also enabled
reading from big endian, as we know what is on each buffer, and thus know
how to handle endianness using to_le and from_be implemented on Rust 's
native types.

Best,
Jorge


On Mon, Mar 8, 2021 at 11:12 PM Andrew Lamb <al...@influxdata.com> wrote:

> Thank you for filing the ticket.
>
> I wonder if you mean this reader:
>
> https://docs.rs/arrow/3.0.0/arrow/ipc/reader/struct.FileReader.html#method.try_new
>
> If so, while it is called a `FileReader` I think that is somewhat
> misleading. It requires something that implements `std::io::Read` -- which
> `&[u8]` does.
>
> https://doc.rust-lang.org/std/io/trait.Read.html#impl-Read-2
>
> So you should be able to read directly from the `[u8]` without having to do
> any copies
>
> I may perhaps be missing something
>
> On Thu, Mar 4, 2021 at 10:53 AM Dominik Moritz <domor...@cmu.edu> wrote:
>
> >  I just remembered a bigger issue I ran into. I wanted to read from IPC
> but
> > I don’t have a file. I do have the data as [u8] already. The current API
> > incurs more copies than necessary (I think) and therefore the performance
> > of reading IPC is worse than in JS. (
> > https://issues.apache.org/jira/projects/ARROW/issues/ARROW-11696).
> >
> > On Mar 1, 2021 at 23:29:18, Dominik Moritz <domor...@cmu.edu> wrote:
> >
> > > I am looking forward to speaking with you then. I’ll talk about the
> > > motivation.
> > >
> > > My experience with the library has been good. I ran into a few
> > limitations
> > > that I filed Jiras for. I struggled a bit with some of the error
> handling
> > > and Arc types but that’s probably because I am now very experienced
> with
> > > Rust and wasm-bindgen doesn’t support all Rust features.
> > >
> > > I had some bigger issues with the DataFusion and Parquet libraries as
> > they
> > > don’t support wasm right now (also filed Jiras for those).
> > >
> > > On Feb 27, 2021 at 11:14:27, Andrew Lamb <al...@influxdata.com> wrote:
> > >
> > >> Hi  Dominik,
> > >>
> > >> That sounds really interesting -- thank you for the offer
> > >>
> > >> I for one would enjoy seeing a demo and suggest that 10 minutes might
> > be a
> > >> good length. The next call (details are also on the announcement [1])
> is
> > >> scheduled for Wednesday March 10, 2021 at 09:00 PST / 12:00 EST /
> 17:00
> > >> UTC. The link is https://meet.google.com/ctp-yujs-aee
> > >>
> > >> I would personally be interested in hearing about your experience as a
> > >> user
> > >> of the Rust library (what was good, what was challenging, how can we
> > >> improve).
> > >>
> > >> Thanks!
> > >> Andrew
> > >>
> > >> [1]
> > >>
> > >>
> >
> https://lists.apache.org/thread.html/raa72e1a8a3ad5dbb8366e9609a041eccca87f85545c3bc3d85170cfc%40%3Cdev.arrow.apache.org%3E
> > >>
> > >> On Fri, Feb 26, 2021 at 4:17 AM Fernando Herrera <
> > >> fernando.j.herr...@gmail.com> wrote:
> > >>
> > >> Hi Dominic,
> > >>
> > >>
> > >> I would be interested in a demo. Im curious to see your implementation
> > and
> > >>
> > >> what advantages you have seen over javascript
> > >>
> > >>
> > >> thanks
> > >>
> > >> Fernando
> > >>
> > >>
> > >> On Thu, Feb 25, 2021 at 10:39 PM Dominik Moritz <domor...@cmu.edu>
> > wrote:
> > >>
> > >>
> > >> > Hello Rust Arrow Devs,
> > >>
> > >> >
> > >>
> > >> > I have been working on a wasm version of Arrow using the Rust
> library
> > (
> > >>
> > >> > https://github.com/domoritz/arrow-wasm). I was wondering whether
> you
> > >>
> > >> would
> > >>
> > >> > be interested in having me demo it in the Arrow Rust sync call. If
> so,
> > >>
> > >> when
> > >>
> > >> > would be the next one and how much time would you want to allocate
> for
> > >>
> > >> it?
> > >>
> > >> > Also, would you be interested for me to dive into something in
> > >>
> > >> particular?
> > >>
> > >> >
> > >>
> > >> > Cheers,
> > >>
> > >> > Dominik
> > >>
> > >> >
> > >>
> > >>
> > >>
> >
>

Reply via email to