hi Eric -- there have not been any patches yet related to it. I'm currently in the midst of some internal restructuring of the Parquet C++ library to address long-standing efficiency and memory use issues. It's my intention to spend time on the data frame project as one of my next focus areas, likely to be after Labor Day.
- Wes On Mon, Aug 12, 2019 at 10:28 AM Eric Erhardt <eric.erha...@microsoft.com.invalid> wrote: > > Hey Wes, > > I just wanted to check-in on this work. Have there been any updates to the > Arrow "data frame" project worth sharing? > > Thanks, > Eric > > -----Original Message----- > From: Wes McKinney <wesmck...@gmail.com> > Sent: Tuesday, May 21, 2019 8:17 AM > To: dev@arrow.apache.org > Subject: Re: [DISCUSS] Developing a "data frame" subproject in the Arrow C++ > libraries > > On Tue, May 21, 2019, 8:43 AM Antoine Pitrou <anto...@python.org> wrote: > > > > > Le 21/05/2019 à 13:42, Wes McKinney a écrit : > > > hi Antoine, > > > > > > On Tue, May 21, 2019 at 5:48 AM Antoine Pitrou <anto...@python.org> > > wrote: > > >> > > >> > > >> Hi Wes, > > >> > > >> How does copy-on-write play together with memory-mapped data? It > > >> seems that, depending on whether the memory map has several > > >> concurrent users (a condition which may be timing-dependent), we > > >> will either persist changes on disk or make them ephemeral in > > >> memory. That doesn't sound very user-friendly, IMHO. > > > > > > With memory-mapping, any Buffer is sliced from the parent MemoryMap > > > [1] so mutating the data on disk using this interface wouldn't be > > > possible with the way that I've framed it. > > > > Hmm... I always forget that SliceBuffer returns a read-only view. > > > > The more important issue is that parent_ is non-null. The idea is that no > mutation is allowed if we reason that another Buffer object has access to the > address space of interest. I think this style of copy-on-write is a > reasonable compromise that prevents most kinds of defensive copying. > > > > Regards > > > > Antoine. > >