Thanks Julien for organizing the meeting and taking notes. I wrote up some initial thoughts on shared memory IPC on https://issues.apache.org/jira/browse/ARROW-263
I'll try to flesh out a more concrete spec today/tomorrow. -Micah On Thu, Aug 18, 2016 at 10:25 AM, Julien Le Dem <jul...@dremio.com> wrote: > My notes: (I'll schedule another one in 2 weeks but people should feel free > to do ad-hoc discussion in the meantime) > > Attendees and their topic of interest for today: > - Micah Kornfield: Dictionary encoding, Reusing dictionaries across record > batches, Shared memory, memory management, releasing memory shared accross > processes > - Wes McKinney: Finalize types (Category, ...), File format RPC format, > IPC > - Julien Le Dem: finalize metadata (RPC, IPC, File), File format > implementation, UDF use case > - Erol: Shared memory across Java and C++ to share large amounts of data > > Arrow IPC: > - Shared memory: > - current version doesn’t do Schema negotiation yet. > - all unit tests reading writing out memory with a predefined schema > and known based address. > - no dictionary encoding yet. > - issues to discuss: > - communicating the base memory address: > - possibly use RPC for coordination. > - options for shared memory > - forking a process: anonymous shared memory implicitly > - starting a new process. Need to spawn alternate shared memory that > needs to be cleaned up > - direct memory mapped system call (communicate file name to > subprocess). > - Action (Micah) create a JIRA to sum this up > > - Memory management: > - the process producing the data will allocate the memory and pass it > read only. It needs to wait for the consumer to be done to release it. > - one option is memory mapped file (persistent independent of the > process) > - each process responsible for its memory. Reader needs to release > memory. > - mechanism for handling too much memory allocation. > - In the case of record batches over RPC this is not an issue (memory is > copied over). > > - RPC transport > definition of the protocol and how we send message. > - File transport > > - Dictionary encoding: > - start simple: simple buffer<int> layout > - enable extension in the future (v2: bit packing?) > > - Category type: > - Semantic difference with Dictionary encoded. > - TODO(Julien): Add Category type in Parquet? > > > On Thu, Aug 18, 2016 at 9:39 AM, Julien Le Dem <jul...@dremio.com> wrote: > > > Hi Nicole. > > Can you try again? > > I was accepting you but it did not seem to work. > > Julien > > > > On Thu, Aug 18, 2016 at 9:26 AM, Nicole Nemer <nicole.ne...@rms.com> > > wrote: > > > >> I am trying to join and it not letting me inŠ > >> nn > >> ‹ > >> Nicole Nemer, PhD > >> Technical Architect/Dev Manager > >> > >> 303-641-3340 > >> > >> > >> > >> > >> > >> > >> On 8/18/16, 10:00 AM, "Julien Le Dem" <jul...@dremio.com> wrote: > >> > >> >And this is starting now. > >> >https://plus.google.com/hangouts/_/dremio.com/arrow > >> > > >> >On Wed, Aug 17, 2016 at 7:07 PM, Julien Le Dem <jul...@dremio.com> > >> wrote: > >> > > >> >> Here is the hangout link for tomorrow: > >> >> https://plus.google.com/hangouts/_/dremio.com/arrow > >> >> > >> >> I have also added to a google calendar event everyone who replied to > >> >>that > >> >> thread. > >> >> > >> >> > >> >> On Wed, Aug 17, 2016 at 6:12 PM, Wes McKinney <wesmck...@gmail.com> > >> >>wrote: > >> >> > >> >>> hi folks, > >> >>> > >> >>> Reminder that the Arrow sync is tomorrow morning at 09:00 Pacific > >> >>> (http://timesched.pocoo.org/?date=2016-08-18&tz=pacific-stan > >> >>> dard-time!&range=540,600). > >> >>> I believe Julien will send a public Google hangout link to the > mailing > >> >>> list for you all to join. > >> >>> > >> >>> Thanks > >> >>> Wes > >> >>> > >> >>> On Tue, Aug 16, 2016 at 11:07 AM, Wes McKinney <wesmck...@gmail.com > > > >> >>> wrote: > >> >>> > +1. If there is demand for an Asia-friendly time we can change > >> things > >> >>> up from week to week. > >> >>> > > >> >>> >> On Aug 16, 2016, at 10:52 AM, Jacques Nadeau <jacq...@apache.org > > > >> >>> wrote: > >> >>> >> > >> >>> >> sounds good > >> >>> >> > >> >>> >>> On Tue, Aug 16, 2016 at 10:39 AM, Julien Le Dem < > >> jul...@dremio.com> > >> >>> wrote: > >> >>> >>> > >> >>> >>> Based on the feedback I'm proposing Thursday Aug 18 at 4PM UTC > as > >> >>>the > >> >>> first > >> >>> >>> Arrow sync. > >> >>> >>> That's: > >> >>> >>> - 9AM PDT (San Francisco) > >> >>> >>> - 12PM EDT (New York) > >> >>> >>> - 5PM CET (London) > >> >>> >>> - 6PM CEST (Paris, Berlin) > >> >>> >>> > >> >>> >>>> On Tue, Aug 9, 2016 at 6:45 AM, Uwe L. Korn <uw...@xhochy.com> > >> >>> wrote: > >> >>> >>>> > >> >>> >>>> +1 for bi-weekly and europeen friendly times: CET (GMT+1) > >> >>> >>>> > >> >>> >>>>> Am 09.08.2016 um 00:39 schrieb Julien Le Dem < > jul...@dremio.com > >> >: > >> >>> >>>>> > >> >>> >>>>> Also to all who are responding let me know your timezone as > >> well. > >> >>> >>>>> > >> >>> >>>>> On Mon, Aug 8, 2016 at 3:30 PM, Micah Kornfield < > >> >>> emkornfi...@gmail.com > >> >>> >>>> > >> >>> >>>>> wrote: > >> >>> >>>>> > >> >>> >>>>>> Sounds good to me as well. Biweekly would be preferred. > >> >>> >>>>>> > >> >>> >>>>>>> On Monday, August 8, 2016, Wes McKinney < > wesmck...@gmail.com> > >> >>> wrote: > >> >>> >>>>>>> > >> >>> >>>>>>> hi Julien -- this sounds like a good idea, also +1 for > >> >>>bi-weekly. > >> >>> I > >> >>> >>>>>>> will do my best to join when possible. So far we've mostly > >> been > >> >>> >>>>>>> communicating via pull request, so I think periodic syncs > will > >> >>>be > >> >>> >>>>>>> helpful. > >> >>> >>>>>>> > >> >>> >>>>>>> - Wes > >> >>> >>>>>>> > >> >>> >>>>>>> On Mon, Aug 8, 2016 at 2:45 PM, P. Taylor Goetz < > >> >>> ptgo...@gmail.com > >> >>> >>>>>>> <javascript:;>> wrote: > >> >>> >>>>>>>> +1 > >> >>> >>>>>>>> > >> >>> >>>>>>>> My preference would be for bi-weekly. > >> >>> >>>>>>>> > >> >>> >>>>>>>> -Taylor > >> >>> >>>>>>>> > >> >>> >>>>>>>>> On Aug 8, 2016, at 5:25 PM, Julien Le Dem < > >> jul...@dremio.com > >> >>> >>>>>>> <javascript:;>> wrote: > >> >>> >>>>>>>>> > >> >>> >>>>>>>>> Hi all, > >> >>> >>>>>>>>> My experience with Parquet is that a regular sync up over > >> >>> hangout > >> >>> >>>>>> helps > >> >>> >>>>>>>>> keeping in touch and staying updated about what everyone > is > >> >>> doing. > >> >>> >>>>>>>>> I was thinking of scheduling it weekly or bi-weekly. > >> >>> >>>>>>>>> Who would join? > >> >>> >>>>>>>>> > >> >>> >>>>>>>>> The way it goes is first we do a round table where people > >> >>> introduce > >> >>> >>>>>>>>> themselves an list the topics they'd like to talk or hear > >> >>>about. > >> >>> >>>>>>>>> That makes the agenda and we go through it. > >> >>> >>>>>>>>> At the end we send notes to the mailing list with > >> discussions > >> >>> and > >> >>> >>>>>> action > >> >>> >>>>>>>>> items (for example: open JIRA, comment on JIRA, review PR, > >> >>>etc). > >> >>> >>>>>>>>> > >> >>> >>>>>>>>> -- > >> >>> >>>>>>>>> Julien > >> >>> >>>>> > >> >>> >>>>> > >> >>> >>>>> > >> >>> >>>>> -- > >> >>> >>>>> Julien > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> -- > >> >>> >>> Julien > >> >>> >>> > >> >>> > >> >> > >> >> > >> >> > >> >> -- > >> >> Julien > >> >> > >> > > >> > > >> > > >> >-- > >> >Julien > >> > >> > > > > > > -- > > Julien > > > > > > -- > Julien >