My notes: (I'll schedule another one in 2 weeks but people should feel free to do ad-hoc discussion in the meantime)
Attendees and their topic of interest for today: - Micah Kornfield: Dictionary encoding, Reusing dictionaries across record batches, Shared memory, memory management, releasing memory shared accross processes - Wes McKinney: Finalize types (Category, ...), File format RPC format, IPC - Julien Le Dem: finalize metadata (RPC, IPC, File), File format implementation, UDF use case - Erol: Shared memory across Java and C++ to share large amounts of data Arrow IPC: - Shared memory: - current version doesn’t do Schema negotiation yet. - all unit tests reading writing out memory with a predefined schema and known based address. - no dictionary encoding yet. - issues to discuss: - communicating the base memory address: - possibly use RPC for coordination. - options for shared memory - forking a process: anonymous shared memory implicitly - starting a new process. Need to spawn alternate shared memory that needs to be cleaned up - direct memory mapped system call (communicate file name to subprocess). - Action (Micah) create a JIRA to sum this up - Memory management: - the process producing the data will allocate the memory and pass it read only. It needs to wait for the consumer to be done to release it. - one option is memory mapped file (persistent independent of the process) - each process responsible for its memory. Reader needs to release memory. - mechanism for handling too much memory allocation. - In the case of record batches over RPC this is not an issue (memory is copied over). - RPC transport definition of the protocol and how we send message. - File transport - Dictionary encoding: - start simple: simple buffer<int> layout - enable extension in the future (v2: bit packing?) - Category type: - Semantic difference with Dictionary encoded. - TODO(Julien): Add Category type in Parquet? On Thu, Aug 18, 2016 at 9:39 AM, Julien Le Dem <jul...@dremio.com> wrote: > Hi Nicole. > Can you try again? > I was accepting you but it did not seem to work. > Julien > > On Thu, Aug 18, 2016 at 9:26 AM, Nicole Nemer <nicole.ne...@rms.com> > wrote: > >> I am trying to join and it not letting me inŠ >> nn >> ‹ >> Nicole Nemer, PhD >> Technical Architect/Dev Manager >> >> 303-641-3340 >> >> >> >> >> >> >> On 8/18/16, 10:00 AM, "Julien Le Dem" <jul...@dremio.com> wrote: >> >> >And this is starting now. >> >https://plus.google.com/hangouts/_/dremio.com/arrow >> > >> >On Wed, Aug 17, 2016 at 7:07 PM, Julien Le Dem <jul...@dremio.com> >> wrote: >> > >> >> Here is the hangout link for tomorrow: >> >> https://plus.google.com/hangouts/_/dremio.com/arrow >> >> >> >> I have also added to a google calendar event everyone who replied to >> >>that >> >> thread. >> >> >> >> >> >> On Wed, Aug 17, 2016 at 6:12 PM, Wes McKinney <wesmck...@gmail.com> >> >>wrote: >> >> >> >>> hi folks, >> >>> >> >>> Reminder that the Arrow sync is tomorrow morning at 09:00 Pacific >> >>> (http://timesched.pocoo.org/?date=2016-08-18&tz=pacific-stan >> >>> dard-time!&range=540,600). >> >>> I believe Julien will send a public Google hangout link to the mailing >> >>> list for you all to join. >> >>> >> >>> Thanks >> >>> Wes >> >>> >> >>> On Tue, Aug 16, 2016 at 11:07 AM, Wes McKinney <wesmck...@gmail.com> >> >>> wrote: >> >>> > +1. If there is demand for an Asia-friendly time we can change >> things >> >>> up from week to week. >> >>> > >> >>> >> On Aug 16, 2016, at 10:52 AM, Jacques Nadeau <jacq...@apache.org> >> >>> wrote: >> >>> >> >> >>> >> sounds good >> >>> >> >> >>> >>> On Tue, Aug 16, 2016 at 10:39 AM, Julien Le Dem < >> jul...@dremio.com> >> >>> wrote: >> >>> >>> >> >>> >>> Based on the feedback I'm proposing Thursday Aug 18 at 4PM UTC as >> >>>the >> >>> first >> >>> >>> Arrow sync. >> >>> >>> That's: >> >>> >>> - 9AM PDT (San Francisco) >> >>> >>> - 12PM EDT (New York) >> >>> >>> - 5PM CET (London) >> >>> >>> - 6PM CEST (Paris, Berlin) >> >>> >>> >> >>> >>>> On Tue, Aug 9, 2016 at 6:45 AM, Uwe L. Korn <uw...@xhochy.com> >> >>> wrote: >> >>> >>>> >> >>> >>>> +1 for bi-weekly and europeen friendly times: CET (GMT+1) >> >>> >>>> >> >>> >>>>> Am 09.08.2016 um 00:39 schrieb Julien Le Dem <jul...@dremio.com >> >: >> >>> >>>>> >> >>> >>>>> Also to all who are responding let me know your timezone as >> well. >> >>> >>>>> >> >>> >>>>> On Mon, Aug 8, 2016 at 3:30 PM, Micah Kornfield < >> >>> emkornfi...@gmail.com >> >>> >>>> >> >>> >>>>> wrote: >> >>> >>>>> >> >>> >>>>>> Sounds good to me as well. Biweekly would be preferred. >> >>> >>>>>> >> >>> >>>>>>> On Monday, August 8, 2016, Wes McKinney <wesmck...@gmail.com> >> >>> wrote: >> >>> >>>>>>> >> >>> >>>>>>> hi Julien -- this sounds like a good idea, also +1 for >> >>>bi-weekly. >> >>> I >> >>> >>>>>>> will do my best to join when possible. So far we've mostly >> been >> >>> >>>>>>> communicating via pull request, so I think periodic syncs will >> >>>be >> >>> >>>>>>> helpful. >> >>> >>>>>>> >> >>> >>>>>>> - Wes >> >>> >>>>>>> >> >>> >>>>>>> On Mon, Aug 8, 2016 at 2:45 PM, P. Taylor Goetz < >> >>> ptgo...@gmail.com >> >>> >>>>>>> <javascript:;>> wrote: >> >>> >>>>>>>> +1 >> >>> >>>>>>>> >> >>> >>>>>>>> My preference would be for bi-weekly. >> >>> >>>>>>>> >> >>> >>>>>>>> -Taylor >> >>> >>>>>>>> >> >>> >>>>>>>>> On Aug 8, 2016, at 5:25 PM, Julien Le Dem < >> jul...@dremio.com >> >>> >>>>>>> <javascript:;>> wrote: >> >>> >>>>>>>>> >> >>> >>>>>>>>> Hi all, >> >>> >>>>>>>>> My experience with Parquet is that a regular sync up over >> >>> hangout >> >>> >>>>>> helps >> >>> >>>>>>>>> keeping in touch and staying updated about what everyone is >> >>> doing. >> >>> >>>>>>>>> I was thinking of scheduling it weekly or bi-weekly. >> >>> >>>>>>>>> Who would join? >> >>> >>>>>>>>> >> >>> >>>>>>>>> The way it goes is first we do a round table where people >> >>> introduce >> >>> >>>>>>>>> themselves an list the topics they'd like to talk or hear >> >>>about. >> >>> >>>>>>>>> That makes the agenda and we go through it. >> >>> >>>>>>>>> At the end we send notes to the mailing list with >> discussions >> >>> and >> >>> >>>>>> action >> >>> >>>>>>>>> items (for example: open JIRA, comment on JIRA, review PR, >> >>>etc). >> >>> >>>>>>>>> >> >>> >>>>>>>>> -- >> >>> >>>>>>>>> Julien >> >>> >>>>> >> >>> >>>>> >> >>> >>>>> >> >>> >>>>> -- >> >>> >>>>> Julien >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> -- >> >>> >>> Julien >> >>> >>> >> >>> >> >> >> >> >> >> >> >> -- >> >> Julien >> >> >> > >> > >> > >> >-- >> >Julien >> >> > > > -- > Julien > -- Julien