Hi Radu, It might be easier to get feedback on some concrete code. Perhaps make a PR with a proof of concept and we can discuss there?
Neal On Fri, Sep 4, 2020 at 4:27 AM Radu Teodorescu <radukay...@yahoo.com.invalid> wrote: > Micah and all, > Thanks for that pointer, I certainly didn’t follow it in detail at the > time. > > My question/thoughts are actually more limited in scope and I am > specifically targeting features supported by the standard AND are supported > by other major parquet implementation. > > Specifically I would like to enable support for the having RowGroups in > separate file and (as a side effect) be able to keep metadata in a separate > file. > This seems to be supported by the spec and by most readers including arrow > (at least from scanning the code). > > If the above are true (or at least not known to be false), it seems like > the writer can be modified fairly easily to support that and I am happy to > look into making that change. > > Thoughts? > Radu > > PS: don’t mean to be stubborn by keeping it on the arrow group, but it > seems like it is an arrow implementation specific goal. > > > > > > > On Sep 3, 2020, at 6:42 PM, Micah Kornfield <emkornfi...@gmail.com> > wrote: > > > > Hi Radu, > > This is a conversation best had on dev@parquet. It came up recently [1] > > and I cross-posted there as well. > > > > [1] > > > https://lists.apache.org/thread.html/re4fe4bc80c9eadd446761588f9b03d827193f91269a7c14ce0c444dd%40%3Cdev.arrow.apache.org%3E > > > > On Thu, Sep 3, 2020 at 3:20 PM Radu Teodorescu > <radukay...@yahoo.com.invalid> > > wrote: > > > >> Hello, > >> What is the current thinking around allowing the logical content of a > >> parquet file to be split across multiple files? > >> I see that in theory there is support for reading files where different > >> row groups are in separate files but I cannot see any features that > allow > >> that for writing. > >> > >> On a somewhat related note, what are the thoughts on supporting parquet > >> file append mode? > >> Specifically if the meatadata is stored in a standalone file one can > >> easily add new row groups to an existing file and create a new version > of > >> the metadata file without affecting potential consumers of the existing > >> data. > >> > >> > >> > >