Hi Radu,
It might be easier to get feedback on some concrete code. Perhaps make a PR
with a proof of concept and we can discuss there?

Neal

On Fri, Sep 4, 2020 at 4:27 AM Radu Teodorescu <radukay...@yahoo.com.invalid>
wrote:

> Micah and all,
> Thanks for that pointer, I certainly didn’t follow it in detail at the
> time.
>
> My question/thoughts are actually more limited in scope and I am
> specifically targeting features supported by the standard AND are supported
> by other major parquet implementation.
>
> Specifically I would like to enable support for the having RowGroups in
> separate file and (as a side effect) be able to keep metadata in a separate
> file.
> This seems to be supported by the spec and by most readers including arrow
> (at least from scanning the code).
>
> If the above are true (or at least not known to be false), it seems like
> the writer can be modified fairly easily to support that and I am happy to
> look into making that change.
>
> Thoughts?
> Radu
>
> PS: don’t mean to be stubborn by keeping it on the arrow group, but it
> seems like it is an arrow implementation specific goal.
>
>
>
>
>
> > On Sep 3, 2020, at 6:42 PM, Micah Kornfield <emkornfi...@gmail.com>
> wrote:
> >
> > Hi Radu,
> > This is a conversation best had on dev@parquet.  It came up recently [1]
> > and I cross-posted there as well.
> >
> > [1]
> >
> https://lists.apache.org/thread.html/re4fe4bc80c9eadd446761588f9b03d827193f91269a7c14ce0c444dd%40%3Cdev.arrow.apache.org%3E
> >
> > On Thu, Sep 3, 2020 at 3:20 PM Radu Teodorescu
> <radukay...@yahoo.com.invalid>
> > wrote:
> >
> >> Hello,
> >> What is the current thinking around allowing the logical content of a
> >> parquet file to be split across multiple files?
> >> I see that in theory there is support for reading files where different
> >> row groups are in separate files but I cannot see any features that
> allow
> >> that for writing.
> >>
> >> On a somewhat related note, what are the thoughts on supporting parquet
> >> file append mode?
> >> Specifically if the meatadata is stored in a standalone file one can
> >> easily add new row groups to an existing file and create a new version
> of
> >> the metadata file without affecting potential consumers of the existing
> >> data.
> >>
> >>
> >>
>
>

Reply via email to