Do we want to create data file format specific maps for these metadata to
better separate for the differentformats? I don't think having Parquet
footer size is relevant for Avro or ORC.

On Mon, Feb 10, 2025, 06:52 Xuanwo <xua...@apache.org> wrote:

> +1 non-binding from me.
>
> I love this idea. Even though S3 supports reading from the tail, this
> value can still be useful in cases where the size is incorrectly hinted,
> requiring an additional read for the Parquet footer size.
>
> On Mon, Feb 10, 2025, at 09:21, Anton Okolnychyi wrote:
>
> +1 from me, I'd love to see this implemented (maybe even in V3 if anyone
> is willing to pick it up?).
>
> Eduard and I were discussing DV file compaction where we need to know the
> ratio of live vs orphan DVs in a particular DV file. Manifests contain
> sizes of individual DV blobs as well as the total DV file size. In order to
> compute the ratio of live DVs accurately, we have to subtract the footer
> size from the total file size. Doing this without an extra read would be
> great.
>
> - Anton
>
> нд, 9 лют. 2025 р. о 13:51 Daniel Weeks <dwe...@apache.org> пише:
>
> Hey Sreeram,
>
> Sounds like there's a fair amount of interest/support for this.  Anton
> also mentioned that having this information would help estimate orphaned
> DVs, so there's multiple cases where this would be beneficial.
>
> We might want to tie this change to a format version release (even if just
> an optional field) because any metadata rewrites may result in dropping the
> value.
>
> Did you want to put together a proposal for the changes?
>
> Best,
> -Dan
>
> On Sat, Feb 8, 2025 at 11:31 AM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
>
> +1  I think this is probably useful for wider schemas that typically have
> larger footers that go past the heuristic.
>
> It would be good to have some concrete numbers on how much this impacts
> workloads before committing to it.
>
> Cheers,
> Micah
>
> On Thu, Jan 30, 2025 at 7:10 AM Steve Loughran <ste...@cloudera.com.invalid>
> wrote:
>
>
> Knowing the footer offset would be really useful if passed down to
> whatever is implementing the input stream, along with the actual file size.
>
> This can be used for prefetching the footer, as well as caching it (Azure
> ABFS, google GCS connectors): right now they guess that about 1MB is all
> they need.
>
> while readTail() can get bytes off the end, it doesn't pass that
> information down to the stream, to do its own thing.
>
> The Analytics stream which the AWS S3 team are getting into the s3a code (
> https://issues.apache.org/jira/browse/HADOOP-19363) goes one step further
> than the others: it parses that footer itself and tries to predict where
> application code is going to read next: as you read one rowgroup it
> speculatively fetch the next one, even as the first one is downloaded.
>
> Again, it guesses on footer size: pass that in and they will know what to
> fetch and store. Ideally this should be accompanied by file type (parquet,
> avro) and your actual read plans (vectored, random, sequential,
> whole-file). With this information you can cut out a number of
> wasted/inefficient S3 calls, and tune fetching/caching policy appropriately.
>
> Anyway:
> +1 to footer length, and if already known, file length should come down
> too, along with that read plan. saying "parque, vectored, randomt" will be
> enough, which is what a draft PR i have for hadoop fileIO does.
>
> On Wed, 22 Jan 2025 at 03:39, Sreeram Garlapati <gsreeramku...@gmail.com>
> wrote:
>
> Thanks for the nice idea/suggestion, Dan.
> Yes, we have been employing a similar technique that you noted below and
> kinda arrived at the conclusion that there is no deterministic way to
> achieve that most optimal situation, ie., single i/o call to S3 to read the
> parquet footer.
>
> Best,
> Sreeram
>
> On Tue, Jan 21, 2025 at 4:20 PM Daniel Weeks <dwe...@apache.org> wrote:
>
> Hey Sreeram,
>
> I think it's worthwhile to consider what value would be added by tracking
> the footer size in metadata, but there are other options to address these
> optimization use cases.
>
> For example, if you take a look at the RangeReadable
> <https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/io/RangeReadable.java#L68>
>  interface
> for FileIO implementations, there's a readTail method so that you can
> optimistically read from the tail end of the file to try to fetch the full
> footer in a single read.  This is even optimized in some of the
> implementations (like S3InputStream) to leverage backward reads as opposed
> to seek operations which might have overhead.
>
> Depending on the size of the file, you may want to load just the tail or
> the whole file to avoid all reads.  Having the exact value definitely will
> make this more exact, but I feel like using the above approach can
> approximate the same performance benefits.
>
> Just a thought,
> -Dan
>
> On Tue, Jan 21, 2025 at 12:17 PM Sreeram Garlapati <
> gsreeramku...@gmail.com> wrote:
>
> Hello Team!
>
> This is a small improvement proposal to store the *parquet footer size*
> as part of the *data_file* metadata in the iceberg manifest
> <https://iceberg.apache.org/spec/#manifests>.
> *manifest_entry   >   (2) data_file  >  (146 Optional)
> footer_size_in_bytes*
>
> *Motivation*:
>
>    - We have several sub-second read use cases on iceberg tables. We
>    store icebergs and parquets on S3. Every hop to S3 is v.expensive (P99 of
>    >200 milliseconds). Hence we are trying to see if we can optimize by
>    cutting down any of these hops. One such hop is during the Parquet file
>    read., the first read to the parquet, which is to read the last 8 bytes -
>    to read the - footer size and par1 sequence.
>    - Iceberg metadata already includes the file_size_in_bytes. Including
>    the footer size benefits all the readers. ie., readers can directly issue 1
>    I/O call to read the footer - *read_parquet_footer(filehandle,
>    offset=file_size_in_bytes-footer_size_in_bytes-1)*
>    - This is similar to what we have in the iceberg specification in the
>    case of storing Table statistics
>    <https://iceberg.apache.org/spec/#table-statistics>, puffins >
>    *file-footer-size-in-bytes*.
>    - This can be easily extended to ORC as needed too. Perhaps, in the
>    ORC case, an additional property to store the postscript length is also
>    needed.
>
> Truly appreciate your thoughts,
> Sreeram <https://www.linkedin.com/in/sreeramgarlapati>
>
> Xuanwo
>
> https://xuanwo.io/
>
>

Reply via email to