Do we want to create data file format specific maps for these metadata to better separate for the differentformats? I don't think having Parquet footer size is relevant for Avro or ORC.
On Mon, Feb 10, 2025, 06:52 Xuanwo <xua...@apache.org> wrote: > +1 non-binding from me. > > I love this idea. Even though S3 supports reading from the tail, this > value can still be useful in cases where the size is incorrectly hinted, > requiring an additional read for the Parquet footer size. > > On Mon, Feb 10, 2025, at 09:21, Anton Okolnychyi wrote: > > +1 from me, I'd love to see this implemented (maybe even in V3 if anyone > is willing to pick it up?). > > Eduard and I were discussing DV file compaction where we need to know the > ratio of live vs orphan DVs in a particular DV file. Manifests contain > sizes of individual DV blobs as well as the total DV file size. In order to > compute the ratio of live DVs accurately, we have to subtract the footer > size from the total file size. Doing this without an extra read would be > great. > > - Anton > > нд, 9 лют. 2025 р. о 13:51 Daniel Weeks <dwe...@apache.org> пише: > > Hey Sreeram, > > Sounds like there's a fair amount of interest/support for this. Anton > also mentioned that having this information would help estimate orphaned > DVs, so there's multiple cases where this would be beneficial. > > We might want to tie this change to a format version release (even if just > an optional field) because any metadata rewrites may result in dropping the > value. > > Did you want to put together a proposal for the changes? > > Best, > -Dan > > On Sat, Feb 8, 2025 at 11:31 AM Micah Kornfield <emkornfi...@gmail.com> > wrote: > > +1 I think this is probably useful for wider schemas that typically have > larger footers that go past the heuristic. > > It would be good to have some concrete numbers on how much this impacts > workloads before committing to it. > > Cheers, > Micah > > On Thu, Jan 30, 2025 at 7:10 AM Steve Loughran <ste...@cloudera.com.invalid> > wrote: > > > Knowing the footer offset would be really useful if passed down to > whatever is implementing the input stream, along with the actual file size. > > This can be used for prefetching the footer, as well as caching it (Azure > ABFS, google GCS connectors): right now they guess that about 1MB is all > they need. > > while readTail() can get bytes off the end, it doesn't pass that > information down to the stream, to do its own thing. > > The Analytics stream which the AWS S3 team are getting into the s3a code ( > https://issues.apache.org/jira/browse/HADOOP-19363) goes one step further > than the others: it parses that footer itself and tries to predict where > application code is going to read next: as you read one rowgroup it > speculatively fetch the next one, even as the first one is downloaded. > > Again, it guesses on footer size: pass that in and they will know what to > fetch and store. Ideally this should be accompanied by file type (parquet, > avro) and your actual read plans (vectored, random, sequential, > whole-file). With this information you can cut out a number of > wasted/inefficient S3 calls, and tune fetching/caching policy appropriately. > > Anyway: > +1 to footer length, and if already known, file length should come down > too, along with that read plan. saying "parque, vectored, randomt" will be > enough, which is what a draft PR i have for hadoop fileIO does. > > On Wed, 22 Jan 2025 at 03:39, Sreeram Garlapati <gsreeramku...@gmail.com> > wrote: > > Thanks for the nice idea/suggestion, Dan. > Yes, we have been employing a similar technique that you noted below and > kinda arrived at the conclusion that there is no deterministic way to > achieve that most optimal situation, ie., single i/o call to S3 to read the > parquet footer. > > Best, > Sreeram > > On Tue, Jan 21, 2025 at 4:20 PM Daniel Weeks <dwe...@apache.org> wrote: > > Hey Sreeram, > > I think it's worthwhile to consider what value would be added by tracking > the footer size in metadata, but there are other options to address these > optimization use cases. > > For example, if you take a look at the RangeReadable > <https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/io/RangeReadable.java#L68> > interface > for FileIO implementations, there's a readTail method so that you can > optimistically read from the tail end of the file to try to fetch the full > footer in a single read. This is even optimized in some of the > implementations (like S3InputStream) to leverage backward reads as opposed > to seek operations which might have overhead. > > Depending on the size of the file, you may want to load just the tail or > the whole file to avoid all reads. Having the exact value definitely will > make this more exact, but I feel like using the above approach can > approximate the same performance benefits. > > Just a thought, > -Dan > > On Tue, Jan 21, 2025 at 12:17 PM Sreeram Garlapati < > gsreeramku...@gmail.com> wrote: > > Hello Team! > > This is a small improvement proposal to store the *parquet footer size* > as part of the *data_file* metadata in the iceberg manifest > <https://iceberg.apache.org/spec/#manifests>. > *manifest_entry > (2) data_file > (146 Optional) > footer_size_in_bytes* > > *Motivation*: > > - We have several sub-second read use cases on iceberg tables. We > store icebergs and parquets on S3. Every hop to S3 is v.expensive (P99 of > >200 milliseconds). Hence we are trying to see if we can optimize by > cutting down any of these hops. One such hop is during the Parquet file > read., the first read to the parquet, which is to read the last 8 bytes - > to read the - footer size and par1 sequence. > - Iceberg metadata already includes the file_size_in_bytes. Including > the footer size benefits all the readers. ie., readers can directly issue 1 > I/O call to read the footer - *read_parquet_footer(filehandle, > offset=file_size_in_bytes-footer_size_in_bytes-1)* > - This is similar to what we have in the iceberg specification in the > case of storing Table statistics > <https://iceberg.apache.org/spec/#table-statistics>, puffins > > *file-footer-size-in-bytes*. > - This can be easily extended to ORC as needed too. Perhaps, in the > ORC case, an additional property to store the postscript length is also > needed. > > Truly appreciate your thoughts, > Sreeram <https://www.linkedin.com/in/sreeramgarlapati> > > Xuanwo > > https://xuanwo.io/ > >