Hi Iceberg Dev,
I tried searching for it in the specification but couldn't find anything
explicit:
1. Is it assumed that all data files and delete files will always have
globally unique names in a table?
2. Is it expected that the pairwise intersection of all manifest files in
a snapshot is empt
Hi Iceberg dev
As we all know, in our current apache iceberg write path, the ORC file
writer cannot just roll over to a new file once its byte size reaches the
expected threshold. The core reason that we don't support this before is:
The lack of correct approach to estimate the byte size from
Hi Openinx.
Thanks for bringing this to our attention. And many thanks to hiliwei for
their willingness to tackle big problems and little problems.
I wanted to say that I think most anything that’s relatively close would be
better than the current situation most likely (where the feature is
disab
Thanks to openinx for opening this discussion.
One thing to note, the current approach faces a problem, because of some
optimization mechanisms, when writing a large amount of duplicate data,
there will be some deviation between the estimated and the actual size.
However, when cached data is flush
> As their widths are not the same, I think we may need to use an average
width minus the batch.size (which is row count actually).
@Kyle, sorry I miss-typed the word before. I mean "need an average width
multiplied by the batch.size".
On Fri, Mar 4, 2022 at 1:29 PM liwei li wrote:
> Thanks to