Yeah, I would probably ignore the column size metric. That's really more
for columnar formats, where we could use it to estimate how much data from
a row group is being projected. For Avro, we'd have to read the same amount
either way.
For this, I'd probably create an appender that wraps another a
Feedback/guidance request:
Byte size info in avro is encapsulated in encoder
(org.apache.avro.io.BufferedBinaryEncoder) and is not exposed by avro api.
Should we carry on with the task ignoring that metric (gathering as much
info as we can inside Iceberg)?
Is it feasible to get Avro modified (to
Hi Ryan,
I'll give it a try.
Regards,
L.
On Thu, 12 Mar 2020 at 18:16, Ryan Blue wrote:
> Hi Luis,
>
> You're right about what's happening. Because the Avro appender doesn't
> track column-level stats, Iceberg can't determine that the file only
> contains matching data rows and can be deleted.
Hi Luis,
You're right about what's happening. Because the Avro appender doesn't
track column-level stats, Iceberg can't determine that the file only
contains matching data rows and can be deleted. Parquet does keep those
stats, so even though the partitioning doesn't guarantee the delete is
safe,