Hi Ryan, I'll give it a try.
Regards, L. On Thu, 12 Mar 2020 at 18:16, Ryan Blue <rb...@netflix.com.invalid> wrote: > Hi Luis, > > You're right about what's happening. Because the Avro appender doesn't > track column-level stats, Iceberg can't determine that the file only > contains matching data rows and can be deleted. Parquet does keep those > stats, so even though the partitioning doesn't guarantee the delete is > safe, Iceberg can determine that it is. > > The solution is to add column-level stats for Avro files. Is that > something you're interested in working on? > > rb > > On Thu, Mar 12, 2020 at 10:09 AM Luis Otero <lote...@gmail.com> wrote: > >> Hi, >> >> AvroFileAppender doesn't report min/max values ( >> https://github.com/apache/incubator-iceberg/blob/80cbc60ee55911ee627a7ad3013804394d7b5e9a/core/src/main/java/org/apache/iceberg/avro/AvroFileAppender.java#L60 >> ). >> >> As a side effect (I think) overwrite operations (if there are data files >> with the same partition) fail with "Cannot delete file where some, but not >> all, rows match filter" because StrictMetricsEvaluator can't confirm all >> rows match. >> >> For instance, if you modify TestLocalScan with: >> >> this.partitionSpec = >> PartitionSpec.builderFor(SCHEMA).bucket("id",10).build(); >> >> this.file1Records = new ArrayList<Record>(); >> file1Records.add(record.copy(ImmutableMap.of("id", 60L, "data", >> UUID.randomUUID().toString()))); >> DataFile file1 = writeFile(sharedTable.location(), >> format.addExtension("file-1"), file1Records); >> >> this.file2Records = new ArrayList<Record>(); >> file2Records.add(record.copy(ImmutableMap.of("id", 1L, "data", >> UUID.randomUUID().toString()))); >> DataFile file2 = writeFile(sharedTable.location(), >> format.addExtension("file-2"), file2Records); >> >> this.file3Records = new ArrayList<Record>(); >> file3Records.add(record.copy(ImmutableMap.of("id", 1L, "data", >> UUID.randomUUID().toString()))); >> DataFile file3 = writeFile(sharedTable.location(), >> format.addExtension("file-3"), file3Records); >> >> sharedTable.newAppend() >> .appendFile(file1) >> .commit(); >> >> sharedTable.newAppend() >> .appendFile(file2) >> .commit(); >> >> sharedTable.newOverwrite() >> .overwriteByRowFilter(equal("id",1L)) >> .addFile(file3) >> .commit(); >> >> >> Fails with 'org.apache.iceberg.exceptions.ValidationException: Cannot >> delete file where some, but not all, rows match filter ref(name="id") == 1: >> file:/AVRO/file-2.avro' for AVRO format but works fine for PARQUET format. >> >> Am I missing something here? >> >> Thanks!! >> > > > -- > Ryan Blue > Software Engineer > Netflix >