Hi,

I am trying to understand the usage of Transactions in Iceberg with
"commit.retry.num-retries" set to zero. My requirement is that the
transaction must fail if the table gets updated by any concurrent
transaction after opening the transaction.

I wrote the following unit test in TestHadoopTables.java to verify the
behaviour. I am noticing that both transactions are committing one after
the other leading to an unexpected table state. Could anyone please confirm
if I am doing anything wrong, or whether Iceberg transaction commit logic
needs any change?

This test is very simple. It opens two transactions one after another, adds
a file as part of the transaction, and commits them one after the other. My
requirement is that the second transaction must fail with
CommitFailedException. But, it is successfully committing.

@Test
  public void testSimpleConcurrentTransaction() {
    PartitionSpec spec = PartitionSpec.builderFor(SCHEMA)
            .build();

    // set table property to avoid retries during commit
    final Map<String, String> tableProperties = Stream.of(new String[][] {
            { TableProperties.COMMIT_NUM_RETRIES, "0"
}}).collect(Collectors.toMap(d->d[0], d->d[1]));

    final DataFile FILE_A = DataFiles.builder(spec)
            .withPath("/path/to/data-a.parquet")
            .withFileSizeInBytes(10)
            .withRecordCount(1)
            .build();

    Table table = TABLES.create(SCHEMA, spec, tableProperties,
tableDir.toURI().toString());

    // It is an empty table, so there is no snapshot yet
    Assert.assertEquals("Current snapshot must be null", null,
table.currentSnapshot());

    // start transaction t1
    Transaction t1 = table.newTransaction();

    // start transaction t2
    Transaction t2 = table.newTransaction();

    // t1 is adding a data file
    t1.newAppend()
            .appendFile(FILE_A)
            .commit();

    // t2 is adding a data file
    t2.newAppend()
            .appendFile(FILE_A)
            .commit();

    // commit transaction t1
    t1.commitTransaction();

    // commit transaction t2: My requirement is that the following commit
must fail
    t2.commitTransaction();

    table.refresh();
    List<ManifestFile> manifests = table.currentSnapshot().allManifests();

    // Following assert fails since both transaction added one each
manifest file
    Assert.assertEquals("Should have 1 manifest file", 1, manifests.size());
  }

Please suggest whether there is a way to commit transactions such that the
second one fails. Thank you so much.

Reply via email to