Whoops, I didn’t see Ryan answer already.
> On May 4, 2023, at 3:18 PM, Szehon Ho <szehon...@apple.com.INVALID> wrote:
>
> Hi,
>
> I believe it only matters if you have conflicting commits. For single writer
> case, I think you are right and it should not matter, so you may save very
> slightly in performance by turning it to Snapshot Isolation. The checks are
> metadata checks though, so I would think it will not be a signfiicant
> performance difference.
>
> In general, the isolation levels in Iceberg work by checking before commit to
> see if there are any conflicting changes to data files about to be committed,
> from when the operation first started (ie, starting snapshot id). So if
> there is a failure due to the isolation level, I believe the error bubbles
> back the application to try again, hence ‘pessimistic’.
>
> Note, metadata conflicts are automatically retried and should rarely bubble
> up to user, so only in case of data isolation level conflict (ie, you delete
> a file that is currently being rewritten by another operation), will
> error-handling be required.
>
> Hope that helps
> Szehon
>
>> On May 4, 2023, at 12:19 PM, Nirav Patel <nira...@gmail.com> wrote:
>>
>> I am trying to ingest data into iceberg table using spark streaming. There
>> are no multiple writers to same data at the moment. According to iceberg api
>> <https://iceberg.apache.org/javadoc/0.11.0/org/apache/iceberg/IsolationLevel.html#:%7E:text=Both%20of%20them%20provide%20a,environments%20with%20many%20concurrent%20writers.>
>> default isolation level for table is serializable . I want to understand if
>> there is only a single application (single spark streaming job in my case)
>> writing to iceberg table is there any advantage or disadvantage over using
>> serializable or a snapshot isolation ? Is there any performance impact of
>> using serializable when only one application is writing to table? Also it
>> seems iceberg allows all writers to write into snapshot and use OCC to
>> decide if one needs to retry because it was late. In this case how it is
>> serializable at all? isn't serilizability achieved via pessimistic
>> concurrency control? Would like to understand how iceberg implement
>> serializable isolation level and how it is different than snapshot isolation
>> ?
>>
>> Thanks
>