Re: tradeoffs between serializable vs snapshot isolation for single writer

Szehon Ho Thu, 04 May 2023 15:22:24 -0700
Whoops, I didn’t see Ryan answer already. 

> On May 4, 2023, at 3:18 PM, Szehon Ho <szehon...@apple.com.INVALID> wrote:
> 
> Hi,
> 
> I believe it only matters if you have conflicting commits.  For single writer 
> case, I think you are right and it should not matter, so you may save very 
> slightly in performance by turning it to Snapshot Isolation.  The checks are 
> metadata checks though, so I would think it will not be a signfiicant 
> performance difference.
> 
> In general, the isolation levels in Iceberg work by checking before commit to 
> see if there are any conflicting changes to data files about to be committed, 
> from when the operation first started (ie, starting snapshot id).  So if 
> there is a failure due to the isolation level, I believe the error bubbles 
> back the application to try again, hence ‘pessimistic’.  
> 
> Note, metadata conflicts are automatically retried and should rarely bubble 
> up to user, so only in case of data isolation level conflict (ie, you delete 
> a file that is currently being rewritten by another operation), will 
> error-handling be required.
> 
> Hope that helps
> Szehon 
> 
>> On May 4, 2023, at 12:19 PM, Nirav Patel <nira...@gmail.com> wrote:
>> 
>> I am trying to ingest data into iceberg table using spark streaming. There 
>> are no multiple writers to same data at the moment. According to iceberg api 
>> <https://iceberg.apache.org/javadoc/0.11.0/org/apache/iceberg/IsolationLevel.html#:%7E:text=Both%20of%20them%20provide%20a,environments%20with%20many%20concurrent%20writers.>
>>  default isolation level for table is serializable . I want to understand if 
>> there is only a single application (single spark streaming job in my case) 
>> writing to iceberg table is there any advantage or disadvantage over using 
>> serializable or a snapshot isolation ? Is there any performance impact of 
>> using serializable when only one application is writing to table? Also it 
>> seems iceberg allows all writers to write into snapshot and use OCC to 
>> decide if one needs to retry because it was late. In this case how it is 
>> serializable at all? isn't serilizability achieved via pessimistic 
>> concurrency control? Would like to understand how iceberg implement 
>> serializable isolation level and how it is different than snapshot isolation 
>> ?
>> 
>> Thanks
>
Re: tradeoffs between serializable vs snapshot isolation for single writer

Reply via email to