Re: tradeoffs between serializable vs snapshot isolation for single writer

Szehon Ho Thu, 04 May 2023 15:19:27 -0700

Hi,

I believe it only matters if you have conflicting commits.  For single writer 
case, I think you are right and it should not matter, so you may save very 
slightly in performance by turning it to Snapshot Isolation.  The checks are 
metadata checks though, so I would think it will not be a signfiicant 
performance difference.

In general, the isolation levels in Iceberg work by checking before commit to 
see if there are any conflicting changes to data files about to be committed, 
from when the operation first started (ie, starting snapshot id).  So if there 
is a failure due to the isolation level, I believe the error bubbles back the 
application to try again, hence ‘pessimistic’.  

Note, metadata conflicts are automatically retried and should rarely bubble up 
to user, so only in case of data isolation level conflict (ie, you delete a 
file that is currently being rewritten by another operation), will 
error-handling be required.

Hope that helps
Szehon 

> On May 4, 2023, at 12:19 PM, Nirav Patel <nira...@gmail.com> wrote:
> 
> I am trying to ingest data into iceberg table using spark streaming. There 
> are no multiple writers to same data at the moment. According to iceberg api 
> <https://iceberg.apache.org/javadoc/0.11.0/org/apache/iceberg/IsolationLevel.html#:%7E:text=Both%20of%20them%20provide%20a,environments%20with%20many%20concurrent%20writers.>
>  default isolation level for table is serializable . I want to understand if 
> there is only a single application (single spark streaming job in my case) 
> writing to iceberg table is there any advantage or disadvantage over using 
> serializable or a snapshot isolation ? Is there any performance impact of 
> using serializable when only one application is writing to table? Also it 
> seems iceberg allows all writers to write into snapshot and use OCC to decide 
> if one needs to retry because it was late. In this case how it is 
> serializable at all? isn't serilizability achieved via pessimistic 
> concurrency control? Would like to understand how iceberg implement 
> serializable isolation level and how it is different than snapshot isolation ?
> 
> Thanks

Re: tradeoffs between serializable vs snapshot isolation for single writer

Reply via email to