Vivek, You might want to try MERGE INTO again. You should be able to make it more efficient by adding predicates to the ON clause. Those will get pushed down to the target table to avoid a big scan.
Iceberg supports transactions to do what you want, but it doesn't use table locking. Instead, it will apply both operations when retrying the transaction commit. So it still uses optimistic concurrency for coordination. If the commit fails, then nothing in your table would be changed by operations in the transaction. The difficulty is that Spark doesn't support transactions with SQL statements. On Mon, May 3, 2021 at 9:54 AM vivek B <vivekbalachan...@gmail.com> wrote: > > > On 2021/05/02 18:10:19, Ryan Blue <b...@apache.org> wrote: > > Vivek, > > > > Currently, Spark doesn't support any of the BEGIN/COMMIT statements for > > transactions, so I don't think that it is possible right now. What are > you > > trying to do? It may be that some of the newer commands, like MERGE INTO, > > would work for you instead. > > > > On Thu, Apr 29, 2021 at 5:49 PM vivek B <vivekbalachan...@gmail.com> > wrote: > > > > > Hey All, > > > Is there a way to run multiple sql operations via spark as one single > > > transaction ? > > > > > > Thanks, > > > vivek > > > > > > > > > -- > Ryan Blue > > I was using merge into command to do insert update and delete in one sql > query. > But found it to be slow and I am guessing it may be due to fact that > merge into reads whole iceberg table into spark and does join. > > So wanted to do explicitly delete , update and insert on iceberg table. > I was asking whether there is a way to hold lock on iceberg table.(so > that anybody else cannot write to iceberg table and increment the snapshot > id ). > And apply some sql operations. but if anything goes wrong then roll back > to snapshot id that was there at the beginning of my sql operations. > > Thanks, > vivek > > > -- Ryan Blue