Re: Re: Re: Re: Refactor the code of HadoopTableOptions

2024-07-14 Thread Jean-Baptiste Onofré
Hi I agree with Ryan and it was my comment in a previous message: "About FileIO, we can always extend it, but as it's used in different Iceberg layers (like ResolvedFileIO for instance), we have to be careful adding new operations here, especially if it's specific for HadoopCatalog table/view ope

Re:Re: Re: Re: Re: Refactor the code of HadoopTableOptions

2024-07-14 Thread lisoda
Regarding HadoopTableOptions, if the filesystem supports rename operations that do not overwrite the target file, the entire HadoopTableOptions does not need to use lockManager. One of the reasons for keeping LockManager is simply because it was used in the code for the original implementation.

Re: Re: Re: Re: Refactor the code of HadoopTableOptions

2024-07-12 Thread Ryan Blue
FileIO purposely does not support a rename operation because we wanted to keep a minimal API that handled object stores correctly rather than using a FileSystem concept. While we may need some extensions outside of what the core provides for reading and writing tables, I think we still need to be c

Re: Re: Re: Re: Refactor the code of HadoopTableOptions

2024-07-11 Thread Jean-Baptiste Onofré
Thanks ! I will take a look asap (I would like to complete the 1.6.0 release soon). Regards JB On Fri, Jul 12, 2024 at 8:07 AM lisoda wrote: > > Hi,Sir. > I've finished extending the usual distributed locks.I think we'll no need to > extend distributed locks for a long time. > > PR:https://git

Re:Re: Re: Re: Refactor the code of HadoopTableOptions

2024-07-11 Thread lisoda
Hi,Sir. I've finished extending the usual distributed locks.I think we'll no need to extend distributed locks for a long time. PR:https://github.com/apache/iceberg/pull/10688 As a next step, I'm going to try to extend FileIO to support operations like rename. It would be great if you could gi

Re:Re:Re: Re: Refactor the code of HadoopTableOptions

2024-07-09 Thread lisoda
Sir, I have added some comments in PR and I hope it will help you. If you have some other suggestions, please let me know, thanks. 在 2024-07-04 23:49:34,"lisoda" 写道: yea.If I'm not mistaken, the jdbc catalog has the same problem with concurrent commits.It doesn't have any locks to co

Re:Re: Re: Re: Refactor the code of HadoopTableOptions

2024-07-05 Thread lisoda
Thank you for your reply.I re-examined the jdbc-catalog implementation and it cleverly uses UpdatedRecords for the checksum.So there is nothing wrong with the implementation.That was my mistake, thanks for pointing it out. At 2024-07-05 16:09:54, "Jean-Baptiste Onofré" wrote: >Hi, > >

Re: Re: Re: Refactor the code of HadoopTableOptions

2024-07-05 Thread Jean-Baptiste Onofré
Hi, Actually the JDBC catalog relies on the RDBMS backend of the lock. That's one of the reasons why we are using a single RDBMS table for both tables and views. So, I don't think we would need a lock mechanism for JDBC, the RDBMS one is OK for now. About FileIO, we can always extend it, but as it

Re:Re: Re: Refactor the code of HadoopTableOptions

2024-07-04 Thread lisoda
yea.If I'm not mistaken, the jdbc catalog has the same problem with concurrent commits.It doesn't have any locks to control concurrency.In other words, LockManager can be used for jdbcCatalog as well. Also, for the part about unbundling hadoop.I have a suggestion. Can we extend the FileIO inte

Re: Re: Refactor the code of HadoopTableOptions

2024-07-04 Thread Jean-Baptiste Onofré
Yeah, I agree with the distributed locking service. Maybe we can imagine a pluggable (by configuration) lock service depending of the user infra. For the view support, I can take a look (as I worked on the JDBC catalog view support). Anyway, I'm gonna take a look at your PR. Thanks again for your

Re: Refactor the code of HadoopTableOptions

2024-07-04 Thread Jean-Baptiste Onofré
Hi, Thanks for the heads up and working on this ! My understanding of the HadoopCatalog is that we would need more than an improved commit mechanism to be production ready (I'm thinking on scalability, or view support). What's your thoughts? By the way, I'm happy to take a look at adding view sup