Re:Re: Re: Re: Refactor the code of HadoopTableOptions

lisoda Fri, 05 Jul 2024 02:55:05 -0700

Thank you for your reply.I re-examined the jdbc-catalog implementation and it 
cleverly uses UpdatedRecords for the checksum.So there is nothing wrong with 
the implementation.That was my mistake, thanks for pointing it out.












At 2024-07-05 16:09:54, "Jean-Baptiste Onofré" <j...@nanthrax.net> wrote:
>Hi,
>
>Actually the JDBC catalog relies on the RDBMS backend of the lock.
>That's one of the reasons why we are using a single RDBMS table for
>both tables and views. So, I don't think we would need a lock
>mechanism for JDBC, the RDBMS one is OK for now.
>About FileIO, we can always extend it, but as it's used in different
>Iceberg layers (like ResolvedFileIO for instance), we have to be
>careful adding new operations here, especially if it's specific for
>HadoopCatalog table/view operations. I will take a look.
>
>Thanks !
>Regards
>JB
>
>On Thu, Jul 4, 2024 at 4:49 PM lisoda <lis...@yeah.net> wrote:
>>
>> yea.If I'm not mistaken, the jdbc catalog has the same problem with 
>> concurrent commits.It doesn't have any locks to control concurrency.In other 
>> words, LockManager can be used for jdbcCatalog as well.
>>
>> Also, for the part about unbundling hadoop.I have a suggestion. Can we 
>> extend the FileIO interface so that all operations are implemented using 
>> FileIO?
>>
>>
>>
>>
>>
>>
>> 在 2024-07-04 23:38:30，"Jean-Baptiste Onofré" <j...@nanthrax.net> 写道：
>> >Yeah, I agree with the distributed locking service. Maybe we can
>> >imagine a pluggable (by configuration) lock service depending of the
>> >user infra.
>> >
>> >For the view support, I can take a look (as I worked on the JDBC
>> >catalog view support).
>> >
>> >Anyway, I'm gonna take a look at your PR. Thanks again for your 
>> >contribution !
>> >
>> >Regards
>> >JB
>> >
>> >On Thu, Jul 4, 2024 at 4:05 PM lisoda <lis...@yeah.net> wrote:
>> >>
>> >> Hello.
>> >> Yea. Improving the commit mechanism is just the beginning.We also need to 
>> >> implement a distributed locking service for users who use object stores.I 
>> >> think the next step is to support iceberg-view and such.
>> >> But I've never used iceberg's views before.It will take me some time to 
>> >> familiarise myself with the functionality of the view section, if I'm to 
>> >> be of any assistance. But if you need my help, I'll do anything what I 
>> >> can.
>> >> Anyway, I'm glad to hear from you.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> 在 2024-07-04 22:04:17，"Jean-Baptiste Onofré" <j...@nanthrax.net> 写道：
>> >> >Hi,
>> >> >
>> >> >Thanks for the heads up and working on this !
>> >> >
>> >> >My understanding of the HadoopCatalog is that we would need more than
>> >> >an improved commit mechanism to be production ready (I'm thinking on
>> >> >scalability, or view support). What's your thoughts?
>> >> >By the way, I'm happy to take a look at adding view support if it helps.
>> >> >
>> >> >Regards
>> >> >JB
>> >> >
>> >> >On Thu, Jul 4, 2024 at 8:27 AM lisoda <lis...@yeah.net> wrote:
>> >> >>
>> >> >> Hi Team.
>> >> >> I've refactored the logic of the commit method in 
>> >> >> HadoopTableOptions.With this refactoring, I believe that hadoopCatalog 
>> >> >> is ready to be used in a production environment. Now 
>> >> >> HadoopTableOptions can implement atomic commits while being compatible 
>> >> >> with the differences in behaviour between block and object 
>> >> >> stores.Concurrency control is also supported.if anyone can assist me 
>> >> >> in reiewing this PR, that would be great.
>> >> >> Also, any FileSystemCatalog's user can comment on this PR. Any advice 
>> >> >> would be invaluable to me.
>> >> >> Thank you all.
>> >> >>
>> >> >> PR:https://github.com/apache/iceberg/pull/10623
>> >> >> SLACK:https://apache-iceberg.slack.com/archives/C03LG1D563F/p1719993403208859

Re:Re: Re: Re: Refactor the code of HadoopTableOptions

Reply via email to