lisoda, I don't think there is a good way to fix the HadoopCatalog implementation. That's why we recommend not using it.
In the quickstart, the assumption is that you're using a Hive catalog. The HadoopCatalog example shows how to add additional catalogs (in this case, a local one for testing). I don't think it is too misleading, but perhaps we should change that so people are not confused. I'll start a thread about deprecating HadoopCatalog and HadopTableOperations since the operations are unsafe. On Wed, Jul 17, 2024 at 2:45 AM lisoda <lis...@yeah.net> wrote: > Hello steven. > > HadoopCatalog does have many problems, but because the community added it > to the QuickStart chapter in the first place, many users have actually > stayed with hadoopCatalog. There is a huge cost to switching catalogs. In > addition, HIVE even uses HadoopCatalog as an implementation of > iceberg-external-table. In other words, HadoopCatalog is actually heavily > used in production environments without the user's knowledge. > > Against this background, there are two things we can do: > 1. guide the user to replace the catalog implementation. > 2. Fix hadoopCatalog. > > We chose the second option and received good feedback from our users. I'm > proud of the results of our work, as we have actually solved a large number > of user problems. > > In addition, based on our latest research, we are confident that we can > actually manage catalogues reliably without relying on distributed locks, > regardless of whether the file system supports atomic operations or not. We > have initially implemented our internal implementation in the object store > catalog with good results. > > In addition to serving these customers and solving their problems, if a > message queuing system like kafka wants to interface its tiered storage to > iceberg, I think a file system based catalog would be their favourite > thing. Because they already use files to manage metadata. I think the idea > that the filesystem catalog must need a distributed lock is completely > wrong. > > But in any case, if the community wishes to stop supporting > FileSystemCatalog, I will respect the community's choice. > > I'm glad to hear from you. > > Regards > lisoda > > > > > > 在 2024-07-16 23:18:42,"Steven Wu" <stevenz...@gmail.com> 写道: > > Lisoda, HadoopCatalog has many issues for production usage like Dan said. > It has never been recommended in production. It was widely used in unit > test code, which is also slowly moving toward InMemoryCatalog. As the > community is aligned behind the REST catalog, it is preferable to limit the > work related hadoop catalog. > > On Sun, Jul 14, 2024 at 11:44 PM Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > >> Again, it's my "vision": if the community wants to maintain and move >> forward on HadoopCatalog, that's fine (not sure it would be a good >> idea regarding the "limitations" of filesystem based catalog :)). >> >> Let's see what the others are thinking. >> >> Regards >> JB >> >> On Mon, Jul 15, 2024 at 8:29 AM lisoda <lis...@yeah.net> wrote: >> > >> > Okay. I see...... >> > I‘m so sad. :( >> > But anyway, thanks for answering all my questions. >> > >> > >> > >> > >> > >> > >> > 在 2024-07-15 14:25:16,"Jean-Baptiste Onofré" <j...@nanthrax.net> 写道: >> > >Hi >> > > >> > >HadoopCatalog is not a "recommended" catalog for production (at least >> > >up to now). So, we should consider either to move it in a separate >> > >repo (if we have the guarantee that it's gonna be maintained, else it >> > >doesn't make sense) or remove it to avoid confusion. My take here is >> > >the same (for several months :)): we should privilege the REST Catalog >> > >API and users should use a REST Catalog server implementation. >> > > >> > >Regards >> > >JB >> > > >> > >On Mon, Jul 15, 2024 at 8:13 AM lisoda <lis...@yeah.net> wrote: >> > >> >> > >> Sir. Even if the entire hadoopCatalog can be used without >> lockManager, should we delete it? >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> 在 2024-07-15 14:08:40,"Jean-Baptiste Onofré" <j...@nanthrax.net> 写道: >> > >> >Hi >> > >> > >> > >> >My understanding is that lock manager is mostly used on the >> > >> >HadoopCatalog. The other catalogs relays on a third party lock >> > >> >mechanism: for instance, JDBC Catalog uses the RDBMS table/row >> > >> >locking, REST Catalog uses implementation lock. >> > >> >I would rather remove HadoopCatalog and the lock manager in favor of >> > >> >the REST catalog and implementation lock mechanism. >> > >> > >> > >> >Just my $0.01 :) >> > >> > >> > >> >Regards >> > >> >JB >> > >> > >> > >> >On Fri, Jul 12, 2024 at 7:41 AM lisoda <lis...@yeah.net> wrote: >> > >> >> >> > >> >> Currently, the only lockManager implementation in iceberg-core is >> InMemoryLockManager. This PR extends two LockManager implementations, one >> based on the Redis, and another based on the Rest-API. >> > >> >> In general, most users use redisLockManager is sufficient to cope >> with most of the scenarios, for redis can not meet the user's requirements, >> we can let the user to provide a RestApi service to achieve this function. >> I believe that, for a long time, these two lock-manager's will satisfy most >> of the customer's needs. >> > >> >> >> > >> >> If someone could review this PR, that would be great. >> > >> >> >> > >> >> PR: https://github.com/apache/iceberg/pull/10688 >> > >> >> SLACK: >> https://apache-iceberg.slack.com/archives/C03LG1D563F/p1720761992982729 >> > -- Ryan Blue Databricks