Regarding HadoopTableOptions, if the filesystem supports rename operations that
do not overwrite the target file, the entire HadoopTableOptions does not need
to use lockManager. One of the reasons for keeping LockManager is simply
because it was used in the code for the original implementation.
Currently, there are several implementations of Catalog that need to use
LockManager in their code: HadoopCatalog, GlueCatalog.
For GlueCatalog, because I do not use it a lot, I do not do any evaluation of
it.
For HadoopCatalog, it works fine without LockManager and guarantees atomicity
of commi
I also agree with moving the hadoop related parts to a separate
module.Incidentally, if the filesystem supports concurrency control and atomic
operations, wouldn't it be nice to implement an abstract filesystem-based
catalog?Let's say we can quickly build a production-ready filesystem-based
cat
At present, the file system based catalogues have the following problems (this
is what I can think of at the moment, perhaps not comprehensive).
1. does not support renaming operations
2. commit does not support atomicity
3. atomic delete (sorry I don't understand why we need it?What scenarios nee
Iisoda,
Unfortunately, I don't agree with your assessment. The problems with file
system based catalog implementations are inherent and steps taken to
address them are not adequate to have confidence in the implementation.
Commit atomicity is not solved as it relies on locking, which has a numbe
Hey folks,
The next Seattle area Apache Iceberg meetup will be on July 18th, 2024 from
5:00 PM to 8:00 PM. More information is available at
https://sites.google.com/view/icebergmeetup
Be sure to RSVP before the event!
Come for a night of networking and lively discussions. We also have
presentatio
Sir, the support for distributed locks is only added because some customers
wish to use the file system catalog in object storage. If I do not support the
demands of these users, then the entire file system catalog does not need to
rely on distributed locks at all.
What I would like to understan
Additionally, I can also cease adding any specific implementation of a lock
manager. As long as the file system catalog is used only for block storage file
systems, similar to HDFS, local storage file systems, etc., it can function
properly without the need for any distributed locks.
Rep
Hi,
I noticed the community service is on at a similar time each time, which
unfortunately is 2am my local time. Have you considered staggering the time of
this meetup so that people from other time zones can attend?
Kind Regards,
Justin
Sir. Following this PR, we can modify hadoopTableOptions to support atomic
commits based on filesystem catalogues without distributed locks..core:Refactor
the code of HadoopTableOptions by BsoBird · Pull Request #10623 ·
apache/iceberg (github.com)
在 2024-07-15 00:08:26,"Daniel Weeks"
Hi Casel
I think Daniel is correct, and There is indeed no official document about
aliyun oss integration, I think we can write a doc for this.
There are some other related documents (but not exactly in your case I
think) about aliyun iceberg integration, if you are interested.
[1]
https://www.
Hi
I agree with Ryan and it was my comment in a previous message:
"About FileIO, we can always extend it, but as it's used in different
Iceberg layers (like ResolvedFileIO for instance), we have to be
careful adding new operations here, especially if it's specific for
HadoopCatalog table/view ope
Hi
My understanding is that lock manager is mostly used on the
HadoopCatalog. The other catalogs relays on a third party lock
mechanism: for instance, JDBC Catalog uses the RDBMS table/row
locking, REST Catalog uses implementation lock.
I would rather remove HadoopCatalog and the lock manager in f
Sir. Even if the entire hadoopCatalog can be used without lockManager, should
we delete it?
在 2024-07-15 14:08:40,"Jean-Baptiste Onofré" 写道:
>Hi
>
>My understanding is that lock manager is mostly used on the
>HadoopCatalog. The other catalogs relays on a third party lock
>mechanism: fo
Hi
HadoopCatalog is not a "recommended" catalog for production (at least
up to now). So, we should consider either to move it in a separate
repo (if we have the guarantee that it's gonna be maintained, else it
doesn't make sense) or remove it to avoid confusion. My take here is
the same (for sever
Okay. I see..
I‘m so sad. :(
But anyway, thanks for answering all my questions.
在 2024-07-15 14:25:16,"Jean-Baptiste Onofré" 写道:
>Hi
>
>HadoopCatalog is not a "recommended" catalog for production (at least
>up to now). So, we should consider either to move it in a separate
>repo (i
Hi Micah
We agreed that any change to spec is considered as code modification
change and submitted to vote. I see your change is mostly
"documentation" and not actually big change, so I got your point.
However, I think it's clearer to keep the same process even for
"small" changes.
I would recomme
Again, it's my "vision": if the community wants to maintain and move
forward on HadoopCatalog, that's fine (not sure it would be a good
idea regarding the "limitations" of filesystem based catalog :)).
Let's see what the others are thinking.
Regards
JB
On Mon, Jul 15, 2024 at 8:29 AM lisoda wro
18 matches
Mail list logo