I am not expressing any opinion on the product whatsoever.
What I will note is that I have spent 8 weeks full time this year dealing
with AWS Java SDK problems in the more foundational parts of the SDK.
https://github.com/steveloughran/engineering-proposals/blob/trunk/refactoring-s3a.md#aws-sdk-v
> - Whether we should build S3 Tables catalog support similar to what we do for
> AWS Glue.
Yes, happy to have someone start that discussion separately, if it makes sense
to do so. Amazon has already provided such an catalog implementation in
a separate Apache 2.0-licensed project called Amazon S3
I second Ryan’s opinion that production-grade catalog is a much broader
concept than just CAS-ing the pointer.
What we observe in practice in our company, is that users want to work with
large schemas (sometimes - with literally thousands schemes and millions
tables), have support for common DDL o
Hi, Nikhil
Thank you very much for bringing S3 tables discussion here.
However, I would like to point out that the S3 Table is not the same concept we
are discussing here. It is not an object storage-based catalog; instead, it is
a stateful service that provides dedicated APIs. It’s better to
> And I'm also looking forward to what Jack is alluding to.
AWS just announced *native* S3 support for Iceberg buckets! [0] This is
almost surely what Jack was alluding to.
This is very cool. It's a much deeper integration than I was expecting but
nonetheless one that fully satisfies my use case
> We deprecated this recently and we don't have to deprecate it if object
stores support atomic operations like this.
I disagree because this misses many of the reasons for deprecation. It
isn't just that S3 didn't support a `putIfAbsent` operation. Other object
stores did and there are still seve
There's a PR up from amazon to add this to the s3a connector
https://github.com/apache/hadoop/pull/7011
targeting a 3.4.2 release early next year, though they've not updated the
PR as requested yet.
1. It doesn't give you the same semantics as posix create-no-overwrite
call -you only get t
Ignore the last email, just re-read the proposal earlier in the email chain
On Wed, Nov 27, 2024 at 11:37 AM Alex Merced wrote:
> This is just a quick thought to put out there: If there will be a new
> reimagining of a file system catalog, would it be worth adding a
> multi-table layer on top?
>
This is just a quick thought to put out there: If there will be a new
reimagining of a file system catalog, would it be worth adding a
multi-table layer on top?
*As a rough example:*
- At the TOP is a JSON file that is just a mapping of the table name to the
directory where VERSION-HINT would be
I think one major issue with current HadoopCatalog is that there's no way
to manage tables by name. If adding one metadata layer on top of it, we
need to handle more consistency challenges.
Manu
On Wed, Nov 27, 2024 at 8:03 PM Gabor Kaszab wrote:
> Hi All,
>
> Xuanwo, I recall the reasoning aga
Hi All,
Xuanwo, I recall the reasoning against HadoopCatalog was the other way
around: even though it is safe to use on HDFS, it is unsafe on object
storage. I believe that this gap of functionalities of object stores seems
to go away, so for me HadoopCatalog would even make more sense now than
be
Hi
I believe we still need to deprecate HadoopCatalog since the operation is still
not safe on Hadoop. As raised by Jack Ye before, I suggest we consider having a
StorageCatalog or ObjectStorageCatalog that can only be used with storage
services supporting conditional writes. That would be a go
Makes sense! I'd be eager to chat more about this but I'm afraid I won't be at
re:Invent. Maybe we plan to circle back after re:Invent, once we see what AWS
announces?
On Tue, Nov 26, 2024 at 2:58 PM Jean-Baptiste Onofré wrote:
>
> Hi Nikhil
>
> Thanks for your message, very interesting.
>
> I th
Indeed, I got pointed at that feature on Bluesky earlier today [0]. I dredged up
the mailing list discussion that occurred around its deprecation, and this exact
point actually came up. There was some concern from Ryan that the complexity of
keeping the file system tables around just wasn't worth i
Interesting.
We already have file system tables [1] in Iceberg (HadoopCatalog implements
this spec).
We deprecated this recently and we don't have to deprecate it if object
stores support atomic operations like this.
[1] https://iceberg.apache.org/spec/#file-system-tables
- Ajantha
On Wed, Nov
Ah, fascinating. Thanks very much for the pointer.
Here's the thread introducing the proposal [0], for anyone else curious.
[0]: https://lists.apache.org/thread/kh4n98w4z22sc8h2vot4q8n44vdtnltg
On Tue, Nov 26, 2024 at 3:27 PM Jean-Baptiste Onofré wrote:
>
> Hi Vignesh
>
> Thanks for the reminde
Hi Vignesh
Thanks for the reminder, I remember we quickly discussed this during a
community meeting.
I will take a new look at the doc.
Regards
JB
On Tue, Nov 26, 2024 at 9:19 PM Vignesh wrote:
>
> Hi,
> There was a proposal along the same lines, for the read portion few weeks
> back by Ashvi
Hi,
There was a proposal along the same lines, for the read portion few weeks
back by Ashvin.
https://docs.google.com/document/d/1yzLXSOtzBXyaWHfeVsWsMu4xmOH8rV6QyM5ZAnJZjMQ/edit?usp=drivesdk
Thanks,
Vignesh.
On Tue, Nov 26, 2024, 11:59 AM Jean-Baptiste Onofré wrote:
> Hi Nikhil
>
> Thanks for
Hi Nikhil
Thanks for your message, very interesting.
I think it would be great to involve the Polaris project here as well,
as a REST Catalog implementation.
The Polaris community is discussing storage/backend right now, so it
would be the perfect timing to consider leveraging S3 conditional
writ
Talk about tenterhooks! But okay, I take your hint. :)
On Tue, Nov 26, 2024 at 2:17 PM Jack Ye wrote:
>
> Hi Nikhil,
>
> I am also personally very excited about S3 adding this support!
>
> I would suggest we discuss this after the AWS re:invent 2024 event that is
> coming right next week, as the
Hi Nikhil,
I am also personally very excited about S3 adding this support!
I would suggest we discuss this after the AWS re:invent 2024 event that is
coming right next week, as there are going to be more S3 feature
announcements during that week, and the community can have a more
comprehensive di
21 matches
Mail list logo