from:"Steve Loughran"

Re: [Discuss] Analytics Accelerator Library for Amazon S3 as default S3 Input Stream

2025-07-31 Thread Steve Loughran

On Fri, 25 Jul 2025 at 17:28, Kevin Liu wrote: *> I think it would be great to also make these improvements available to older Iceberg clients.* Use the S3A connector and turn on vector reads through parquert and you currently get the same performance, about at 30% speedup in TPC benchmarks (I k

Re: [DISCUSS] v4 - One file commits

2025-06-02 Thread Steve Loughran

so this'll cut down on #of manifest files read, won't it? so improving query planning Does anyone have an estimate of what benefit this is likely to have in production deployments? On Thu, 29 May 2025 at 21:25, Ryan Blue wrote: > Hi everyone, > > Like Russell’s recent note, I’m starting a threa

Re: [Discuss] Apache Iceberg 1.9.0 release

2025-03-17 Thread Steve Loughran

Can I get this reviewed and merged; gives all hadoop filesystems with bulk delete calls the ability to issue bulk deletes up to their page sizes; off by default. Tested all the way through iceberg to AWS S3 london. https://github.com/apache/iceberg/pull/10233 On Mon, 17 Mar 2025 at 12:32, Yuya

Re: Very strange (AI generated) issues

2025-01-31 Thread Steve Loughran

What about extending the issue templates? Because of a growing problem with worthless LLM-generated issues, github MAY terminate any account doing this to our project [ ] I am a human being and am not creating AI generated issues. [ ] I accept that if I am posting AI-generated issues, my github ac

Re: Proposal: Parquet footer size in Iceberg metadata

2025-01-30 Thread Steve Loughran

Knowing the footer offset would be really useful if passed down to whatever is implementing the input stream, along with the actual file size. This can be used for prefetching the footer, as well as caching it (Azure ABFS, google GCS connectors): right now they guess that about 1MB is all they nee

Re: missing files in an Iceberg table

2025-01-30 Thread Steve Loughran

These people using S3 versioned buckets? If so, until actually purged, they are just hiding under tombstone markers Our little cloud-storage support-call library, cloudstore, has something to list and recover these https://github.com/steveloughran/cloudstore https://github.com/steveloughran/clou

Re: Very strange (AI generated) issues

2025-01-29 Thread Steve Loughran

Are these issues being manually created? maybe add a new checkbox [ ] I am not participating in any AI training/experiment and if it turns out that I am -I agree to compensate developers for the time wasted. Or have something to specifically handle new posters., or at least automatically flag th

Re: There is no easy way to secure Iceberg data. How can we improve?

2025-01-03 Thread Steve Loughran

actually, there is a way for the catalog to return S3 objects without granting access to the entire bucket: aws presigning: https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-presigned-url.html This offers time-bounded access to an object catalog will need to generate and return the pres

Re: There is no easy way to secure Iceberg data. How can we improve?

2025-01-02 Thread Steve Loughran

if the data is stored in S3 then if someone has unrestricted access to a single store containing all the data (default without S3 access grants, cloudera ranger extensions or some other access control mechanism to grant access to clients without sharing credentials) - then it's effectively impossib

Re: Storing catalog directly on object store

2024-12-06 Thread Steve Loughran

I am not expressing any opinion on the product whatsoever. What I will note is that I have spent 8 weeks full time this year dealing with AWS Java SDK problems in the more foundational parts of the SDK. https://github.com/steveloughran/engineering-proposals/blob/trunk/refactoring-s3a.md#aws-sdk-v

Re: Storing catalog directly on object store

2024-11-27 Thread Steve Loughran

There's a PR up from amazon to add this to the s3a connector https://github.com/apache/hadoop/pull/7011 targeting a 3.4.2 release early next year, though they've not updated the PR as requested yet. 1. It doesn't give you the same semantics as posix create-no-overwrite call -you only get t

Re: [DISCUSS] Variant Spec Location

2024-08-28 Thread Steve Loughran

> I think Parquet is a better place for the variant spec than Arrow. Parquet is upstream of nearly every project (other than ORC) log4j is that -but it doesn't mean that it is the right place. What is key is: what does it mean for parquet to have a variant type in there? Does it actually make se

Re: Welcome Péter, Amogh and Eduard to the Apache Iceberg PMC

2024-08-14 Thread Steve Loughran

congratulations all. On Tue, 13 Aug 2024 at 21:25, Russell Spitzer wrote: > Hi Y'all, > > It is my pleasure to let everyone know that the Iceberg PMC has voted to > have several talented individuals join us. > > So without further ado, please welcome Péter Váry, Amogh Jahagirdar and > Eduard Tud

Re: [DISCUSS] Filesystem in PyIceberg

2024-08-13 Thread Steve Loughran

On Tue, 13 Aug 2024 at 03:50, Xuanwo wrote: > Hi, André > > Thanks a lot for starting this thread. > > List operations on storage services are expensive and slow. That's why > Iceberg is designed to store metadata in files and avoid using list > operations in FileIO. However, `orphan file removal

Re: [DISCUSS] Deprecate HadoopTableOperations, move to tests in 2.0

2024-07-30 Thread Steve Loughran

On Thu, 18 Jul 2024 at 00:02, Ryan Blue wrote: > Hey everyone, > > There has been some recent discussion about improving > HadoopTableOperations and the catalog based on those tables, but we've > discouraged using file system only table (or "hadoop" tables) for years now > because of major proble

Re: Building with JDK 21

2024-07-11 Thread Steve Loughran

A move to java 11 means it is time to move to Hadoop 3.3.x as the minimum release, anything 17+ means java 3.4.x. Which before long will go making java 11 its minimum version. That is: - cut the hadoop2 version/profile which is really java7. - be prepared to move to 3.4.x if some java11/17 incompa

Re: [Discuss] Analytics Accelerator Library for Amazon S3 as default S3 Input Stream

Re: [DISCUSS] v4 - One file commits

Re: [Discuss] Apache Iceberg 1.9.0 release

Re: Very strange (AI generated) issues

Re: Proposal: Parquet footer size in Iceberg metadata

Re: missing files in an Iceberg table

Re: Very strange (AI generated) issues

Re: There is no easy way to secure Iceberg data. How can we improve?

Re: There is no easy way to secure Iceberg data. How can we improve?

Re: Storing catalog directly on object store

Re: Storing catalog directly on object store

Re: [DISCUSS] Variant Spec Location

Re: Welcome Péter, Amogh and Eduard to the Apache Iceberg PMC

Re: [DISCUSS] Filesystem in PyIceberg

Re: [DISCUSS] Deprecate HadoopTableOperations, move to tests in 2.0

Re: Building with JDK 21

16 matches

Site Navigation

Mail list logo

Footer information