What about extending the issue templates?
Because of a growing problem with worthless LLM-generated issues, github
MAY terminate any account doing this to our project
[ ] I am a human being and am not creating AI generated issues.
[ ] I accept that if I am posting AI-generated issues, my github ac
Knowing the footer offset would be really useful if passed down to whatever
is implementing the input stream, along with the actual file size.
This can be used for prefetching the footer, as well as caching it (Azure
ABFS, google GCS connectors): right now they guess that about 1MB is all
they nee
These people using S3 versioned buckets?
If so, until actually purged, they are just hiding under tombstone markers
Our little cloud-storage support-call library, cloudstore, has something to
list and recover these
https://github.com/steveloughran/cloudstore
https://github.com/steveloughran/clou
Are these issues being manually created?
maybe add a new checkbox
[ ] I am not participating in any AI training/experiment and if it turns
out that I am -I agree to compensate developers for the time wasted.
Or have something to specifically handle new posters., or at least
automatically flag th
actually, there is a way for the catalog to return S3 objects without
granting access to the entire bucket: aws presigning:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-presigned-url.html
This offers time-bounded access to an object
catalog will need to generate and return the pres
if the data is stored in S3 then if someone has unrestricted access to a
single store containing all the data (default without S3 access grants,
cloudera ranger extensions or some other access control mechanism to grant
access to clients without sharing credentials) - then it's effectively
impossib
I am not expressing any opinion on the product whatsoever.
What I will note is that I have spent 8 weeks full time this year dealing
with AWS Java SDK problems in the more foundational parts of the SDK.
https://github.com/steveloughran/engineering-proposals/blob/trunk/refactoring-s3a.md#aws-sdk-v
There's a PR up from amazon to add this to the s3a connector
https://github.com/apache/hadoop/pull/7011
targeting a 3.4.2 release early next year, though they've not updated the
PR as requested yet.
1. It doesn't give you the same semantics as posix create-no-overwrite
call -you only get t
> I think Parquet is a better place for the variant spec than Arrow.
Parquet is upstream of nearly every project (other than ORC)
log4j is that -but it doesn't mean that it is the right place.
What is key is: what does it mean for parquet to have a variant type in
there? Does it actually make se
congratulations all.
On Tue, 13 Aug 2024 at 21:25, Russell Spitzer
wrote:
> Hi Y'all,
>
> It is my pleasure to let everyone know that the Iceberg PMC has voted to
> have several talented individuals join us.
>
> So without further ado, please welcome Péter Váry, Amogh Jahagirdar and
> Eduard Tud
On Tue, 13 Aug 2024 at 03:50, Xuanwo wrote:
> Hi, André
>
> Thanks a lot for starting this thread.
>
> List operations on storage services are expensive and slow. That's why
> Iceberg is designed to store metadata in files and avoid using list
> operations in FileIO. However, `orphan file removal
On Thu, 18 Jul 2024 at 00:02, Ryan Blue wrote:
> Hey everyone,
>
> There has been some recent discussion about improving
> HadoopTableOperations and the catalog based on those tables, but we've
> discouraged using file system only table (or "hadoop" tables) for years now
> because of major proble
A move to java 11 means it is time to move to Hadoop 3.3.x as the minimum
release, anything 17+ means java 3.4.x. Which before long will go making
java 11 its minimum version.
That is:
- cut the hadoop2 version/profile which is really java7.
- be prepared to move to 3.4.x if some java11/17 incompa
13 matches
Mail list logo