Hello everyone,
I'm pleased to announce the release of Apache Iceberg Go v0.3.0!
Apache Iceberg is an open table format for huge analytic datasets,
Iceberg delivers high query performance for tables with tens of
petabytes of data, along with atomic commits, concurrent writes, and
SQL-compatible t
I agree with Russell here. The goal is to clarify how to run a meetup that
meets our requirements, rather than approving them individually. I like
Max's addition to make anyone starting one aware of the brand guidelines.
I also like Danica's suggestions so that we state that we expect meetups to
g
> whether Spark 3.5 can perform some basic queries or provide file merging
capabilities in the current or next version of V3?
ConradJam, that should already work for a while now. I don't think we have
the exact Spark smoke test coverage as you described. But I can see that
Spark "testRemoveDanglin
Sorry, I didn't come back to this after I initially read it. I think it's
fine to make this change because we can definitely have identity transform
partition fields that don't match after a rename. If I remember correctly,
the reason for not making this public was just to ensure partition field
na
+1
It is good to have consistency within the RowDelta API. And I think it is a
good idea in general to use "remove" to refer to removing a file from
metadata, rather than "delete" because you can add or remove delete files.
On Thu, May 29, 2025 at 11:46 AM Russell Spitzer
wrote:
> Ryan pointed
My vote is +1 (non-binding)
Thank you everyone!
I'll close the vote now as it passes with 4 non-binding +1 votes, 3
binding +1 votes, and no 0 or -1 votes.
Binding votes:
Fokko
Eduard
Amogh
Non-binding votes:
Kevin
Leon
JB
Matt
I'll start the release steps now. Thanks again!
--Matt
On Thu, M
I received feedback from Alkis regarding their Parquet optimization work.
Their internal testing shows promising results for reducing metadata size
and improving parsing performance. They plan to formalize a proposal for
these Parquet enhancements in the near future.
Meanwhile, I'm putting togethe
Hi Peter, thanks for sharing the context around the Flink streaming use
case and side note for concurrent write. Apologies for the delay as I just
got back from a vacation. Yeah, I agree, having the index at the partition
level is a better approach if we plan to use caching. As a distributed
cache
Hi everyone,
Like Russell’s recent note, I’m starting a thread to connect those of us
that are interested in the idea of changing Iceberg’s metadata in v4 so
that in most cases committing a change only requires writing one additional
metadata file.
*Idea: One-file commits*
The current Iceberg me
Thanks for kicking this thread off Ryan, I'm interested in helping out
here! I've been working on a proposal in this area and it would be great to
collaborate with different folks and exchange ideas here, since I think a
lot of people are interested in solving this problem.
Thanks,
Amogh Jahagirda
Ryan pointed out to me that whenI added the "deleteFile" method I was not
following
the convention already being used within the RowDelta operation and instead
had copied
the OverwriteFiles API. To fix this I think it would be great to change the
API to "removeRows"
to match the other APIs in the c
We definitely should not abdicate our responsibilities to the trade mark, I
just want to shift away from a pre-clearance model which we have done so
far. I know I
always try to help folks out if I see something which I think may be
inappropriate
On Thu, May 29, 2025 at 7:50 AM Rich Bowen wrote:
I'm not seeing any strong feelings on this so I'm going to go ahead and
merge. If anyone else sees issues we can always address this in a follow up.
On Wed, May 21, 2025 at 6:07 PM Steven Wu wrote:
> It seems that the PR has made two valid arguments to support to change of
> public scope
> * ide
On 2025/05/23 19:24:28 Russell Spitzer wrote:
> Hey Y'all
>
> Basically I would like to get the PMC out of the meetup approval business
...
> Please let me know what you think,
(Board hat)
A critical role of a PMC is being stewards of the project's brands/marks. So
while it's not necessary
On 2025/05/23 19:24:28 Russell Spitzer wrote:
> Hey Y'all
>
> Basically I would like to get the PMC out of the meetup approval business
...
> Please let me know what you think,
(Board hat)
A critical role of a PMC is being stewards of the project's brands/marks. So
while it's not necessary
+1 (binding)
Checked signatures, checksums, and licenses, and did some checks against
the REST catalog.
Thanks Kevin for running this release!
Kind regards,
Fokko
Op wo 28 mei 2025 om 12:39 schreef Renjie Liu :
> Hi:
>
> We still need two PMC votes for this release!
>
> Please help to test and
Hi Y'all
As discussed in the last community sync, we are beginning to gather up
folks who are interested in various efforts for Iceberg V4. To that end,
I'd like to use this thread as a gathering point for folks interested in
the metadata file format shift to Parquet. I wrote a quick abstract to
d
I also noticed that some of the tables shared by v2 and v3 didn't
mention v3. I've updated the headers to include v3 for clarity.
Please let me know if this change requires a separate vote thread:
https://github.com/apache/iceberg/pull/13181
- Ajantha
On Wed, May 28, 2025 at 10:27 PM Ajantha Bhat
I would like to know whether Spark 3.5 can perform some basic queries or
provide file merging capabilities in the current or next version of V3?
Steven Wu 于2025年5月29日周四 06:19写道:
> JB, please set the milestone to the multi-arg transform PRs/issues.
>
> Peter/Max, ack on the changes targeted for 1
I’m also super excited about this idea
On Thu, May 29, 2025 at 3:37 PM Amogh Jahagirdar <2am...@gmail.com> wrote:
> Thanks for kicking this thread off Ryan, I'm interested in helping out
> here! I've been working on a proposal in this area and it would be great to
> collaborate with different fol
Fewer commit conflicts meaning the tables representing column families are
updated independently, rather than having to serialize commits to a single
table. Perhaps with a wide table solution the commit logic could be enhanced to
support things like concurrent overwrites to independent column fa
This will be great for users. metadata can self adapt. Start with a
compacted one file. As the table grows in size, the metadata can adapt to a
tree or linked structure.
On Thu, May 29, 2025 at 3:44 PM Russell Spitzer
wrote:
> I’m also super excited about this idea
>
> On Thu, May 29, 2025 at 3:
>
> BTW, does it make sense to take metadata json file into consideration as
> well? Currently it is just a large json string containing all snapshots.
> Since it is also on the critical path of a commit, I'm not sure if we can
> explore incremental semantics on it together with manifest list files
Count me in!
Do we plan to store this files in columnar format as well?
On Fri, May 30, 2025, 04:00 Prashant Singh wrote:
> I am also super excited about the idea ! I would love to contribute.
>
> On Thu, May 29, 2025 at 6:54 PM Yufei Gu wrote:
>
>> BTW, does it make sense to take metadata json
IMO, the main drawback for the view solution is the complexity of
maintaining consistency across tables if we want to use features like time
travel, incremental scan, branch & tag, encryption, etc.
On Fri, May 30, 2025 at 12:55 PM Bryan Keller wrote:
> Fewer commit conflicts meaning the tables r
Look forward to when Iceberg can move on a bit from its name, to handle
slightly faster data. Interested as well to follow along, if I can !
Do we plan to store this files in columnar format as well?
>
Is that the other thread?
https://lists.apache.org/thread/phdo75zmt8j9r44ngd7vdhtxqq63yxsp
Tha
Thank you so much for driving this release !
It will be really helpful in getting this critical table corruption bug fix
out to iceberg users : https://github.com/apache/iceberg/pull/12818 (Merged)
Best,
Prashant Singh
On Thu, May 29, 2025 at 1:02 PM Steven Wu wrote:
> > whether Spark 3.5 can p
Bryan, interesting approach to split horizontally across multiple tables.
A few potential down sides
* operational overhead. tables need to be managed consistently and probably
in some coordinated way
* complex read
* maybe fragile to enforce correctness (during join). It is robust to
enforce the
Hi everyone,
We have been investigating a wide table format internally for a similar use
case, i.e. we have wide ML tables with features generated by different
pipelines and teams but want a unified view of the data. We are comparing that
against separate tables joined together using a shuffle-
I am also super excited about the idea ! I would love to contribute.
On Thu, May 29, 2025 at 6:54 PM Yufei Gu wrote:
> BTW, does it make sense to take metadata json file into consideration as
>> well? Currently it is just a large json string containing all snapshots.
>> Since it is also on the c
This is a long-awaited discussion!
BTW, does it make sense to take metadata json file into consideration as
well? Currently it is just a large json string containing all snapshots.
Since it is also on the critical path of a commit, I'm not sure if we can
explore incremental semantics on it togethe
I am interested in working on this proposal.
I would assume it is to use `InternalData` with the format as
`parquet`. But the challenge will be the test cases, the core module cannot
write the parquet metadata due to circular dependency. We need to abstract
out the test cases in the core module and
I am interested in these problems too. Looking forward to collaborating on
this feature development.
- Ajantha
On Fri, May 30, 2025 at 7:07 AM Gang Wu wrote:
> This is a long-awaited discussion!
>
> BTW, does it make sense to take metadata json file into consideration as
> well? Currently it is
33 matches
Mail list logo