s) but for other additions of things like puffin it seemed like
> there was an official vote.
>
>
>
> Should the spec be considered finalized as a V1 version now? Was there a
> vote held? Will there be one? Are there any blockers to adoption?
>
>
>
> Thanks,
>
> Micah
>
--
Ryan Blue
Tabular
tion in. Some of the discussions/feedback on
>>> the API PR slightly changed the API from the initial proposed API that
>>> probably more closely resembled Netflix's implementation. Getting an
>>> implementation going on the finalized APIs could give some good feedback
internal
>>> platform
>>>
>>> · ran a few manual steps in Spark 3.3
>>>
>>>
>>>
>>> Just FYI that the release notes will usually be available once voting on
>>> the RC passed and artifacts are publicly available.
>>>
>>>
>>>
>>> Thanks
>>>
>>> Eduard
>>>
>>>
>>>
>>> On Tue, Mar 14, 2023 at 5:19 AM Jack Ye wrote:
>>>
>>> Hi Everyone,
>>>
>>> I propose that we release the following RC as the official Apache
>>> Iceberg 1.2.0 release.
>>>
>>> The commit ID is e340ad5be04e902398c576f431810c3dfa4fe717
>>> * This corresponds to the tag: apache-iceberg-1.2.0-rc1
>>> * https://github.com/apache/iceberg/commits/apache-iceberg-1.2.0-rc1
>>> *
>>> https://github.com/apache/iceberg/tree/e340ad5be04e902398c576f431810c3dfa4fe717
>>>
>>> The release tarball, signature, and checksums are here:
>>> *
>>> https://dist.apache.org/repos/dist/dev/iceberg/apache-iceberg-1.2.0-rc1
>>>
>>> You can find the KEYS file here:
>>> * https://dist.apache.org/repos/dist/dev/iceberg/KEYS
>>>
>>> Convenience binary artifacts are staged on Nexus. The Maven repository
>>> URL is:
>>> *
>>> https://repository.apache.org/content/repositories/orgapacheiceberg-1121/
>>>
>>> Please download, verify, and test.
>>>
>>> Please vote in the next 72 hours.
>>>
>>> [ ] +1 Release this as Apache Iceberg 1.2.0
>>> [ ] +0
>>> [ ] -1 Do not release this because...
>>>
>>
--
Ryan Blue
Tabular
/ Solicitation / Recruiting
>
> The Apache Iceberg community is a space for everyone to operate free of
> influence. The development lists, slack workspace, and github should not
> be used to market products or services. Solicitation or overt promotion
> should not be performed in common channels or through direct messages.
>
> For questions regarding any of the guidelines above, please contact a PMC
> member
>
>
--
Ryan Blue
Tabular
gt; This release can be downloaded from:
>> https://www.apache.org/dyn/closer.cgi/iceberg/apache-iceberg-1.2.0/apache-iceberg-1.2.0.tar.gz
>>
>> Java artifacts are available from Maven Central.
>>
>> Thanks to everyone for contributing!
>>
>> Best,
>> Jack Ye
>>
>
--
Ryan Blue
Tabular
a/browse/INFRA-24400 to change our behavior
>> back to the old standard.
>>
>> I'd like to make sure folks are generally in favor of changing the
>> default back, please respond to this thread if you are in support of
>> going back to "Only requires approval first time" or if you don't believe
>> this is a good idea please respond as well.
>>
>>
>> Thanks for your time,
>> Russ
>
>
--
Ryan Blue
Tabular
>
>
>
> I've created a milestone <https://github.com/apache/iceberg/milestone/31> to
> track issues related to the patch release.
>
>
>
> If there are any issues that have been identified that should be included,
> please reply to this email so that we can discuss their inclusion.
>
>
>
> If anyone is interested in volunteering to be release manager, let me know.
>
>
> Thanks,
>
> Dan
>
--
Ryan Blue
Tabular
obably leave it
out, but if people are interested I wouldn't oppose having it go into 1.2.1.
On Mon, Apr 3, 2023 at 10:32 AM Ryan Blue wrote:
> +1 for including #7273. Thanks for pointing that out, Amogh!
>
> I also agree with Dan about the changes in #7153. That isn't fixing a
&g
ums are here:
>>>> *
>>>> https://dist.apache.org/repos/dist/dev/iceberg/apache-iceberg-1.2.1-rc2
>>>>
>>>> You can find the KEYS file here:
>>>> * https://dist.apache.org/repos/dist/dev/iceberg/KEYS
>>>>
>>>> Convenience binary artifacts are staged on Nexus. The Maven repository
>>>> URL is:
>>>> *
>>>> https://repository.apache.org/content/repositories/orgapacheiceberg-1131/
>>>>
>>>> Please download, verify, and test.
>>>>
>>>> Please vote in the next 72 hours.
>>>>
>>>> [ ] +1 Release this as Apache Iceberg 1.2.1
>>>> [ ] +0
>>>> [ ] -1 Do not release this because...
>>>>
>>>
--
Ryan Blue
Tabular
t;>
>>> Here are three possible dates for the Sync:
>>>
>>> 1. 18.04.23 16:00 UTC
>>>
>>> 2. 19.04.23 16:00 UTC
>>>
>>> 3. 20.04.23 16:00 UTC
>>>
>>> For those who want to join the meeting, it would be great if you could
>>> answer this email with the dates that you are available. I will then
>>> create an online meeting for the date where most people can join.
>>>
>>> I'm looking forward to talking to you.
>>>
>>> Best wishes,
>>>
>>> Jan Kaul
>>>
>>>
--
Ryan Blue
Tabular
remains healthy, with a reasonable increase in both opened and
closed pull requests, as well as a steady number of unique contributors. The
Python implementation has been bringing a lot of new contributors.
Iceberg was featured in 14 talks at Subsurface, as well as in a panel.
--
Ryan Blue
Hi everyone!
I want to congratulate 3 new PMC members, Fokko Driesprong, Steven Wu, and
Yufei Gu. Thanks for all your contributions!
I was going to wait a little longer to announce, but since they're in our
board report it's already out.
Ryan
--
Ryan Blue
Tabular
also looks like the 2.4
>> branch on the Spark Github repository is stale, so I don't expect any
>> further releases.
>>
>> Before creating a PR I would like to check on the mail-list if anyone has
>> any objections. If so, please let us know.
>>
>> Thanks,
>> Fokko Driesprong
>>
>
--
Ryan Blue
Tabular
1 for dropping Spark 2.4 support and we can clean up doc as well such
>>> as https://iceberg.apache.org/docs/latest/spark-queries/#spark-24
>>>
>>> Thanks,
>>> Steve Zhang
>>>
>>>
>>>
>>> On Apr 13, 2023, at 12:53 PM, Jack Ye wrote:
>>>
>>> +1 for dropping 2.4 support
>>>
>>>
>>>
>>
>> --
>> Edgar R
>> Data Warehouse Infrastructure
>>
>
--
Ryan Blue
Tabular
> What does everybody think about Spark 3.1 support after we add Spark 3.4
> support? Our initial plan was to release jars for the last 3 versions. Are
> there any blockers for dropping 3.1?
>
> - Anton
--
Ryan Blue
Tabular
park 3.5 Hadoop 2 will be
> dropped <https://lists.apache.org/thread/vr6bx2bmkgo4mhdspjm9g29h2c3lmrrz>.
> I'll create a PR for removing Spark 2.4 shortly because I see a consensus
> for removing that.
>
> Kind regards,
> Fokko
>
> Op wo 19 apr 2023 om 19:02 schreef Anto
LinkedIn is still on Spark 3.1. I am guessing a number of other companies
>> could be in the same boat. I feel the argument for Spark 2.4 is different
>> from that of Spark 3.1 and it would be great if we can continue to support
>> 3.1 for some time.
>>
>> On Wed, Apr
of expectations our users should have. Do we promise that all bug fixes
>> discovered in newer Spark versions will be cherry-picked to all older Spark
>> versions? I am not sure that’s true at this point.
>>
>> - Anton
>>
>>
>> On Apr 21, 2023, at 10:29 AM, Ry
Would we also drop support for JDK 8?
On Fri, Apr 21, 2023 at 4:58 PM Anton Okolnychyi
wrote:
> Following up on the discussion in the Spark 2.4 thread, shall we move to
> JDK 11 for releases as Spark 2.4 support has been dropped?
>
> - Anton
--
Ryan Blue
Tabular
; Sorry, I wasn’t clear that I also imply dropping JDK 8 (unless there is a
> good reason to keep it?).
>
> - Anton
>
> On Apr 21, 2023, at 4:59 PM, Ryan Blue wrote:
>
> Would we also drop support for JDK 8?
>
> On Fri, Apr 21, 2023 at 4:58 PM Anton Okolnychyi <
> a
nly
> to provide folks a place to collaborate, without requiring authors to
> cherry-pick all applicable changes, like we agreed initially.
>
> - Anton
>
> On Apr 21, 2023, at 3:58 PM, Ryan Blue wrote:
>
> Good question about backports. Walaa and Edgar, are you backporting f
Maven, 1 for JDK8 and 1
> for JDK11?
>
> Jack
>
> On Fri, Apr 21, 2023 at 5:17 PM Ryan Blue wrote:
>
>> Looks like Hive isn't quite done migrating to Java 11:
>> https://issues.apache.org/jira/browse/HIVE-22415
>>
>> I'm not sure whether that
rying to change what agreed before. Just for my understanding. Let's
>> say the latest Spark version is 3.3. Today, we don't require any backport
>> to 3.2 and 3.1, correct?
>>
>> On Fri, Apr 21, 2023 at 5:19 PM Ryan Blue wrote:
>>
>>> I still agree w
grade to Iceberg 1.2+ if 3.1
> support is still available although deprecated.
>
>
--
Ryan Blue
Tabular
g table? In Hive, what we
> normally do is to run a query "create table x like y location z". But we
> can't do this for the Iceberg table.
>
> If this is a feature that is missing, should we collaborate to build a
> similar feature?
>
> Thanks
>
>
>
--
Ryan Blue
Tabular
gt;> that much point.
>>>
>>> On the Hive front, as you can see from that ticket it's been open for
>>> 4(!) years and hasn't received much action recently. I think it's one of
>>> the reasons AWS EMR still defaults to Java 8. It would be reall
, but I didn't see any queries that compose the two.
>
>
>
> Is my understanding correct that there isn't existing SQL for time travel
> on a specific branch (I assume in this case users need to query the
> underlying Iceberg metadata to determine a snapshot of interest)?
>
>
>
> Thanks,
>
> Micah
>
>
>
> [1] https://iceberg.apache.org/docs/latest/spark-queries/
>
--
Ryan Blue
Tabular
amage arising from the use of this e-mail
> or attachments we recommend that you subject these to your virus checking
> procedures prior to use. The views, opinions, conclusions and other
> information expressed in this electronic mail are not given or endorsed by
> the company unless otherwise indicated by an authorized representative
> independent of this message.
--
Ryan Blue
Tabular
we are interested in is SparkSQL. Since you mentioned it
> is an easy fix, would you please share how that should be implemented such
> that anyone (maybe myself) interested in this can explore the solution?
>
> Thanks both again.
>
> On Tue, Apr 25, 2023 at 4:07 PM Ryan Blue wrote:
&g
bably not worth the effort.
>
> Do you think it pays to add a note for implementers in the specification
> that the "snapshot-log" (assuming I got the correct field) is what is used
> in reference implementations for time-travel (apologies if this is already
> covered and I miss
on
> the FileScanTask.
>
> There might be many other ways to implement this and I'd love to hear what
> people think and would be great to find a way that would help us out on
> Impala.
>
> Cheers,
> Gabor
>
>
>
>
--
Ryan Blue
Tabular
gt; Iceberg tables.
>> You can see various examples at
>> https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/iceberg-create-table-like-table.test
>>
>> - Zoltan
>>
>> On Wed, Apr 26, 2023 at 4:10 AM Ryan Blue wrote:
&g
g 1.3.0 around mid May during
>>> the sync. The primary goal is to support Spark 3.4 and Flink 1.17. There
>>> are also a few other notable changes that went in or are very close. Do we
>>> see any blockers that we should track?
>>>
>>> - Anton
>>
>>
--
Ryan Blue
Tabular
d get approval for the design.
>>
>> I will wait another week or two for some more people to take a look at
>> the document
>>
>> before jumping into the implementation.
>>
>> Thanks,
>> Ajantha.
>>
>>
>>
>> On Sat, Nov 26, 2
gt;>>
>>>> SELECT
>>>>
>>>> e.data_file.partition,
>>>>
>>>> MAX(s.committed_at) AS last_modified_time
>>>>
>>>> FROM db.table.snapshots s
>>>>
>>>> JOIN db.table.entries e
>>>&
Hi everyone,
I want to congratulate Amogh and Eduard, who were just added as Ierberg
committers and Szehon, who was just added to the PMC. Thanks for all your
contributions!
Ryan
--
Ryan Blue
isn't serilizability achieved via
> pessimistic concurrency control? Would like to understand how iceberg
> implement serializable isolation level and how it is different than
> snapshot isolation ?
>
> Thanks
>
--
Ryan Blue
Tabular
that allows for people freedom to choose whatever they want, but reserving
>> a prefix for well understood engines in the Iceberg community (e.g. a
>> prefix of "iceberg-engine." could be reserved as denoting engines that the
>> community has officially agreed on naming
to table?
>> Also it seems iceberg allows all writers to write into snapshot and use OCC
>> to decide if one needs to retry because it was late. In this case how it is
>> serializable at all? isn't serilizability achieved via
>> pessimistic concurrency control? Would like to understand how iceberg
>> implement serializable isolation level and how it is different than
>> snapshot isolation ?
>>
>> Thanks
>>
>>
>>
>>
--
Ryan Blue
Tabular
ar improvement (see improvement 3).
>>
>> I would appreciate feedback from the community about this doc, and I can
>> organize some meetings to discuss our thoughts about this topic afterwards.
>>
>> Doc link:
>> https://docs.google.com/document/d/1kyyJp4masbd1FrIKUHF1ED_z1hTARL8bNoKCgb7fhSQ/edit#
>>
>> Best,
>> Jack Ye
>>
>>
>>
--
Ryan Blue
Tabular
spatchAdapter.java:94)
> at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
> at
> org.gradle.api.internal.tasks.testing.worker.TestWorker$2.run(TestWorker.java:176)
> at
> org.gradle.api.internal.tasks.testing.worker.TestWorker.executeAndMaintainThreadName(TestWorker.java:129)
> at
> org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:100)
> at
> org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:60)
> at
> org.gradle.process.internal.worker.child.ActionExecutionWorker.execute(ActionExecutionWorker.java:56)
> at
> org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:113)
> at
> org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:65)
> at
> worker.org.gradle.process.internal.worker.GradleWorkerMain.run(GradleWorkerMain.java:69)
> at
> worker.org.gradle.process.internal.worker.GradleWorkerMain.main(GradleWorkerMain.java:74)
>
--
Ryan Blue
Tabular
gt;>
>> Thanks,
>> Peter
>>
>> [1]
>> https://github.com/apache/iceberg/blob/f536c840350bd5628d7c514d2a4719404c9b8ed1/api/src/main/java/org/apache/iceberg/Scan.java#L71-L78
>>
>
--
Ryan Blue
Tabular
nted so not sure where the performance issue is.
>>
>>
>>
>> On Mon, May 15, 2023 at 7:55 AM Mayur Srivastava <
>> mayur.srivast...@twosigma.com> wrote:
>>
>> Thanks Ryan.
>>
>> For most partition stats, I’m ok with compaction and keeping fe
gt;
>>>> Hi folks,
>>>>
>>>> Would it be appropriate for us to consider changing the default table
>>>> format version for new tables from v1 to v2?
>>>>
>>>> I don’t think defaulting to v2 tables means all readers have to support
>>>> delete files. DELETE, UPDATE, MERGE operations will only produce delete
>>>> files if configured explicitly.
>>>>
>>>> The primary reason I am starting this thread is to avoid our
>>>> workarounds in v1 spec evolution, and snapshot ID inheritance. The latter
>>>> is critical for the performance of rewriting manifests.
>>>>
>>>> Any thoughts?
>>>>
>>>> - Anton
>>>
>>>
>>
--
Ryan Blue
Tabular
<https://docs.commonroom.io/get-started/integrations/github#required-permissions>
>>>>>> permissions for the apache/iceberg and apache/icberg-docs for the GitHub
>>>>>> integration. Here's how the information would be used:
>>>>>>
>>>>>>- Triage issues and PRs
>>>>>>- Learn ways to improve developer/contributor experience in the
>>>>>>community
>>>>>>- Understand which PRs and issues are not getting attention and
>>>>>>why
>>>>>>- Set alerts and notifications for the Developer Relations team
>>>>>>to follow up on issues to help drive changes in Iceberg
>>>>>>- Metrics reporting to showcase Iceberg usage to drive further
>>>>>>adoption and interest in Iceberg
>>>>>>- Gaining a better understanding of the ways people use Iceberg
>>>>>>and the features they are interested in
>>>>>>- Showcase the diversity of contributions the Iceberg project
>>>>>>
>>>>>> Is everyone okay with me setting this up so I can help the community
>>>>>> with things like roadmap updates and making sure we follow up on reviews?
>>>>>>
>>>>>>
>>>>>>
--
Ryan Blue
Tabular
ame pain point? How do
> you solve this? I would love to understand if there is a solution to this
> otherwise we can brainstorm if there is a way to solve this.
>
> Thanks!
>
> Pucheng
>
--
Ryan Blue
Tabular
ypes of the
>>>> transforms. This can look a bit weird with field types changed.
>>>>
>>>> public static Schema transform(Schema schema, Map>>> ?>> idToTransforms)
>>>>
>>>> =
>>>> This is how everything is put together for RowDataComparator.
>>>>
>>>> Schema projected = TypeUtil.select(schema, sortFieldIds); //
>>>> sortFieldIds set is calculated from SortOrder
>>>> Map> idToTransforms) idToTransforms = //
>>>> calculated from SortOrder
>>>> Schema sortSchema = TypeUtil.transform(projected, idToTransforms);
>>>>
>>>> StructLike leftSortKey =
>>>> structTransformation.wrap(structProjection.wrap(rowDataWrapper.wrap(leftRowData)))
>>>> StructLike rightSortKey =
>>>> structTransformation.wrap(structProjection.wrap(rowDataWrapper.wrap(leftRowData)))
>>>>
>>>> Comparators.forType(sortSchema).compare(leftSortKey, rightSortKey)
>>>>
>>>> Thanks,
>>>> Steven
>>>>
>>>> [1]
>>>> https://docs.google.com/document/d/13N8cMqPi-ZPSKbkXGOBMPOzbv2Fua59j8bIjjtxLWqo/
>>>>
>>>
--
Ryan Blue
Tabular
t;>> I haven't thought it completely through, but it crossed my mind that a
>>>>> ‘Soft’-mode of ExpireSnapshot may be useful, where we can delete data
>>>>> files
>>>>> but just mark snapshot’s metadata files as expired without physically
>
` can be useful as a standalone class.
>
> On Fri, Jun 2, 2023 at 4:04 PM Ryan Blue wrote:
>
>> This all sounds pretty reasonable to me, although I'd use `StructType`
>> rather than `Schema` in most places so this is more reusable. I definitely
>> agree about re
for tables
- Improved compatibility
The community is also continuing to build a view specification, expand REST
catalog support, and add encryption to the table spec.
Community Health:
The community continues to be healthy, with most metrics steady this
quarter.
--
Ryan Blue
Tabular
(talks/meetups at CommunityOverCode conf, community
> events, ...).
>
> Thanks,
> Regards
> JB
>
> On Wed, Jun 14, 2023 at 3:51 AM Ryan Blue wrote:
> >
> > Hi everyone,
> >
> > Here’s our draft for the June board report. Please comment on this
> thre
e above is a partial introduction of the proposal. For the full
> document, please refer to:
> https://docs.google.com/document/d/1Sobv8XbvsyPzHi1YWy_jSet1Wy7smXKDKeQrNZSFYCg
>
>
> Thank you very much for the valuable suggestions from @Steven and @Junjie
> Chen.
>
> Thanks,
> Liwei Li
>
>
--
Ryan Blue
Tabular
gt;
> Please download, verify, and test.
>
>
> Please vote in the next 72 hours.
>
> [ ] +1 Release this as PyIceberg 0.4.0
>
> [ ] +0
>
> [ ] -1 Do not release this because...
>
>
> Please consider this email a +1 from my side:
>
>
>- Ran some basic table scans
> - Including tables with positional deletes
>- Checked to see if everything still works when PyArrow is not
>installed
>- Set some table properties
>
> Kind regards,
>
> Fokko
>
--
Ryan Blue
Tabular
t;>> In the recent past, https://github.com/apache/iceberg/pull/6838/ was a
>>>> PR to allow the write distribution mode to be specified in SQLConf. This
>>>> was merged.
>>>> Cheng Pan asks if there is any guidance on when we should allow configs
>>>> to be specified in SQLConf.
>>>> Thanks,
>>>> Wing Yew
>>>>
>>>> ps. The above open PRs could use reviews by committers.
>>>>
>>>>
--
Ryan Blue
Tabular
2 hours.
>
> [ ] +1 Release this as PyIceberg 0.4.0
>
> [ ] +0
>
> [ ] -1 Do not release this because...
>
>
> Please consider this email a +1 from my side:
>
>
>- Ran some basic table scans
> - Including tables with positional deletes
>- Checked to see if everything still works when PyArrow is not
>installed
>- Set some table properties
>
> Kind regards,
>
> Fokko
>
--
Ryan Blue
Tabular
te the how-to-release guide to make
> sure that this won't happen again.
>
> Kind regards,
> Fokko
>
> Op di 27 jun 2023 om 23:04 schreef Ryan Blue :
>
>> Any idea why there are rc1 artifacts here?
>> https://dist.apache.org/repos/dist/dev/iceberg/pyiceberg-0.4.0
w configs
> to be specified in SQLConf.
> > Thanks,
> > Wing Yew
> >
> > ps. The above open PRs could use reviews by committers.
> >
>
>
--
Ryan Blue
Tabular
0. It was started based on the issue found by
> Xiangyang (@ConeyLiu) :
> https://github.com/apache/iceberg/pull/7931#pullrequestreview-1507935277.
> >
> > Do people have any other bug fixes that should be included? Also let me
> know, if anyone wants to be a release manager? If not, I can give it a
> shot as well.
> >
> > Thanks,
> > Szehon
>
--
Ryan Blue
Tabular
apache/iceberg/pull/7790,
>>> to allow the write mode (copy-on-write/merge-on-read) to be specified in
>>> SQLConf. The use case is explained in the PR.
>>> >> Cheng Pan has an open PR, https://github.com/apache/iceberg/pull/7733,
>>> to allow locality to be specified in SQLConf.
>>> >> In the recent past, https://github.com/apache/iceberg/pull/6838/ was
>>> a PR to allow the write distribution mode to be specified in SQLConf. This
>>> was merged.
>>> >> Cheng Pan asks if there is any guidance on when we should allow
>>> configs to be specified in SQLConf.
>>> >> Thanks,
>>> >> Wing Yew
>>> >>
>>> >> ps. The above open PRs could use reviews by committers.
>>> >>
>>> >
>>>
>>>
--
Ryan Blue
Tabular
sions about this:
> https://github.com/apache/iceberg/issues/6736.
> I have submitted a PR for this:
> https://github.com/apache/iceberg/pull/8023. Hopeful get more advice from
> you.
>
> Thanks,
> Xianyang
>
>
>
--
Ryan Blue
Tabular
reat if you could share your opinions on the topic. Maybe
> this could also be a point for the community sync later today.
>
> Hope you're all doing well. Best wishes,
>
> Jan
>
>
--
Ryan Blue
Tabular
t-forward".
>
> May I know why this is not allowed? or is this theoretical supported but
> just the implementation is not there yet?
>
> Best,
> Pucheng
>
--
Ryan Blue
Tabular
;txnAppId" which
>> allow it to drop duplicates before writing like following.
>>
>> def writeToDeltaLakeTableIdempotent(batch_df, batch_id):
>> batch_df.write.format(...).option("txnVersion",
>> batch_id).option("txnAppId", app_id).save(...) # location 1
>>
>> Is there something similar exist for Iceberg? If not do you see issue
>> with `foreach` and `merg into.. when not matched..` approach at production
>> scale.
>>
>> I have posted a question on SO regarding this as well:
>>
>> https://stackoverflow.com/questions/76726225/spark-structured-streaming-apache-iceberg-how-appends-can-be-idempotent
>>
>> Thanks!
>> Nirav
>>
>>
--
Ryan Blue
Tabular
working on the rust implementation slightly
> > favor a separate repository but would be okay with using the existing
> > repository.
> >
> >
> > It would be great if you could share your opinions on the topic. Maybe
> > this could also be a point for the community sync later today.
> >
> > Hope you're all doing well. Best wishes,
> >
> > Jan
> >
> >
>
--
Ryan Blue
Tabular
Hi,
>>
>> I’m trying to access the slack workspace for Apache Iceberg but I think
>> the link is broken.
>>
>> Can I be added please?
>>
>> Cheers,
>>
>> Bruno Murino
>>
>>
>>
--
Ryan Blue
Tabular
ixes over 1.3.0, including:
>>>> * Fix Spark RewritePositionDeleteFiles failure for certain partition
>>>> types (#8059)
>>>> * Fix Spark RewriteDataFiles concurrency edge-case on commit timeouts
>>>> (#7933)
>>>> * Table Metadata parser now accepts null current-snapshot-id,
>>>> properties, snapshots fields (#8064)
>>>> * FlinkCatalog creation no longer creates the default database (#8039)
>>>> * Fix loading certain V1 table branch snapshots using snapshot
>>>> references (#7621)
>>>> * Fix Spark partition-level DELETE operations for WAP branches (#7900)
>>>> * Fix HiveCatalog deleting metadata on failures in checking lock status
>>>> (#7931)
>>>>
>>>> Please download, verify, and test.
>>>>
>>>> Please vote in the next 72 hours. (Weekends excluded)
>>>>
>>>> [ ] +1 Release this as Apache Iceberg 1.3.1
>>>> [ ] +0
>>>> [ ] -1 Do not release this because...
>>>>
>>>> Only PMC members have binding votes, but other community members are
>>>> encouraged to cast
>>>> non-binding votes. This vote will pass if there are 3 binding +1 votes
>>>> and more binding
>>>> +1 votes than -1 votes.
>>>>
>>>> Thanks
>>>> Szehon
>>>>
>>>
--
Ryan Blue
Tabular
in
> SparkConfParser is to use the option if set, else use the session conf if
> set, else use the table property. This applies across the board.
> - Wing Yew
>
>
>
>
>
>
> On Sun, Jul 16, 2023 at 4:48 PM Ryan Blue wrote:
>
>> Yes, I agree that there is value
en a particular release will no longer
> be supported or receive updates.
> >
> > What do you think about setting up an EOL policy? We could go for a
> vote-based approach or have a fixed lifecycle for each release. Either way,
> this could help our users plan their upgrades and keep their systems
> updated more effectively.
> >
> > Looking forward to hearing your thoughts!
> >
> > Best,
> >
> > Yufei
>
--
Ryan Blue
Tabular
and prs can have more reviews and
>>>> attractions.
>>>>- *Easier sharing of resources*. It would be easier to share
>>>>resources for integration tests.
>>>>
>>>> Cons
>>>>
>>>>- *Increases complexity of project structure*. The project
>>>>structure would be more complex when coupling different languages and
>>>>toolchain setup.
>>>>- *Longer build/ci time. *Unnecessary ci checks maybe triggered
>>>>for small prs in different languages.
>>>>
>>>>
>>>>
>>>> Multi Repo
>>>>
>>>>
>>>>
>>>> Pros
>>>>
>>>>- *Simplifies project structure*. Different language may have
>>>>toolchains and project setup, one repo for one language makes project
>>>>structure easier to understand and follow.
>>>>- *Independent versioning and releases*. Different language may
>>>>have different versioning and releases process. It’s also possible in
>>>>monorepo, but I guess it would be easier in standalone multi repo.
>>>>- *Improved build/ci time*. No unnecessary ci checks will be
>>>>triggered.
>>>>
>>>> Cons
>>>>
>>>>- *Difficult to track the overall progress. *Multi repos makes it
>>>>harder to track what’s happening in different teams.
>>>>- *Difficult to share common resources.* It maybe more difficult to
>>>>share resources and do integration tests cross different languages.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Welcome to share your ideas and thoughts in this discussion!
>>>>
>>>>
>>>>
>>>> References
>>>>
>>>>
>>>>
>>>>1.
>>>>
>>>> https://www.coforge.com/blog/mono-repo-vs.-multi-repo-in-git-unravelling-the-key-differences
>>>>
>>>>
--
Ryan Blue
Tabular
igrate specific valuable
>>>>conversations to Discourse once it is done.
>>>>3. Another idea, would be that we could also use the Discourse
>>>>forum as one of the inputs to create some sort of chatbot experience,
>>>>either in Slack or neste
entifier fields and a nested field cannot be used as an
>> identifier field if it is nested in an optional struct, to avoid null
>> values in identifiers." and I propose "Float and double fields cannot be
>> used as identifier fields."
>>
>>
>>
>> - What do people think of these two proposed changes?
>>
>> - What can I do next?
>>
>>
>>
>> The spec mentions v3
>> <https://github.com/apache/iceberg/blob/9df8ddb05428cf3d7145bc5cf4a130de36dbb96a/format/spec.md#version-3>;
>> is there a plan for a v3 release yet? I saw a conversation about enabling
>> v2 by default, so I assume v3 is a ways off yet.
>>
>> --
>>
>> Jacob Marble
>>
>> 🇺🇸 🇺🇦
>>
>
>
> --
> Jacob Marble
> 🇺🇸 🇺🇦
>
--
Ryan Blue
Tabular
lues in primary keys? If so, what is the purpose of it?
On Thu, Aug 24, 2023 at 5:40 PM Jacob Marble
wrote:
> On Thu, Aug 24, 2023 at 5:28 PM Ryan Blue wrote:
>
>> I think it's a good idea to start adding timestamp types with nanosecond
>> precision. I've heard this a fe
Jacob, could you model this with a derived field? Or could you require the
tags and use a "unknown" value?
On Mon, Aug 28, 2023 at 11:18 AM Jacob Marble
wrote:
> On Fri, Aug 25, 2023 at 3:23 PM Ryan Blue wrote:
>
>> I don't think that we should introduce nanosec
e data files and the metadata files may store in different
> locations,
> // so it has to call dropTableData to force delete the data file.
> CatalogUtil.dropTableData(ops.io(), lastMetadata);
> }
> return fs.delete(tablePath, true /* recursive */);
> }
> } catch (IOException e) {
> throw new RuntimeIOException(e, "Failed to delete file: %s", tablePath);
> }
> }
>
>
> Thanks,
> Manu
>
>
--
Ryan Blue
Tabular
>> write.spark.fanout.enabled - False
>> write.distribution-mode - None
>> but I have left it to defaults as I assume writer will override those
>> settings.
>>
>> so do "fanout-enabled" option have effect when using with foreachBatch?
>> (I'm new to spark streaming as well)
>>
>> thanks
>>
>
--
Ryan Blue
Tabular
.
> 2. The manifest file name will be the name specified during the "first
> write" (the "second write" is manifest copy during appendManifest
> operation). An example will be "stage-%d-task-%d-manifest-%s" which is the
> name used during snapshot creation, but since the last param is UUID, it
> should be fine.
>
> Would like to hear from you, thanks!
>
--
Ryan Blue
Tabular
gt;
> Is it OK if I set it True during snapshot table creation and set it to
> false when finished?
>
> On Thu, Aug 31, 2023 at 9:44 AM Ryan Blue wrote:
>
>> This isn't something that we can set to `true` because it is a
>> forward-incompatible change. That's w
ly need to.
> https://iceberg.apache.org/spec/#default-values
>
> On Tue, Aug 29, 2023 at 9:52 AM Ryan Blue wrote:
>
>> Jacob, could you model this with a derived field? Or could you
>> require the tags and use a "unknown" value?
>>
>> On Mon, Aug
t thread-safe, this part seems
> missing, slack discussion:
> https://apache-iceberg.slack.com/archives/C03LG1D563F/p1693410897758349?thread_ts=1693355747.700759&cid=C03LG1D563F
> )
>
>>
--
Ryan Blue
Tabular
lower than normal and is
not
expected to fluctuate. We will take a look and see what the difference is.
--
Ryan Blue
Tabular
partition. I will have to run
> compaction regardless it seems.
>
> Best
> Nirav
>
>
>
>
>
> On Thu, Aug 31, 2023 at 8:59 AM Ryan Blue wrote:
>
>> We generally don't recommend fanout writers because they create lots of
>> small data files. It also isn
t;
> 2) I can partitioned by ingestion_time and during querying I can just use
> big enough time range to accommodate for late data. however, it's not
> consumer friendly query.
>
>
> Is there any better suggestion ?
>
>
> Best,
>
> Nirav
>
--
Ryan Blue
Tabular
S
>>>>> gpg --import KEYS
>>>>>
>>>>> svn checkout
>>>>> https://dist.apache.org/repos/dist/dev/iceberg/pyiceberg-0.5.0rc1/
>>>>> /tmp/pyiceberg/
>>>>>
>>>>> for name in $(ls /tmp/pyiceberg/pyiceberg-*.whl
>>>>> /tmp/pyiceberg/pyiceberg-*.tar.gz)
>>>>> do
>>>>> gpg --verify ${name}.asc ${name}
>>>>> done
>>>>>
>>>>> cd /tmp/pyiceberg/
>>>>> for name in $(ls /tmp/pyiceberg/pyiceberg-*.whl.asc.sha512
>>>>> /tmp/pyiceberg/pyiceberg-*.tar.gz.asc.sha512)
>>>>> do
>>>>> shasum -a 512 --check ${name}
>>>>> done
>>>>>
>>>>> tar xzf pyiceberg-0.5.0.tar.gz
>>>>> cd pyiceberg-0.5.0
>>>>>
>>>>> ./dev/check-license
>>>>>
>>>>> Please download, verify, and test.
>>>>>
>>>>> Please vote in the next 72 hours.
>>>>> [ ] +1 Release this as PyIceberg 0.5.0
>>>>> [ ] +0
>>>>> [ ] -1 Do not release this because...
>>>>>
>>>>> Consider this my +1 (binding), I've tested the license, and checksums
>>>>> and ran example notebooks against the 0.5.0 rc1
>>>>> <https://github.com/tabular-io/docker-spark-iceberg/pull/92>.
>>>>>
>>>>> Cheers, Fokko
>>>>>
>>>>
--
Ryan Blue
Tabular
tps://github.com/apache/iceberg/pull/8267>
>- Support for adding columns
><https://github.com/apache/iceberg/pull/8174>
>- Optimize concurrency <https://github.com/apache/iceberg/pull/8104>
> (follow
>up on the Support serverless environments)
>- Bump Pydantic to v2 <https://github.com/apache/iceberg/pull/7782>
> (improved
>performance of the JSON (de)serialization)
>- A lot of bugfixes!
>
> The commit ID is f798b06246e67131d413dfceece5ccaf269e01fe
>
>
>
>- This corresponds to the tag: pyiceberg-0.5.0rc3
>(37fa779b0957644590a03754a733a5b3e3f589d0)
>- https://github.com/apache/iceberg/releases/tag/pyiceberg-0.5.0rc3
>-
>
> https://github.com/apache/iceberg/tree/f798b06246e67131d413dfceece5ccaf269e01fe
>
>
> The release tarball, signature, and checksums are here:
>
>
>
>- https://dist.apache.org/repos/dist/dev/iceberg/pyiceberg-0.5.0rc3/
>
>
> You can find the KEYS file here:
>
>
>
>- https://dist.apache.org/repos/dist/dev/iceberg/KEYS
>
>
> Convenience binary artifacts are staged on pypi:
>
>
> https://pypi.org/project/pyiceberg/0.5.0rc3/
>
>
> And can be installed using: pip3 install pyiceberg==0.5.0rc3
>
>
> Please download, verify, and test.
>
>
> Please vote in the next 72 hours.
>
>
> [ ] +1 Release this as PyIceberg 0.5.0
>
> [ ] +0
>
> [ ] -1 Do not release this because...
>
>
> Cheers, Fokko
>
>
> Xuanwo
>
>
--
Ryan Blue
Tabular
facts are staged on pypi:
> >
> >
> > https://pypi.org/project/pyiceberg/0.5.0rc3/
> >
> >
> > And can be installed using: pip3 install pyiceberg==0.5.0rc3
> >
> >
> > Please download, verify, and test.
> >
> >
> > Please vote in the next 72 hours.
> >
> >
> > [ ] +1 Release this as PyIceberg 0.5.0
> >
> > [ ] +0
> >
> > [ ] -1 Do not release this because...
> >
> >
> > Cheers, Fokko
> >
> >
>
--
Ryan Blue
Tabular
?
> > >> Creating an issue?
> > >>
> > >> - Anton
> > >>
> > >> On Apr 24, 2023, at 10:01 AM, Edgar Rodriguez <
> > >> edgar.rodrig...@airbnb.com.INVALID> wrote:
> > >>
> > >> Hi all,
> > >>
>
ed the 18
> month maintenance mark in Spark.
>
> - Anton
--
Ryan Blue
Tabular
y place int time" you mean I can derive
> event_time partitions that got affected (added, updated) based on wall
> clock? how exactly do I do that? I know there are history, manifest,
> snapshot table but haven't dig into them to see if it's possbile to derive
> what pa
Based on your past experience, how do companies move forward to
> maintain Iceberg with removed Spark versions? Do those companies own their
> enhancements or fixes internally?
>
> Thanks!
>
> On Wed, Sep 20, 2023 at 4:00 PM Ryan Blue wrote:
>
>> +1
>>
>> O
t?usp=sharing
>
>
>
> Welcome to comment, and looking forward to hear your advice.
>
--
Ryan Blue
Tabular
that ourselves, right?
>
> Thanks
>
> On Wed, Sep 20, 2023 at 4:57 PM Ryan Blue wrote:
>
>> Pucheng, you can continue using older releases of Iceberg or you may
>> already have your own fork. The older versions will continue to work and we
>> can still do p
lopment in Iceberg. It was released in October, 2021 and passed
> the 18 month maintenance mark in Spark.
> >
> > - Anton
>
--
Ryan Blue
Tabular
nticipate a bit to inform our users/community,
> for instance having a clear table about the supported layers (a bit
> like on https://karaf.apache.org/download.html or
> https://kafka.apache.org/downloads).
>
> Thanks !
> Regards
> JB
>
> On Thu, Sep 21, 2023 at 5:40
t;>>> > Summit name. If it works for everyone, I will send a message to the
> >>>> > Apache Publicity & Marketing to get their OK for the event.
> >>>> > 2. create two committees:
> >>>> > 2.1. the Sponsoring Committee gathering companies/organizations
> >>>> > wanting to sponsor the event
> >>>> > 2.2. the Program Committee gathers folks from the Iceberg
> community
> >>>> > (PMC/committers/contributors) to select talks.
> >>>> >
> >>>> > My company (Dremio) will “host” the event - i.e., provide funding, a
> >>>> > conference platform, sponsor logistics, speaker training, slide
> >>>> > design, etc..
> >>>> >
> >>>> > In terms of dates, as CommunityOverCode Con NA will be in October, I
> >>>> > think January 2024 would work: it gives us time to organize
> smoothly,
> >>>> > promote the event, and not in a rush.
> >>>> >
> >>>> > I propose:
> >>>> > 1. to create the #summit channel on Iceberg Slack.
> >>>> > 2. I will share a preparation document with a plan proposal.
> >>>> >
> >>>> > Thoughts ?
> >>>> >
> >>>> > Regards
> >>>> > JB
>
--
Ryan Blue
Tabular
same transaction as the MERGE so that reprocessing is minimized. Does
>>Iceberg support storing this as table metadata? I do not see any related
>>information in the Iceberg Table Spec.
>>2. Use the dataframe API or Spark SQL for the incremental read and
>>
as it's a community event.
> As few of us helped on summits in the past, experience and how it went
> are valuable.
>
> Special thank to you, Brian and Ed about the comments.
>
> I propose to work on the doc: anyone from the community can edit the doc.
>
> Thanks,
> Regards
>
per
> interesting) and connect dots with other Apache projects. I also think it’s
> good timing to grow/expand our community.
> I will also work on the document thanks to your comment, as anyone can do
> :)
>
> Regards
> JB
>
> Le mer. 27 sept. 2023 à 00:47, Ryan Blue a éc
; > >> > You can find the KEYS file here:
> > >> > * https://dist.apache.org/repos/dist/dev/iceberg/KEYS
> > >> >
> > >> > Convenience binary artifacts are staged on Nexus. The Maven
> repository URL is:
> > >> > *
> https://repository.apache.org/content/repositories/orgapacheiceberg-1145/
> > >> >
> > >> > Please download, verify, and test.
> > >> >
> > >> > Please vote in the next 72 hours. (Weekends excluded)
> > >> >
> > >> > [ ] +1 Release this as Apache Iceberg 1.4.0
> > >> > [ ] +0
> > >> > [ ] -1 Do not release this because...
> > >> >
> > >> > Only PMC members have binding votes, but other community members
> are encouraged to cast non-binding votes. This vote will pass if there are
> 3 binding +1 votes and more binding +1 votes than -1 votes.
> > >> >
> > >> > - Anton
> >
> >
>
--
Ryan Blue
Tabular
j3q9w2fqnxq2llbn
>>> >>>
>>> >>> Since we just did the PyIcerg 0.5.0 release, I think it is a good
>>> moment to migrate PyIceberg to iceberg-python as well:
>>> https://github.com/apache/iceberg-python/pull/2 I went over the PRs
>>> that are ready to merge and got them in. If there is anything missing,
>>> please let me know.
>>> >>>
>>> >>> I would suggest merging the PR and leaving the source code in the
>>> main repository for another week or so to make sure that we didn't miss
>>> anything.
>>> >>>
>>> >>> Since PyIceberg now also hosts the docs on the Github pages of the
>>> Iceberg repository, moving PyIceberg will also free up the Github pages for
>>> the migration of the docs back into the main repository.
>>> >>>
>>> >>> Let me know if there are any concerns.
>>> >>>
>>> >>> Kind regards,
>>> >>> Fokko Driesprong
>>>
>>
--
Ryan Blue
Tabular
-1.4.0-rc2
>>>>> > > *
>>>>> https://github.com/apache/iceberg/tree/10367c380098c2e06a49521a33681ac7f6c64b2c
>>>>> > >
>>>>> > > The release tarball, signature, and checksums are here:
>>>>> > > *
>>>>> https://dist.apache.org/repos/dist/dev/iceberg/apache-iceberg-1.4.0-rc2
>>>>> > >
>>>>> > > You can find the KEYS file here:
>>>>> > > * https://dist.apache.org/repos/dist/dev/iceberg/KEYS
>>>>> > >
>>>>> > > Convenience binary artifacts are staged on Nexus. The Maven
>>>>> repository URL is:
>>>>> > > *
>>>>> https://repository.apache.org/content/repositories/orgapacheiceberg-1146/
>>>>> > >
>>>>> > > Please download, verify, and test.
>>>>> > >
>>>>> > > Please vote in the next 72 hours. (Weekends excluded)
>>>>> > >
>>>>> > > [ ] +1 Release this as Apache Iceberg 1.4.0
>>>>> > > [ ] +0
>>>>> > > [ ] -1 Do not release this because...
>>>>> > >
>>>>> > > Only PMC members have binding votes, but other community members
>>>>> are encouraged to cast non-binding votes. This vote will pass if there are
>>>>> 3 binding +1 votes and more binding +1 votes than -1 votes.
>>>>> > >
>>>>> > > - Anton
>>>>> > >
>>>>> >
>>>>>
>>>>
--
Ryan Blue
Tabular
1 - 100 of 1167 matches
Mail list logo