date:20241009

Re: [DISCUSS] Iceberg Rust Sync Meeting

2024-10-09 Thread Kevin Liu

+1 on sync meeting for iceberg rust. I want to get involved and catch up on
the recent developments. For reference, here's the doc we've been using for
the pyiceberg sync
https://docs.google.com/document/d/1oMKodaZJrOJjPfc8PDVAoTdl02eGQKHlhwuggiw7s9U

Best,
Kevin

On Wed, Oct 9, 2024 at 5:30 AM Xuanwo  wrote:

> Hi,
>
> I'm starting this thread to explore the idea of hosting an Iceberg Rust
> Sync Meeting. In this meeting, we will discuss recent major changes,
> pending PR reviews, and features in development. It will offer a space for
> Iceberg Rust contributors to connect and become familiar with each other,
> helping us identify and remove contribution barriers to the best of our
> ability.
>
> Details about this meeeting:
>
> I suggest hosting our meeting at the same time of day, but one week
> earlier than the Iceberg Sync Meeting. For example, if the Iceberg Sync
> Meeting is scheduled for Thursday, October 24, 2024, from 00:00 to 01:00
> GMT+8, the Iceberg Rust Sync Meeting would take place one week before, on
> Thursday, October 17, 2024, from 00:00 to 01:00 GMT+8.
>
> I also suggest using the same Google Meet code (if possible) so we don't
> get confused.
>
> These meetings will not be recorded, but I will take notes in a Google
> Doc, similar to what we do in the Iceberg Sync Meeting.
>
> What are your thoughts? I'm open to other options as well.
>
> Xuanwo
>
> https://xuanwo.io/
>

Re: [VOTE] Table V3 Spec: Row Lineage

2024-10-09 Thread rdb...@gmail.com

+1

Thanks for shepherding this, Russell!

On Tue, Oct 8, 2024 at 7:07 PM Russell Spitzer 
wrote:

> Hi Y'all!
>
> I think we are more or less in agreement on adding Row Lineage to the spec
> apart from a few details which may change a bit during implementation.
> Because of this, I'd like to call for an overall vote on whether or not
> Row-Lineage as described in  PR 11130
>  can be added to the spec.
>
> I'll note this is basically giving a thumbs up for reviewers and
> implementers to go ahead with the pull-request and acknowledging that you
> support the direction this proposal is going. I do think we'll probably dig
> a few things out when we write the reference implementation, but I think in
> general we have defined the required behaviors we want to see.
>
> Please vote in the next 72 hours
>
> [ ] +1, commit the proposed spec changes
> [ ] -0
> [ ] -1, do not make these changes because . . .
>
>
> Thanks everyone,
>
> Russ
>

Re: [Discuss] Iceberg community maintaining the docker images

2024-10-09 Thread rdb...@gmail.com

I think it's important for a project to remain focused on its core purpose,
and I've always advocated for Iceberg to remain a library that is easy to
plug into other projects. I think that should be the guide here as well.
Aren't projects like Spark and Trino responsible for producing easy to use
Docker images of those environments? Why would the Iceberg project build
and maintain them?

I would prefer not to be distracted by these things, unless we need them
for cases like supporting testing and validation of things that are part of
the core purpose of the project.

On Tue, Oct 8, 2024 at 6:08 AM Ajantha Bhat  wrote:

> Hello everyone,
>
> Now that the test fixtures are in [1],we can create a runtime JAR for the
> REST catalog adapter [2] from the TCK.
> Following that, we can build and maintain the Docker image based on it [3].
>
> I also envision the Iceberg community maintaining some quick-start Docker
> images, such as spark-iceberg-rest, Trino-iceberg-rest, among others.
>
> I've looked into other Apache projects, and it seems that Apache Infra can
> assist us with this process.
> As we have the option to publish Iceberg docker images under the Apache
> Docker Hub account.
>
> [image: image.png]
>
> I am more than willing to maintain this code, please find the PRs related
> to the same [2] & [3].
>
> Any suggestions on the same? contributions are welcome if we agree to
> maintain it.
>
> [1] https://github.com/apache/iceberg/pull/10908
> [2] https://github.com/apache/iceberg/pull/11279
> [3] https://github.com/apache/iceberg/pull/11283
>
> - Ajantha
>

Re: [Discuss] Replace Hadoop Catalog Examples with JDBC Catalog in Documentation

2024-10-09 Thread Marc Cenac

I support the idea of updating the docs to replace the Hadoop catalog
example, but I'm wondering why not use a REST Catalog example instead?  I
saw Ajantha proposed adding Docker images for a REST Catalog adapter [1] so
we could potentially use this with a JDBC Catalog backed by SQLite file as
a convenient quickstart example which shows a REST Catalog configuration.
I'm thinking the REST Catalog would be preferred to the JDBC catalog as a
best practice, since it's technology agnostic (on the server side) and the
protocol allows for more advanced functionality (ie. multi table commits,
credentials vending, etc).

[1] https://lists.apache.org/thread/xl1cwq7vmnh6zgfd2vck2nq7dfd33ncq

On Tue, Oct 8, 2024 at 1:18 PM Kevin Liu  wrote:

> Hi all,
>
> I wanted to bring up a suggestion regarding our current documentation. The
> existing examples for Iceberg often use the Hadoop catalog, as seen in:
>
>- Adding a Catalog - Spark Quickstart [1]
>- Adding Catalogs - Spark Getting Started [2]
>
> Since we generally advise against using Hadoop catalogs in production
> environments, I believe it would be beneficial to replace these examples
> with ones that use the JDBC catalog. The JDBC catalog, configured with a
> local SQLite database file, offers similar convenience but aligns better
> with production best practices.
>
> I've created an issue [3] and a PR [4] to address this. Please take a
> look, and I'd love to hear your thoughts on whether this is a direction we
> want to pursue.
>
> Best,
> Kevin Liu
>
> [1] https://iceberg.apache.org/spark-quickstart/#adding-a-catalog
> [2]
> https://iceberg.apache.org/docs/nightly/spark-getting-started/#adding-catalogs
> [3] https://github.com/apache/iceberg/issues/11284
> [4] https://github.com/apache/iceberg/pull/11285
>
>

[DISCUSS] Iceberg Rust Sync Meeting

2024-10-09 Thread Xuanwo

Hi,

I'm starting this thread to explore the idea of hosting an Iceberg Rust Sync 
Meeting. In this meeting, we will discuss recent major changes, pending PR 
reviews, and features in development. It will offer a space for Iceberg Rust 
contributors to connect and become familiar with each other, helping us 
identify and remove contribution barriers to the best of our ability.

Details about this meeeting:

I suggest hosting our meeting at the same time of day, but one week earlier 
than the Iceberg Sync Meeting. For example, if the Iceberg Sync Meeting is 
scheduled for Thursday, October 24, 2024, from 00:00 to 01:00 GMT+8, the 
Iceberg Rust Sync Meeting would take place one week before, on Thursday, 
October 17, 2024, from 00:00 to 01:00 GMT+8.

I also suggest using the same Google Meet code (if possible) so we don't get 
confused.

These meetings will not be recorded, but I will take notes in a Google Doc, 
similar to what we do in the Iceberg Sync Meeting.

What are your thoughts? I'm open to other options as well.

Xuanwo

https://xuanwo.io/

Re: [Discuss] Replace Hadoop Catalog Examples with JDBC Catalog in Documentation

2024-10-09 Thread Renjie Liu

I would also vote for jdbc catalog, ideally using sqlite as backend as it
doesn't require setting up other databases.

On Thu, Oct 10, 2024 at 8:42 AM Manu Zhang  wrote:

> I'd vote for JDBC catalog as it's simple for a quick-start guide. Setting
> up a REST Service with docker image could be cumbersome.
> We can have another page for REST Catalog.
>
> Regards,
> Manu
>
> On Thu, Oct 10, 2024 at 2:50 AM Marc Cenac
>  wrote:
>
>> I support the idea of updating the docs to replace the Hadoop catalog
>> example, but I'm wondering why not use a REST Catalog example instead?  I
>> saw Ajantha proposed adding Docker images for a REST Catalog adapter [1] so
>> we could potentially use this with a JDBC Catalog backed by SQLite file as
>> a convenient quickstart example which shows a REST Catalog configuration.
>> I'm thinking the REST Catalog would be preferred to the JDBC catalog as a
>> best practice, since it's technology agnostic (on the server side) and the
>> protocol allows for more advanced functionality (ie. multi table commits,
>> credentials vending, etc).
>>
>> [1] https://lists.apache.org/thread/xl1cwq7vmnh6zgfd2vck2nq7dfd33ncq
>>
>> On Tue, Oct 8, 2024 at 1:18 PM Kevin Liu  wrote:
>>
>>> Hi all,
>>>
>>> I wanted to bring up a suggestion regarding our current documentation.
>>> The existing examples for Iceberg often use the Hadoop catalog, as seen in:
>>>
>>>- Adding a Catalog - Spark Quickstart [1]
>>>- Adding Catalogs - Spark Getting Started [2]
>>>
>>> Since we generally advise against using Hadoop catalogs in production
>>> environments, I believe it would be beneficial to replace these examples
>>> with ones that use the JDBC catalog. The JDBC catalog, configured with a
>>> local SQLite database file, offers similar convenience but aligns better
>>> with production best practices.
>>>
>>> I've created an issue [3] and a PR [4] to address this. Please take a
>>> look, and I'd love to hear your thoughts on whether this is a direction we
>>> want to pursue.
>>>
>>> Best,
>>> Kevin Liu
>>>
>>> [1] https://iceberg.apache.org/spark-quickstart/#adding-a-catalog
>>> [2]
>>> https://iceberg.apache.org/docs/nightly/spark-getting-started/#adding-catalogs
>>> [3] https://github.com/apache/iceberg/issues/11284
>>> [4] https://github.com/apache/iceberg/pull/11285
>>>
>>>

Re: [DISCUSS] Iceberg Rust Sync Meeting

2024-10-09 Thread Renjie Liu

+1 for sync meeting for iceberg rust.

These meetings will not be recorded.


I think we have meeting records for catalog meetings and community sync, so
we should also record this?

For time, I would suggest moving it one hour ahead, e.g. 23:00 to 00:00
GTM+8, so that it's a little more friendly to people in asia?

On Wed, Oct 9, 2024 at 10:50 PM Kevin Liu  wrote:

> +1 on sync meeting for iceberg rust. I want to get involved and catch up
> on the recent developments. For reference, here's the doc we've been using
> for the pyiceberg sync
> https://docs.google.com/document/d/1oMKodaZJrOJjPfc8PDVAoTdl02eGQKHlhwuggiw7s9U
>
> Best,
> Kevin
>
> On Wed, Oct 9, 2024 at 5:30 AM Xuanwo  wrote:
>
>> Hi,
>>
>> I'm starting this thread to explore the idea of hosting an Iceberg Rust
>> Sync Meeting. In this meeting, we will discuss recent major changes,
>> pending PR reviews, and features in development. It will offer a space for
>> Iceberg Rust contributors to connect and become familiar with each other,
>> helping us identify and remove contribution barriers to the best of our
>> ability.
>>
>> Details about this meeeting:
>>
>> I suggest hosting our meeting at the same time of day, but one week
>> earlier than the Iceberg Sync Meeting. For example, if the Iceberg Sync
>> Meeting is scheduled for Thursday, October 24, 2024, from 00:00 to 01:00
>> GMT+8, the Iceberg Rust Sync Meeting would take place one week before, on
>> Thursday, October 17, 2024, from 00:00 to 01:00 GMT+8.
>>
>> I also suggest using the same Google Meet code (if possible) so we don't
>> get confused.
>>
>> These meetings will not be recorded, but I will take notes in a Google
>> Doc, similar to what we do in the Iceberg Sync Meeting.
>>
>> What are your thoughts? I'm open to other options as well.
>>
>> Xuanwo
>>
>> https://xuanwo.io/
>>
>

Re: [Discuss] Replace Hadoop Catalog Examples with JDBC Catalog in Documentation

2024-10-09 Thread Manu Zhang

I'd vote for JDBC catalog as it's simple for a quick-start guide. Setting
up a REST Service with docker image could be cumbersome.
We can have another page for REST Catalog.

Regards,
Manu

On Thu, Oct 10, 2024 at 2:50 AM Marc Cenac 
wrote:

> I support the idea of updating the docs to replace the Hadoop catalog
> example, but I'm wondering why not use a REST Catalog example instead?  I
> saw Ajantha proposed adding Docker images for a REST Catalog adapter [1] so
> we could potentially use this with a JDBC Catalog backed by SQLite file as
> a convenient quickstart example which shows a REST Catalog configuration.
> I'm thinking the REST Catalog would be preferred to the JDBC catalog as a
> best practice, since it's technology agnostic (on the server side) and the
> protocol allows for more advanced functionality (ie. multi table commits,
> credentials vending, etc).
>
> [1] https://lists.apache.org/thread/xl1cwq7vmnh6zgfd2vck2nq7dfd33ncq
>
> On Tue, Oct 8, 2024 at 1:18 PM Kevin Liu  wrote:
>
>> Hi all,
>>
>> I wanted to bring up a suggestion regarding our current documentation.
>> The existing examples for Iceberg often use the Hadoop catalog, as seen in:
>>
>>- Adding a Catalog - Spark Quickstart [1]
>>- Adding Catalogs - Spark Getting Started [2]
>>
>> Since we generally advise against using Hadoop catalogs in production
>> environments, I believe it would be beneficial to replace these examples
>> with ones that use the JDBC catalog. The JDBC catalog, configured with a
>> local SQLite database file, offers similar convenience but aligns better
>> with production best practices.
>>
>> I've created an issue [3] and a PR [4] to address this. Please take a
>> look, and I'd love to hear your thoughts on whether this is a direction we
>> want to pursue.
>>
>> Best,
>> Kevin Liu
>>
>> [1] https://iceberg.apache.org/spark-quickstart/#adding-a-catalog
>> [2]
>> https://iceberg.apache.org/docs/nightly/spark-getting-started/#adding-catalogs
>> [3] https://github.com/apache/iceberg/issues/11284
>> [4] https://github.com/apache/iceberg/pull/11285
>>
>>

Re: Iceberg View Spec Improvements

2024-10-09 Thread Walaa Eldin Moustafa

Thanks Ryan and everyone who left feedback on the doc. Let me clarify a few
things.

"Improving the spec" also includes making the implicit assumptions
explicitly stated in the spec.

Explicitly stating the assumptions is discussed under the "Portable table
identifiers" section in the doc. I am onboard with that direction.

I think this section encodes the suggestions shared by Steven and Russel as
well as the suggestion shared by you, and a couple more actually to ensure
it is comprehensive/unambiguous. I will reiterate the assumptions below. If
folks think we could go with those assumptions, I can create a PR to
reflect them on the spec.

* Engines must share the same default catalog names, ensuring that
partially specified SQL identifiers with catalog omitted are resolved to
the same fully specified SQL identifier across all engines.
* Engines must share the same default namespaces, ensuring that SQL
identifiers without catalog and namespace are resolved to the same fully
specified SQL identifier across all engines.
* All engines must resolve a fully specified SQL identifier to the same
storage table in the same storage catalog.

Thanks,
Walaa.

Re: [DISCUSS] Defining a concept of "externally owned" tables in the REST spec

2024-10-09 Thread Dennis Huo

Summarizing discussion from today's Iceberg Catalog Community Sync, here
were some of the key points:

   - General agreement on the need for some flavors of mechanisms for
   catalog federation in-line with this proposal
   - We should come up with a more fitting name for the endpoint other than
   "notifications"
  - Some debate over whether to just add behaviors to updateTable or
  registerTable endpoints; ultimately agreed that the behavior of these
  tables is intended to be fundamentally different, and want to avoid
  accidentally dangerous implementations, so it's better to have a
different
  endpoint
  - The idea of "Notifications" in itself is too general for this
  purpose, and we might want something in the future that is more in-line
  with much more generalized Notifications and don't want a conflict
  - This endpoint focuses on the semantic of "force-update" without a
  standard Iceberg commit protocol
   - The endpoint should potentially be a "bulk endpoint" since the use
   case is more likely to want to reflect batches at a time
  - Some debate over whether this is strictly necessary, and whether
  there would be any implicit atomicity expectations
  - For this use case the goal is explicitly *not* to perform a
  heavyweight commit protocol, so a bulk API is just an
optimization to avoid
  making a bunch of individual calls; some or all of the requests
in the bulk
  request could succeed or fail
   - The receiving side should not have structured failure modes relating
   to out-of-sync state -- e.g. the caller should not be depending on response
   state to determine consistency on the sending side
  - This was debated with pros/cons of sending meaningful response
  errors
  - Pro: Useful for the caller to receive some amount of feedback to
  know whether the force-update made it through, whether there are other
  issues preventing syncing, etc
  - Con: This is likely a slippery-slope of scope creep that still
  fundamentally only partially addresses failure modes; instead,
the overall
  system must be designed for idempotency of declared updated state and if
  consistency is desired, the caller must not rely only on responses to
  reconcile state anyways
   - We want to separate out the discussion of the relative merits of a
   push vs pull model of federation, so the merits of pull/polling/readthrough
   don't preclude adding this push-based endpoint
  - In-depth discussion of relative pros/cons, but agreed that one
  doesn't necessarily preclude the other, and this push endpoint targets a
  particular use case
   - Keep the notion of "external tables" only "implicit" instead of having
   to plumb a new table type everywhere (for now?)
  - We could document the intended behavior of tables that come into
  existence from this endpoint having a different "ownership" semantic than
  those created by createTable/registerTable, but it REST spec
itself doesn't
  necessarily need to expose any specific syntax/properties/etc about these
  tables

Thanks everyone for the inputs to the discussion! Please feel free to chime
in here if I missed anything or got anything wrong from today's discussion.


On Fri, Sep 20, 2024 at 9:05 PM Dennis Huo  wrote:

> Thanks for the input, Christian!
>
> I agree a comprehensive solution would likely require some notion of
> pull-based approaches (and even federated read-through on-demand). I see
> some pros/cons to both push and pull approaches, and it seems in part to
> relate to:
>
>- Whether only the "reflecting catalog" is an Iceberg REST server, or
>only the "owning catalog" is an Iceberg REST server, or both
>- Whether it's "easier" to put the complexity of
>connection/credential/state management in the "owning catalog" or in the
>"reflecting catalog"
>
> Though the "push" approach glosses over some of the potential complexity
> on the "owning catalog" side, it does seem like a more minimal starting
> point the doesn't require any additional state or data model within the
> Iceberg REST server, but can still be useful as a building block even where
> integrations aren't necessarily formally defined via the REST spec. For
> example, for a single-tenant internal deployment of an Iceberg REST server
> whose goal is to reflect a subset of a large legacy Hive metastore (which
> is the "owning" catalog in this case) where engines are using the Iceberg
> HiveCatalog, it may not be easy to retrofit a shim to expose a compatible
> "/changes" endpoint, but might be possible to add a
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java
> with some one-off code that pushes to an Iceberg REST endpoint.
>
> You raise a good point about failure scenarios though; this flavor of
> federation doesn't provide strong con

[PROPOSAL] Partially Loading Metadata - LoadTable V2

2024-10-09 Thread Haizhou Zhao

Hello Dev List,


I want to bring this proposal to discussion:


https://docs.google.com/document/d/1eXnT0ZiFvdm_Zvk6fLGT_UxVWO-HsiqVywqu1Uk8s7E/edit#heading=h.uad1lm906wz4



It proposes a new LoadTable API (branded LoadTableV2 at the moment) on REST
spec that allows partially loading table metadata. The motivation is to
stabilize and optimize Spark write workloads, especially on Iceberg tables
with big metadata (e.g. due to huge list of snapshot/metadata log,
complicated schema, etc.). We want to leverage this proposal to reduce
operational and monetary cost of Iceberg & REST catalog usages, and achieve
higher commit frequencies (DDL & DML included) on top of Iceberg tables
through REST catalog.



Looking forward to hearing feedback and discussions.


Thank you,

Haizhou

Re: Iceberg View Spec Improvements

2024-10-09 Thread rdb...@gmail.com

+1 for Steven's comment. There is already an implicit assumption that the
catalog names are consistent across engines. The best practice is to not
reference identifiers across catalogs, but there isn't much we can do about
the assumption here without rewriting SQL to fully qualify identifiers.

On Tue, Oct 8, 2024 at 11:16 PM Walaa Eldin Moustafa 
wrote:

> Hi Steven,
>
> Assumption 1 in "Portable SQL table identifiers" states:
>
> *All engines resolve a fully specified SQL identifier x.y.z to the same
> storage table identifier b.c in the same catalog a.*
>
> I think this assumption encodes the 4th assumption you shared. Assuming
> "x.y.z" resolves to "b.c" in storage catalog "a" across all engines is
> true, the following is also true:
>
> 1- When resolving to the same storage table identifier, the same catalog
> name must be used -- This is encoded by the fact that SQL catalog part is
> common as "x". (this addresses the first part of the 4th assumption you
> shared).
>
> 2- The mapping from "x.y.z" to "b.c" in storage catalog "a" can be done by
> federation, but that is an implementation detail. As long as we maintain
> the constraint that "x.y.x" is resolved consistently across engines to the
> same storage table, implementing this by federation or something else is
> irrelevant. (this addresses the second part of the 4th assumption you
> shared).
>
> Further, the Assumption 1 above is more comprehensive since it prescribes
> how to go about all of the SQL catalog part, namespace and table name, not
> only the catalog part.
>
> Thanks,
> Walaa.
>
> On Tue, Oct 8, 2024 at 9:41 PM Steven Wu  wrote:
>
>> Walaa, it doesn't seem to me that the doc captured Russel's idea. there
>> could be a new assumption
>> 4. If the catalog name is part of the table identifier, it should be
>> consistent across engines. catalog federation can achieve the
>> normalization/standardization of the catalog names
>>
>>
>>
>> On Tue, Oct 8, 2024 at 6:17 PM Walaa Eldin Moustafa <
>> wa.moust...@gmail.com> wrote:
>>
>>> Just opened Comment access to the doc. Link here again for convenience
>>> [1].
>>>
>>> [1]
>>> https://docs.google.com/document/d/1e5orD_sBv0VlNNLZRgUtalVUllGuztnAGTtqo8J0UG8/edit
>>>
>>> Thanks,
>>> Walaa.
>>>
>>>
>>> On Tue, Oct 8, 2024 at 10:42 AM Walaa Eldin Moustafa <
>>> wa.moust...@gmail.com> wrote:
>>>
 Thanks Steven! I think this fits in the framework of "portable table
 identifiers" in the doc. I have stated the assumptions that should be added
 to the Iceberg spec in that case (in the doc they are more abstract/generic
 than the version you shared). Would be great to provide your feedback on
 the assumptions in the doc.

 Thanks,
 Walaa.


 On Tue, Oct 8, 2024 at 9:40 AM Steven Wu  wrote:

> I like to follow up on Russel's suggestion of using a federated
> catalog for resolving the catalog name/alias problem. I think Russel's 
> idea
> is that the federated catalog standardizes the catalog names (for
> referencing). That could solve the problem.
>
> There are two cases/
> (1) single catalog: there is no need to include catalog name in the
> table identifier.
> (2) multiple catalogs (backends): the view and storage table should be
> defined in a federated catalog. the references to source tables should
> include the source catalog names, which are standardized by the federated
> catalog.
>
> Thanks,
> Steven
>
>
> On Mon, Oct 7, 2024 at 11:16 PM Walaa Eldin Moustafa <
> wa.moust...@gmail.com> wrote:
>
>> Hi Everyone,
>>
>> As part of our discussions on the Materialized View (MV) spec, the
>> topic of "SQL table identifiers" has been a constant source of 
>> complexity.
>> After several iterations, the community has agreed not to use SQL table
>> identifiers in the table-side representation of MVs. However, that still
>> does not preclude referencing SQL table identifiers in views since they 
>> are
>> integral to view definitions. Therefore, it’s crucial to properly design
>> this aspect of the spec in order to improve the view spec as well as
>> unblock the progress on the MV spec.
>>
>> I’ve outlined the current gaps in the view spec along with some
>> proposed ways to address them in this document [1]. It would be great to
>> get your feedback so we can simplify future discussions around views and
>> materialized views.
>>
>> Looking forward to hearing your thoughts.
>>
>> [1]
>> https://docs.google.com/document/d/1e5orD_sBv0VlNNLZRgUtalVUllGuztnAGTtqo8J0UG8/edit
>>
>> Thanks,
>> Walaa
>>
>>

[Discuss] Apache Iceberg 1.6.2 release because of Avro CVE ?

2024-10-09 Thread Ajantha Bhat

Hi everyone,
Since 1.7.0 is still a few weeks away,
how about releasing version 1.6.2 with just the Avro version update?
The current Avro version in 1.6.1 (1.11.3) has a recently reported CVE:
CVE-2024-47561 . [2]

I'm happy to coordinate and be the release manager for this.
[1]
https://github.com/apache/iceberg/blob/8e9d59d299be42b0bca9461457cd1e95dbaad086/gradle/libs.versions.toml#L28
[2] https://lists.apache.org/thread/c2v7mhqnmq0jmbwxqq3r5jbj1xg43h5x

- Ajantha

[DISCUSS] [PyIceberg] Use of asserts to "programming the negative space"

2024-10-09 Thread André Luis Anastácio

Hello Everyone,

I would like to open a discussion about using "assert" in some functions to 
promote a more defensive programming approach, ensuring that certain 
assumptions in our code are always validated.

The intention here is to propose a recommendation, not a strict rule. What are 
your thoughts on this?

In the Java implementation repository, we have some code that follows this 
approach in Scala code [1]. I'm not very familiar with Scala, so I’m not sure 
if this is a common pattern, but I believe we could improve the quality of our 
Python code by adopting a similar approach.
You can find a reference discussing this approach 
herehttps://ratfactor.com/cards/tiger-style

[1] 
https://github.com/search?q=repo%3Aapache%2Ficeberg+assert++language%3AScala&type=code&l=Scala

Best regards,

André Anastácio

Re: [DISCUSS] Iceberg Rust Sync Meeting

Re: [VOTE] Table V3 Spec: Row Lineage

Re: [Discuss] Iceberg community maintaining the docker images

Re: [Discuss] Replace Hadoop Catalog Examples with JDBC Catalog in Documentation

[DISCUSS] Iceberg Rust Sync Meeting

Re: [Discuss] Replace Hadoop Catalog Examples with JDBC Catalog in Documentation

Re: [DISCUSS] Iceberg Rust Sync Meeting

Re: [Discuss] Replace Hadoop Catalog Examples with JDBC Catalog in Documentation

Re: Iceberg View Spec Improvements

Re: [DISCUSS] Defining a concept of "externally owned" tables in the REST spec

[PROPOSAL] Partially Loading Metadata - LoadTable V2

Re: Iceberg View Spec Improvements

[Discuss] Apache Iceberg 1.6.2 release because of Avro CVE ?

[DISCUSS] [PyIceberg] Use of asserts to "programming the negative space"

14 matches

Site Navigation

Mail list logo

Footer information