Re: Spec change for multi-arg transform

2024-01-28 Thread YE
Thanks for Micah and Ryan's reply. As Szehon already pointed out, this change is to allow creation of *new* multi-arg transforms. I remember there's a discussion in the google doc whether targeting this as a `V3` spec change, it turns out that we may support this as long as we make sure old writer

Re: Spec change for multi-arg transform

2024-01-28 Thread YE
I finish the `bucketV2` spec change, WDYT, Ryan? And BTW, it might introduce additional overhead to support it in the V1 spec, I am aiming to support this in V2 by enabling a specific table property. YE 于2024年1月29日周一 11:27写道: > Thanks for Micah and Ryan's reply. > > As Szehon al

[VOTE] Release Apache Iceberg 1.2.0 RC1

2023-03-13 Thread Jack Ye
Hi Everyone, I propose that we release the following RC as the official Apache Iceberg 1.2.0 release. The commit ID is e340ad5be04e902398c576f431810c3dfa4fe717 * This corresponds to the tag: apache-iceberg-1.2.0-rc1 * https://github.com/apache/iceberg/commits/apache-iceberg-1.2.0-rc1 * https://gi

Re: [VOTE] Release Apache Iceberg 1.2.0 RC1

2023-03-19 Thread Jack Ye
+1 (binding) - validated signature, license, checksum, RAT - ran unit and AWS integration tests using Java 8 and 11 - ran manual tests with EMR 6.10 Spark 3.3 for read write operations, Glue integrations, branching and tagging related features Best, Jack Ye On Sun, Mar 19, 2023 at 8:06 AM

Re: [DISCUSS] Iceberg Community Guidelines

2023-03-19 Thread Jack Ye
Ye On Sun, Mar 19, 2023 at 10:21 AM Ryan Blue wrote: > +1 for the guidelines. > > I also like the idea of a jobs channel! > > On Fri, Mar 17, 2023 at 7:18 PM wrote: > >> Looks good to me , although for recruiting I think maybe we should have a >> dedicated jo

Re: [RESULT][VOTE] Release Apache Iceberg 1.2.0 RC1

2023-03-20 Thread Jack Ye
The vote result is: +1: 4 binding, 9 non-binding +0: 0 binding, 0 non-binding -1: 0 binding, 0 non-binding Therefore, the release candidate is passed! Best, Jack Ye On Mon, Mar 20, 2023 at 10:10 AM Russell Spitzer wrote: > +1 (binding) > - Validated sig, license, checksum, rat > -

[ANNOUNCE] Apache Iceberg release 1.2.0

2023-03-22 Thread Jack Ye
ution. This release can be downloaded from: https://www.apache.org/dyn/closer.cgi/iceberg/apache-iceberg-1.2.0/apache-iceberg-1.2.0.tar.gz Java artifacts are available from Maven Central. Thanks to everyone for contributing! Best, Jack Ye

Re: [Discuss] Allow all users who have Committed to the project to run CI without Approval

2023-03-29 Thread Jack Ye
+1 for "Only requires approval first time", thank you for submitting the ticket Russell! Best, Jack Ye On Wed, Mar 29, 2023 at 8:13 AM YoungXinLer-邮箱 <524022...@qq.com.invalid> wrote: > +1 for "Only requires approval first time" > > >

Re: C++/Rust SDK sync

2023-04-10 Thread Jack Ye
will loop in related people and see what time slots work for them. I will reply with our preferred time slots and a list of people with emails to invite later this week. Cheers, Jack Ye On Fri, Apr 7, 2023 at 5:47 AM Driesprong, Fokko wrote: > Hi Jan, > > Thanks for raising this, and

Re: Welcome new PMC members!

2023-04-12 Thread Jack Ye
Congratulations to everyone! Best, Jack Ye On Wed, Apr 12, 2023 at 10:13 AM Steve Zhang wrote: > Congratulations everyone! > > Thanks, > Steve Zhang > > > > On Apr 11, 2023, at 9:46 PM, Eduard Tudenhoefner > wrote: > > Congrats to everyone! > > On We

Re: [DISCUSS] Dropping Spark 2.4 support

2023-04-13 Thread Jack Ye
+1 for dropping 2.4 support Best, Jack Ye On Thu, Apr 13, 2023 at 10:59 AM Fokko Driesprong wrote: > Hi all, > > I'm working on moving to Hadoop 3.x > <https://github.com/apache/iceberg/pull/7114>, and one thing is that it > seems to be incompatible with Spark 2.4.

Re: [DISCUSS] Switch to JDK 11 for releases?

2023-04-22 Thread Jack Ye
Would it be an option to use --release flag to control the release target version, and publish 2 versions of the library to Maven, 1 for JDK8 and 1 for JDK11? Jack On Fri, Apr 21, 2023 at 5:17 PM Ryan Blue wrote: > Looks like Hive isn't quite done migrating to Java 11: > https://issues.apache.o

Re: [DISCUSS] Spark 3.1 support?

2023-04-22 Thread Jack Ye
Here was the original lifecycle of engine version support guideline we came up with: https://iceberg.apache.org/multi-engine-support/#current-engine-version-lifecycle-status I think we can at least mark 3.1 support as deprecated, which matches the situation here that "People who are still interest

Re: Seeking Input on Handling Ambiguity in Generating Changelogs

2023-04-23 Thread Jack Ye
enough, or might eventually correct itself. If we follow this logic, throwing an exception could be based on a config, just like in CDC we have upsert mode as a specific mode to turn on. Otherwise people developing a change data feed based on this might have to be blocked by such error until the tabl

Re: [DISCUSS] Switch to JDK 11 for releases?

2023-04-24 Thread Jack Ye
gt;> 8, then publishing some artifacts for JDK 11 would still mean only using >>> JDK 8 features. The source version is what we care about more, so if we >>> can't change it then we can't really do anything else. >>> >>> On Sat, Apr 22, 2023 a

Improve Change Data Capture Use Case for Iceberg

2023-04-26 Thread Jack Ye
meetings to discuss our thoughts about this topic afterwards. Doc link: https://docs.google.com/document/d/1kyyJp4masbd1FrIKUHF1ED_z1hTARL8bNoKCgb7fhSQ/edit# Best, Jack Ye

Re: Sequence number for ContentFiles

2023-04-26 Thread Jack Ye
+1 for using file sequence number. This work has been discussed for a long time but never got picked up, would be great if someone can drive it to completion. -Jack On Wed, Apr 26, 2023 at 12:03 PM Anton Okolnychyi wrote: > Will a clock skew cause any issues w.r.t. relying on the snapshot commi

Re: SQL Syntax for Time Travel on a Branch?

2023-04-26 Thread Jack Ye
We probably want to document these two different behaviors, and what we think is the correct expected behavior on the website. The question about time travel in a branch comes quite often since the related feature is publicly released. If some users really want the ancestor-based behavior, it is t

Re: Seeking Input on Handling Ambiguity in Generating Changelogs

2023-04-26 Thread Jack Ye
ult be like following two rows after the dedupe? Can you share any docs > or implementations for the dedupe process? > (1, 'a', DELETE) > (1, 'd', INSERT) > > Best, > > Yufei > > > On Sun, Apr 23, 2023 at 12:00 AM Jack Ye wrote: > >> When t

Re: [DISCUSS] Switch to JDK 11 for releases?

2023-04-26 Thread Jack Ye
g both JDK8 and JDK11 given the complexity we need to >> add. Our build is already complicated. >> >> Jack’s idea of using JDK11 with --release flag may be worth exploring. >> >> - Anton >> >> On Apr 24, 2023, at 10:11 AM, Jack Ye wrote: >> >>

Re: Java 1.3.0 around mid May?

2023-04-28 Thread Jack Ye
Looking at the milestone https://github.com/apache/iceberg/milestone/26, most of the PRs are in good progress except for https://github.com/apache/iceberg/issues/7449. Given the fact that many people are looking forward to using Spark 3.4 and Flink 1.17, it's probably worth having a quick release.

Re: Welcome new committers and PMC!

2023-05-09 Thread Jack Ye
Congratulations everyone! Well deserved! Best, Jack Ye On Fri, May 5, 2023 at 7:35 PM Jun H. wrote: > Congratulations, Amogh, Eduard, and Szehon! > > On May 5, 2023, at 12:37 AM, Mingliang Liu wrote: > >  > Congrats! All well deserved. > > On Thu, May 4, 2023 at 11:5

Re: Iceberg transaction support with spark sql

2023-05-26 Thread Jack Ye
You can find example here: https://iceberg.apache.org/docs/latest/branching/#audit-branch -Jack On Fri, May 26, 2023 at 1:55 AM Gaurav Agarwal wrote: > Any examples nif these scenarios wap or branch and merge support > > On Fri, May 26, 2023, 3:14 AM Russell Spitzer > wrote: > >> We also have

Re: 👋 Intro and question for the community

2023-05-30 Thread Jack Ye
Seems like a valuable and interesting product to use! Are there any restrictions on Apache side to use such product integration? Is it a free product for us to use? Best, Jack Ye On Tue, May 30, 2023 at 9:23 AM Brian Olsen wrote: > Great question! > > I asked the same questions to Co

Re: 👋 Intro and question for the community

2023-05-30 Thread Jack Ye
age but I > would mostly be using the one in Tabular. So the PMC one would be a shared > way to manage a view off the community for PMC work vs DevRel work. > > Does that make sense? > > On Tue, May 30, 2023 at 11:57 AM Jack Ye wrote: > >> Seems like a valuable and intere

Additional indexes for data files

2023-06-06 Thread Jack Ye
r should we extend Puffin to suppose these additional data file level indexes? I definitely want to do a more concrete design around this topic, but would like to know some general ideas first around this subject. Best, Jack Ye

Re: How to perform read/write operation for iceberg table present AWS Glue Catalog

2023-06-20 Thread Jack Ye
g connection: https://trino.io/docs/current/connector/iceberg.html#glue-catalog In addition, you can also use pyiceberg to connect to any Python engines and libraries: https://py.iceberg.apache.org/configuration/#glue-catalog Please let me know if you have any questions. Best, Jack Ye On

Re: How to perform read/write operation for iceberg table present AWS Glue Catalog

2023-06-21 Thread Jack Ye
le__;!!KpaPruflFCEp!jC92faNUvYxOxNNLhg0Rrkwz93XMgMXJtYX6himywHmeLa0C9GZQQIcXP1SlleBPSp3CBia4m95ZHt8WafjJ$> >.Please provide the access for it. > > > > Thanks and Regards > > Anush > > *From:* Jack Ye > *Sent:* Tuesday, June 20, 2023 10:55 PM > *To:* dev@iceb

Re: [VOTE] Release PyIceberg 0.4.0 RC2

2023-06-27 Thread Jack Ye
+1 (binding) Verified checksum, signature, license, test, test-s3. Ran basic checks for Glue catalog, also verified the row filter issue is fixed: [image: Screenshot 2023-06-27 at 10.55.47 PM.png] Best, Jack Ye On Tue, Jun 27, 2023 at 10:25 PM Jean-Baptiste Onofré wrote: > +1 (non bind

Re: [PROPOSAL] Preparing first Apache Iceberg Summit

2023-07-19 Thread Jack Ye
+1. Let me know when the meeting is! -Jack On Wed, Jul 19, 2023 at 8:54 AM Russell Spitzer wrote: > I would love to be involved if possible. I'm a bit short on time though > but can definitely contribute async time to planning. > > On Wed, Jul 19, 2023 at 9:35 AM Jean-Baptiste Onofré > wrote:

Discuss support for EXTERNAL/MANAGED semantics

2023-11-20 Thread Jack Ye
scussion remains vendor-neutral. Best, Jack Ye

Re: Invitation to contribute to OneTable

2023-12-05 Thread Jack Ye
me to envision a long-term roadmap that does not eventually make it a table format, with connectors and data maintenance features built directly against this internal model, which is kind of feels like what the commercial entity OneHouse is trying to do right now, but maybe I am wrong. What do you thin

Re: Discuss support for EXTERNAL/MANAGED semantics

2023-12-05 Thread Jack Ye
ed to connect to external systems for read purpose. I think this is a quite powerful feature to be added to the community. Please let me know what you think! Best, Jack Ye On Tue, Dec 5, 2023 at 4:02 PM Ryan Blue wrote: > Thanks for including a clear rationale for the proposal. I

Re: Invitation to contribute to OneTable

2023-12-06 Thread Jack Ye
think it's debatable whether advertising other projects is helpful >> or wanted here, but I'd rather not add to the noise either way. >> >> Ryan >> >> On Tue, Dec 5, 2023 at 8:36 PM Jack Ye wrote: >> >>> I recently did an analysis of the OneT

Re: [DISCUSS] JUnit5 and parameterized testing

2023-12-07 Thread Jack Ye
Looking at the referenced open issue on Junit5 side, it seems like at least the community is actively working on a @ParameterizedContainer solution. In that case I would +1 for option 2, since we have an easy path for moving to the official class-level annotation when that is ready. -Jack On Thu,

Re: [DISCUSS] Run GC with Catalog or Tables

2023-12-07 Thread Jack Ye
Running GC across the entire catalog has always been something I want to explore, because of one particular benefit: People typically use one S3 bucket for all tables in a catalog, and you can run a JOIN of the union of all files metadata table against the S3 inventory list

Re: [DISCUSS] Run GC with Catalog or Tables

2023-12-07 Thread Jack Ye
snapshot expiration relatively lightweight, and a batch execution could be less risky. -Jack On Thu, Dec 7, 2023 at 10:50 AM Jack Ye wrote: > Running GC across the entire catalog has always been something I want to > explore, because of one particular benefit: > > People typically use o

Isolation Analysis for Iceberg Multi-Table Transaction

2023-12-08 Thread Jack Ye
! Best, Jack Ye

Re: Proposal for REST APIs for Iceberg table scans

2023-12-11 Thread Jack Ye
Hi Ryan, thanks for the feedback! I was a part of this design discussion internally and can provide more details. One reason for separating the CreateScan operation was to make the API asynchronous and thus keep HTTP communications short. Consider the case where we only have GetScanTasks API, and

Re: Proposal for RESTful Data Operations

2023-12-11 Thread Jack Ye
> The proposal is to roll back rewrite commits, but that's already possible with the much simpler API that exists today. Based on my understanding of the proposal, I think it's more about the possibility of enabling other ways that do not require a full rollback. it's just currently we implemented

Re: Proposal for REST APIs for Iceberg table scans

2023-12-13 Thread Jack Ye
tc). >> >> Looking forward to discussing this tomorrow in the community sync >> <https://iceberg.apache.org/community/#iceberg-community-events>! >> >> Kind regards, >> Fokko >> >> >> >> Op ma 11 dec 2023 om 19:05 schreef Jack Ye : >&

Re: Proposal for REST APIs for Iceberg table scans

2023-12-13 Thread Jack Ye
After looking around, it seems like compared to OpenAPI, the AsyncAPI protocol (https://www.asyncapi.com/) could be a better option to describe streaming APIs. That might be one potential option, just put it out here. -Jack On Wed, Dec 13, 2023 at 11:52 AM Jack Ye wrote: > The current propo

Re: Proposal for RESTful Data Operations

2023-12-13 Thread Jack Ye
esides rolling back a commit? >> And why does that require 5 extra routes and metadata writes from the REST >> service? >> >> On Mon, Dec 11, 2023 at 11:27 AM Jack Ye wrote: >> >>> > The proposal is to roll back rewrite commits, but that's already >>

Re: Proposal for REST APIs for Iceberg table scans

2023-12-13 Thread Jack Ye
sumption on the service and creating responses that are too > large or take too long, but we can get around that by responding with a > code that instructs the client to use the CreateScan API like 413 > (Payload too large). I think that would allow simple clients to function > for all but r

Re: Proposal for REST APIs for Iceberg table scans

2023-12-13 Thread Jack Ye
://www.ietf.org/archive/id/draft-ietf-httpbis-safe-method-w-body-02.html>, > but I guess it isn't finalized yet? > > Just read them like you would a POST request that doesn't actually create > anything. > > On Wed, Dec 13, 2023 at 3:45 PM Jack Ye wrote: > >>

Pagination for List APIs in the REST spec

2023-12-14 Thread Jack Ye
enforce that the response returned for the paginated GET should be deterministic. Any thoughts on this? Best, Jack Ye

Re: [Discuss] Spark 3.2 support?

2023-12-14 Thread Jack Ye
+1 -Jack On Wed, Dec 13, 2023 at 11:25 PM Eduard Tudenhoefner wrote: > +1 on removing Spark 3.2 > > On Wed, Dec 13, 2023 at 8:01 PM Jean-Baptiste Onofré > wrote: > >> +1 >> >> Regards >> JB >> >> On Wed, Dec 13, 2023 at 7:10 PM Ajantha Bhat >> wrote: >> > >> > Hi All, >> > >> > In our recent

Re: Pagination for List APIs in the REST spec

2023-12-19 Thread Jack Ye
and offset would be >>>> preferred. If skipping someplace in the middle of the namespaces is >>>> required then I would suggest modelling those as first class query >>>> parameters (e.g. "startAfterNamespace") >>>> >>>> Cheers, &

Re: Pagination for List APIs in the REST spec

2023-12-19 Thread Jack Ye
;>>>>>> list namespaces) I think it is OK to have looser requirements. >>>>>>> >>>>>>> If we want to enforce that level of atomicity, we probably want to >>>>>>>> introduce another time travel query parameter (e.g. &g

Re: Proposal for REST APIs for Iceberg table scans

2023-12-19 Thread Jack Ye
e internally but send it as part of the shard ID. >> >> Shape of shard payload >> >> I think we have 2 general options depending on how strict we want to be. >> >>1. Require a standard shard definition >>2. Allow arbitrary JSON and leave it to the service

Re: Column-Level Key-Value Properties (Tags) in Iceberg

2024-01-03 Thread Jack Ye
able model. For example, we could add a "policy" field that contains sub-fields like the table's basic access permission (READ/WRITE/ADMIN), authorized columns, data filters, etc. I am not sure if Iceberg needs its own policy spec though, that might go a bit too far. Any thoughts? Be

Re: [PROPOSAL] Improvement on our PR flows

2024-01-03 Thread Jack Ye
+1, sounds like a good idea to clean up stale PRs. -Jack On Wed, Jan 3, 2024 at 9:52 AM Russell Spitzer wrote: > I definitely need something to keep emailing me, so I support this. > > On Wed, Jan 3, 2024 at 7:52 AM Jean-Baptiste Onofré > wrote: > >> Hi guys, >> >> We have several examples whe

Re: [DISCUSS] Iceberg community summit

2024-01-12 Thread Jack Ye
Thanks for continuing the effort! I definitely would like to volunteer if possible! Best, Jack Ye On Fri, Jan 12, 2024 at 9:49 AM Ryan Blue wrote: > Hi everyone, > > We've been having discussions about how to put together an Iceberg > conference or summit for this year and

Re: [ANNOUNCE] New committer: Honah J.

2024-01-12 Thread Jack Ye
Congratulations! Thanks for all the work in python! Best, Jack Ye On Fri, Jan 12, 2024 at 1:11 PM Fokko Driesprong wrote: > On behalf of the Iceberg PMC, I'm happy to announce that Honah has > accepted an invitation to become a committer on Apache (Py)Iceberg. > Welcome, and than

Re: Proposed PyIceberg logo art

2024-01-16 Thread Jack Ye
with the related licensing information. A related question, do we plan to create logos also for the other projects iceberg-rust and iceberg-go? Best, Jack Ye On Tue, Jan 16, 2024 at 1:09 AM Jean-Baptiste Onofré wrote: > That's a good point. That said, I don't see it as blocker as th

Re: Process for creating new Proposals

2024-01-17 Thread Jack Ye
+1, and we should add a new template for that in https://github.com/apache/iceberg/tree/main/.github/ISSUE_TEMPLATE. Best, Jack Ye On Wed, Jan 17, 2024 at 10:12 AM Yufei Gu wrote: > +1 Thanks Jan! > Yufei > > > On Wed, Jan 17, 2024 at 3:40 AM Brian Olsen > wrote: > &g

Re: Gravitino an Iceberg REST catalog service

2024-01-25 Thread Jack Ye
implementation in TestRESTCatalog. Should we think about making those components out of the tests into an iceberg-rest-server module for this use case, and merge with the implementation that Gravitino has? Best, Jack Ye On Thu, Jan 25, 2024 at 10:47 AM Yufei Gu wrote: > Thanks Justin for the shar

Re: Gravitino an Iceberg REST catalog service

2024-01-25 Thread Jack Ye
s` for their intended purpose. It > has opinions about how to run the actual HTTP service and people that agree > can use it. Other people could use `CatalogHandlers` to build on a > different foundation. > > On Thu, Jan 25, 2024 at 11:13 AM Jack Ye wrote: > >> Really cool

Partition column order in rewrite manifests

2024-01-30 Thread Jack Ye
column, which is not the first partition column, but actually the column that is read most frequently in every query. Translated to code, this means we can benefit from something like: SparkActions.rewriteManifests(table) .sort("b", "a") .commit() Any thoughts? Best, Jack Ye

Re: [PROPOSAL] Create user mailing list ?

2024-01-30 Thread Jack Ye
g list for that purpose. Best, Jack Ye On Tue, Jan 30, 2024 at 8:16 AM Jean-Baptiste Onofré wrote: > AFAIR, some ASF projects are using slackbot to receive users requests > from the mailing list and can send messages to the mailing list. > > Let me do a quick research and get back to

Re: [DISCUSS] Release new Iceberg docs site in the main repository

2024-01-30 Thread Jack Ye
Sorry for the late vote, +1 and thanks for the great work! -Jack On Tue, Jan 30, 2024 at 7:22 AM Eduard Tudenhoefner wrote: > +1, thanks for working on this Brian. > > On Tue, Jan 30, 2024 at 12:02 AM Ryan Blue wrote: > >> It looks like we have lazy consensus, so we'll go ahead with the >> swi

Re: Gravitino an Iceberg REST catalog service

2024-01-30 Thread Jack Ye
for unintended purposes and > it would become a distraction. > > What do you think about using the tests Jar for this? > > On Thu, Jan 25, 2024 at 12:48 PM Jack Ye wrote: > >> Yes, sorry I did not make it clear, I also agree it is not the right >> direction to invest

Re: Partition column order in rewrite manifests

2024-01-30 Thread Jack Ye
Jan 31, 2024 at 7:56 AM wrote: >>> >>>> Sounds like a reasonable thing to add? Maybe we could check cardinality >>>> to pick out the default order as well? >>>> Sent from my iPhone >>>> >>>> On Jan 30, 2024, at 3:50 PM, Jack Ye

Re: Proposal for REST APIs for Iceberg table scans

2024-01-30 Thread Jack Ye
gt; >>> *From: *Renjie Liu >>> *Reply-To: *"dev@iceberg.apache.org" >>> *Date: *Thursday, December 21, 2023 at 10:35 PM >>> *To: *"dev@iceberg.apache.org" >>> *Subject: *RE: [EXTERNAL] Proposal for REST APIs for Iceberg table scans >>> >>

Re: [DISCUSS] iceberg-rust 0.2.0 release

2024-01-31 Thread Jack Ye
Excited about the progress in Rust! +1 for releasing 0.2.0 -Jack On Wed, Jan 31, 2024 at 8:26 AM Ryan Blue wrote: > Thanks, Renjie! It's great to see all of the progress in Rust. I agree > with getting the code released and I'm looking forward to testing it out! > > On Wed, Jan 31, 2024 at 12:3

Re: Proposal for REST APIs for Iceberg table scans

2024-01-31 Thread Jack Ye
're overloading the endpoint two perform two >> distinctly different operations: distribute a plan and scan a plan. >> >> Changing the task-type then changes the behavior and the result. I feel >> it would be more straightforward to separate the distribute and sc

Re: Partition column order in rewrite manifests

2024-01-31 Thread Jack Ye
hema()); > Column partitionColumn = df.col("data_file.partition"); > Dataset transformedDF = manifestTransformFunction(df, > partitionColumn, numManifests); > return writeFunc.apply(transformedDF).collectAsList(); > }); > } > ``

Re: [DISCUSS] Change iceberg-rust CI Settings to only require approval for new github users

2024-01-31 Thread Jack Ye
+1! And I think we should also do that for iceberg-go -Jack On Wed, Jan 31, 2024, 5:42 PM Renjie Liu wrote: > +1 for this. > > It greatly improves the contributor's experience, especially for new and > non experienced contributors. > > On Thu, Feb 1, 2024 at 1:51 AM Fokko Driesprong wrote: > >

Re: Re: Partition column order in rewrite manifests

2024-02-01 Thread Jack Ye
Just created https://github.com/apache/iceberg/issues/9615 to track this. -Jack On Thu, Feb 1, 2024 at 11:43 AM Zach Dischner wrote: > That is a great idea! Would let anyone fine tune the sorting and > colocation behaviors they want for the metadata tree >

Re: [Discuss] Change iceberg-python and iceberg-go CI Settings to only require approval for first time contributors

2024-02-01 Thread Jack Ye
+1 -Jack On Thu, Feb 1, 2024 at 5:35 PM Honah J. wrote: > Hello everyone > > Inspired by our recent discussion regarding iceberg-rust's CI setting, I > am starting this thread to gather feedback on changing the CI settings for > iceberg-python and iceberg-go to only require approvals for new >

Re: [VOTE] Release Apache PyIceberg 0.6.0rc4

2024-02-09 Thread Jack Ye
+1 (binding) Checked license, signature, checksum Ran test and test-s3 in python 3.11 Ran manual testing with Glue catalog Thanks for the work, and happy Chinese New Year! Best, Jack Ye On Fri, Feb 9, 2024 at 4:34 PM Ryan Blue wrote: > +1 (binding) > > - Checked licenses,

Support permission concepts in REST spec

2024-02-13 Thread Jack Ye
e REST spec and related engine integration could increase enterprise adoption and help our vision of standardizing access through the REST interface. Would appreciate any thoughts in this domain! And if we have some general interest in this direction, I can put up a more detailed design doc. Best, Jack Ye

Deprecate DynamodbCatalog

2024-02-14 Thread Jack Ye
implementation and is maintaining their own catalog. Please comment if you have any production dependency or have any concern about deprecating for this catalog, and we can discuss the path forward. Best, Jack Ye

Re: [Discuss] add a new task-type to file scan task JSON serialization

2024-02-14 Thread Jack Ye
"? "base" is very Java abstract class specific. In fact, the StaticDataTask is not really scanning a file anyway, maybe we should just call these like file-scan-task, data-task, etc.? Best, Jack Ye On Wed, Feb 14, 2024 at 4:01 PM Ryan Blue wrote: > Thanks, Steven! Looks like

Re: [Discuss] add a new task-type to file scan task JSON serialization

2024-02-14 Thread Jack Ye
sed by Flink for checkpoint state. > so it is not purely a REST API thing. > > @Jack, Ryan also had the same suggestion in the PR comment. I have updated > the naming > > On Wed, Feb 14, 2024 at 4:08 PM Jack Ye wrote: > >> > It would fail if the FileScanTask i

Re: Deprecate DynamodbCatalog

2024-02-15 Thread Jack Ye
will revive this thread again to double check when we plan to remove it after 2 releases. Best, Jack Ye On Thu, Feb 15, 2024 at 2:33 AM Jean-Baptiste Onofré wrote: > Hi Jack, > > If we have a very low number of users and propose they move to another > catalog, that makes sense to m

Materialized view integration with REST spec

2024-02-16 Thread Jack Ye
t I have overlooked. I would greatly appreciate any thoughts about this! Best, Jack Ye

Re: [VOTE] Release Apache PyIceberg 0.6.0rc6

2024-02-19 Thread Jack Ye
+1 (binding) Checked signature, checksum, license Ran unit and integ tests with Python 3.11 Ran manual tests with Glue catalog Best, Jack Ye On Mon, Feb 19, 2024 at 4:35 AM Fokko Driesprong wrote: > +1 (binding) > > I've checked signatures and checksums, checked the licenses

Re: [VOTE] Release Apache Iceberg Rust 0.2.0 RC1

2024-02-19 Thread Jack Ye
+1 (binding) Verified checksum, signature, license, note, ASF header Ran build and test Checked no unexpected binary files Best, Jack Ye On Mon, Feb 19, 2024 at 2:33 AM Jean-Baptiste Onofré wrote: > +1 (non binding) > > I checked: > - checksum and signature are correct > -

Re: Materialized view integration with REST spec

2024-02-19 Thread Jack Ye
we should at >>> least consider directly describing the full metadata of the storage table >>> in Iceberg view, instead of pointing to a JSON file. >> >> >> Do you mean we need to add components like >> `LoadMaterializedViewResponse`, if so, I would +1 f

Re: [ANNOUNCE] Release Apache Iceberg Rust 0.2.0

2024-02-20 Thread Jack Ye
Congratulations on the first release! -Jack On Tue, Feb 20, 2024 at 2:32 AM Driesprong, Fokko wrote: > Hi all, > > The Apache Iceberg Rust community is pleased to announce that Apache > Iceberg Rust 0.2.0 has been released! > > Iceberg is a data access layer that allows users to easily and effi

Re: Materialized view integration with REST spec

2024-02-20 Thread Jack Ye
single thread instead of multiple threads going on at the same time. If we think this format is not effective, I propose that we create a new mv channel in Iceberg Slack workspace, and people interested can join and discuss all these points directly. What do we think? Best, Jack Ye On Mon, Feb 19

Re: Table Portability Proposal

2024-02-20 Thread Jack Ye
ation. Has this option been considered? I quickly scanned through the linked doc, it seems to be not discussed, but I might have missed it. Best, Jack Ye On Tue, Feb 20, 2024 at 9:21 AM Jean-Baptiste Onofré wrote: > Hi Ryan > > Ah ok, I thought that an Iceberg release is &

Re: Table Schema History Pruning

2024-02-20 Thread Jack Ye
metadata size. +1 for reopening the issue to discuss further. We should probably also make the title more specific than "The metadata file is too large". Best, Jack Ye On Tue, Feb 20, 2024 at 9:55 AM Sung Yun (BLOOMBERG/ 120 PARK) < syu...@bloomberg.net> wrote: > Hi Barron

Re: Support permission concepts in REST spec

2024-02-20 Thread Jack Ye
de a more detailed doc for us to review. Best, Jack Ye On Fri, Feb 16, 2024 at 10:29 AM Micah Kornfield wrote: > Hi Jack, > I think this is an interesting idea but I think there are some practical > concerns (I posted them inline). > > - general access patterns, like read-only

Re: Proposal for RESTful Data Operations

2024-02-20 Thread Jack Ye
endFiles. >> If you have a chance to review, I would appreciate any additional feedback >> you may have. >> >> https://github.com/apache/iceberg/pull/9292 >> >> Best, >> >> Drew >> >> On Fri, Jan 12, 2024 at 3:40 PM Drew wrote: >> &

Re: [DISCUSS] spec: remove the file scan task JSON serialization section from table spec

2024-02-21 Thread Jack Ye
Was there any prior discussions on devlist for adding it to the spec? Could you help link those conversations? Thanks, Jack Ye On Wed, Feb 21, 2024 at 1:05 PM Steven Wu wrote: > > In the recent PR review [1], Ryan and emkornfield has raised a question > why file scan task JSON seri

Re: Proposal for RESTful Data Operations

2024-02-21 Thread Jack Ye
uages simpler, but that’s not enough (yet) to convince > me that it is a good idea. > > I think there’s a strong case for append, but for deletes I don’t think we > need or want to go there. I certainly would not want to add delete > endpoints that would be misused. > > Ry

Re: [DISCUSS] spec: remove the file scan task JSON serialization section from table spec

2024-02-21 Thread Jack Ye
that Flink relies on the format to evolve in >> compatible ways across versions. I think that means that we don't make any >> guarantees about how it evolves and it can be safely removed since it is >> not a contract that we are committed to maintaining. >> >>

Re: Materialized view integration with REST spec

2024-02-21 Thread Jack Ye
Thanks everyone for the help in organizing the thoughts! I have moved the summary of everyone's comments here also to the doc that Jan linked under question 0. We can continue to have more discussions there and cast votes! Best, Jack Ye On Wed, Feb 21, 2024 at 12:14 PM Jan Kaul

Re: [VOTE] Release Apache Iceberg 1.5.0 RC3

2024-02-22 Thread Jack Ye
+1 (binding) Checked license, signature, checksum, build, test with Java17 Ran manual test with EMR 7.0 Spark 3.5 and Glue. Best, Jack Ye On Thu, Feb 22, 2024 at 7:58 PM Daniel Weeks wrote: > +1 (binding) > > Verified sigs/sums/license/build/test (Java 17) > > I also did manu

Re: [VOTE] Release Apache Iceberg 1.5.0 RC3

2024-02-23 Thread Jack Ye
with REST/JDBC >> catalog. >> >> Thanks, >> >> Amogh Jahagirdar >> >> On Thu, Feb 22, 2024 at 9:10 PM Jack Ye wrote: >> >>> +1 (binding) >>> >>> Checked license, signature, checksum, build, test with Java17 >>>

Re: Support permission concepts in REST spec

2024-02-26 Thread Jack Ye
l if the chain > of trust includes the ability to enforce constraints. Plus, constraints may > need to be known during job execution, not just at commit time, so it is > better to send them when loading a table. > > Ryan > > On Tue, Feb 20, 2024 at 1:44 PM Jack Ye wrote: &g

Re: Proposal for RESTful Data Operations

2024-02-26 Thread Jack Ye
e to find out what you’d change or what > specific features of an LSM approach you’re looking for that we can’t do > today. Maybe we should set up a time to talk with a group that wants to > work on this area? > > Ryan > > On Wed, Feb 21, 2024 at 3:55 PM Jack Ye wrote: > >&

Re: Deprecate DynamodbCatalog

2024-02-27 Thread Jack Ye
: >>> >>>> +1 for deprecation and removal by 2.0 version. >>>> >>>> - Ajantha >>>> >>>> On Sat, Feb 17, 2024 at 4:23 AM Daniel Weeks wrote: >>>> >>>>> +1 as well for deprecation >>>>> >&g

Re: Materialized view integration with REST spec

2024-02-28 Thread Jack Ye
Thanks Ryan for the help to trace back to the root question! Just a clarification question regarding your reply before I reply further: what exactly does the option "a combination of the two (i.e. commits are combined)" mean? How is that different from "a new metadata type"? -Jack On Wed, Feb

Re: Materialized view integration with REST spec

2024-02-28 Thread Jack Ye
ch, all the arguments you listed for the new metadata type still hold true, and it also "reuses existing metadata definitions" and can "fall back to simple views". -Jack On Wed, Feb 28, 2024 at 5:05 PM Jack Ye wrote: > Thanks Ryan for the help to trace back to the root questio

Inconsistency between REST spec and table/view spec

2024-02-29 Thread Jack Ye
alog. The same approach can be used to talk to HMS/Glue/JDBC/... while users will only interact with the RESTCatalog as the entry point. I think this can provide a good path forward overall for the catalog consolidation story, interested to know what others think. Best, Jack Ye

Re: Inconsistency between REST spec and table/view spec

2024-02-29 Thread Jack Ye
d cut away from file based metadata >> could bifurcate access to Iceberg data. There are also aspects to the spec >> that reference the metadata paths (like metadata log, though it's >> optional), but would likely need to be addressed. >> >> -Dan >> &

Re: Inconsistency between REST spec and table/view spec

2024-02-29 Thread Jack Ye
ementation doesn't follow the Iceberg spec for commit >> requirements, it's not compliant with the spec. There's no exemption that >> says if you're using REST you don't need to follow the spec. Why do you >> think that's the case? >> >

  1   2   3   4   >