Re: [VOTE] Release Apache Iceberg 1.2.0 RC1

2023-03-15 Thread Szehon Ho
Hi, One note, on this release, I ran some simple spark-SQL using a local Spark, like "insert into table select 1". I find any of these operation now spawns 200 executors and takes awhile to finish. |== Physical Plan ==\nAppendData org.apache.spark.sql.execution.datasources.v2.DataSourceV2Strat

Re: [Discuss] Allow all users who have Committed to the project to run CI without Approval

2023-03-29 Thread Szehon Ho
+1 Thanks Szehon On Wed, Mar 29, 2023 at 10:27 AM Eduard Tudenhoefner wrote: > +1 for "Only requires approval first time" > > On Wed, Mar 29, 2023 at 6:32 PM John Zhuge wrote: > >> +1 for "Only requires approval first time" >> >> On Wed, Mar 29, 2023 at 9:03 AM Ajantha Bhat >> wrote: >> >>> >

Re: [VOTE] Release Apache Iceberg 1.2.1 RC2

2023-04-06 Thread Szehon Ho
+1 (non-binding) Verified signature Verified checksum Verified License Built and ran tests Ran simple queries on spark 3.3. Thanks Dan for the release, Szehon On Thu, Apr 6, 2023 at 12:04 PM Daniel Weeks wrote: > Hi Everyone, > > I propose that we release the following RC as the official Apach

Re: Welcome new PMC members!

2023-04-12 Thread Szehon Ho
Nice, congratulations guys! Szehon On Wed, Apr 12, 2023 at 12:35 AM Gidon Gershinsky wrote: > Congrats Fokko, Steven, Yufei! > > Cheers, Gidon > > > On Wed, Apr 12, 2023 at 7:14 AM Ajantha Bhat > wrote: > >> Congratulations to all. >> >> On Wed, Apr 12, 2023 at 8:51 AM OpenInx wrote: >> >>> Co

Re: [Proposal] Partition stats in Iceberg

2023-05-02 Thread Szehon Ho
Yea I agree, I had a handy query for the last update time of partition. SELECT e.data_file.partition, MAX(s.committed_at) AS last_modified_time FROM db.table.snapshots s JOIN db.table.entries e WHERE s.snapshot_id = e.snapshot_id GROUP BY by e.data_file.partition It's a bit lengthy currentl

Re: [Proposal] Partition stats in Iceberg

2023-05-02 Thread Szehon Ho
g forward to the work in the phase 2 implementation. > Let me know if I can help, thanks. > > On Tue, May 2, 2023 at 4:28 PM Szehon Ho wrote: > >> Yea I agree, I had a handy query for the last update time of partition. >> >> SELECT >> >> e.data_file.partition, &

Re: tradeoffs between serializable vs snapshot isolation for single writer

2023-05-04 Thread Szehon Ho
Hi, I believe it only matters if you have conflicting commits. For single writer case, I think you are right and it should not matter, so you may save very slightly in performance by turning it to Snapshot Isolation. The checks are metadata checks though, so I would think it will not be a sig

Re: tradeoffs between serializable vs snapshot isolation for single writer

2023-05-04 Thread Szehon Ho
Whoops, I didn’t see Ryan answer already. > On May 4, 2023, at 3:18 PM, Szehon Ho wrote: > > Hi, > > I believe it only matters if you have conflicting commits. For single writer > case, I think you are right and it should not matter, so you may save very > sligh

Re: Welcome new committers and PMC!

2023-05-05 Thread Szehon Ho
Thanks all, really appreciate it, and congrats to Eduard and Amogh ! Szehon On Fri, May 5, 2023 at 12:37 AM Mingliang Liu wrote: > Congrats! All well deserved. > > On Thu, May 4, 2023 at 11:50 PM Eduard Tudenhoefner > wrote: > >> Thanks everyone, and also congrats to Amogh and Szehon! >> >> On

Re: [VOTE] Release Apache Iceberg 1.3.0 RC0

2023-05-24 Thread Szehon Ho
+1 (binding) 1. verify signatures 2. verify checksum 3. verify license documentation 4. build and run tests 5. Ran simple tests on Spark 3.4 - Create simple table and check metadata tables - Ran 'delete from' statement to generate position delete, and run rewrite_position_delete Thanks Szehon On

Re: [DISCUSS] Default format version for new tables?

2023-05-24 Thread Szehon Ho
Hi, I'm +1 to making v2 the default, say after this release. It seems most of the features brought up as concerns on Spark side in the thread Gabor linked have been implemented (like position delete lifecycle). But Anton's point is also good. Even if some delete file features are missing, V2 is

Re: Iceberg old partition gc

2023-06-02 Thread Szehon Ho
I think this violates Iceberg’s assumption of immutable snapshots. That would require modifying the old snapshot to no longer point to those gc’ed data files, else not sure how you can time-travel to read from that snapshot, if some of its files are deleted? That being said, I also had this thoug

Re: Iceberg old partition gc

2023-06-02 Thread Szehon Ho
t doing the delete. You can then recover > the snapshot if you happen to have accidentally TTL'd a partition. > > On Fri, Jun 2, 2023 at 8:51 AM Szehon Ho wrote: > >> I think this violates Iceberg’s assumption of immutable snapshots. That >> would require modifying the

Re: Iceberg old partition gc

2023-06-03 Thread Szehon Ho
den the > metadata system. tagging can extend the history with selective snapshots. > > It seems that you are saying that purging actions of old partitions are > creating new snapshots, which are taking up some space in the snapshot > history. But if snapshot expiration is time based (

Re: allowing configs to be specified in SQLConf for Spark reads/writes

2023-06-21 Thread Szehon Ho
Hi, Yea, its definitely an issue. Fwiw, I was looking at reviving the old effort in Spark to pass in configs dynamically in Spark SQL statement, which is probably the cleanest solution. (https://github.com/apache/spark/pull/34072 was the old effort, and I made https://github.com/apache/spark/pul

Re: allowing configs to be specified in SQLConf for Spark reads/writes

2023-06-26 Thread Szehon Ho
> it. > > Thanks for reviving the effort. > Manu > > Szehon Ho 于2023年6月22日 周四00:45写道: > >> Hi, >> >> Yea, its definitely an issue. >> >> Fwiw, I was looking at reviving the old effort in Spark to pass in >> configs dynamically in Spark SQL s

[DISCUSS] Apache Iceberg Release 1.3.1

2023-07-06 Thread Szehon Ho
Hi I wanted to start a discussion for whether its the right time for 1.3.1, a patch release of 1.3.0. It was started based on the issue found by Xiangyang (@ConeyLiu) : https://github.com/apache/iceberg/pull/7931#pullrequestreview-1507935277. Do people have any other bug fixes that should be inc

Re: [DISCUSS] Apache Iceberg Release 1.3.1

2023-07-07 Thread Szehon Ho
Thu, Jul 6, 2023 at 9:02 PM Jean-Baptiste Onofré >> wrote: >> >>> Hi, >>> >>> It sounds good to me to have 1.3.1. >>> >>> Thanks ! >>> Regards >>> JB >>> >>> On Fri, Jul 7, 2023 at 12:53 AM Szehon Ho >&g

Re: [DISCUSS] Apache Iceberg Release 1.3.1

2023-07-10 Thread Szehon Ho
that we can start backporting those bug fixes. > > Eduard > > On Fri, Jul 7, 2023 at 6:52 PM Szehon Ho wrote: > >> Thanks a lot Eduard! I think https://github.com/apache/iceberg/pull/7933 >> is also a good candidate as well. >> >> Thanks, >> Szehon

Re: [DISCUSS] Apache Iceberg Release 1.3.1

2023-07-12 Thread Szehon Ho
eberg/milestones/Iceberg%201.3.1 Thanks Szehon On Mon, Jul 10, 2023 at 11:14 AM Szehon Ho wrote: > Thanks Eduard! Merged all your backport prs, I will commit the last one > probably tomorrow and then we can start the release. > > Thanks > Szehon > > On Sun, Jul 9, 2023 at 11

Re: [DISCUSS] Apache Iceberg Release 1.3.1

2023-07-14 Thread Szehon Ho
it would be great to backport this to 1.3.x as > well. > > Kind regards, > Fokko > > Op wo 12 jul 2023 om 22:09 schreef Szehon Ho : > >> Hi guys >> >> Just an update on this. Another issue came up about the new 1.3.0 >> function rewrite_position_deletes (

[VOTE] Release Apache Iceberg 1.3.1 RC1

2023-07-17 Thread Szehon Ho
Hi Everyone, I propose that we release the following RC as the official Apache Iceberg 1.3.1 release. The commit ID is 62c34711c3f22e520db65c51255512f6cfe622c4 * This corresponds to the tag: apache-iceberg-1.3.1-rc1 * https://github.com/apache/iceberg/commits/apache-iceberg-1.3.1-rc1 * https://gi

Re: [VOTE] Release Apache Iceberg 1.3.1 RC1

2023-07-24 Thread Szehon Ho
ionStage.execute(ApiCallAttemptMetricCollectionStage.java:36) >> at >> software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:81) >> ... 23 more >> >> Best, >> >> Yufei >> >&g

[PASSED][VOTE] Release Apache Iceberg 1.3.1 RC1

2023-07-24 Thread Szehon Ho
Szehon On Mon, Jul 24, 2023 at 2:21 PM Szehon Ho wrote: > +1 (binding) > > 1. Verify signatures > 2. Verify checksums > 3. Verify license documentation > 4. Built and ran tests, only failure is TestS3RestSigner > 5. Ran simple queries against Spark 3.4 > > Thanks > S

[ANNOUNCE] Apache Iceberg release 1.3.1

2023-07-25 Thread Szehon Ho
I'm pleased to announce the release of Apache Iceberg 1.3.1! Apache Iceberg is an open table format for huge analytic datasets. Iceberg delivers high query performance for tables with tens of petabytes of data, along with atomic commits, concurrent writes, and SQL-compatible table evolution. This

Re: Proposal to fix the docs - this time it'll be different

2023-07-27 Thread Szehon Ho
Hi I'm ok with putting things back in Iceberg repo, it gets more visbility on prs. I guess it used to be a bit distracting, but now with more projects in Iceberg (pyiceberg, rust) we have to anyway use tags to filter through all the mails. Just wanted to +1 on Fokko/Ryan suggestion to avoid vers

Table owned locations

2023-08-29 Thread Szehon Ho
Hi all, As you know, there is a recurring Iceberg issue where delete orphan file operations may inadvertently delete other table's data, if they are misconfigured to have the same location. A while back, Anton had a proposal for 'owned.locations' in: https://github.com/apache/iceberg/issues/4159

Re: Spec change for multi-arg transform

2024-01-28 Thread Szehon Ho
Hi, This would not be retrofitting existing partition transforms, but just allowing for the creation of new multi-arg transforms. Is the concern that some implementations are never expecting new transforms to be added? Old implementations would indeed not be able to read Iceberg tables created w

Re: Spec change for multi-arg transform

2024-01-30 Thread Szehon Ho
ference is that >>>> for step 2, we typically just build one reference implementation in the >>>> Java library. We do vote on the large spec updates, but in this case you >>>> haven't seen one since we haven't built the reference implementation yet. >>&

Re: Spec change for multi-arg transform

2024-01-30 Thread Szehon Ho
Sorry I may have misunderstood the statement and maybe this is specific to multi-arg transform, in any case let's get a spec pr earlier in to discuss/specify behavior for V1-2 vs 3. Thanks Szehon On Tue, Jan 30, 2024 at 9:23 AM Szehon Ho wrote: > Thanks all for the discussion. >

Re: Materialized view integration with REST spec

2024-02-19 Thread Szehon Ho
Hi, Great to see more discussion on the MV spec. Actually, Jan's document "Iceberg Materialized View Spec" has been organized , with a "Design Questions" section to track these debates, and it would be nice to centr

Re: Materialized view integration with REST spec

2024-02-21 Thread Szehon Ho
f we think >>> this format is not effective, I propose that we create a new mv channel in >>> Iceberg Slack workspace, and people interested can join and discuss all >>> these points directly. What do we think? >>> >>> Best, >>> Jack Ye >

Re: Materialized view integration with REST spec

2024-02-22 Thread Szehon Ho
o keep these separate from discussions about single points >>>> so that they can be persisted in the document. >>> >>> >>> Not sure if it helpful, but I added voting chips Question 0, as maybe an >>> easier way to keep track of votes. If it is helpful

Re: Materialized view integration with REST spec

2024-02-29 Thread Szehon Ho
Hi Yes I mostly agree with the assessment. To clarify a few minor points. is a materialized view a view and a separate table, a combination of the > two (i.e. commits are combined), or a new metadata type? For 'new metadata type', I consider mostly Jack's initial proposal of a new Catalog MV o

Re: [VOTE] Release Apache Iceberg 1.5.0 RC4

2024-03-01 Thread Szehon Ho
+1 (binding) - Verified signature - Verified checksum - RAT check - Compiled - Manually ran basic queries on Spark 3.5 On Fri, Mar 1, 2024 at 6:13 AM Fokko Driesprong wrote: > +1 (binding) > > - Checked checksum and signature > - Ran a modified version of dbt-spark to take advantage of the view

Re: New committer: Bryan Keller

2024-03-05 Thread Szehon Ho
Congratulations Bryan, well deserved, great work on Iceberg ! On Tue, Mar 5, 2024 at 8:14 AM Jack Ye wrote: > Congrats Bryan! > > -Jack > > On Tue, Mar 5, 2024 at 7:33 AM Amogh Jahagirdar wrote: > >> Congratulations Bryan! Very well deserved, thank you for all your >> contributions! >> >> On Tu

Re: [VOTE] Release Apache Iceberg 1.5.0 RC6

2024-03-08 Thread Szehon Ho
+1 (binding) * Verified signature * Verified checksum * RAT check * built JDK 11 * Ran basic tests on Spark 3.5 Thanks Szehon On Fri, Mar 8, 2024 at 5:50 PM Amogh Jahagirdar wrote: > +1 non-binding > > Verified signatures,checksums,RAT checks, build, and tests with JDK11. I > also ran ad-hoc t

Re: New committer: Renjie Liu

2024-03-11 Thread Szehon Ho
Congratulations! On Mon, Mar 11, 2024 at 12:43 PM Jack Ye wrote: > Congratulations Renjie! > > Best, > Jack Ye > > On Mon, Mar 11, 2024, 8:24 AM Ryan Blue wrote: > >> Congratulations, Renjie! Thanks for all your contributions! >> >> On Mon, Mar 11, 2024 at 12:52 AM Eduard Tudenhoefner >> wrote

Re: Materialized view integration with REST spec

2024-03-22 Thread Szehon Ho
n 6: New MV spec with table and view metadata >>>>>>>>>>>>> >>>>>>>>>>>>> I originally excluded option 2 because I think it does not >>>>>>>>>>>>> align

Re: Materialized view integration with REST spec

2024-03-22 Thread Szehon Ho
s back? > > On Fri, Mar 22, 2024 at 10:35 AM Szehon Ho > wrote: > >> Hi >> >> My understanding was last time it was still unresolved, and the action >> item was on Jack and/or/ Jan to make a shorter document. I think the >> debate now has boiled down to Ryan&

Re: [VOTE] Release Apache Iceberg 1.5.1 RC0

2024-04-22 Thread Szehon Ho
+1 (binding) * Verify signature * Verify checksum * Verify licenses * Build and run basic test with Spark 3.5 Thanks Szehon On Sun, Apr 21, 2024 at 11:45 PM Ajantha Bhat wrote: > +1 (non-binding) > > * validated checksum and signature > * checked license docs & ran RAT checks > * ran build and

Re: [Proposal] Add support for Materialized Views in Iceberg

2024-04-22 Thread Szehon Ho
+1 for the approach given it reduces the work. On this, as it exposes storage tables to user catalog, I was mainly thinking we should have a common suffix/naming pattern for storage table across catalog. The netflix approach sounds good to me. Hope we can continue the proposal, as there's still

[Discuss] Geospatial Support

2024-05-01 Thread Szehon Ho
Hi everyone, We have created a formal proposal for adding Geospatial support to Iceberg. Please read the following for details. - Github Proposal : https://github.com/apache/iceberg/issues/10260 - Proposal Doc: https://docs.google.com/document/d/1iVFbrRNEzZl8tDcZC81GFt01QJkLJsI9E2NBOt2

Re: Materialized Views: Next Steps

2024-05-09 Thread Szehon Ho
Thanks Walaa for driving it forward, looking forward to thinking about implementation of Materialized Views. I see Jan's point, the PR spec change is similar but does not seem to be completely aligned with the Draft Spec in the design doc: https://docs.google.com/document/d/1UnhldHhe3Grz8JBngwXPA6

Re: Materialized Views: Next Steps

2024-05-09 Thread Szehon Ho
t by now. If we agree, we can continue the > discussion on the PR, else, we can create a doc. > > Thanks, > Walaa. > > > On Thu, May 9, 2024 at 4:39 PM Szehon Ho wrote: > >> Thanks Walaa for driving it forward, looking forward to thinking about >> implementation

Re: Materialized Views: Next Steps

2024-05-09 Thread Szehon Ho
ent/d/1zg0wQ5bVKTckf7-K_cdwF4mlRi6sixLcyEh6jErpGYY/edit?pli=1&disco=AAABK7e3QB4 > [2] > https://docs.google.com/document/d/1zg0wQ5bVKTckf7-K_cdwF4mlRi6sixLcyEh6jErpGYY/edit?pli=1&disco=AAABIonvCGE > > Thanks, > Walaa. > > > On Thu, May 9, 2024 at 5:49 PM Szehon Ho wro

Re: Materialized Views: Next Steps

2024-05-10 Thread Szehon Ho
apache/spark/blob/2df494fd4e4e64b9357307fb0c5e8fc1b7491ac3/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewInfo.java#L45 > > Thanks, > Walaa. > > On Thu, May 9, 2024 at 11:30 PM Szehon Ho wrote: > >> Hi Walaa >> >> As there may be confus

Re: [Discuss] Heap pressure with RewriteFiles APIs

2024-05-21 Thread Szehon Ho
Hi Naveen Yes it sounds like it will help to disable metrics for those columns? Iirc, by default it manifest entries have metrics at 'truncate(16)' level for 100 columns, which as you see can be quite memory intensive. A potential improvement later also is to have the ability to remove counts by

Re: [Discuss] Geospatial Support

2024-05-29 Thread Szehon Ho
two > columns from different data providers. > > To address this we would like to propose including the option to specify > the SRS with only a SRID in phase 1. The query engine may choose to treat > it as opaque identified or make a look-up in the EPSG database of > supported. >

Re: [Discuss] Geospatial Support

2024-06-05 Thread Szehon Ho
not many libs >> can parse projjson. >> >> @Szehon Is there a way that we can support both SRID and PROJJSON in Geo >> Iceberg? >> >> It is also worth noting that, although there are many libs that can parse >> SRID and perform look-up in the EPSG database,

Re: [Discuss] Geospatial Support

2024-06-18 Thread Szehon Ho
ote: >> >>> > The min/max stats are discussed in the doc (Phase 2), depending on the >>> non-trivial encoding. >>> >>> Just want to add that min/max stats filtering could be supported by file >>> format natively. Adding geometry type to parquet spec >>

Re: Agenda Community Sync 19th June

2024-06-18 Thread Szehon Ho
Hi guys, The sync is Juneteenth (US federal holiday), so I think some folks on this side may miss, FYI PS (at least from my side) one highlight is the longstanding 1k column bug is finally fixed (at least partially) in https://github.com/apache/iceberg/pull/10020 Thanks Szehon On Tue, Jun 18, 2

Re: Making the NDV property required for theta sketch blobs in Puffin

2024-06-21 Thread Szehon Ho
It makes sense to me, normally changing optional -> required would probably require a version bump, but maybe it is ok here as it is a relatively new format, afaik adapted by Trino which already sets this field, but let's see if anyone disagrees. Thanks Szehon On Fri, Jun 21, 2024 at 3:35 PM huax

Re: Feedback Collection: Bylaws in Iceberg

2024-06-24 Thread Szehon Ho
Hi Also copying my previous response in private. Hi > Thanks Jack for taking the time for this doc. While the Iceberg community > and PMC so far has been one of the most collaborative, and I have > personally the utmost respect for those that laid the groundwork without > which we would not be h

Re: [Discuss] Geospatial Support

2024-06-26 Thread Szehon Ho
ored as a string, Iceberg cannot read it. This should be ok, as we only need this for XZ2 transform, where the user already passes in the info from CRS (up to user to make sure these align). Thanks Szehon On Tue, Jun 18, 2024 at 12:23 PM Szehon Ho wrote: > Jia and I will sync with t

Re: [Proposal] REST Spec: Server-side Metadata Tables

2024-07-03 Thread Szehon Ho
Yes, I was chatting with Yufei about this, in the first glance I agree this would be nice to have. I always thought that metadata tables are important enough to spec somewhere, and I think this is a nice place to do it. There seems to be some overlap with existing calls (ie, you can get snapshots

Re: [Proposal] REST Spec: Server-side Metadata Tables

2024-07-03 Thread Szehon Ho
file removal without removing all the snapshot > information yet. > Please help my understand the reasoning behind these tradeoffs. > > Best > PF > > > > > On Thu, 4 Jul 2024 at 02:26, Szehon Ho <mailto:szehon.apa...@gmail.com>> wrote: >> Yes, I was ch

[DISCUSS] Extend Snapshot Metadata Lifecycle

2024-07-05 Thread Szehon Ho
Hi folks, I would like to discuss an idea for an optional extension of Iceberg's Snapshot metadata lifecycle. Thanks Piotr for replying on the other thread that this should be a fuller Iceberg format change. *Proposal Summary* Currently, ExpireSnapshots(long olderThan) purges metadata and delet

Re: [DISCUSS] Extend Snapshot Metadata Lifecycle

2024-07-08 Thread Szehon Ho
implementations. Also, the type >>> of metadata tracked can differ depending on the use case. For example, >>> while LakeChime retains partition and operation type metadata, it does not >>> track file-level metadata as there was no specific use case for that. >>&

Re: [DISCUSS] Extend Snapshot Metadata Lifecycle

2024-07-09 Thread Szehon Ho
t even want to know >> of. If one can expire a snapshot from the middle of the history, that would >> be nice, so users would see only S1/S2/S4. The only downside is that >> reading S2 is less performant than reading S3, but IMHO this could be >> acceptable for having onl

Re: allowing configs to be specified in SQLConf for Spark reads/writes

2024-07-09 Thread Szehon Ho
Hi, Just FYI, good news, this change is merged on the Spark side : https://github.com/apache/spark/pull/46707 (its the third effort!). In next version of Spark, we will be able to pass read properties via SQL to a particular Iceberg table such as SELECT * FROM iceberg.db.table1 WITH (`locality`

Re: allowing configs to be specified in SQLConf for Spark reads/writes

2024-07-09 Thread Szehon Ho
e work on supporting DELETE/UPDATE/MERGE in > the DataFrame API? > Thanks, > Wing Yew > > > On Tue, Jul 9, 2024 at 10:05 PM Szehon Ho wrote: > >> Hi, >> >> Just FYI, good news, this change is merged on the Spark side : >> https://github.com/apache/spark/p

Re: [VOTE] spec: remove the JSON spec for content file and file scan task sections

2024-07-11 Thread Szehon Ho
+1 Thanks Szehon On Thu, Jul 11, 2024 at 11:02 AM Daniel Weeks wrote: > +1 (binding) > > On Thu, Jul 11, 2024 at 10:54 AM Anurag Mantripragada > wrote: > >> +1 (non-binding) .Thanks Steve >> >> >> Anurag Mantripragada >> >> On Jul 11, 2024, at 10:27 AM, Yufei Gu wrote: >> >> +1 (binding) Than

Re: [DISCUSS] Extend Snapshot Metadata Lifecycle

2024-07-16 Thread Szehon Ho
lumns). >>> >>> How long are we going to keep the expired snapshot references by >>> default? If it is months/years, it can have major implications on the query >>> performance of metadata tables (like snapshots, all_*). >>> >>> I assume it will also have

Re: [DISCUSS] DROP PARTITION in Spark

2024-07-17 Thread Szehon Ho
Hi Gabor I'm neutral for this, but can be convinced. My initial thoughts is that there would be no way to have ADD PARTITION (I assume old Hive workloads would rely on this), and these are not ANSI SQL standard statements as Spark moves to that direction. The second point of guaranteeing a metad

Re: Building with JDK 21

2024-07-22 Thread Szehon Ho
Thanks Piotr for driving this, late +1 to add JDK 21 support and your plan for spotless. It seems ok to me too to bite the bullet and move to newer spotless (disabling spotless for JDK8 builds) post 1.6, but looks like the discussion happened and I'm fine either way. Thanks! Szehon On Mon, Jul 2

Re: Dropping JDK 8 support

2024-07-22 Thread Szehon Ho
+1 for dropping JDK 8 in Iceberg 2.0. I also wonder the same thing as Huaxin (sorry if I missed a previous thread on Iceberg 2.0 plan). Also as Huaxin has discovered in Spark 4.0 Support PR , looks like we may have to drop Java8 first in Spark 4.0 mod

Re: [VOTE] Drop Java 8 support in Iceberg 1.7.0

2024-07-26 Thread Szehon Ho
+1 (binding) Thanks Szehon On Fri, Jul 26, 2024 at 8:55 AM Steven Wu wrote: > +1 (binding) > > I would also suggest keeping the vote open for 7 days for a larger > decision like this. > > > On Fri, Jul 26, 2024 at 8:50 AM Ryan Blue > wrote: > >> +1 >> >> On Fri, Jul 26, 2024 at 8:42 AM Russell

Re: [DISCUSS] Guidelines for committing PRs

2024-07-29 Thread Szehon Ho
Hi, Also if I read it correctly, I think this proposal imposes the following workflows in "spec" folders : 1. Large and functional changes. These redirect to Iceberg improvement proposals, which ends in code-modification vote 2. bug-fixes or clarification, which is specified to require

Re: [DISCUSS] Guidelines for committing PRs

2024-07-29 Thread Szehon Ho
t 1:53 PM Szehon Ho wrote: > Hi, > > Also if I read it correctly, I think this proposal imposes the following > workflows in "spec" folders : > >1. Large and functional changes. These redirect to Iceberg >improvement proposals, which ends in code-modi

Re: [DISCUSS] adoption of format version 3

2024-07-31 Thread Szehon Ho
Sorry I missed the sync this morning (sick), I'd like to push for geo too. I think on this front as per the last sync, Ryan recommended to wait for Parquet support to land, to avoid having two versions on Iceberg side (Iceberg-native vs Parquet-native). Parquet support is being actively worked on

Re: [DISCUSS] adoption of format version 3

2024-08-06 Thread Szehon Ho
10:19 PM Micah Kornfield < >>>>> emkornfi...@gmail.com> wrote: >>>>> >>>>>> It sounds like most of the opinions so far are waiting for the scope >>>>>> of work to finish before finalizing the specification. >>>>>>

Re: Welcome Péter, Amogh and Eduard to the Apache Iceberg PMC

2024-08-13 Thread Szehon Ho
Congratulations all, very well deserved! Thanks Szehon On Tue, Aug 13, 2024 at 10:25 PM Russell Spitzer wrote: > Hi Y'all, > > It is my pleasure to let everyone know that the Iceberg PMC has voted to > have several talented individuals join us. > > So without further ado, please welcome Péter V

Re: [Discuss] Geospatial Support

2024-08-20 Thread Szehon Ho
). Thanks, Szehon On Wed, Jun 26, 2024 at 7:29 PM Szehon Ho wrote: > Hi > > It was great to meet in person with Snowflake engineers and we had a good > discussion on the paths forward. > > Meeting notes for Snowflake- Iceberg sync. > >- Iceberg proposed Geometry type d

Re: Welcoming Yan Yan as a new committer!

2021-03-24 Thread Szehon Ho
Nice, congratulations! > On 24 Mar 2021, at 11:37, Marton Bod wrote: > > Congratulations, well done! > > On Wed, 24 Mar 2021 at 11:32, Peter Vary wrote: > Congratulations Yan! > >> On Mar 24, 2021, at 05:43, Yufei Gu > > wrote: >> >> Congratulations, Yan! >> >>

Re: Welcoming Ryan Murray as a new committer!

2021-03-29 Thread Szehon Ho
That’s awesome, great work Ryan. Szehon > On 29 Mar 2021, at 18:08, Anton Okolnychyi > wrote: > > Hey folks, > > I’d like to welcome Ryan Murray as a new committer to the project! > > Thanks for all the hard work, Ryan! > > - Anton

Re: Welcoming Russell Spitzer as a new committer

2021-03-29 Thread Szehon Ho
Awesome, well-deserved, Russell! Szehon > On 29 Mar 2021, at 18:10, Holden Karau wrote: > > Congratulations Russel! > > On Mon, Mar 29, 2021 at 9:10 AM Anton Okolnychyi > wrote: > Hey folks, > > I’d like to welcome Russell Spitzer as a new committer to the project! > > Thanks for all your

Re: Spark configuration on hive catalog

2021-04-22 Thread Szehon Ho
Hi Huadong, nice to see you again :). The syntax is spark-sql is ‘insert into .. …”, here you defined your db as a catalog? You just need to define one catalog and use it when referring to your table. > On 22 Apr 2021, at 07:34, Huadong Liu wrote: > > Hello Iceberg Dev, > > I am not sure

Re: Welcoming OpenInx as a new PMC member!

2021-06-29 Thread Szehon Ho
Congrats Zheng! > On 29 Jun 2021, at 14:02, Anton Okolnychyi > wrote: > > Well deserved! Congrats! > >> On 29 Jun 2021, at 13:56, Jack Ye > > wrote: >> >> Congratulations!!! >> >> On Tue, Jun 29, 2021 at 1:55 PM Ryan Murray > > wrote: >> C

Re: Welcoming Jack Ye as a new committer!

2021-07-05 Thread Szehon Ho
Congratulations Jack! > On 5 Jul 2021, at 16:53, Jun H. wrote: > > Congratulations! > > >> On Jul 5, 2021, at 4:14 PM, Russell Spitzer >> wrote: >> >>  >> Congratulations! >> >> On Mon, Jul 5, 2021 at 3:21 PM karuppayya > > wrote: >> Congratulations Jack!

Re: Iceberg 0.12.0 Release Plan

2021-07-19 Thread Szehon Ho
t; >3. #2284 Core: reassign the partition field IDs and reuse any existing >ID <https://github.com/apache/iceberg/pull/2284>s > > #2284 is in review. > > Ryan said he would take a look at #2308. > > @Szehon Ho , can you please confirm whether or not > you'r

Serializable isolation for insert overwrites?

2021-07-20 Thread Szehon Ho
Hi, Does anyone know if its feasible to consider making Spark's "insert overwrite" implement serializable transaction, like delete, update, merge? Maybe at least for "overwrite by filter", then it can narrow down the conflict checks needed on the commitWithSerializableTransaction side. I don't h

Re: Serializable isolation for insert overwrites?

2021-07-20 Thread Szehon Ho
ck and I can point you in the > right direction. > > Ryan > > On Tue, Jul 20, 2021 at 4:20 PM Szehon Ho wrote: > >> Hi, >> >> Does anyone know if its feasible to consider making Spark's "insert >> overwrite" implement serializable transaction,

Re: [VOTE] Release Apache Iceberg 0.12.0 RC2

2021-08-05 Thread Szehon Ho
+1 (non-binding) * Verify Signature Keys * Verify Checksum * dev/check-license * Build * Run tests (though some timeout failures, on Hive MR test..) Thanks Szehon On Thu, Aug 5, 2021 at 2:23 PM Daniel Weeks wrote: > +1 (binding) > > I verified sigs/sums, license, build, and test > > -Dan > > O

Re: [VOTE] Release Apache Iceberg 0.12.0 RC2

2021-08-09 Thread Szehon Ho
adoop > configurations using configs prefixed with > "spark.sql.catalog.(catalog-name).hadoop." > - one of my contributions to this release that has been asked about by > several customers internally > - tested using `spark.sql.catalog.(catalog-name).hadoop.fs.

Re: [VOTE] Release Apache Iceberg 0.12.0 RC2

2021-08-09 Thread Szehon Ho
Got it, I somehow thought changes were manually cherry-picked, thanks for clarification. Thanks Szehon > On 9 Aug 2021, at 13:34, Ryan Blue wrote: > > Szehon, I think that should make it because the RC will come from master. > > On Mon, Aug 9, 2021 at 12:56 PM Szehon Ho w

Re: Subject: [VOTE] Release Apache Iceberg 0.12.0 RC3

2021-08-10 Thread Szehon Ho
+1 (non binding) * Checked Signature Keys * Verified Checksum * Rat checks * Build and run tests, most functionality pass (also timeout errors on Hive-MR) Thanks Szehon On Tue, Aug 10, 2021 at 1:40 AM Ryan Murray wrote: > +1 (non-binding) > > * Verify Signature Keys > * Verify Checksum > * dev

Re: Iceberg python library sync

2021-08-12 Thread Szehon Ho
+1, would love to listen in as well Thanks, Szehon > On 12 Aug 2021, at 12:48, Arthur Wiedmer > wrote: > > Hi Jun, > > Please add me as well! > > Best, > Arthur > > > > On Thu, Aug 12, 2021 at 12:19 AM Jun H. > wrote: > Hi everyone, > > Since early this year,

Re: [DISCUSS] Iceberg roadmap

2021-09-10 Thread Szehon Ho
Hi I also missed the last sync, and wanted to add two things if possible. Thanks, Szehon Priority 2: - Core: Predicate pushdown for remaining Metadata tables [medium] - Core/Spark: Support serializable isolation for ReplacePartitions / Insert Overwrite [medium] On Fri, Sep 10, 2021 a

Re: Welcome new PMC members!

2021-11-18 Thread Szehon Ho
Awesome, congratulations Jack and Russell! > On 18 Nov 2021, at 09:30, Ryan Murray wrote: > > Congratulations both! Well deserved! > > On Thu, 18 Nov 2021, 09:19 Omar Al-Safi, > wrote: > Congrats both of you! > > On Thu, Nov 18, 2021 at 8:31 AM Eduard Tudenhoefner

Re: Number of entries in manifest-list

2022-01-07 Thread Szehon Ho
Hi, The manifest entries are one per data file or delete file, so depends how many data files/delete files your table has. Number of files is controlled mostly by the parallelism of the job that writes the table, though there are Iceberg RewriteDataFile utilities that can compact as well (as in y

Re: Number of entries in manifest-list

2022-01-07 Thread Szehon Ho
of > `manifest_file` structs. > > Is there a general order-of-magnitude target number of `manifest_file` > structs? Presumably that would dictate when one would want to merge > manifest files and/or data files. > > Thanks again! > ggg > > > On Fri, Jan 7, 2022 at 1

Re: [VOTE] Release Apache Iceberg 0.13.0 RC2

2022-01-30 Thread Szehon Ho
+1 (non-binding) Verified signature Verified checksum Rat check Built and ran test, all succeed, after some temporary local HMS timeout Tested relevant jar with Spark 3.2, created various tables and ran queries Thanks Szehon On Fri, Jan 28, 2022 at 12:19 PM Russell Spitzer wrote: > +1 > All te

Re: Getting last modified timestamp/other stats per partition

2022-02-23 Thread Szehon Ho
Hi Probably the metadata tables can help with this. For the size/num_rows of partitions, you can query the files table, https://iceberg.apache.org/docs/latest/spark-queries/#files. (Because Iceberg keeps stats for files, and not necessary partitions). SELECT partition, sum(file_size_in_bytes),

Re: Getting last modified timestamp/other stats per partition

2022-03-07 Thread Szehon Ho
; >> >> *From:* Mayur Srivastava >> *Sent:* Thursday, February 24, 2022 7:27 AM >> *To:* dev@iceberg.apache.org >> *Subject:* RE: Getting last modified timestamp/other stats per partition >> >> >> >> Thanks Szehon. I’ll give this a try. >> >&g

Re: Welcome Szehon Ho as a committer!

2022-03-11 Thread Szehon Ho
ufei Gu >>>> <mailto:flyrain...@gmail.com>> wrote: > >>>>> > >>>>> Congratulations Szehon! > >>>>> Best, > >>>>> > >>>>> Yufei > >>>>>

Re: [VOTE] Release Apache Iceberg 0.13.2 RC0

2022-05-28 Thread Szehon Ho
Hi For gpg verify KEYS i get: gpg: Can't check signature: No public key I imported latest keys and do see key for : uid Russell Spitzer (CODE SIGNING KEY) sub rsa4096 2022-05-26 [E] but maybe no public key? Maybe I am missing something obvious. Also wanted to ask, can we get this

Re: [VOTE] Release Apache Iceberg 0.13.2 RC0

2022-05-29 Thread Szehon Ho
eEE15k0XH39/ZCYPikR8XEqs0YkO > wdFeyrBN22jtT48jMJ4IFw4odabqOqBn6Wazx3tBg0ZMTxn/i2H4tHpe78RIj/7Z > 7eLhkMY0meA64TMBCc0aS3ffCnJzetWOSpgjv9o= > =gy3b > -END PGP PUBLIC KEY BLOCK- > > > > On May 28, 2022, at 2:04 PM, Szehon Ho wrote: > > Hi > > For gpg verify

Re: [VOTE] Release Apache Iceberg 0.13.2 RC0

2022-05-29 Thread Szehon Ho
On the other topic, the pr for 0.13 branch is merged: https://github.com/apache/iceberg/pull/4890, my preference will be to include this in new RC to solve the aforementioned issue : https://github.com/apache/iceberg/issues/4718. Thanks, Szehon On Sun, May 29, 2022 at 2:59 PM Szehon Ho wrote

Re: [VOTE] Release Apache Iceberg 0.13.2 RC1

2022-06-06 Thread Szehon Ho
+1 (non-binding) 1. Verified signatures 2. Verified checksums 3. RAT checks 4. Build and test 5. Tested with Spark 3.2, create a table and run a few queries Thanks Szehon On Mon, Jun 6, 2022 at 10:46 AM Daniel Weeks wrote: > +1 (binding) > > verified sigs/sums/license/build/tes

  1   2   >