Re: There is no easy way to secure Iceberg data. How can we improve?

2025-01-03 Thread Micah Kornfield
Hi Vladimir and JB, There have been some previous discussions on security [1]. > We can think about splitting table data into multiple files for > column-level security and masking. For example, instead of storing columns > [a, b, c] in the same Parquet file, we split them into three files: [a,

Re: [VOTE] Add Variant type to Iceberg Spec

2024-11-26 Thread Micah Kornfield
specifically > references the Parquet spec except for our reference link to it. > > I don't think there is anything that will happen in the spec that will > change what we would include in the Iceberg Spec (especially in this PR) > > On Fri, Nov 22, 2024 at 5:10 PM Micah Korn

Re: [VOTE] Add Variant type to Iceberg Spec

2024-11-22 Thread Micah Kornfield
My (non-binding) vote is -1 until the variant spec is formally adopted in Parquet. On Fri, Nov 22, 2024 at 2:51 PM Aihua Xu wrote: > Hi everyone, > > I've updated the Iceberg spec to include the new Variant type as part of > #10831 . The changes are

Re: [DISCUSS] Proposal to buffer manifest files before updating manifest-list

2024-11-22 Thread Micah Kornfield
Would cadding the ability to have a list of manifest lists solve this problem? This might be an incremental step to getting to "everything" is a manifest? For now I wanted to reuse the existing manifest-list and manifests fields. Regardless of the outcome, please let's not re-use a field in a w

Re: [DISCUSS] - Deprecate Equality Deletes

2024-11-19 Thread Micah Kornfield
asonable >>>>>>>>>>>>>> limitations, from >>>>>>>>>>>>>> an implementation point-of-view. But OTOH, as a user, this >>>>>>>>>>>>>> seems to be >>>>>&g

Re: [VOTE] Deletion Vectors in V3

2024-10-31 Thread Micah Kornfield
+1 (non-binding) On Thu, Oct 31, 2024 at 4:05 PM Steve Zhang wrote: > +1 (non-binding) > > Thanks, > Steve Zhang > > > > On Oct 31, 2024, at 3:41 PM, rdb...@gmail.com wrote: > > +1 > > Thanks, Anton! > > On Wed, Oct 30, 2024 at 11:58 PM Fokko Driesprong > wrote: > >> +1 >> >> I had to read up a

Re: [DISCUSS] - Deprecate Equality Deletes

2024-10-31 Thread Micah Kornfield
I agree that equality deletes have their place in streaming. I think the ultimate decision here is how opinionated Iceberg wants to be on its use-cases. If it really wants to stick to its origins of "slow moving data", then removing equality deletes would be inline with this. I think the other h

Re: Spec changes for deletion vectors

2024-10-21 Thread Micah Kornfield
t;>> >>>>>>>> Imho, I will focus on 1 because it would be a great feature for the >>>>>>>> Iceberg community. >>>>>>>> >>>>>>>> Regards >>>>>>>> JB >>>>>>&g

Re: Spec changes for deletion vectors

2024-10-16 Thread Micah Kornfield
One small point > Theoretically we could end up with iceberg implementers who have bugs in > this part of the code and we wouldn’t even know it was an issue till > someone converted the table to delta. I guess we could mandate readers validate all fields here to make sure they are all consistent

Re: [Discuss] LZ4 compression in Puffin spec

2024-10-13 Thread Micah Kornfield
ormat. On Sat, Aug 31, 2024 at 12:11 PM Piotr Findeisen wrote: > Hi Micah, > > Good point. > Does unframed LZ4 provide a checksum of the content before compression? > > Best > Piotr > > > On Fri, 30 Aug 2024 at 23:34, Micah Kornfield > wrote: > >> The Ice

Re: Spec changes for deletion vectors

2024-10-11 Thread Micah Kornfield
I think it might be worth mentioning the current proposal makes some, mostly minor, design choices to try to be compatible with Delta Lake deletion vectors. I think there might be a general philosophical question on what compromises the community is willing to make for compatibility reasons. On T

Re: [VOTE] Table v3 spec: Add unknown and new type promotion

2024-09-30 Thread Micah Kornfield
I'm -0.0 as worded currently. I think there are some more aspects that should be defined for date->timestamp/timestamp_ns promotion (left comments on the PR). The addition of an Unknown type seems like a good addition. Thanks, Micah On Mon, Sep 30, 2024 at 2:32 PM Yufei Gu wrote: > +1(binding

Re: [DISCUSS] Define calendar used in specification?

2024-09-30 Thread Micah Kornfield
Thu, Sep 12, 2024 at 1:33 PM Micah Kornfield wrote: > The spec purposely avoids timestamp conversion. Iceberg returns values as >> they are passed from the engine and it is the engine's responsibility to do >> any date/time conversion. I don't think that we sh

Re: V3 Spec Changes

2024-09-27 Thread Micah Kornfield
For variant, the current plan on moving to Parquet is to mark the variant type as experimental. Would Iceberg depend on the experimental type or is V3 going to wait for a variant to be deemed non-experimental by the Parquet community? Thanks, Micah On Tue, Sep 24, 2024 at 9:52 AM Russell Spitzer

Re: [DISCUSS] Define calendar used in specification?

2024-09-12 Thread Micah Kornfield
i.e., force the user to choose when a value is >> encountered where it matters). See >> https://issues.apache.org/jira/browse/SPARK-46440. >> >> I'm not sure if any of this matters for Iceberg though. It may matter if >> any Iceberg implementation writes using the

[DISCUSS] Define calendar used in specification?

2024-09-11 Thread Micah Kornfield
At the moment, the specification is ambiguous on which calendar is used for temporal conversion/writing [1]. Reading the java code it appears it is using Java's OffsetDateTime which conforms to ISO8601 [2]. ISO8601 appears to explicitly disallow the Julian calendar (but only says proleptic gregori

Re: Time-based partitioning on long column type

2024-09-11 Thread Micah Kornfield
> > Maybe we could update the time-based partition functions to be applied to > a long column directly. It would treat that column like a timestamp in > milliseconds. Would that work? I need to think more about the implications > of doing that, but I don't think that we currently have an issue with

[RESULT][VOTE] Merge guidelines for committing PRs

2024-09-03 Thread Micah Kornfield
, Aug 30, 2024 at 5:07 AM Jean-Baptiste Onofré wrote: > -0 (non binding) > > I'm not convinced it would help much but worth to see :) > > Regards > JB > > > > On Wed, Aug 28, 2024 at 6:28 PM Micah Kornfield > wrote: > > > > I propose to merge

Re: Type promotion in v3

2024-08-30 Thread Micah Kornfield
>> introducing a way to specify the default unit somewhere, we decided the >> simplest solution was to assume milliseconds. But now this is moot because >> this would break lower/upper bounds so I'm proposing we drop this type >> promotion case. >> >> For i

Re: [Discuss] LZ4 compression in Puffin spec

2024-08-30 Thread Micah Kornfield
> > The Iceberg implementation was supposed to be based on aircompressor pure > Java implementation https://github.com/airlift/aircompressor/pull/142. > AFAICT, aircompressor started to favor (or be more OK with) native > implementations (because of Project Panama), so adding LZ4 framed > compressi

[VOTE] Merge guidelines for committing PRs

2024-08-28 Thread Micah Kornfield
I propose to merge https://github.com/apache/iceberg/pull/10780 as a starting place for describing community norms around merging/discussing PRs. We've discussed this [1] and gone through a bunch of revisions on the PR to what is a minimal starting point for describing the merge process. The vote

Re: [DISCUSS] Guidelines for committing PRs

2024-08-28 Thread Micah Kornfield
Anton > > пт, 16 серп. 2024 р. о 17:55 Walaa Eldin Moustafa > пише: > >> Thanks Micha. It is clearer now. I have left some comments. Let us >> continue on the PR. >> >> On Fri, Aug 16, 2024 at 5:39 PM Micah Kornfield >> wrote: >> >>> Hi Walaa,

Re: Type promotion in v3

2024-08-20 Thread Micah Kornfield
e gap or make partial progress until there's a better > option (like moving the parquet metadata with explicit type information at > the stats level). > > In terms of keeping complexity low, I'd lean more towards restricting > evolution (like with incompatible transforms) than try

Re: Type promotion in v3

2024-08-20 Thread Micah Kornfield
w we store data. Would we be able > to write hex-encoded strings? > > I'd argue that once a schema is going from "any type"->"string", >> something was fairly wrong with data modelling initially, providing more >> tools to help users fix these types of

Re: Type promotion in v3

2024-08-19 Thread Micah Kornfield
Hi Xiangjin, Could you elaborate a bit more on how the Parquet manifest would fix the > type promotion problem? If the lower and upper bounds are still a map of > , I don't think we can perform column pruning on that, and the > type information of the stat column is still missing. I think the i

Re: Type promotion in v3

2024-08-19 Thread Micah Kornfield
eage of > field change history? > > Thanks, > Gang > > On Tue, Aug 20, 2024 at 7:34 AM Micah Kornfield > wrote: > >> Hi Ryan, >> >> Thanks for the reply, responses inline >> >>> >>>- How do we keep track of the replaced column? Does

Re: [DISCUSS] Materialized Views: Lineage and State information

2024-08-19 Thread Micah Kornfield
n overall size. Thanks, Micah On Fri, Aug 16, 2024 at 4:56 PM Walaa Eldin Moustafa wrote: > Thanks Micah, for the latter, I meant the type of denormalization of > repeating a 3-part name as opposed to using an ID. > > On Fri, Aug 16, 2024 at 4:52 PM Micah Kornfield > wrote: >

Re: [VOTE] Spec changes in preparation for v3

2024-08-19 Thread Micah Kornfield
+1 (non-binding) On Mon, Aug 19, 2024 at 4:33 PM Steve Zhang wrote: > +1 (non-binding) > > Thanks, > Steve Zhang > > > > On Aug 19, 2024, at 1:47 PM, John Zhuge wrote: > > +1 (non-binding) > > On Mon, Aug 19, 2024 at 1:34 PM Yufei Gu wrote: > >> +1 >> Yufei >> >> >> On Mon, Aug 19, 2024 at 1:1

Re: Type promotion in v3

2024-08-19 Thread Micah Kornfield
uch of a functional difference besides needing more complex projection > > I also don’t agree with the expanded definition of type promotion. Type > promotion exposes a way to implicitly cast older data to the new type. That > doesn’t allow you to choose the string format you want for a date,

Re: Type promotion in v3

2024-08-19 Thread Micah Kornfield
I think continuing to define type promotion as something that happens implicitly from the reader perspective has a few issues: 1. It makes it difficult to reason about all additional features that might require stable types to interpret. Examples of existing filters: partition statistics file, e

Re: [DISCUSS] Guidelines for committing PRs

2024-08-16 Thread Micah Kornfield
Hi Walaa, > For the former, we could talk about avoiding conflict of interest as a way > of "maintaining trust". For the latter, we can state some examples that > clearly reflect conflict of interest with no ambiguity. For example, a > committer merging a large change that received minimal discuss

Re: [DISCUSS] Variant Spec Location

2024-08-16 Thread Micah Kornfield
so need to standardize many functions > > > related to it. > > > > > > A neutral place to maintain it is a great choice. > > > > > > - As Gang Wu said, a standalone project is good, just like > RoaringBitmap > > > [1]. > > > - As Ryan

Re: [DISCUSS] Materialized Views: Lineage and State information

2024-08-16 Thread Micah Kornfield
ntifiers in the lineage side >> (but in this case lineage will be a set instead of just a map). >> >> Hence, my concerns with using catalog identifiers (as opposed to UUIDs) >> are: >> * The fundamental issue where the table spec depends on/refers to the >> view spec

Re: [DISCUSS] Variant Spec Location

2024-08-15 Thread Micah Kornfield
request on the public Spark Dev list? > I would be glad to co-sign, I can also draft up a quick email if you don't > have time. > > On Thu, Aug 15, 2024 at 10:04 AM Micah Kornfield > wrote: > >> I agree that it would be beneficial to make a sub-project, the main >>

Re: [DISCUSS] Materialized Views: Lineage and State information

2024-08-15 Thread Micah Kornfield
> >>> Also in terms of identifiers to use(UUID or catalog identifier) for the >>> refresh state >>> We will not be able to fetch the table/View using the UUID alone, for >>> example from Hive based catalog. >>> We do not have the direct mapping between UUI

Re: [DISCUSS] Materialized Views: Lineage and State information

2024-08-15 Thread Micah Kornfield
gt;> We do not have the direct mapping between UUID and table/view. >> Which leaves us only with the catalog identifiers? >> >> Thanks & Regards >> Karuppayya >> >> >> On Thu, Aug 15, 2024 at 9:16 AM Micah Kornfield >> wrote: >> >>&g

Re: [DISCUSS] Variant Spec Location

2024-08-15 Thread Micah Kornfield
;>> >>>> On Thu, Aug 15, 2024, at 23:17, Gang Wu wrote: >>>> >>>> +1 on posting this discussion to dev@spark ML >>>> >>>> > I don't think there is anything that would stop us from moving to a >>>> joint proj

Re: [DISCUSS] Materialized Views: Lineage and State information

2024-08-15 Thread Micah Kornfield
I think it might be worth restating perceived requirements and making sure there is alignment on them. If I am reading correctly, I think the following are perceived requirements: 1. An engine must be able to unambiguously detect that an underlying queried entity has changed or not via metadata to

Re: [DISCUSS] Variant Spec Location

2024-08-15 Thread Micah Kornfield
> > I agree that it would be beneficial to make a sub-project, the main > problem is political and not logistic. I've been asking for movement from > other relative projects for a month and we simply haven't gotten anywhere. I just wanted to double check that these issues were brought directly to

Re: [DISCUSS] Guidelines for committing PRs

2024-08-09 Thread Micah Kornfield
were to follow a flowchart to test >> whether a PR is mergeable, what would the flowchart look like? (Of course, >> we do not have to depict a flowchart, but I am just conveying the style of >> guidance that could potentially be more effective and clear). >> >> Thanks, &g

Re: [DISCUSS] Guidelines for committing PRs

2024-08-09 Thread Micah Kornfield
there aren't other items I'll start a vote next week to merge the change. Thanks, Micah On Tue, Aug 6, 2024 at 12:18 PM Micah Kornfield wrote: > My only question is around "conflict of interest" while reviewing PRs. I >> think it needs further explanation and s

Re: [DISCUSS] Guidelines for committing PRs

2024-08-06 Thread Micah Kornfield
s some of the ASF guidelines but I don't mind that given that it > is fairly concise. My only question is around "conflict of interest" while > reviewing PRs. I think it needs further explanation and some concrete > examples. > > - Anton > > нд, 4 серп. 2024 р. о 15:

[RESULT][VOTE] Merge specification clarifications on reading/writing partition values

2024-08-05 Thread Micah Kornfield
;> >>>> +1 (non-binding) >>>> Thanks Micah ! >>>> >>>> Regards, >>>> Prashant >>>> >>>> On Fri, Aug 2, 2024 at 11:06 AM Micah Kornfield >>>> wrote: >>>> >>>>> I've op

Re: [DISCUSS] adoption of format version 3

2024-08-05 Thread Micah Kornfield
have a regular release cadence, it would still make > sense to group features like new types together because it makes the > versions easier to understand and limits the overall impact in the > implementations. > > Ryan > > On Fri, Aug 2, 2024 at 11:39 AM Micah Kornfield

Re: [DISCUSS] Guidelines for committing PRs

2024-08-04 Thread Micah Kornfield
e subjective (e.g., [1]) or > > implied (e.g., [2]). What remains (e.g., [3] or the way to proceed if > > a committer feels something is worthy of a proposal-level discussion) > > fits more in a process that organizes what qualifies as a proposal vs > > code change etc (e.g.,

Re: [DISCUSS] Guidelines for committing PRs

2024-08-02 Thread Micah Kornfield
nable. For votes it seems like most of the recent threads on the matter have decided to err on the side of votes for changes to specs, as it reduces the judgement call of reviewers. Thoughts? Thanks, Micah On Tue, Jul 30, 2024 at 11:08 AM Micah Kornfield wrote: > The problem I'm worried

Re: [DISCUSS] adoption of format version 3

2024-08-02 Thread Micah Kornfield
leased, which again > makes it potentially tied to the release cycle of at least the Java library. > > Curious what people think. > > Best, > Jack Ye > > [1] https://lists.apache.org/thread/v6x772v9sgo0xhpwmh4br756zhbgomtf > > On Wed, Jul 31, 2024 at 10:19 PM Micah K

[VOTE] Merge specification clarifications on reading/writing partition values

2024-08-02 Thread Micah Kornfield
I've opened a PR [1] to clarify that partition columns must always be written by implementations and that for identity transformed partition values, the metadata from the manifest file must be used. Please vote on merging this change. The vote will remain open for at least 72 hours. [] +1 [] +0

Re: [DISCUSS] Spec clarifications on reading/writing Identity partitioned columns

2024-07-31 Thread Micah Kornfield
y > this. > > Ryan > > On Thu, Jul 25, 2024 at 1:18 PM Russell Spitzer > wrote: > >> I have no problem with explicitly stating that writing identity source >> columns is optional on write. We should, of course, mandate surfacing the >> column on read :) >> &

Re: [DISCUSS] adoption of format version 3

2024-07-31 Thread Micah Kornfield
It sounds like most of the opinions so far are waiting for the scope of work to finish before finalizing the specification. An alternative view: Would it make sense to start releasing the table specification on a regular cadence (e.g. quarterly, every 6 months or yearly)? I think the problem with

Re: [VOTE] Clarify "File System Tables" in the table spec

2024-07-31 Thread Micah Kornfield
+1 (non-binding) On Wed, Jul 31, 2024 at 5:12 PM Ryan Blue wrote: > As promised in the discussion thread, I've opened a PR to clarify the > "File System Tables" section and mark it deprecated since there appears to > be consensus for at least warning people that this is unsafe in most cases > an

Re: [DISCUSS] Guidelines for committing PRs

2024-07-30 Thread Micah Kornfield
tent here is stating how ASF >>> communities work and the only Iceberg-specific parts are the proposal >>> process and calling out that we vote on spec changes, I would probably just >>> have a description of how to handle proposals (which is already there) and >>>

Re: [Early Feedback] Variant and Subcolumnarization Support

2024-07-26 Thread Micah Kornfield
>>>>>> some JSON parsing capabilities [1] for string fields. >>>>>> >>>>>> So until we have native support in Flink for something similar to >>>>>> Vartiant type, I expect that we need to map it to JSON strings in >>>>

[DISCUSS] Guidelines for committing PRs

2024-07-25 Thread Micah Kornfield
As part of the bylaws discussions that have been happening, we are trying to make small focused proposals to move things forward. As a first step towards this I created a proposal for guidelines on committing pull requests [1]. Feedback is appreciate

[DISCUSS] Spec clarifications on reading/writing Identity partitioned columns

2024-07-25 Thread Micah Kornfield
The Table specification doesn't mention anything about requirements for whether writing identity partitioned columns is necessary. Empirically, it appears that implementations always write the column data at least for parquet. For columnar formats, this is relatively cheap as it is trivially RLE

Re: [DISCUSS][BYLAWS] Moving forward on the bylaws

2024-07-25 Thread Micah Kornfield
but I don't see exactly what > problem we are trying to solve here. > > Regards > JB > > On Tue, Jul 23, 2024 at 7:58 AM Micah Kornfield > wrote: > > > > My 2 cents on this topic. I think we are getting bogged down in > relatively minor details/bureaucratic

Re: [DISCUSS][BYLAWS] Moving forward on the bylaws

2024-07-25 Thread Micah Kornfield
gt;> >> On Tue, Jul 23, 2024 at 7:45 AM Russell Spitzer < >> russell.spit...@gmail.com> wrote: >> >>> Micah has a great list there for me. I'm similarly not as interested in >>> the bureaucracy of the project and more interested in actually discus

Re: [ANNOUNCE] Welcoming new committers and PMC members

2024-07-23 Thread Micah Kornfield
Congrats everyone! On Tuesday, July 23, 2024, Sung Yun wrote: > Thank you very much! > > I am excited to see the project growing to new capacities as well, and to > be an active part of that journey. > > I will continue to work hard together with the community to take > (Py)Iceberg to its next s

Re: [DISCUSS][BYLAWS] Moving forward on the bylaws

2024-07-22 Thread Micah Kornfield
My 2 cents on this topic. I think we are getting bogged down in relatively minor details/bureaucratic points. This is a reiteration of a previous recommendation on the topic, but in the interest of making progress here, I'd propose let's break this conversation down and focus on incremental definit

Re: [Early Feedback] Variant and Subcolumnarization Support

2024-07-22 Thread Micah Kornfield
t;>>> * We are interested in also adopting the shredding spec from Spark and >>>>> would like to move it to whatever place we decided the Variant spec is >>>>> going to live. >>>>> >>>>> Let us know if missed anything and if you have a

Re: [RESULT][VOTE] Merge table spec clarifications on time travel and equality deletes

2024-07-19 Thread Micah Kornfield
. > > Thanks, > Dmitri. > > [1] https://github.com/apache/iceberg/pull/8982 > > On Fri, Jul 19, 2024 at 1:15 PM Micah Kornfield > wrote: > >> The vote passes with: >> >> 5 "+1 Binding votes" >> 3 "+1 Non-binding votes." >>

Re: [RESULT][VOTE] Merge table spec clarifications on time travel and equality deletes

2024-07-19 Thread Micah Kornfield
w `implementation notes` >>>> section) >>>> >>>> On Thu, Jul 18, 2024 at 3:54 PM Ryan Blue >>>> wrote: >>>> >>>>> +1 >>>>> >>>>> Thanks, Micah! >>>>> >>>>> On Tue,

Re: [DISCUSS] Merging specification clarifications

2024-07-15 Thread Micah Kornfield
;s clearer to keep the same process even for >> "small" changes. >> I would recommend to use two vote threads (one per change) to avoid >> confusion and vote on one specific change. >> >> Thanks ! >> Regards >> JB >> >> On Fri, Jul 12, 20

[VOTE] Merge table spec clarifications on time travel and equality deletes

2024-07-15 Thread Micah Kornfield
I'd like to raise on modifying the table specification with clarifications on time travel and equality deletes [1][2]. The PRs have links to prior mailing list discussions where there was apparent consensus that these were the expectations for functionality. Possible votes: [ ] +1 Merge the PRs [

Re: [Early Feedback] Variant and Subcolumnarization Support

2024-07-12 Thread Micah Kornfield
I don't think this needs to hold up the PR but I think coming to a consensus on the exact set of types supported is worthwhile (and if the goal is to maintain the same set as specified by the Spark Variant type or if divergence is expected/allowed). From a fragmentation perspective it would be a s

[DISCUSS] Merging specification clarifications

2024-07-12 Thread Micah Kornfield
Hi, I have to open pull requests to clarify points on the specification [1][2]. I believe these both document current behavior and don't represent a specification change (and they were already discussed on the mailing list) But given the recent focus on spec update process, I wanted to ask if the

Re: [DISCUSS] Describing REST Server capabilities

2024-06-27 Thread Micah Kornfield
ly version >>> each API? When using tag, it means I have to offer capabilities per-tagged >>> group. However, I could for example just offer loadTable and nothing else >>> in a catalog, and that should still be Iceberg REST compliant. And I think >>> we need a versioning st

Re: [Discussion] Apache Iceberg Community Guideline - Initial Version

2024-06-26 Thread Micah Kornfield
Hi Jack, I think it would make sense to convert this to a PR, so it can be version tracked in the future (and that way it avoids another review if the intent is to transitition github)? Thanks, Micah On Tue, Jun 25, 2024 at 9:07 AM Jack Ye wrote: > Hi everyone, > > Thanks for the feedback in th

Re: [DISCUSS] Describing REST Server capabilities

2024-06-24 Thread Micah Kornfield
I don't have strong opinions either way here, just thought it was worth raising some concerns over possible evolution here. Some responses inline, but if capabilities seem to meet the requirement at hand, then it does potentially seem the simplest mechanism. I think we also want to avoid relyanc

Re: [DISCUSS] Describing REST Server capabilities

2024-06-20 Thread Micah Kornfield
> > The general idea behind a capability is that if e.g. a server supports > *views*, then that server must implement all endpoints grouped under that > capability. I haven't thought deeply about this, but is there a reason to be prescriptive about this by grouping endpoints in capabilities? Ano

Re: Call for Ryan Blue to Step Down as PMC Chair

2024-06-07 Thread Micah Kornfield
> > Many proposals are piling up and take months and years to get reviewed and > merged. I would suggest maybe starting another thread on this matter. This could be for a variety of reasons but it would be good to solve them as a community. As stated above by others, I'll reiterate that I don't

Re: [Early Feedback] Variant and Subcolumnarization Support

2024-05-11 Thread Micah Kornfield
Hi Tyler, et. al., I think some sort of semi-structured type is a good idea. I think one important question is whether to support Variant, JSON or another representation of semi-structured data as the user facing data type. Please correct me if I'm wrong, but I think Variant is mostly a superset

Re: New committer: Renjie Liu

2024-03-09 Thread Micah Kornfield
Congrats On Saturday, March 9, 2024, Hussein Awala wrote: > Congrats Renjie! > > On Sat, Mar 9, 2024 at 8:55 PM Yufei Gu wrote: > >> Congratulations and thanks for the great work in rust iceberg, Renjie! >> >> Yufei >> >> >> On Sat, Mar 9, 2024 at 11:39 AM Steven Wu wrote: >> >>> Congrats, Ren

Re: Materialized view integration with REST spec

2024-02-21 Thread Micah Kornfield
rough >>>>>> SQL), materialized view APIs should be almost the same as regular view >>>>>> APIs >>>>>> (except for operations specific to materialized views like REFRESH >>>>>> command >>>>>> etc). Typically

Re: Materialized view integration with REST spec

2024-02-19 Thread Micah Kornfield
Hi Jack, > In my mind, the first key point we all need to agree upon to move this > design forward is*: Do we really want to go with the MV = view + storage > table design approach for Iceberg MV?* I think we want this to the extent that we do not want to redefine the same concept with differen

Re: Process for creating new Proposals

2024-02-19 Thread Micah Kornfield
g the requirements for a specification > feature. > > @JackYe I would prepare an Github Issue template. > > Kind regards, > Jan > On 05.02.24 04:25, Micah Kornfield wrote: > > A few follow-up questions, if these are too off-topic I can start another > thread. > > Can we c

Re: Support permission concepts in REST spec

2024-02-16 Thread Micah Kornfield
Hi Jack, I think this is an interesting idea but I think there are some practical concerns (I posted them inline). - general access patterns, like read-only, read-write, admin full access, > etc. Is this intended to be information only? I would hope the tokens and REST API vending to clients wou

Re: Process for creating new Proposals

2024-02-04 Thread Micah Kornfield
A few follow-up questions, if these are too off-topic I can start another thread. Can we clarify the scope of proposals? If these involve large changes, or new features in existing specifications or new specifications, would it make sense to advertise them on this mailing list at each part of the

Re: Spec change for multi-arg transform

2024-01-29 Thread Micah Kornfield
8258 (author made >>>> an end-to-end reference implementation there for a sample transform) >>>> 2. Google Doc Dicussion: >>>> https://docs.google.com/document/d/1aDoZqRgvDOOUVAGhvKZbp5vFstjsAMY4EFCyjlxpaaw/edit#heading=h.si1nr6ftu79b >>>> 3. Aug

Re: Spec change for multi-arg transform

2024-01-27 Thread Micah Kornfield
I think this is a good idea but I have concerns about compatibility. IMO, I think changing the cardinality of input columns is a large enough change that trying to retrofit it into V1 or V2 of the specification will cause pain for implementations not relying on reference implementation. I As a se

Re: [ANNOUNCE] New committer: Honah J.

2024-01-12 Thread Micah Kornfield
Congrats! On Friday, January 12, 2024, Jack Ye wrote: > Congratulations! Thanks for all the work in python! > > Best, > Jack Ye > > On Fri, Jan 12, 2024 at 1:11 PM Fokko Driesprong wrote: > >> On behalf of the Iceberg PMC, I'm happy to announce that Honah has >> accepted an invitation to become

Re: Pagination for List APIs in the REST spec

2023-12-20 Thread Micah Kornfield
It would help to have articulable use cases to really invest in more > complexity in this area and I feel like we're drifting a little into the > speculative at this point. > > -Dan > > > > On Wed, Dec 20, 2023 at 3:27 PM Micah Kornfield > wrote: > >> I agr

Re: Proposal for REST APIs for Iceberg table scans

2023-12-20 Thread Micah Kornfield
> > Also +1 for having a more strict definition of the shard. Having arbitrary > JSON was basically what we experimented with a string shard ID, and we > ended up with something very similar to the manifest plan task you describe > in the serialized ID string. IIUC the proposal correctly, I'd act

Re: Pagination for List APIs in the REST spec

2023-12-20 Thread Micah Kornfield
think it would be enough for the server to determine it for >> now, since I don't see any usage to allow clients to set the expiration >> time in api. >> >> 2. Do servers need to expose the expiration time to clients? >> >> Personally I think it would be enough

Re: Pagination for List APIs in the REST spec

2023-12-19 Thread Micah Kornfield
uation tokens). On the other hand, using "asOf" can be complex to >>>>> implement and may be too powerful for the pagination use case (because it >>>>> allows to query the warehouse as of any point of time, not just now). >>>>> >>&g

Re: Pagination for List APIs in the REST spec

2023-12-19 Thread Micah Kornfield
(This > is also the missing piece I forgot to mention in the start index approach > to ensure it works in distributed settings) > > -Jack > > On Tue, Dec 19, 2023, 9:51 AM Micah Kornfield > wrote: > >> I tried to cover these in more details at: >> https://docs.g

Re: Pagination for List APIs in the REST spec

2023-12-19 Thread Micah Kornfield
n >> the ListNamespacesResponse might allow for more backward compatibility. In >> that scenario, pagination would only take place for clients who know how to >> paginate and the ordering would not need to be deterministic. >> >> -Dan >> >> On Fri, D

Re: Pagination for List APIs in the REST spec

2023-12-15 Thread Micah Kornfield
Just to clarify and add a small suggestion: The behavior with no additional parameters requires the operations to happen as they do today for backwards compatibility (i.e either all responses are returned or a failure occurs). For new parameters, I'd suggest an opaque start token (instead of spec

Re: Spec Clarification: Partition Spec equality

2023-11-21 Thread Micah Kornfield
isn't allowed. > > For how this applies to tracking delete files, it is the partition spec ID > that should be checked. > > On Fri, Nov 3, 2023 at 10:35 AM Micah Kornfield > wrote: > >> Hello Iceberg Dev, >> The Iceberg specification for matching delete files wi

Re: MOR CDC view support

2023-11-21 Thread Micah Kornfield
Slightly side topic: Are slack channels archived anywhere for offline consumption (apologies if I missed it on the community page)? Thanks, Micah On Tue, Nov 21, 2023 at 6:07 AM Renjie Liu wrote: > Thanks for sharing. > > On Tue, Nov 21, 2023 at 21:52 Walaa Eldin Moustafa > wrote: > >> We met

Re: SQL Syntax for Time Travel on a Branch?

2023-11-03 Thread Micah Kornfield
he current metadata file to be the source of truth for all >> Iceberg operations. >> >> I think it's probably a good idea to note that this is the expected time >> travel behavior. >> >> On Wed, Apr 26, 2023 at 8:42 AM Micah Kornfield >> wrote: >&g

Spec Clarification: Partition Spec equality

2023-11-03 Thread Micah Kornfield
Hello Iceberg Dev, The Iceberg specification for matching delete files with data files during scan planning states : "The data file’s partition (both spec and partition values) is equal to the delete file’s partition" Equality of partition specs appears slightly ambiguous (apologies if I missed t

Re: Nested column types and equality delete files

2023-11-03 Thread Micah Kornfield
ats it as destinct by default, but allows configuration to > treat it as no distinct: > > > https://stackoverflow.com/questions/8289100/create-unique-constraint-with-null-columns > > > On Sat, Oct 28, 2023 at 04:00 Micah Kornfield > wrote: > >> Iceberg spec h

Re: Nested column types and equality delete files

2023-10-27 Thread Micah Kornfield
berg.apache.org/spec/#identifier-field-ids> . I think > it would make sense if equality id fields share similar constraints. > > On Thu, Oct 26, 2023 at 4:24 AM Micah Kornfield > wrote: > >> Sorry I think I missed a question: >> >> Similarly, I think we could hand

Re: Nested column types and equality delete files

2023-10-25 Thread Micah Kornfield
hen compared to enumerating the leaf columns. Since the change is potentially backwards incompatible, we might not be able to get away with disallowing them? Thanks, Micah On Wed, Oct 25, 2023 at 1:22 PM Micah Kornfield wrote: > I think nesting in struct makes sense to support as this is

Re: Nested column types and equality delete files

2023-10-25 Thread Micah Kornfield
. Similarly, I think we could handle fields with > primitive or struct types but fields that contain lists or maps should not > be allowed. > > Does that sound reasonable to you? We could be more conservative and > disallow deletion by struct fields as well. > > Ryan > > On

Nested column types and equality delete files

2023-10-20 Thread Micah Kornfield
Hi Iceberg Dev, Are equality delete files intended to support nested columns of nested types (lists, structs and maps) or "children" of nested types? I couldn't find anything prohibiting it in the specification [1] (apologies if I missed it) but it seems like this adds a fair amount of complexi

Re: Additional indexes for data files

2023-06-07 Thread Micah Kornfield
Hi Jack, For NDV sketch, how much space does it typically take per file. One issue is it might increase manifest file size. I think one way of making it more palatable to add additional statistics is looking at the possibility of using a columnar format as an alternative to Avro, so that there i

Re: Current Status of View Specification

2023-05-09 Thread Micah Kornfield
people. It would be bonkers to see catalogs mean different things in > different places. > > Not only that, this is also how columns work. I don't know of anything > that tracks view columns by their context and ID. If your view is "select > a, b from table" and "a&

Re: Current Status of View Specification

2023-05-02 Thread Micah Kornfield
nd require users to ensure all query engines have their catalogs configured in a uniform way (this seems somewhat painful of an end-user experience). Thanks, Micah On Wed, Mar 15, 2023 at 10:56 PM Micah Kornfield wrote: > Thanks Ryan for the quick response. > > I think that names s

  1   2   >