Re: [DISCUSS] Python implementation

Ryan Blue Tue, 05 Mar 2019 16:11:56 -0800

We may want to do that eventually, but I think it isn't necessary right
now. I also don't know how we would determine what must be implemented to
form that minimum support threshold. Not having features means not being
able to read tables that use them. If there is no Parquet support, then you
can't read Parquet tables. Same thing with encryption. I don't see a
minimum being valuable, compared to a table that makes it clear what is
supported. We should probably make that table, though!


On Thu, Feb 28, 2019 at 3:39 PM Xabriel Collazo Mojica <xcoll...@adobe.com>
wrote:

> Regarding:
>
>
>
> Would every feature added to the Java version need to be mirrored in
> Python?
>
> I think that the spec should be used to coordinate across implementations,
> but that those implementations can have different features and degrees of
> support. It would be fine if python didn’t have support for the encryption
> structures until someone needs that support from Python and adds it.
> Otherwise, we’re asking too much of contributors: go fix this in another
> language that you may not know or be comfortable in.
>
>
>
> Should then the spec have a feature support matrix stating minimum support
> needed? As in the customary MAY, SHOULD, MUST, etc. [1]?
>
>
>
> [1]: https://www.ietf.org/rfc/rfc2119.txt
>
>
>
>
>
> *Xabriel J Collazo Mojica*  |  Senior Software Engineer  |  Adobe  |
> xcoll...@adobe.com
>
>
>
> *From: *Ryan Blue <rb...@netflix.com.INVALID>
> *Reply-To: *"dev@iceberg.apache.org" <dev@iceberg.apache.org>, "
> rb...@netflix.com" <rb...@netflix.com>
> *Date: *Thursday, February 28, 2019 at 3:23 PM
> *To: *Matt Cheah <mch...@palantir.com>
> *Cc: *"dev@iceberg.apache.org" <dev@iceberg.apache.org>
> *Subject: *Re: [DISCUSS] Python implementation
>
>
>
> The only difficulty I can think of is that we will need to remove the
> python directory from the source tarball when we build it. Shouldn't be a
> big problem.
>
>
>
> rb
>
>
>
> On Thu, Feb 28, 2019 at 2:08 PM Matt Cheah <mch...@palantir.com> wrote:
>
> I’m wondering how significant the maintenance burden is for maintaining
> two release cycles from the same repository? I would imagine that it would
> be less burden concentrated in one place if we had separate repositories at
> least to start with. Then when we have confidence in the readiness of the
> Python work we can merge it into Iceberg proper and have the release
> publish both versions.
>
>
>
> -Matt Cheah
>
>
>
> *From: *Daniel Weeks <daniel.c.we...@gmail.com>
> *Reply-To: *"dev@iceberg.apache.org" <dev@iceberg.apache.org>
> *Date: *Thursday, February 28, 2019 at 1:47 PM
> *To: *"dev@iceberg.apache.org" <dev@iceberg.apache.org>, Ryan Blue <
> rb...@netflix.com>
> *Subject: *Re: [DISCUSS] Python implementation
>
>
>
> I agree with this approach.
>
>
>
> Since this is an entirely new implementation for python, it makes more
> sense to take the initial version (pending any additional review/comments)
> and then continue to iterate from that point.  It would be very difficult
> to break up into smaller commits and work through incrementally without
> adding a lot of value (though going forward we should lean into more
> incremental contributions).
>
>
>
> I do think that Matt brings up some good points and initially I would lean
> into keeping a single repo and if we find there are more contributions in
> other languages that we reconsider separating the repos to keep them from
> impacting releases.
>
>
>
> Also, want to cal lout a huge thanks to Ted for all the work they did to
> contribute to this and Uwe for reviewing.
>
>
>
> -Dan
>
>
>
>
>
>
>
> On Thu, Feb 28, 2019, 12:26 PM Ryan Blue <rb...@netflix.com.invalid>
> wrote:
>
> Hi everyone,
>
>
>
> One of our contributors, Ted, has done a lot of work on an initial python
> implementation and Uwe was kind enough to review it. Here's the PR:
>
>
>
> https://github.com/apache/incubator-iceberg/pull/54 [github.com]
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__github.com_apache_incubator-2Diceberg_pull_54%26d%3DDwMFaQ%26c%3Dizlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8%26r%3DhzwIMNQ9E99EMYGuqHI0kXhVbvX3nU3OSDadUnJxjAs%26m%3D2fd2BMX_B8e6HdkY_gBWAhTDBM6ub2f3wG910jf-Itw%26s%3Dta9z2acUFCvQRc67MnbJypCG90OL1VuMFEmnd0ymOVA%26e%3D&data=02%7C01%7Cxcollazo%40adobe.com%7C02ff98f47e53462cfee308d69dd3b28c%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636869929896506719&sdata=lzZywzfkYZzSdeBWs0lILiUvV6sv2KUSSNK2nW7ztns%3D&reserved=0>
>
>
>
> Because this is a brand-new implementation, the PR is huge: 157 new files.
> That makes it really tough to review in depth, and also really time
> consuming to update and maintain. What I suggest is committing the PR as-is
> now that it has passed a round of reviews. Then we can improve it in
> smaller pull requests.
>
>
>
> Are there any objections to this plan or other thoughts?
>
>
>
> I think that the python implementation would not be included in the first
> Apache Iceberg release. I would prefer to release the python implementation
> on a separate release cycle so that Java blockers don't prevent a Python
> bug fix and vice versa.
>
>
>
> rb
>
>
>
> --
>
> Ryan Blue
>
> Software Engineer
>
> Netflix
>
>
>
>
> --
>
> Ryan Blue
>
> Software Engineer
>
> Netflix
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: [DISCUSS] Python implementation

Reply via email to