Hi Maciej, Hyukjin,

Did you find any time to discuss adding the types to the Python repository?
Would love to know what came out of it.

Cheers, Fokko

Op wo 5 aug. 2020 om 10:14 schreef Driesprong, Fokko <fo...@driesprong.frl>:

> Mostly echoing stuff that we've discussed in
> https://github.com/apache/spark/pull/29180, but good to have this also on
> the dev-list.
>
> > So IMO maintaining outside in a separate repo is going to be harder.
> That was why I asked.
>
> I agree with Felix, having this inside of the project would make it much
> easier to maintain. Having it inside of the ASF might be easier to port the
> pyi files to the actual Spark repository.
>
> > FWIW, NumPy took this approach. they made a separate repo, and merged it
> into the main repo after it became stable.
>
> As Maciej pointed out:
>
> > As of POC ‒ we have stubs, which have been maintained over three years
> now and cover versions between 2.3 (though these are fairly limited) to,
> with some lag, current master.
>
> What would be required to mark it as stable?
>
> > I guess all depends on how we envision the future of annotations
> (including, but not limited to, how conservative we want to be in the
> future). Which is probably something that should be discussed here.
>
> I'm happy to motivate people to contribute type hints, and I believe it is
> a very accessible way to get more people involved in the Python codebase.
> Using the ASF model we can ensure that we require committers/PMC to sign
> off on the annotations.
>
> > Indeed, though the possible advantage is that in theory, you can have
> different release cycle than for the main repo (I am not sure if that's
> feasible in practice or if that was the intention).
>
> Personally, I don't think we need a different cycle if the type hints are
> part of the code itself.
>
> > If my understanding is correct, pyspark-stubs is still incomplete and
> does not annotate types in some other APIs (by using Any). Correct me if I
> am wrong, Maciej.
>
> For me, it is a bit like code coverage. You want this to be high to make
> sure that you cover most of the APIs, but it will take some time to make it
> complete.
>
> For me, it feels a bit like a chicken and egg problem. Because the type
> hints are in a separate repository, they will always lag behind. Also, it
> is harder to spot where the gaps are.
>
> Cheers, Fokko
>
>
>
> Op wo 5 aug. 2020 om 05:51 schreef Hyukjin Kwon <gurwls...@gmail.com>:
>
>> Oh I think I caused some confusion here.
>> Just for clarification, I wasn’t saying we must port this into a separate
>> repo now. I was saying it can be one of the options we can consider.
>>
>> For a bit of more context:
>> This option was considered as, roughly speaking, an invalid option and it
>> might need an incubation process as a separate project.
>> After some investigations, I found that this is still a valid option and
>> we can take this as the part of Apache Spark but in a separate repo.
>>
>> FWIW, NumPy took this approach. they made a separate repo
>> <https://github.com/numpy/numpy-stubs>, and merged it into the main repo
>> <https://github.com/numpy/numpy-stubs> after it became stable.
>>
>>
>> My only major concerns are:
>>
>>    - the possibility to fundamentally change the approach in
>>    pyspark-stubs <https://github.com/zero323/pyspark-stubs>. It’s not
>>    because how it was done is wrong but because how Python type hinting 
>> itself
>>    evolves.
>>    - If my understanding is correct, pyspark-stubs
>>    <https://github.com/zero323/pyspark-stubs> is still incomplete and
>>    does not annotate types in some other APIs (by using Any). Correct me if I
>>    am wrong, Maciej.
>>
>> I’ll have a short sync with him and share to understand better since he’d
>> probably know the context best in PySpark type hints and I know some
>> contexts in ASF and Apache Spark.
>>
>>
>>
>> 2020년 8월 5일 (수) 오전 6:31, Maciej Szymkiewicz <mszymkiew...@gmail.com>님이
>> 작성:
>>
>>> Indeed, though the possible advantage is that in theory, you can have
>>> different release cycle than for the main repo (I am not sure if that's
>>> feasible in practice or if that was the intention).
>>>
>>> I guess all depends on how we envision the future of annotations
>>> (including, but not limited to, how conservative we want to be in the
>>> future). Which is probably something that should be discussed here.
>>> On 8/4/20 11:06 PM, Felix Cheung wrote:
>>>
>>> So IMO maintaining outside in a separate repo is going to be harder.
>>> That was why I asked.
>>>
>>>
>>>
>>> ------------------------------
>>> *From:* Maciej Szymkiewicz <mszymkiew...@gmail.com>
>>> <mszymkiew...@gmail.com>
>>> *Sent:* Tuesday, August 4, 2020 12:59 PM
>>> *To:* Sean Owen
>>> *Cc:* Felix Cheung; Hyukjin Kwon; Driesprong, Fokko; Holden Karau;
>>> Spark Dev List
>>> *Subject:* Re: [PySpark] Revisiting PySpark type annotations
>>>
>>>
>>> On 8/4/20 9:35 PM, Sean Owen wrote
>>> > Yes, but the general argument you make here is: if you tie this
>>> > project to the main project, it will _have_ to be maintained by
>>> > everyone. That's good, but also exactly I think the downside we want
>>> > to avoid at this stage (I thought?) I understand for some
>>> > undertakings, it's just not feasible to start outside the main
>>> > project, but is there no proof of concept even possible before taking
>>> > this step -- which more or less implies it's going to be owned and
>>> > merged and have to be maintained in the main project.
>>>
>>>
>>> I think we have a bit different understanding here ‒ I believe we have
>>> reached a conclusion that maintaining annotations within the project is
>>> OK, we only differ when it comes to specific form it should take.
>>>
>>> As of POC ‒ we have stubs, which have been maintained over three years
>>> now and cover versions between 2.3 (though these are fairly limited) to,
>>> with some lag, current master.  There is some evidence there are used in
>>> the wild
>>> (
>>> https://github.com/zero323/pyspark-stubs/network/dependents?package_id=UGFja2FnZS02MzU1MTc4Mg%3D%3D
>>> ),
>>> there are a few contributors
>>> (https://github.com/zero323/pyspark-stubs/graphs/contributors) and at
>>> least some use cases (https://stackoverflow.com/q/40163106/). So,
>>> subjectively speaking, it seems we're already beyond POC.
>>>
>>> --
>>> Best regards,
>>> Maciej Szymkiewicz
>>>
>>> Web: https://zero323.net
>>> Keybase: https://keybase.io/zero323
>>> Gigs: https://www.codementor.io/@zero323
>>> PGP: A30CEF0C31A501EC
>>>
>>>
>>> --
>>> Best regards,
>>> Maciej Szymkiewicz
>>>
>>> Web: https://zero323.net
>>> Keybase: https://keybase.io/zero323
>>> Gigs: https://www.codementor.io/@zero323
>>> PGP: A30CEF0C31A501EC
>>>
>>>

Reply via email to