Re: [PySpark] Revisiting PySpark type annotations

2020-08-27 Thread Hyukjin Kwon
Thanks Maciej and Fokko. 2020년 8월 28일 (금) 오전 6:09, Maciej 님이 작성: > On my side, I'll try to identify any possible problems by the end of the > week or so (at somewhat crude inspection there is nothing unexpected or > particularly hard to resolve, but sometimes problem occur when you try to > refin

Re: [PySpark] Revisiting PySpark type annotations

2020-08-27 Thread Maciej
On my side, I'll try to identify any possible problems by the end of the week or so (at somewhat crude inspection there is nothing unexpected or particularly hard to resolve, but sometimes problem occur when you try to refine things) and I'll post an update. Maybe we could take it from there? In g

Re: [PySpark] Revisiting PySpark type annotations

2020-08-27 Thread Maciej
Oh, this is probably because of how annotations are handled. In general stubs take preference over inline annotations and are considered the only source of type hints, unless packaged is marked as partially typed (https://www.python.org/dev/peps/pep-0561/#id21). In such case however is all-or-noth

Re: [PySpark] Revisiting PySpark type annotations

2020-08-27 Thread Driesprong, Fokko
Looking at it a second time, I think it is only mypy that's complaining: MacBook-Pro-van-Fokko:spark fokkodriesprong$ git diff *diff --git a/python/pyspark/accumulators.pyi b/python/pyspark/accumulators.pyi* *index f60de25704..6eafe46a46 100644* *--- a/python/pyspark/accumulators.pyi* *+++ b/p

Re: [PySpark] Revisiting PySpark type annotations

2020-08-27 Thread Maciej
Well, technically speaking annotation and actual are not the same thing. Many parts of Spark API might require heavy overloads to either capture relationships between arguments (for example in case of ML) or to capture at least rudimentary relationships between inputs and outputs (i.e. udfs). Just

Re: [PySpark] Revisiting PySpark type annotations

2020-08-27 Thread Maciej
That doesn't sound right. Would it be a problem for you to provide reproducible example? On 8/27/20 6:09 PM, Driesprong, Fokko wrote: > Today I've updated [SPARK-17333][PYSPARK] Enable mypy on the > repository  and while > doing so I've noticed that all

Re: [PySpark] Revisiting PySpark type annotations

2020-08-27 Thread Driesprong, Fokko
Thanks for sharing Hyukjin, however, I'm not sure if we're taking the right direction. Today I've updated [SPARK-17333][PYSPARK] Enable mypy on the repository and while doing so I've noticed that all the methods that aren't in the pyi file are *unable

Re: [PySpark] Revisiting PySpark type annotations

2020-08-27 Thread Hyukjin Kwon
Okay, it took me a while because I had to check the options and feasibility we discussed here. TL;DR: I think we can just port directly pyi files as are into PySpark main repository. I would like to share only the key points here because it looks like I, Maciej and people here agree with this dir