It doesn't break anything at all. You can take stub files as-is, put these into PySpark root, and as long as users are not interested in type checking, it won't have any runtime impact.
Surprisingly the current MyPy build (mypy==0.511) reports only one incompatibility with Python 2 (dynamic metaclasses), which is could be resolved without significant loss of function. On 05/23/2017 12:08 PM, Reynold Xin wrote: > Seems useful to do. Is there a way to do this so it doesn't break > Python 2.x? > > > On Sun, May 14, 2017 at 11:44 PM, Maciej Szymkiewicz > <mszymkiew...@gmail.com <mailto:mszymkiew...@gmail.com>> wrote: > > Hi everyone, > > For the last few months I've been working on static type > annotations for PySpark. For those of you, who are not familiar > with the idea, typing hints have been introduced by PEP 484 > (https://www.python.org/dev/peps/pep-0484/ > <https://www.python.org/dev/peps/pep-0484/>) and further extended > with PEP 526 (https://www.python.org/dev/peps/pep-0526/ > <https://www.python.org/dev/peps/pep-0526/>) with the main goal of > providing information required for static analysis. Right now > there a few tools which support typing hints, including Mypy > (https://github.com/python/mypy <https://github.com/python/mypy>) > and PyCharm > > (https://www.jetbrains.com/help/pycharm/2017.1/type-hinting-in-pycharm.html > > <https://www.jetbrains.com/help/pycharm/2017.1/type-hinting-in-pycharm.html>). > > Type hints can be added using function annotations > (https://www.python.org/dev/peps/pep-3107/ > <https://www.python.org/dev/peps/pep-3107/>, Python 3 only), > docstrings, or source independent stub files > (https://www.python.org/dev/peps/pep-0484/#stub-files > <https://www.python.org/dev/peps/pep-0484/#stub-files>). Typing is > optional, gradual and has no runtime impact. > > At this moment I've annotated majority of the API, including > majority of pyspark.sql and pyspark.ml <http://pyspark.ml>. At > this moment project is still rough around the edges, and may > result in both false positive and false negatives, but I think it > become mature enough to be useful in practice. > > The current version is compatible only with Python 3, but it is > possible, with some limitations, to backport it to Python 2 > (though it is not on my todo list). > > There is a number of possible benefits for PySpark users and > developers: > > * Static analysis can detect a number of common mistakes to > prevent runtime failures. Generic self is still fairly > limited, so it is more useful with DataFrames, SS and ML than > RDD, DStreams or RDD. > * Annotations can be used for documenting complex signatures > (https://git.io/v95JN) including dependencies on arguments and > value (https://git.io/v95JA). > * Detecting possible bugs in Spark (SPARK-20631) . > * Showing API inconsistencies. > > Roadmap > > * Update the project to reflect Spark 2.2. > * Refine existing annotations. > > If there will be enough interest I am happy to contribute this > back to Spark or submit to Typeshed > (https://github.com/python/typeshed > <https://github.com/python/typeshed> - this would require a > formal ASF approval, and since Typeshed doesn't provide > versioning, is probably not the best option in our case). > > Further inforamtion: > > * https://github.com/zero323/pyspark-stubs > <https://github.com/zero323/pyspark-stubs> - GitHub repository > > * > https://speakerdeck.com/marcobonzanini/static-type-analysis-for-robust-data-products-at-pydata-london-2017 > > <https://speakerdeck.com/marcobonzanini/static-type-analysis-for-robust-data-products-at-pydata-london-2017> > - interesting presentation by Marco Bonzanini > > -- > Best, > Maciej > > -- Maciej Szymkiewicz
signature.asc
Description: OpenPGP digital signature