Seems useful to do. Is there a way to do this so it doesn't break Python 2.x?
On Sun, May 14, 2017 at 11:44 PM, Maciej Szymkiewicz <mszymkiew...@gmail.com > wrote: > Hi everyone, > > For the last few months I've been working on static type annotations for > PySpark. For those of you, who are not familiar with the idea, typing hints > have been introduced by PEP 484 (https://www.python.org/dev/peps/pep-0484/) > and further extended with PEP 526 (https://www.python.org/dev/pe > ps/pep-0526/) with the main goal of providing information required for > static analysis. Right now there a few tools which support typing hints, > including Mypy (https://github.com/python/mypy) and PyCharm ( > https://www.jetbrains.com/help/pycharm/2017.1/type-hinting-in-pycharm.html). > Type hints can be added using function annotations ( > https://www.python.org/dev/peps/pep-3107/, Python 3 only), docstrings, or > source independent stub files (https://www.python.org/dev/pe > ps/pep-0484/#stub-files). Typing is optional, gradual and has no runtime > impact. > > At this moment I've annotated majority of the API, including majority of > pyspark.sql > and pyspark.ml. At this moment project is still rough around the edges, > and may result in both false positive and false negatives, but I think it > become mature enough to be useful in practice. > The current version is compatible only with Python 3, but it is possible, > with some limitations, to backport it to Python 2 (though it is not on my > todo list). > > There is a number of possible benefits for PySpark users and developers: > > - Static analysis can detect a number of common mistakes to prevent > runtime failures. Generic self is still fairly limited, so it is more > useful with DataFrames, SS and ML than RDD, DStreams or RDD. > - Annotations can be used for documenting complex signatures ( > https://git.io/v95JN) including dependencies on arguments and value ( > https://git.io/v95JA). > - Detecting possible bugs in Spark (SPARK-20631) . > - Showing API inconsistencies. > > Roadmap > > - Update the project to reflect Spark 2.2. > - Refine existing annotations. > > If there will be enough interest I am happy to contribute this back to > Spark or submit to Typeshed (https://github.com/python/typeshed - this > would require a formal ASF approval, and since Typeshed doesn't provide > versioning, is probably not the best option in our case). > > Further inforamtion: > > - https://github.com/zero323/pyspark-stubs - GitHub repository > > > - https://speakerdeck.com/marcobonzanini/static-type-analysis- > for-robust-data-products-at-pydata-london-2017 > > <https://speakerdeck.com/marcobonzanini/static-type-analysis-for-robust-data-products-at-pydata-london-2017> > - interesting presentation by Marco Bonzanini > > -- > Best, > Maciej > >