Seems useful to do. Is there a way to do this so it doesn't break Python
2.x?


On Sun, May 14, 2017 at 11:44 PM, Maciej Szymkiewicz <mszymkiew...@gmail.com
> wrote:

> Hi everyone,
>
> For the last few months I've been working on static type annotations for
> PySpark. For those of you, who are not familiar with the idea, typing hints
> have been introduced by PEP 484 (https://www.python.org/dev/peps/pep-0484/)
> and further extended with PEP 526 (https://www.python.org/dev/pe
> ps/pep-0526/) with the main goal of providing information required for
> static analysis. Right now there a few tools which support typing hints,
> including Mypy (https://github.com/python/mypy) and PyCharm (
> https://www.jetbrains.com/help/pycharm/2017.1/type-hinting-in-pycharm.html).
> Type hints can be added using function annotations (
> https://www.python.org/dev/peps/pep-3107/, Python 3 only), docstrings, or
> source independent stub files (https://www.python.org/dev/pe
> ps/pep-0484/#stub-files). Typing is optional, gradual and has no runtime
> impact.
>
> At this moment I've annotated majority of the API, including majority of 
> pyspark.sql
> and pyspark.ml. At this moment project is still rough around the edges,
> and may result in both false positive and false negatives, but I think it
> become mature enough to be useful in practice.
> The current version is compatible only with Python 3, but it is possible,
> with some limitations, to backport it to Python 2 (though it is not on my
> todo list).
>
> There is a number of possible benefits for PySpark users and developers:
>
>    - Static analysis can detect a number of common mistakes to prevent
>    runtime failures. Generic self is still fairly limited, so it is more
>    useful with DataFrames, SS and ML than RDD, DStreams or RDD.
>    - Annotations can be used for documenting complex signatures (
>    https://git.io/v95JN) including dependencies on arguments and value (
>    https://git.io/v95JA).
>    - Detecting possible bugs in Spark (SPARK-20631) .
>    - Showing API inconsistencies.
>
> Roadmap
>
>    - Update the project to reflect Spark 2.2.
>    - Refine existing annotations.
>
> If there will be enough interest I am happy to contribute this back to
> Spark or submit to Typeshed (https://github.com/python/typeshed -  this
> would require a formal ASF approval, and since Typeshed doesn't provide
> versioning, is probably not the best option in our case).
>
> Further inforamtion:
>
>    - https://github.com/zero323/pyspark-stubs - GitHub repository
>
>
>    - https://speakerdeck.com/marcobonzanini/static-type-analysis-
>    for-robust-data-products-at-pydata-london-2017
>    
> <https://speakerdeck.com/marcobonzanini/static-type-analysis-for-robust-data-products-at-pydata-london-2017>
>    - interesting presentation by Marco Bonzanini
>
> --
> Best,
> Maciej
>
>

Reply via email to