As already pointed out by Nicholas, there is no Python 2 conflict here. Moreover, despite the fact that I used Python 3 specific feature, Python 2 users can benefit from the annotations as well in some circumstances (already mentioned MyPy is one option, PyCharm another, maybe some other tools as well, if not natively then, like Jupyter, through MyPy).
Nonetheless there are many factors to consider here. First and foremost if project has enough manpower to spare, to actually maintain manually curated annotations. While simple annotations can be generated automatically (static ones, can be created with stubgen, by reflection with MonkeyType), but these are fairly limited and sometimes truly monstrous. At this moment PySpark annotations consist of ~ 5KLOCs - some parts are close to trivial, other are rather, and sometimes require additional definitions. Since standards and tools evolve, this code that has to be actively maintained. This potentially means another stream of JIRA tickets to handle. Additionally, if annotations are to be used, maintainers should set clear goals. As annotations can vary from dynamic Any -> Any, through detailed annotations including generics (that's where most of the annotations for PySpark are at the point), to in-depth constraints on values (simple dependent types). Additionally one can choose between documenting factual relationships and recommendations (in other words, rejecting some values in the types system, that are allowed in practice). There is also a trade-off between completeness and the cost of maintenance. Finally it should be decided if annotations should cover only the public API (my choice), or internals as well, and if should be mandatory for the chosen API, or optional. Furthermore there are some challenges when it comes to PySpark dependencies, many of which don't have their own annotations. And there is of course a matter of annotating Py4j interfaces. Last but not least there is a question of testing and acceptance. Ideally one would run type checker of choice against examples and source, and accept annotations, if there is no conflict. In reality however, available tools have limitations, and can reject correct code (generics are particularly problematic here). Not to start with regressions and backward incompatible changes. From the other hand, checking only internal consistency (primary acceptance criterion used with annotations only project) can miss some obvious problems. There are possible solutions, but these don't come without a cost. Now the question is what are possible advantages of merging annotations into the official repository versus keeping these outside. Keeping things in sync and tapping into existing pool of contributors are the most obvious ones. Additionally it means bringing some benefits of annotations, even if the final user is not aware or not interested in typing at all (see PyCharm case). On the other hand, if user is aware of Python typing, there is little overhead of having a separate package. It is a lightweight dependency, with no executable code, and it is not required on the worker nodes. There is also more room for experimentation without strict release schedule. Anyway.... On my side I can donate existing annotations, help with the migration process, and provide some support during the transition period, if decision to include annotations in the main repository is made. However I don't have a strong opinion if such transition is required or not. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org