wengh commented on code in PR #50684: URL: https://github.com/apache/spark/pull/50684#discussion_r2071054468
########## python/docs/source/user_guide/sql/python_data_source.rst: ########## @@ -356,17 +356,28 @@ For library that are used inside a method, it must be imported inside the method from pyspark import TaskContext context = TaskContext.get() +Mutating State +~~~~~~~~~~~~~~ +Some methods such as DataSourceReader.read() and DataSourceReader.partitions() must be stateless. Changes to the object state made in these methods are not guaranteed to be visible or invisible to future invocations. + +Other methods such as DataSource.schema() and DataSourceStreamReader.latestOffset() can be stateful. Changes to the object state made in these methods are visible to future invocations. + +Refer to the documentation of each method for more details. Review Comment: most methods can change state so I guess we can just list the exceptions here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org