wengh commented on code in PR #50684:
URL: https://github.com/apache/spark/pull/50684#discussion_r2071064597


##########
python/docs/source/user_guide/sql/python_data_source.rst:
##########
@@ -356,17 +356,28 @@ For library that are used inside a method, it must be 
imported inside the method
         from pyspark import TaskContext
         context = TaskContext.get()
 
+Mutating State
+~~~~~~~~~~~~~~
+Some methods such as DataSourceReader.read() and DataSourceReader.partitions() 
must be stateless. Changes to the object state made in these methods are not 
guaranteed to be visible or invisible to future invocations.
+
+Other methods such as DataSource.schema() and 
DataSourceStreamReader.latestOffset() can be stateful. Changes to the object 
state made in these methods are visible to future invocations.
+
+Refer to the documentation of each method for more details.

Review Comment:
   The following methods should not mutate internal state. Changes to the 
object state made in these methods are not guaranteed to be visible or 
invisible to future calls.
   
   - DataSourceReader.partitions()
   - DataSourceReader.read()
   - DataSourceStreamReader.read()
   - SimpleDataSourceStreamReader.readBetweenOffsets()
   - All writer methods
   
   All other methods such as DataSource.schema() and 
DataSourceStreamReader.latestOffset() can be stateful. Changes to the object 
state made in these methods are visible to future calls.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to