I see there's a comment in the TaskInfo class that the index may not be the
same as the ID of the RDD partition the task is computing. Under what
circumstances *will* the ID by the same? If there are zero guarantees, any
suggestions on how to grab this info from the scheduler to populate a new
fiel
Hi
As the documentation said:
spark.python.worker.memory
Amount of memory to use per python worker process during aggregation, in
the same format as JVM memory strings (e.g. 512m, 2g). If the memory used
during aggregation goes above this amount, it will spill the data into
disks.
I search the con