Hi Guys, I have this question for a very long time and after diving into the source code(specifically from the links below) I have a feeling that the lineage of an RDD (the transformations) are converted into byte code and stored in memory or disk. or if I were to ask another question on a similar note do we ever store JVM byte code or python byte code in memory or disk? This make sense to me because if we were to construct an RDD after a node failure we need to go through the lineage and execute the respective transformations so storing their byte codes does make sense however many people seem to disagree with me so it would be great if someone can clarify.
https://github.com/apache/spark/blob/6ee40d2cc5f467c78be662c1639fc3d5b7f796cf/python/pyspark/rdd.py#L1452 https://github.com/apache/spark/blob/6ee40d2cc5f467c78be662c1639fc3d5b7f796cf/python/pyspark/rdd.py#L1471 https://github.com/apache/spark/blob/6ee40d2cc5f467c78be662c1639fc3d5b7f796cf/python/pyspark/rdd.py#L229 https://github.com/apache/spark/blob/master/python/pyspark/cloudpickle.py#L241