I am trying to understand Spark Architecture. For Dataframes that are created from python objects ie. that are *created in memory where are they stored ?*
Take following example: from pyspark.sql import Rowimport datetime courses = [ { 'course_id': 1, 'course_title': 'Mastering Python', 'course_published_dt': datetime.date(2021, 1, 14), 'is_active': True, 'last_updated_ts': datetime.datetime(2021, 2, 18, 16, 57, 25) } ] courses_df = spark.createDataFrame([Row(**course) for course in courses]) Where is the dataframe stored when I invoke the call: courses_df = spark.createDataFrame([Row(**course) for course in courses]) Does it: 1. Send the data to a random executor ? - Does this mean this counts as a shuffle ? 1. Or does it stay on the driver node ? - That does not make sense when the dataframe grows large. -- Regards, Sreyan Chakravarty