hi, I'm learning spark, and wonder when to delete shuffle data, I find the
ContextCleaner class which clean the shuffle data when shuffle dependency
is GC-ed. Based on source code, the shuffle dependency is gc-ed only when
active job finish, but i'm not sure, Could you explain the life cycle of a
This is a user-list question, not a dev-list question. Moving this conversation
to the user list and BCC-ing the dev list.
Also, this statement
> We are not validating against table or column existence.
is not correct. When you call spark.sql(…), Spark will lookup the table
references and fail
Yes, you can validate the syntax of your PySpark SQL queries without
connecting to an actual dataset or running the queries on a cluster.
PySpark provides a method for syntax validation without executing the
query. Something like below
__
/ __/__ ___ _/ /__
_\ \/ _ \