Hi everyone, I have two Spark jobs inside a Spark Application, which read from the same input file. They are executed in 2 threads.
Right now, I cache the input file into memory before executing these two jobs. Are there another ways to share their same input with just only one read? I know there is something called Multiple Query Optimization, but I don't know if it can be applicable on Spark (or SparkSQL) or not? Thank you. Quang-Nhat