Dear all, Here is a requirement I am thinking of implementing in Spark core. Please let me know if this is possible, and kindly provide your thoughts.
A user executes a query to fetch 1 million records from , let's say a database. We let the user store this as a dataframe, partitioned across the cluster. Another user , executed the same query from another session. Is there anyway that we can let the second user reuse the dataframe created by the first user? Can we have a master dataframe (or RDD) which stores the information about the current dataframes loaded and matches against any queries that are coming from other users? In this way, we will have a wonderful system which never allows same query to be executed and loaded again into the cluster memory. Best, Ravion