Re: Dataframe caching

Muthu Jayakumar Fri, 20 Jan 2017 07:57:24 -0800

I guess, this may help in your case?

https://spark.apache.org/docs/latest/sql-programming-guide.html#global-temporary-view


Thanks,
Muthu

On Fri, Jan 20, 2017 at 6:27 AM, ☼ R Nair (रविशंकर नायर) <
ravishankar.n...@gmail.com> wrote:

> Dear all,
>
> Here is a requirement I am thinking of implementing in Spark core. Please
> let me know if this is possible, and kindly provide your thoughts.
>
> A user executes a query to fetch 1 million records from , let's say a
> database. We let the user store this as a  dataframe, partitioned across
> the cluster.
>
> Another user , executed the same query from another session. Is there
> anyway that we can let the second user reuse the dataframe created by the
> first user?
>
> Can we have a master dataframe (or RDD) which stores the information about
> the current dataframes loaded and matches against any queries that are
> coming from other users?
>
> In this way, we will have a wonderful system which never allows same query
> to be executed and loaded again into the cluster memory.
>
> Best, Ravion
>

Re: Dataframe caching

Reply via email to