Ok let us look at this -
Temporary Views, Metadata is stored on the driver; *data remains distributed across executors.* - Caching/Persisting, *Data is stored in the executors' memory or disk. * - The statement *"created on driver memory"* refers to the metadata of temporary views, not the actual data. The data itself is not loaded into the driver unless explicitly collected. In summary: - Data is stored in the executors' memory or disk during normal operations. - The driver only holds metadata unless you explicitly collect data to it. - Temporary views and caching/persisting are different mechanisms with different memory implications. HTH Dr Mich Talebzadeh, Architect | Data Science | Financial Crime | Forensic Analysis | GDPR view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> On Sun, 16 Feb 2025 at 13:29, Tim Robertson <timrobertson...@gmail.com> wrote: > Thanks Mich > > > created on driver memory > > That I hadn't anticipated. Are you sure? > I understood that caching a table pegged the RDD partitions into the > memory of the executors holding the partition. > > > > > On Sun, Feb 16, 2025 at 11:17 AM Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > >> yep. created on driver memory. watch for OOM if the size becomes too large >> >> spark-submit --driver-memory 8G ... >> >> HTH >> >> Dr Mich Talebzadeh, >> Architect | Data Science | Financial Crime | Forensic Analysis | GDPR >> >> view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> >> >> >> On Sun, 16 Feb 2025 at 09:16, Tim Robertson <timrobertson...@gmail.com> >> wrote: >> >>> Answering my own question. Global temp views get created in the >>> global_temp database, so can be accessed thusly. >>> >>> Thanks >>> >>> Dataset<Row> s = spark.read().parquet("/tmp/svampeatlas/*"); >>> s.createOrReplaceGlobalTempView("occurrence_svampe"); >>> spark.catalog().cacheTable("global_temp.occurrence_svampe"); >>> >>> >>> On Sun, Feb 16, 2025 at 10:05 AM Tim Robertson < >>> timrobertson...@gmail.com> wrote: >>> >>>> Hi folks >>>> >>>> Is it possible to cache a table for shared use across sessions with >>>> spark connect? >>>> I'd like to load a read only table once that many sessions will then >>>> query to improve performance. >>>> >>>> This is an example of the kind of thing that I have been trying, but >>>> have not succeeded with. >>>> >>>> SparkSession spark = >>>> SparkSession.builder().remote("sc://localhost").getOrCreate(); >>>> Dataset<Row> s = spark.read().parquet("/tmp/svampeatlas/*"); >>>> >>>> // this works if it is not "global" >>>> s.createOrReplaceGlobalTempView("occurrence_svampe"); >>>> spark.catalog().cacheTable("occurrence_svampe"); >>>> >>>> // this fails with a table not found when a global view is used >>>> spark >>>> .sql("SELECT * FROM occurrence_svampe") >>>> .write() >>>> .parquet("/tmp/export"); >>>> >>>> Thank you >>>> Tim >>>> >>>