Thanks Mich > created on driver memory
That I hadn't anticipated. Are you sure? I understood that caching a table pegged the RDD partitions into the memory of the executors holding the partition. On Sun, Feb 16, 2025 at 11:17 AM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > yep. created on driver memory. watch for OOM if the size becomes too large > > spark-submit --driver-memory 8G ... > > HTH > > Dr Mich Talebzadeh, > Architect | Data Science | Financial Crime | Forensic Analysis | GDPR > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > > > > On Sun, 16 Feb 2025 at 09:16, Tim Robertson <timrobertson...@gmail.com> > wrote: > >> Answering my own question. Global temp views get created in the >> global_temp database, so can be accessed thusly. >> >> Thanks >> >> Dataset<Row> s = spark.read().parquet("/tmp/svampeatlas/*"); >> s.createOrReplaceGlobalTempView("occurrence_svampe"); >> spark.catalog().cacheTable("global_temp.occurrence_svampe"); >> >> >> On Sun, Feb 16, 2025 at 10:05 AM Tim Robertson <timrobertson...@gmail.com> >> wrote: >> >>> Hi folks >>> >>> Is it possible to cache a table for shared use across sessions with >>> spark connect? >>> I'd like to load a read only table once that many sessions will then >>> query to improve performance. >>> >>> This is an example of the kind of thing that I have been trying, but >>> have not succeeded with. >>> >>> SparkSession spark = >>> SparkSession.builder().remote("sc://localhost").getOrCreate(); >>> Dataset<Row> s = spark.read().parquet("/tmp/svampeatlas/*"); >>> >>> // this works if it is not "global" >>> s.createOrReplaceGlobalTempView("occurrence_svampe"); >>> spark.catalog().cacheTable("occurrence_svampe"); >>> >>> // this fails with a table not found when a global view is used >>> spark >>> .sql("SELECT * FROM occurrence_svampe") >>> .write() >>> .parquet("/tmp/export"); >>> >>> Thank you >>> Tim >>> >>