Re: Spark connect: Table caching for global use?

2025-02-16 Thread Mich Talebzadeh
Ok let us look at this - Temporary Views, Metadata is stored on the driver; *data remains distributed across executors.* - Caching/Persisting, *Data is stored in the executors' memory or disk. * - The statement *"created on driver memory"* refers to the metadata of temp

Re: Spark connect: Table caching for global use?

2025-02-16 Thread Tim Robertson
Thanks Mich > created on driver memory That I hadn't anticipated. Are you sure? I understood that caching a table pegged the RDD partitions into the memory of the executors holding the partition. On Sun, Feb 16, 2025 at 11:17 AM Mich Talebzadeh wrote: > yep. created on driver memory. watch

Re: Spark connect: Table caching for global use?

2025-02-16 Thread Mich Talebzadeh
yep. created on driver memory. watch for OOM if the size becomes too large spark-submit --driver-memory 8G ... HTH Dr Mich Talebzadeh, Architect | Data Science | Financial Crime | Forensic Analysis | GDPR view my Linkedin profile

Re: Spark connect: Table caching for global use?

2025-02-16 Thread Tim Robertson
Answering my own question. Global temp views get created in the global_temp database, so can be accessed thusly. Thanks Dataset s = spark.read().parquet("/tmp/svampeatlas/*"); s.createOrReplaceGlobalTempView("occurrence_svampe"); spark.catalog().cacheTable("global_temp.occurrence_svampe"); On S

Spark connect: Table caching for global use?

2025-02-16 Thread Tim Robertson
Hi folks Is it possible to cache a table for shared use across sessions with spark connect? I'd like to load a read only table once that many sessions will then query to improve performance. This is an example of the kind of thing that I have been trying, but have not succeeded with. SparkSess