Thanks Mich

> created on driver memory

That I hadn't anticipated. Are you sure?
I understood that caching a table pegged the RDD partitions into the memory
of the executors holding the partition.




On Sun, Feb 16, 2025 at 11:17 AM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> yep. created on driver memory. watch for OOM if the size becomes too large
>
> spark-submit --driver-memory 8G ...
>
> HTH
>
> Dr Mich Talebzadeh,
> Architect | Data Science | Financial Crime | Forensic Analysis | GDPR
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
>
>
> On Sun, 16 Feb 2025 at 09:16, Tim Robertson <timrobertson...@gmail.com>
> wrote:
>
>> Answering my own question. Global temp views get created in the
>> global_temp database, so can be accessed thusly.
>>
>> Thanks
>>
>> Dataset<Row> s = spark.read().parquet("/tmp/svampeatlas/*");
>> s.createOrReplaceGlobalTempView("occurrence_svampe");
>> spark.catalog().cacheTable("global_temp.occurrence_svampe");
>>
>>
>> On Sun, Feb 16, 2025 at 10:05 AM Tim Robertson <timrobertson...@gmail.com>
>> wrote:
>>
>>> Hi folks
>>>
>>> Is it possible to cache a table for shared use across sessions with
>>> spark connect?
>>> I'd like to load a read only table once that many sessions will then
>>> query to improve performance.
>>>
>>> This is an example of the kind of thing that I have been trying, but
>>> have not succeeded with.
>>>
>>>   SparkSession spark =
>>> SparkSession.builder().remote("sc://localhost").getOrCreate();
>>>   Dataset<Row> s = spark.read().parquet("/tmp/svampeatlas/*");
>>>
>>>   // this works if it is not "global"
>>>   s.createOrReplaceGlobalTempView("occurrence_svampe");
>>>   spark.catalog().cacheTable("occurrence_svampe");
>>>
>>>   // this fails with a table not found when a global view is used
>>>   spark
>>>       .sql("SELECT * FROM occurrence_svampe")
>>>       .write()
>>>       .parquet("/tmp/export");
>>>
>>> Thank you
>>> Tim
>>>
>>

Reply via email to