Spark connect: Table caching for global use?

Tim Robertson Sun, 16 Feb 2025 01:07:00 -0800

Hi folks

Is it possible to cache a table for shared use across sessions with spark
connect?
I'd like to load a read only table once that many sessions will then
query to improve performance.


This is an example of the kind of thing that I have been trying, but have
not succeeded with.

  SparkSession spark =
SparkSession.builder().remote("sc://localhost").getOrCreate();
  Dataset<Row> s = spark.read().parquet("/tmp/svampeatlas/*");

  // this works if it is not "global"
  s.createOrReplaceGlobalTempView("occurrence_svampe");
  spark.catalog().cacheTable("occurrence_svampe");

  // this fails with a table not found when a global view is used
  spark
      .sql("SELECT * FROM occurrence_svampe")
      .write()
      .parquet("/tmp/export");

Thank you
Tim

Spark connect: Table caching for global use?

Reply via email to