Awesome, thank you Michael for the detailed example!
I'll look into whether I can use this approach for my use case. If so, I
could avoid the overhead of repeatedly registering a temp table for one-off
queries, instead registering the table once and relying on the injected
strategy. Don't know how
registerTempTable is backed by an in-memory hash table that maps table name
(a string) to a logical query plan. Fragments of that logical query plan
may or may not be cached (but calling register alone will not result in any
materialization of results). In Spark 2.0 we renamed this function to
cr
it would be great if we establish this.
I know in Hive these temporary tables "CREATE TEMPRARY TABLE ..." are
private to the session and are put in a hidden staging directory as below
/user/hive/warehouse/.hive-staging_hive_2016-07-10_22-58-47_319_5605745346163312826-10
and removed when the sess
Thanks for the link, I hadn't come across this.
According to https://forums.databricks.com/questions/400/what-is-the-
> difference-between-registertemptable-a.html
>
> and I quote
>
> "registerTempTable()
>
> registerTempTable() creates an in-memory table that is scoped to the
> cluster in which i
A bit of gray area here I am afraid, I was trying to experiment with it
According to
https://forums.databricks.com/questions/400/what-is-the-difference-between-registertemptable-a.html
and I quote
"registerTempTable()
registerTempTable() creates an in-memory table that is scoped to the
cluster
Hi again Mich,
"But the thing is that I don't explicitly cache the tempTables ..".
>
> I believe tempTable is created in-memory and is already cached
>
That surprises me since there is a sqlContext.cacheTable method to
explicitly cache a table in memory. Or am I missing something? This could
expl
well I suppose one can drop tempTable as below
scala> df.registerTempTable("tmp")
scala> spark.sql("select count(1) from tmp").show
++
|count(1)|
++
| 904180|
++
scala> spark.sql("drop table if exists tmp")
res22: org.apache.spark.sql.DataFrame = []
Also your point
"B
Hi Mich,
Thank you again for your reply.
As I see you are caching the table already sorted
>
> val keyValRDDSorted = keyValRDD.sortByKey().cache
>
> and the next stage is you are creating multiple tempTables (different
> ranges) that cache a subset of rows already cached in RDD. The data stored
>
Hi Michael,
As I see you are caching the table already sorted
val keyValRDDSorted = keyValRDD.sortByKey().cache
and the next stage is you are creating multiple tempTables (different
ranges) that cache a subset of rows already cached in RDD. The data stored
in tempTable is in Hive columnar format
Hi Mich,
Thank you for your quick reply!
What type of table is the underlying table? Is it Hbase, Hive ORC or what?
>
It is a custom datasource, but ultimately backed by HBase.
> By Key you mean a UNIQUE ID or something similar and then you do multiple
> scans on the tempTable which stores dat
Hi Michael.
What type of table is the underlying table? Is it Hbase, Hive ORC or what?
By Key you mean a UNIQUE ID or something similar and then you do multiple
scans on the tempTable which stores data using in-memory columnar format.
That is the optimisation of tempTable storage as far as I kno
11 matches
Mail list logo