Re: Spark Views Functioning

Mich Talebzadeh Fri, 26 Mar 2021 08:24:48 -0700

My view is that temporary views *createOrReplaceTempView*  or its
predecessor  *registerTempTable* are created in the driver memory. The dag
states


scala> val sales = spark.read.format("jdbc").options(
     |        Map("url" -> _ORACLEserver,
     |        "dbtable" -> "(SELECT * FROM sh.sales)",
     |        "user" -> _username,
     |        "password" -> _password)).load
sales: org.apache.spark.sql.DataFrame = [PROD_ID: decimal(38,10), CUST_ID:
decimal(38,10) ... 5 more fields]

scala> sales.createOrReplaceTempView("sales")


Execute CreateViewCommand
 Details

== Physical Plan ==
Execute CreateViewCommand (1)
   +- CreateViewCommand (2)
         +- LogicalRelation (3)


(1) Execute CreateViewCommand
Output: []

(2) CreateViewCommand
Arguments: `tmp`, false, true, LocalTempView

(3) LogicalRelation
Arguments: JDBCRelation((SELECT * FROM sh.sales)) [numPartitions=1],
[PROD_ID#24, CUST_ID#25, TIME_ID#26, CHANNEL_ID#27, PROMO_ID#28,
QUANTITY_SOLD#29, AMOUNT_SOLD#30], false



So behind the scene you still work on the data frame itself?



   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Fri, 26 Mar 2021 at 13:54, Sean Owen <sro...@gmail.com> wrote:

> Views are simply bookkeeping about how the query is executed, like a
> DataFrame. There is no data or result to store; it's just how to run a
> query. The views exist on the driver. The query executes like any other, on
> the cluster.
>
> On Fri, Mar 26, 2021 at 3:38 AM Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>>
>> As a first guess, where do you think this view is created in a
>> distributed environment?
>>
>> The whole purpose is fast access to this temporary storage (shared among
>> executors in this job) and that storage is only materialised after an
>> action is performed.
>>
>> scala> val sales = spark.read.format("jdbc").options(
>>      |        Map("url" -> _ORACLEserver,
>>      |        "dbtable" -> "(SELECT * FROM sh.sales)",
>>      |        "user" -> _username,
>>      |        "password" -> _password)).load
>> sales: org.apache.spark.sql.DataFrame = [PROD_ID: decimal(38,10),
>> CUST_ID: decimal(38,10) ... 5 more fields]
>>
>> scala> sales.createOrReplaceTempView("sales")
>>
>> scala> spark.sql("select count(1) from sales").show
>> +--------+
>> |count(1)|
>> +--------+
>> |  918843|
>> +--------+
>>
>> HTH
>>
>>
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Fri, 26 Mar 2021 at 06:55, Kushagra Deep <kushagra.d...@mobileum.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> I just wanted to know that when we create a 'createOrReplaceTempView' on
>>> a spark dataset, where does the view reside ? Does all the data come to
>>> driver and the view is created ? Or individual executors have part of the
>>> views (based on the data each executor has) with them , so that when we
>>> query a view, the query runs on each part of data that is there in every
>>> executor?
>>>
>>>
>>>
>>> Get Outlook for Android <https://aka.ms/ghei36>
>>>
>>

Re: Spark Views Functioning

Reply via email to