Re: Spark SQL question

2023-01-28 Thread Bjørn Jørgensen
Hi Mich. This is a Spark user group mailing list where people can ask *any* questions about spark. You know SQL and streaming, but I don't think it's necessary to start a replay with "*LOL*" to the question that's being asked. No questions are to stupid to be asked. lør. 28. jan. 2023 kl. 09:22 s

Re: Spark SQL question

2023-01-28 Thread Mich Talebzadeh
LOL First one spark-sql> select 1 as `data.group` from abc group by data.group; 1 Time taken: 0.198 seconds, Fetched 1 row(s) means that are assigning alias data.group to select and you are using that alias -> data.group in your group by statement This is equivalent to spark-sql> select 1 as

Re: Spark SQL question: why build hashtable for both sides in HashOuterJoin?

2014-10-08 Thread Liquan Pei
gmail.com ] > *Sent:* 2014年9月30日 18:34 > *To:* Haopu Wang > *Cc:* d...@spark.apache.org; user > *Subject:* Re: Spark SQL question: why build hashtable for both sides in > HashOuterJoin? > > Hi Haopu, > > How about full outer join? One hash table may not be efficient for this > case.

Re: Spark SQL question: why build hashtable for both sides in HashOuterJoin?

2014-10-08 Thread Matei Zaharia
g > Cc: d...@spark.apache.org; user > Subject: Re: Spark SQL question: why build hashtable for both sides in > HashOuterJoin? > > Hi Haopu, > > How about full outer join? One hash table may not be efficient for this case. > > Liquan > > On Mon, Sep 29, 201

RE: Spark SQL question: why build hashtable for both sides in HashOuterJoin?

2014-10-07 Thread Haopu Wang
...@spark.apache.org; user Subject: Re: Spark SQL question: why build hashtable for both sides in HashOuterJoin? Hi Haopu, How about full outer join? One hash table may not be efficient for this case. Liquan On Mon, Sep 29, 2014 at 11:47 PM, Haopu Wang wrote: Hi, Liquan, thanks for

Re: Spark SQL question: why build hashtable for both sides in HashOuterJoin?

2014-09-30 Thread Liquan Pei
Sent:* 2014年9月30日 12:31 > *To:* Haopu Wang > *Cc:* d...@spark.apache.org; user > *Subject:* Re: Spark SQL question: why build hashtable for both sides in > HashOuterJoin? > > > > Hi Haopu, > > > > My understanding is that the hashtable on both left and right side is used >

RE: Spark SQL question: why build hashtable for both sides in HashOuterJoin?

2014-09-29 Thread Haopu Wang
anks again! From: Liquan Pei [mailto:liquan...@gmail.com] Sent: 2014年9月30日 12:31 To: Haopu Wang Cc: d...@spark.apache.org; user Subject: Re: Spark SQL question: why build hashtable for both sides in HashOuterJoin? Hi Haopu, My understanding is that the hashtable on both left

Re: Spark SQL question: why build hashtable for both sides in HashOuterJoin?

2014-09-29 Thread Liquan Pei
Hi Haopu, My understanding is that the hashtable on both left and right side is used for including null values in result in an efficient manner. If hash table is only built on one side, let's say left side and we perform a left outer join, for each row in left side, a scan over the right side is n

Re: Spark SQL question: how to control the storage level of cached SchemaRDD?

2014-09-28 Thread Michael Armbrust
an I >> change the storage level? Because I have a big table there. >> >> >> >> Thanks! >> >> >> -- >> >> *From:* Cheng Lian [mailto:lian.cs@gmail.com] >> *Sent:* 2014年9月26日

Re: Spark SQL question: how to control the storage level of cached SchemaRDD?

2014-09-28 Thread Michael Armbrust
rialized 1x Replicated". How can I > change the storage level? Because I have a big table there. > > > > Thanks! > > > -- > > *From:* Cheng Lian [mailto:lian.cs@gmail.com] > *Sent:* 2014年9月26日 21:24 > *To:* Haopu Wang;

Re: Spark SQL question: is cached SchemaRDD storage controlled by "spark.storage.memoryFraction"?

2014-09-26 Thread Cheng Lian
Yes it is. The in-memory storage used with |SchemaRDD| also uses |RDD.cache()| under the hood. On 9/26/14 4:04 PM, Haopu Wang wrote: Hi, I'm querying a big table using Spark SQL. I see very long GC time in some stages. I wonder if I can improve it by tuning the storage parameter. The question