Is you sql works if do not runs a regex on strings and concatenates them, I
mean just Select the stuff without String operations?

On 24 September 2015 at 10:11, java8964 <java8...@hotmail.com> wrote:

> Try to increase partitions count, that will make each partition has less
> data.
>
> Yong
>
> ------------------------------
> Subject: Re: Java Heap Space Error
> From: yu...@useinsider.com
> Date: Thu, 24 Sep 2015 00:32:47 +0300
> CC: user@spark.apache.org
> To: java8...@hotmail.com
>
>
> Yes, it’s possible. I use S3 as data source. My external tables has
> partitioned. Belowed task is 193/200. Job has 2 stages and its 193. task of
> 200 in 2.stage because of sql.shuffle.partitions.
>
> How can i avoid this situation, this is my query:
>
> select userid,concat_ws(' ',collect_list(concat_ws(' ',if(productname is not 
> NULL,lower(productname),''),lower(regexp_replace(regexp_replace(substr(productcategory,2,length(productcategory)-2),'\"',''),\",\",'
>  '))))) inputlist from landing where dt='2015-9' and userid != '' and userid 
> is not null and userid is not NULL and pagetype = 'productDetail' group by 
> userid
>
>
> On 23 Sep 2015, at 23:55, java8964 <java8...@hotmail.com> wrote:
>
> Based on your description, you job shouldn't have any shuffle then, as you
> just apply regex and concatenation on the column, but there is one
> partition having 4.3M records to be read, vs less than 1M records for other
> partitions.
>
> Is that possible? It depends on what is the source of your data.
>
> If there is shuffle in your query (More than 2 stages generated by your
> query, and this is my guess of what happening), then it simple means that
> one partition having way more data than the rest of partitions.
>
> Yong
>
> ------------------------------
> From: yu...@useinsider.com
> Subject: Java Heap Space Error
> Date: Wed, 23 Sep 2015 23:07:17 +0300
> To: user@spark.apache.org
>
> What can cause this issue in the attached picture? I’m running and sql
> query which runs a regex on strings and concatenates them. Because of this
> task, my job gives java heap space error.
>
> <Screen Shot 2015-09-23 at 23.03.18.png>
>
>
>

-- 
This message and its attachments may contain legally privileged or 
confidential information. It is intended solely for the named addressee. If 
you are not the addressee indicated in this message or responsible for 
delivery of the message to the addressee, you may not copy or deliver this 
message or its attachments to anyone. Rather, you should permanently delete 
this message and its attachments and kindly notify the sender by reply 
e-mail. Any content of this message and its attachments which does not 
relate to the official business of the sending company must be taken not to 
have been sent or endorsed by that company or any of its related entities. 
No warranty is made that the e-mail or attachments are free from computer 
virus or other defect.

Reply via email to