I am trying a run terasort in spark , for a 7 node cluster with only 10g of
data and executors get lost with GC overhead limit exceeded error.
This is what my cluster looks like -
- *Alive Workers:* 7
- *Cores in use:* 28 Total, 2 Used
- *Memory in use:* 56.0 GB Total, 1024.0 MB Used
on spark 1.5.2
I have a spark standalone cluster with 6 workers , I left the cluster idle
for 3 days and after 3 days I saw only 4 workers on the spark master UI , 2
workers died with the same exception -
Strange part is cluster was running stable for 2 days but on third day 2
workers abruptly die
orker nodes are configured wrongly like if
> SPARK_MASTER_IP is a hostname of Master Node and workers trying to connect
> to IP of master node. Check whether SPARK_MASTER_IP in Worker nodes are
> exactly the same as what Spark Master GUI shows.
>
>
> Thanks,
> Prabhu Joseph
>
>
rker nodes are configured wrongly like if
>>> SPARK_MASTER_IP is a hostname of Master Node and workers trying to connect
>>> to IP of master node. Check whether SPARK_MASTER_IP in Worker nodes are
>>> exactly the same as what Spark Master GUI shows.
>>>
>>&
Hey Rick ,
Not sure on this but similar situation happened with me, when starting
spark-shell it was starting a new cluster instead of using the existing
cluster and this new cluster was a single node cluster , that's why jobs
were taking forever to complete from spark-shell and were running much
f
my issue.
> In fact, a colleague pointed out that HIS (Cloudera) installation was
> defaulting to kryo for the spark-shell, which had an impact for some jobs.
> I couldn't find the document he was referring to as a source of this
> information, but the behavior sounds plausible at
Hey Robert you could use Zeppelin iInstead If you don't want to use beeline
.
On Monday, September 28, 2015, Robert Grandl
wrote:
> Thanks Mark. Do you know how ? In Spark standalone mode I use beeline to
> submit SQL scripts.
>
> In Spark/YARN, the only way I can see this will work is using
> s
Hi All,
I tried running spark word count and I have couple of questions -
I am analyzing stage 0 , i.e
*sc.textFile -> flatMap -> Map (Word count example)*
1) In the *Stage logs* under Application UI details for every task I am
seeing Shuffle write as 2.7 KB, *question - how can I know where al
ap and map are narrow dependencies, meaning
> they can usually happen on the local node, I bet shuffle is just sending
> out the textFile to a few nodes to distribute the partitions.
>
>
> --
> *From:* Kartik Mathur
> *Sent:* Thursday, October 1, 201
Hi
I am trying to better understand shuffle in spark .
Based on my understanding thus far ,
*Shuffle Write* : writes stage output for intermediate stage on local disk
if memory is not sufficient.,
Example , if each worker has 200 MB memory for intermediate results and the
results are 300MB then
> Maybe you can share more of your context if still unclear.
> I just made assumptions to give clarity on a similar thing.
>
> Nicu
> --
> *From:* Kartik Mathur
> *Sent:* Thursday, October 1, 2015 10:25 PM
> *To:* Nicolae Marasoiu
> *Cc:* user
to read (those pertaining to his assigned partitions). So he needs to
> pick them up from remote nodes which do have replicas of that data.
>
> After blocks are read into memory, flatMap and Map are local computations
> generating new RDDs and in the end the result is sent to the dri
e InputFormat dictates.*
>
> The shuffle can only be the part when a node opens an HDFS file for
> instance but the node does not have a local replica of the blocks which it
> needs to read (those pertaining to his assigned partitions). So he needs to
> pick them up from remote nodes wh
You can create log4j.properties under your SPARK_HOME/conf and set up these
properties -
log4j.rootCategory=INFO, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender
Hi,
I have some nightly jobs which runs every night but dies sometimes because
of unresponsive master , spark master logs says -
Not seeing much else there , what could possible cause an exception like
this.
*Exception in thread "main" java.util.concurrent.TimeoutException: Futures
timed out aft
Retrying what ? I want to know why is it died , and what can i do to
prevent ?
On Wed, Oct 14, 2015 at 5:20 PM, Raghavendra Pandey <
raghavendra.pan...@gmail.com> wrote:
> I fixed these timeout errors by retrying...
> On Oct 15, 2015 3:41 AM, "Kartik Mathur" wrote:
>
That will depend on what is your transformation , your code snippet might
help .
On Tue, Oct 20, 2015 at 1:53 AM, shahid ashraf wrote:
> Hi
>
> Any idea why is 50 GB shuffle read and write for 3.3 gb data
>
> On Mon, Oct 19, 2015 at 11:58 PM, Kartik Mathur
> wrote:
>
Don't use groupBy , use reduceByKey instead , groupBy should always be
avoided as it leads to lot of shuffle reads/writes.
On Fri, Oct 23, 2015 at 11:39 AM, pratik khadloya
wrote:
> Sorry i sent the wrong join code snippet, the actual snippet is
>
> ggImpsDf.join(
>aggRevenueDf,
>aggImps
18 matches
Mail list logo