I have a similar experience.
Using 32 machines, I can see than number of tasks (partitions) assigned to
executors (machines) is not even. Moreover, the distribution change every
stage (iteration).
I wonder why Spark needs to move partitions around any way, should not the
scheduler reduce network
Hi,
I wonder if it is possible to figure out the replication factor used in
GraphX partitioning from its log files.
--
Thanks,
-Khaled
This is an interesting discussion,
I have had some success running GraphX on large graphs with more than a
Billion edges using clusters of different size up to 64 machines. However,
the performance goes down when I double the cluster size to reach 128
machines of r3.xlarge. Does any one have exper
Hi all,
I have a problem running some algorithms on GraphX. Occasionally, it
stopped running without any errors. The task state is FINISHED, but the
executers state is KILLED. However, I can see that one job is not finished
yet. It took too much time (minutes) while every job/iteration were
typica
Hi,
I am using GRAPHX in standalone SPARK 1.5.1 in a medium size cluster (64+1).
I could execute PageRank with large number of iterations on this cluster.
However, when I run SSSP, it always fail at iteration 23 or 24. This is
always at after about 11 mins. Note that PageRank takes more than that
?
Thanks,
-Khaled
On Wed, Nov 4, 2015 at 7:21 AM, Adrian Tanase wrote:
> If some of the operations required involve shuffling and partitioning, it
> might mean that the data set is skewed to specific partitions which will
> create hot spotting on certain executors.
>
> -adrian
>
&g
Hi,
I wonder what does write time means exactly?
I run GraphX workloads and noticed the main bottleneck in most stages is
one or two tasks takes too long in "write time" and delay the whole job.
Enabling speculation helps a little but I am still interested to know how
to fix that?
I use MEMORY_O
Hi,
I'm using the most recent Spark version on a standalone setup of 16+1
machines.
While running GraphX workloads, I found that some executors are lazy? They
*rarely* participate in computation. This causes some other executors to do
their work. This behavior is consistent in all iterations and
Hi all,
I have an interesting behavior from GraphX while running SSSP. I use the
stand-alone mode with 16+1 machines, each has 30GB memory and 4 cores. The
dataset is 63GB. However, the input for some stages is huge, about 16 TB !
The computation takes very long time. I stopped it.
For your info
Hi all,
I was trying to use GraphX to compute pagerank and found that pagerank
value for several vertices is NaN.
I am using Spark 1.3. Any idea how to fix that?
--
Thanks,
-Khaled
Hi all,
I wonder if any one has an explanation for this behavior.
Thank you,
-Khaled
-- Forwarded message --
From: Khaled Ammar
Date: Fri, Jul 24, 2015 at 9:35 AM
Subject: Performance questions regarding Spark 1.3 standalone mode
To: user@spark.apache.org
Hi all,
I have a
Hi all,
I have a standalone spark cluster setup on EC2 machines. I did the setup
manually without the ec2 scripts. I have two questions about Spark/GraphX
performance:
1) When I run the PageRank example, the storage tab does not show that all
RDDs are cached. Only one RDD is 100% cached, but the
Hi,
I am not a spark expert but I found that passing a small partitions value
might help. Try to use this option "--numEPart=$partitions" where
partitions=3 (number of workers) or at most 3*40 (total number of cores).
Thanks,
-Khaled
On Thu, Jul 9, 2015 at 11:37 AM, AshutoshRaghuvanshi <
ashutos
hours. There is
one that was taking 4+ hours, and its input is 400+ GB. I must be doing
something wrong, any comment?
--
Thanks,
-Khaled Ammar
www.khaledammar.com
Hi,
I'm very new to Spark and GraphX. I downloaded and configured Spark on a
cluster, which uses Hadoop 1.x. The master UI shows all workers. The
example command "run-example SparkPi" works fine and completes
successfully.
I'm interested in GraphX. Although the documentation says it is built-in
w
15 matches
Mail list logo