Hi Ocatavian,
Just out of curiosity, did you try persisting your RDD in serialized format
"MEMORY_AND_DISK_SER" or "MEMORY_ONLY_SER" ??
i.e. changing your :
"rdd.persist(MEMORY_AND_DISK)"
to
"rdd.persist(MEMORY_ONLY_SER)"
Regards
On Wed, Jun 10, 2015 at 7:27 AM, Imran Rashid wrote:
> I agree
Dear all,
after some fiddling I have arrived at this solution:
/**
* Customized left outer join on common column.
*/
def leftOuterJoinWithRemovalOfEqualColumn(leftDF: DataFrame, rightDF:
DataFrame, commonColumnName: String): DataFrame = {
val joinedDF = leftDF.as('left).join(rightDF.as('right
Hi, all
I upgrage spark to 1.4.1, many applications failed... I find the heap memory is
not full , but the process of CoarseGrainedExecutorBackend will take more
memory than I expect, and it will increase as time goes on, finally more than
max limited of the server, the worker will die.
An
Hi, I am trying to configure a history server for application.
When I running locally(./run-example SparkPi), the event logs are being
created, and I can start history server.
But when I am trying
./spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster
file:///opt/hadoop/s
Sent from my iPad
On 2014-9-24, at 上午8:13, Steve Lewis wrote:
> When I experimented with using an InputFormat I had used in Hadoop for a
> long time in Hadoop I found
> 1) it must extend org.apache.hadoop.mapred.FileInputFormat (the deprecated
> class not org.apache.hadoop.mapreduce.lib.inp
Hi Akmal,
It might be on HDFS, since you provided a relative path
/opt/spark/spark-events to `spark.eventLog.dir`.
-Andrew
2015-08-01 9:25 GMT-07:00 Akmal Abbasov :
> Hi, I am trying to configure a history server for application.
> When I running locally(./run-example SparkPi), the event logs a
On Sat, Aug 1, 2015 at 9:25 AM, Akmal Abbasov
wrote:
> When I running locally(./run-example SparkPi), the event logs are being
> created, and I can start history server.
> But when I am trying
> ./spark-submit --class org.apache.spark.examples.SparkPi --master
> yarn-cluster file:///opt/hadoop/sp
You should also take into account amount of memory that you plan to use.
It's advised not to give too much memory for each executor .. otherwise GC
overhead will go up.
Btw, why prime numbers?
--
Ruslan Dautkhanov
On Wed, Jul 29, 2015 at 3:31 AM, ponkin wrote:
> Hi Rahul,
>
> Where did you
Hello,
I am not an expert with Spark, but the error thrown by spark seems indicate
that not enough memory for launching job. By default, Spark allocated 1GB
for memory, may be you should increase it ?
Best regards
Fabrice
Le sam. 1 août 2015 à 22:51, Connor Zanin a écrit :
> Hello,
>
> I am h
1. I believe that the default memory (per executor) is 512m (from the
documentation)
2. I have increased the memory used by spark on workers in my launch script
when submitting the job
(--executor-memory 124g)
3. The job completes successfully, it is the "road bumps" in the middle I
am conce
Hi All!
How important would be a significant performance improvement to TCP/IP
itself, in terms of
overall job performance improvement. Which part would be most significantly
accelerated?
Would it be HDFS?
-- ttfn
Simon Edelhaus
California 2015
https://spark-summit.org/2015/events/making-sense-of-spark-performance/
On Sat, Aug 1, 2015 at 3:24 PM, Simon Edelhaus wrote:
> Hi All!
>
> How important would be a significant performance improvement to TCP/IP
> itself, in terms of
> overall job performance improvement. Which part would be most
H
2% huh.
-- ttfn
Simon Edelhaus
California 2015
On Sat, Aug 1, 2015 at 3:45 PM, Mark Hamstra
wrote:
> https://spark-summit.org/2015/events/making-sense-of-spark-performance/
>
> On Sat, Aug 1, 2015 at 3:24 PM, Simon Edelhaus wrote:
>
>> Hi All!
>>
>> How important would be a signifi
If your network is bandwidth-bound, you'll see setting jumbo frames (MTU
9000)
may increase bandwidth up to ~20%.
http://docs.hortonworks.com/HDP2Alpha/index.htm#Hardware_Recommendations_for_Hadoop.htm
"Enabling Jumbo Frames across the cluster improves bandwidth"
If Spark workload is not network
Yes, I forgot to mention
I chose prime number as a modulo for hash function because my keys are usually
strings and spark calculates particular partitiion using key hash(see
HashPartitioner.scala) So, to avoid big number of collisions(when many keys
located in few partition) it is common to use
15 matches
Mail list logo