;
> On Mon, Jul 25, 2016, 18:50 Jia Zou wrote:
>
>>
>> My code is as following:
>>
>> System.out.println("Initialize points...");
>>
>> JavaPairRDD data =
>>
>>
My code is as following:
System.out.println("Initialize points...");
JavaPairRDD data =
sc.sequenceFile(inputFile, IntWritable.class,
DoubleArrayWritable.class);
RDD> rdd =
JavaPairR
Divya,
According to my recent Spark tuning experiences, optimal executor-memory
size not only depends on your workload characteristics (e.g. working set
size at each job stage) and input data size, but also depends on your total
available memory and memory requirements of other components like dri
Hi, Calvin, I am running 24GB data Spark KMeans in a c3.2xlarge AWS
instance with 30GB physical memory.
Spark will cache data off-heap to Tachyon, the input data is also stored in
Tachyon.
Tachyon is configured to use 15GB memory, and use tired store.
Tachyon underFS is /tmp.
The only configura
)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
On Wed, Jan 27, 2016 at 5:53 AM, Jia Zou wrote:
> BTW. The tachyon worker log says following:
>
>
>
> 2015-12-
java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
... 15 more
On Wed, Jan 27, 2016 at 5:02 AM, Jia Zou wrote:
> Dears, I keep getting below exception when using Spark 1.6.0 on top of
> Tachyon
Hi, dears, the problem has been solved.
I mistakely use tachyon.user.block.size.bytes instead of
tachyon.user.block.size.bytes.default. It works now. Sorry for the
confusion and thanks again to Gene!
Best Regards,
Jia
On Wed, Jan 27, 2016 at 4:59 AM, Jia Zou wrote:
> Hi, Gene,
>
> T
-10-73-198-35:7077
/home/ubuntu/HiBench/src/sparkbench/target/sparkbench-5.0-SNAPSHOT-MR2-spark1.5-jar-with-dependencies.jar
tachyon://localhost:19998/Kmeans/Input/samples 10 5
On Wed, Jan 27, 2016 at 5:02 AM, Jia Zou wrote:
> Dears, I keep getting below exception when using Spark 1.6.0 on top
Dears, I keep getting below exception when using Spark 1.6.0 on top of
Tachyon 0.8.2. Tachyon is 93% used and configured as CACHE_THROUGH.
Any suggestions will be appreciated, thanks!
=
Exception in thread "main" org.apache.spark.SparkException: Job aborted du
ng it to
> tachyon-site.properties.
>
> I hope that helps,
> Gene
>
> On Mon, Jan 25, 2016 at 8:13 PM, Jia Zou wrote:
>
>> Dear all,
>>
>> First to update that the local file system data partition size can be
>> tuned by:
>> sc.hadoopConfigurat
thod can't work for
Tachyon data.
Do you have any suggestions? Thanks very much!
Best Regards,
Jia
-- Forwarded message ----------
From: Jia Zou
Date: Thu, Jan 21, 2016 at 10:05 PM
Subject: Spark partition size tuning
To: "user @spark"
Dear all!
When using Spark to re
I configured HDFS to cache file in HDFS's cache, like following:
hdfs cacheadmin -addPool hibench
hdfs cacheadmin -addDirective -path /HiBench/Kmeans/Input -pool hibench
But I didn't see much performance impacts, no matter how I configure
dfs.datanode.max.locked.memory
Is it possible that Spa
Dear all!
When using Spark to read from local file system, the default partition size
is 32MB, how can I increase the partition size to 128MB, to reduce the
number of tasks?
Thank you very much!
Best Regards,
Jia
Dear all,
Can I configure Spark on multiple nodes without HDFS, so that output data
will be written to the local file system on each node?
I guess there is no such feature in Spark, but just want to confirm.
Best Regards,
Jia
t
> Spark.
>
> On Sun, Jan 17, 2016 at 7:29 AM, Jia Zou wrote:
>
>> Dear all,
>>
>> Is there a way to reuse executor JVM across different JobContexts? Thanks.
>>
>> Best Regards,
>> Jia
>>
>
>
Dear all,
Is there a way to reuse executor JVM across different JobContexts? Thanks.
Best Regards,
Jia
Dear all,
I am using Spark1.5.2 and Tachyon0.7.1 to run KMeans with
inputRDD.persist(StorageLevel.OFF_HEAP()).
I've set tired storage for Tachyon. It is all right when working set is
smaller than available memory. However, when working set exceeds available
memory, I keep getting errors like belo
hat don't fit on disk and read them from there when
> they are needed.
> Actually, it's not necessary to set so large driver memory in your case,
> because KMeans use low memory for driver if your k is not very large.
>
> Cheers
> Yanbo
>
> 2015-12-30 22:20 GMT
I am running Spark MLLib KMeans in one EC2 M3.2xlarge instance with 8 CPU
cores and 30GB memory. Executor memory is set to 15GB, and driver memory is
set to 15GB.
The observation is that, when input data size is smaller than 15GB, the
performance is quite stable. However, when input data becomes l
Hi, Ted, it works, thanks a lot for your help!
--Jia
On Sat, Dec 12, 2015 at 3:01 PM, Ted Yu wrote:
> Have you tried adding the option below through
> spark.executor.extraJavaOptions ?
>
> Cheers
>
> > On Dec 13, 2015, at 3:36 AM, Jia Zou wrote:
> >
> > M
My goal is to use hprof to profile where the bottleneck is.
Is there anyway to do this without modifying and rebuilding Spark source
code.
I've tried to add "
-Xrunhprof:cpu=samples,depth=100,interval=20,lineno=y,thread=y,file=/home/ubuntu/out.hprof"
to spark-class script, but it can only profile
21 matches
Mail list logo