from:"Jia Zou"

Re: JavaRDD.foreach (new VoidFunction<>...) always returns the last element

2016-07-25 Thread Jia Zou

; > On Mon, Jul 25, 2016, 18:50 Jia Zou wrote: > >> >> My code is as following: >> >> System.out.println("Initialize points..."); >> >> JavaPairRDD data = >> >>

JavaRDD.foreach (new VoidFunction<>...) always returns the last element

2016-07-25 Thread Jia Zou

My code is as following: System.out.println("Initialize points..."); JavaPairRDD data = sc.sequenceFile(inputFile, IntWritable.class, DoubleArrayWritable.class); RDD> rdd = JavaPairR

Re: how to calculate -- executor-memory,num-executors,total-executor-cores

2016-02-02 Thread Jia Zou

Divya, According to my recent Spark tuning experiences, optimal executor-memory size not only depends on your workload characteristics (e.g. working set size at each job stage) and input data size, but also depends on your total available memory and memory requirements of other components like dri

Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

2016-02-01 Thread Jia Zou

Hi, Calvin, I am running 24GB data Spark KMeans in a c3.2xlarge AWS instance with 30GB physical memory. Spark will cache data off-heap to Tachyon, the input data is also stored in Tachyon. Tachyon is configured to use 15GB memory, and use tired store. Tachyon underFS is /tmp. The only configura

Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

2016-01-27 Thread Jia Zou

) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) On Wed, Jan 27, 2016 at 5:53 AM, Jia Zou wrote: > BTW. The tachyon worker log says following: > > > > 2015-12-

Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

2016-01-27 Thread Jia Zou

java.io.BufferedInputStream.read(BufferedInputStream.java:334) at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) ... 15 more On Wed, Jan 27, 2016 at 5:02 AM, Jia Zou wrote: > Dears, I keep getting below exception when using Spark 1.6.0 on top of > Tachyon

[Problem Solved]Re: Spark partition size tuning

2016-01-27 Thread Jia Zou

Hi, dears, the problem has been solved. I mistakely use tachyon.user.block.size.bytes instead of tachyon.user.block.size.bytes.default. It works now. Sorry for the confusion and thanks again to Gene! Best Regards, Jia On Wed, Jan 27, 2016 at 4:59 AM, Jia Zou wrote: > Hi, Gene, > > T

Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

2016-01-27 Thread Jia Zou

-10-73-198-35:7077 /home/ubuntu/HiBench/src/sparkbench/target/sparkbench-5.0-SNAPSHOT-MR2-spark1.5-jar-with-dependencies.jar tachyon://localhost:19998/Kmeans/Input/samples 10 5 On Wed, Jan 27, 2016 at 5:02 AM, Jia Zou wrote: > Dears, I keep getting below exception when using Spark 1.6.0 on top

TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

2016-01-27 Thread Jia Zou

Dears, I keep getting below exception when using Spark 1.6.0 on top of Tachyon 0.8.2. Tachyon is 93% used and configured as CACHE_THROUGH. Any suggestions will be appreciated, thanks! = Exception in thread "main" org.apache.spark.SparkException: Job aborted du

Re: Spark partition size tuning

2016-01-27 Thread Jia Zou

ng it to > tachyon-site.properties. > > I hope that helps, > Gene > > On Mon, Jan 25, 2016 at 8:13 PM, Jia Zou wrote: > >> Dear all, >> >> First to update that the local file system data partition size can be >> tuned by: >> sc.hadoopConfigurat

Fwd: Spark partition size tuning

2016-01-25 Thread Jia Zou

thod can't work for Tachyon data. Do you have any suggestions? Thanks very much! Best Regards, Jia -- Forwarded message ---------- From: Jia Zou Date: Thu, Jan 21, 2016 at 10:05 PM Subject: Spark partition size tuning To: "user @spark" Dear all! When using Spark to re

Can Spark read input data from HDFS centralized cache?

2016-01-25 Thread Jia Zou

I configured HDFS to cache file in HDFS's cache, like following: hdfs cacheadmin -addPool hibench hdfs cacheadmin -addDirective -path /HiBench/Kmeans/Input -pool hibench But I didn't see much performance impacts, no matter how I configure dfs.datanode.max.locked.memory Is it possible that Spa

Spark partition size tuning

2016-01-21 Thread Jia Zou

Dear all! When using Spark to read from local file system, the default partition size is 32MB, how can I increase the partition size to 128MB, to reduce the number of tasks? Thank you very much! Best Regards, Jia

Can I configure Spark on multiple nodes using local filesystem on each node?

2016-01-19 Thread Jia Zou

Dear all, Can I configure Spark on multiple nodes without HDFS, so that output data will be written to the local file system on each node? I guess there is no such feature in Spark, but just want to confirm. Best Regards, Jia

Re: Reuse Executor JVM across different JobContext

2016-01-17 Thread Jia Zou

t > Spark. > > On Sun, Jan 17, 2016 at 7:29 AM, Jia Zou wrote: > >> Dear all, >> >> Is there a way to reuse executor JVM across different JobContexts? Thanks. >> >> Best Regards, >> Jia >> > >

Reuse Executor JVM across different JobContext

2016-01-17 Thread Jia Zou

Dear all, Is there a way to reuse executor JVM across different JobContexts? Thanks. Best Regards, Jia

org.apache.spark.storage.BlockNotFoundException in Spark1.5.2+Tachyon0.7.1

2016-01-06 Thread Jia Zou

Dear all, I am using Spark1.5.2 and Tachyon0.7.1 to run KMeans with inputRDD.persist(StorageLevel.OFF_HEAP()). I've set tired storage for Tachyon. It is all right when working set is smaller than available memory. However, when working set exceeds available memory, I keep getting errors like belo

Re: Spark MLLib KMeans Performance on Amazon EC2 M3.2xlarge

2015-12-31 Thread Jia Zou

hat don't fit on disk and read them from there when > they are needed. > Actually, it's not necessary to set so large driver memory in your case, > because KMeans use low memory for driver if your k is not very large. > > Cheers > Yanbo > > 2015-12-30 22:20 GMT

Spark MLLib KMeans Performance on Amazon EC2 M3.2xlarge

2015-12-30 Thread Jia Zou

I am running Spark MLLib KMeans in one EC2 M3.2xlarge instance with 8 CPU cores and 30GB memory. Executor memory is set to 15GB, and driver memory is set to 15GB. The observation is that, when input data size is smaller than 15GB, the performance is quite stable. However, when input data becomes l

Re: How to use HProf to profile Spark CPU overhead

2015-12-12 Thread Jia Zou

Hi, Ted, it works, thanks a lot for your help! --Jia On Sat, Dec 12, 2015 at 3:01 PM, Ted Yu wrote: > Have you tried adding the option below through > spark.executor.extraJavaOptions ? > > Cheers > > > On Dec 13, 2015, at 3:36 AM, Jia Zou wrote: > > > > M

How to use HProf to profile Spark CPU overhead

2015-12-12 Thread Jia Zou

My goal is to use hprof to profile where the bottleneck is. Is there anyway to do this without modifying and rebuilding Spark source code. I've tried to add " -Xrunhprof:cpu=samples,depth=100,interval=20,lineno=y,thread=y,file=/home/ubuntu/out.hprof" to spark-class script, but it can only profile

Re: JavaRDD.foreach (new VoidFunction<>...) always returns the last element

JavaRDD.foreach (new VoidFunction<>...) always returns the last element

Re: how to calculate -- executor-memory,num-executors,total-executor-cores

Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

[Problem Solved]Re: Spark partition size tuning

Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

Re: Spark partition size tuning

Fwd: Spark partition size tuning

Can Spark read input data from HDFS centralized cache?

Spark partition size tuning

Can I configure Spark on multiple nodes using local filesystem on each node?

Re: Reuse Executor JVM across different JobContext

Reuse Executor JVM across different JobContext

org.apache.spark.storage.BlockNotFoundException in Spark1.5.2+Tachyon0.7.1

Re: Spark MLLib KMeans Performance on Amazon EC2 M3.2xlarge

Spark MLLib KMeans Performance on Amazon EC2 M3.2xlarge

Re: How to use HProf to profile Spark CPU overhead

How to use HProf to profile Spark CPU overhead

21 matches

Site Navigation

Mail list logo

Footer information