[GraphX] Excessive value recalculations during aggregateMessages cycles

2015-02-07 Thread Kyle Ellrott
I'm trying to setup a simple iterative message/update problem in GraphX (spark 1.2.0), but I'm running into issues with the caching and re-calculation of data. I'm trying to follow the example found in the Pregel implementation of materializing and cacheing messages and graphs and then unpersisting

no space left at worker node

2015-02-07 Thread ey-chih chow
Hi, I submitted a spark job to an ec2 cluster, using spark-submit. At a worker node, there is an exception of 'no space left on device' as follows. == 15/02/08 01:53:38 ERROR logging.FileAppender: Error writing stream to file /root/spark/work/app-2015020

When uses SparkFiles.get("GeoIP.dat"), got exception in thread "main" java.io.FileNotFoundException

2015-02-07 Thread Gmail
Hi there, Spark version: 1.2 /home/hadoop/spark/bin/spark-submit --class com.litb.bi.CSLog2ES --master yarn --executor-memory 1G --jars /mnt/external/kafka/target/spark-streaming-kafka_2.10-1.2.0.jar,/mnt/external/kafka/target/zkclient-0.3.jar,/mnt/external/kafka/target/metrics-core-2.2.0.jar,

Custom streaming receiver slow on YARN

2015-02-07 Thread Jong Wook Kim
Hello people, I have an issue that my streaming receiver is laggy on YARN. Can anyone reply to my question on StackOverflow?: http://stackoverflow.com/questions/28370362/spark-streaming-receiver-particularly-slow-on-yarn Thanks Jong Wook -- View this message in context: http://apache-spark-u

Re: Profiling in YourKit

2015-02-07 Thread Deep Pradhan
So, Can I increase the number of threads by manually coding in the Spark code? On Sat, Feb 7, 2015 at 6:52 PM, Sean Owen wrote: > If you look at the threads, the other 30 are almost surely not Spark > worker threads. They're the JVM finalizer, GC threads, Jetty > listeners, etc. Nothing wrong wi

Re: Spark impersonation

2015-02-07 Thread Chester Chen
Sorry for the many typos as I was typing from my cell phone. Hope you still can get the idea. On Sat, Feb 7, 2015 at 1:55 PM, Chester @work wrote: > > I just implemented this in our application. The impersonation is done > before the job is submitted. In spark yarn (we are using yarn cluster mo

ERROR EndpointWriter: AssociationError

2015-02-07 Thread Lan
Hello, I'm new to Spark, and tried to setup a Spark cluster of 1 master VM SparkV1 and 1 worker VM SparkV4 (the error is the same if I have 2 workers). They are connected without a problem now. But when I submit a job (as in https://spark.apache.org/docs/latest/quick-start.html) at the master: >s

Re: Spark impersonation

2015-02-07 Thread Chester @work
I just implemented this in our application. The impersonation is done before the job is submitted. In spark yarn (we are using yarn cluster mode) , it just takes the current User from UserGroupInfoemation and summitted to yarn resource manager. If one use Kinit from command line, the who Jvm

Re: Can't access remote Hive table from spark

2015-02-07 Thread Zhan Zhang
Yes. You need to create xiaobogu under /user and provide right permission to xiaobogu. Thanks. Zhan Zhang On Feb 7, 2015, at 8:15 AM, guxiaobo1982 mailto:guxiaobo1...@qq.com>> wrote: Hi Zhan Zhang, With the pre-bulit version 1.2.0 of spark against the yarn cluster installed by ambari 1.7.0,

Re: Spark impersonation

2015-02-07 Thread Sandy Ryza
https://issues.apache.org/jira/browse/SPARK-5493 currently tracks this. -Sandy On Mon, Feb 2, 2015 at 9:37 PM, Zhan Zhang wrote: > I think you can configure hadoop/hive to do impersonation. There is no > difference between secure or insecure hadoop cluster by using kinit. > > Thanks. > > Zh

Re: Similar code in Java

2015-02-07 Thread Eduardo Costa Alfaia
Hi Ted, I’ve seen the codes, I am using JavaKafkaWordCount.java but I would like reproducing in java that I’ve done in scala. Is it possible doing the same thing that scala code does in java? Principally this code below or something looks liked: > val KafkaDStreams = (1 to numStreams) map {_ =

Re: Similar code in Java

2015-02-07 Thread Ted Yu
Can you take a look at: ./examples/scala-2.10/src/main/java/org/apache/spark/examples/streaming/JavaKafkaWordCount.java ./external/kafka/src/test/java/org/apache/spark/streaming/kafka/JavaKafkaStreamSuite.java Cheers On Sat, Feb 7, 2015 at 9:45 AM, Eduardo Costa Alfaia wrote: > Hi Guys, > > Ho

Re: getting error when submit spark with master as yarn

2015-02-07 Thread Sandy Ryza
Hi Sachin, In your YARN configuration, either yarn.nodemanager.resource.memory-mb is 1024 on your nodes or yarn.scheduler.maximum-allocation-mb is set to 1024. If you have more than 1024 MB on each node, you should bump these properties. Otherwise, you should request fewer resources by setting --

getting error when submit spark with master as yarn

2015-02-07 Thread sachin Singh
Hi, when I am trying to execute my program as spark-submit --master yarn --class com.mytestpack.analysis.SparkTest sparktest-1.jar I am getting error bellow error- java.lang.IllegalArgumentException: Required executor memory (1024+384 MB) is above the max threshold (1024 MB) of this cluster!

Similar code in Java

2015-02-07 Thread Eduardo Costa Alfaia
Hi Guys, How could I doing in Java the code scala below? val KafkaDStreams = (1 to numStreams) map {_ => KafkaUtils.createStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topicMap,storageLevel = StorageLevel.MEMORY_ONLY).map(_._2) } val unifiedStream =

Re: Can't access remote Hive table from spark

2015-02-07 Thread Ted Yu
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=xiaobogu, access=WRITE, inode="/user":hdfs:hdfs:drwxr-xr-x Looks like permission issue. Can you give access to 'xiaobogu' ? Cheers On Sat, Feb 7, 2015 at 8:15 AM, guxiaob

Re: Can't access remote Hive table from spark

2015-02-07 Thread guxiaobo1982
Hi Zhan Zhang, With the pre-bulit version 1.2.0 of spark against the yarn cluster installed by ambari 1.7.0, I come with the following errors: [xiaobogu@lix1 spark]$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi--master yarn-cluster --num-executors 3 --driver-memory 512m

Re: Is the pre-built version of spark 1.2.0 with --hive option?

2015-02-07 Thread Sean Owen
https://github.com/apache/spark/blob/master/dev/create-release/create-release.sh#L217 Yes, except the 'without hive' version. On Sat, Feb 7, 2015 at 3:45 PM, guxiaobo1982 wrote: > Hi, > > After various problems with the binaries built by myself, I want to try the > pre-built binary, but I want t

Is the pre-built version of spark 1.2.0 with --hive option?

2015-02-07 Thread guxiaobo1982
Hi, After various problems with the binaries built by myself, I want to try the pre-built binary, but I want to know whether it is built with --hive option. Thanks.

Re: Profiling in YourKit

2015-02-07 Thread Sean Owen
If you look at the threads, the other 30 are almost surely not Spark worker threads. They're the JVM finalizer, GC threads, Jetty listeners, etc. Nothing wrong with this. Your OS has hundreds of threads running now, most of which are idle, and up to 4 of which can be executing. In a one-machine cl

Re: Profiling in YourKit

2015-02-07 Thread Enno Shioji
> 1 You have 4 CPU core and 34 threads (system wide, you likely have many more, by the way). Think of it as having 4 espresso machine and 34 baristas. Does the fact that you have only 4 espresso machine mean you can only have 4 baristas? Of course not, there's plenty more work other than making esp

Profiling in YourKit

2015-02-07 Thread Deep Pradhan
Hi, I am using YourKit tool to profile Spark jobs that is run in my Single Node Spark Cluster. When I see the YourKit UI Performance Charts, the thread count always remains at All threads: 34 Daemon threads: 32 Here are my questions: 1. My system can run only 4 threads simultaneously, and obvious