Re: How to use spark-submit

2014-05-11 Thread Sonal Goyal
Hi Stephen, I am using maven shade plugin for creating my uber jar. I have marked spark dependencies as provided. Best Regards, Sonal Nube Technologies On Mon, May 12, 2014 at 1:04 AM, Stephen Boesch wrote: > HI Sonal, > Y

Driver process succeed exiting but web UI shows FAILED

2014-05-11 Thread Cheney Sun
Hi, I'm running the spark 0.9.1 in standalone mod. I submitted one job and the driver succeed running to the end, see the log message below: 2014-05-12 10:34:14,358 - [INFO] (Logging.scala:50) - Finished TID 254 in 19 ms on spark-host007 (progress: 62/63) 2014-05-12 10:34:14,359 - [INFO] (Logging

Re: about spark interactive shell

2014-05-11 Thread fengshen
can we do it? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/about-spark-interactive-shell-tp5575p5576.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

about spark interactive shell

2014-05-11 Thread fengshen
hi,all I am now using spark in production. but I notice spark driver including rdd and dag... and the executors will try to register with the driver. I think the driver should run on the cluster.and client should run on the gateway. Similar like:

Re: Is there any problem on the spark mailing list?

2014-05-11 Thread ankurdave
I haven't been getting mail either. This was the last message I received: http://apache-spark-user-list.1001560.n3.nabble.com/master-attempted-to-re-register-the-worker-and-then-took-all-workers-as-unregistered-tp553p5491.html -- View this message in context: http://apache-spark-user-list.10015

build shark(hadoop CDH5) on hadoop2.0.0 CDH4

2014-05-11 Thread Sophia
I have built shark in sbt way,but the sbt exception turn out: [error] sbt.resolveException:unresolved dependency: org.apache.hadoop#hadoop-client;2.0.0: not found. How can I do to build it well? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/build-shark-ha

streaming on hdfs can detected all new file, but the sum of all the rdd.count() not equals which had detected

2014-05-11 Thread zzzzzqf12345
when I put 200 png files to Hdfs , I found sparkStreaming counld detect 200 files , but the sum of rdd.count() is less than 200, always between 130 and 170, I don't know why...Is this a Bug? PS: When I put 200 files in hdfs before streaming run , It get the correct count and right result. Here is

Re: File present but file not found exception

2014-05-11 Thread Koert Kuipers
are you running spark on a cluster? if so, the executors will not be able to find a file on your local computer. On Thu, May 8, 2014 at 2:48 PM, Sai Prasanna wrote: > Hi Everyone, > > I think all are pretty busy, the response time in this group has slightly > increased. > > But anyways, this is

Re: Test

2014-05-11 Thread Aaron Davidson
I didn't get the original message, only the reply. Ruh-roh. On Sun, May 11, 2014 at 8:09 AM, Azuryy wrote: > Got. > > But it doesn't indicate all can receive this test. > > Mail list is unstable recently. > > > Sent from my iPhone5s > > On 2014年5月10日, at 13:31, Matei Zaharia wrote: > > *This m

Re: How to use spark-submit

2014-05-11 Thread Stephen Boesch
HI Sonal, Yes I am working towards that same idea. How did you go about creating the non-spark-jar dependencies ? The way I am doing it is a separate straw-man project that does not include spark but has the external third party jars included. Then running sbt compile:managedClasspath and rev

Re: is Mesos falling out of favor?

2014-05-11 Thread Tim St Clair
- Original Message - > From: "deric" > To: u...@spark.incubator.apache.org > Sent: Tuesday, May 6, 2014 11:42:58 AM > Subject: Re: is Mesos falling out of favor? Nope. > > I guess it's due to missing documentation and quite complicated setup. > Continuous integration would be nice!

Re: writing my own RDD

2014-05-11 Thread Koert Kuipers
will do On May 11, 2014 6:44 PM, "Aaron Davidson" wrote: > You got a good point there, those APIs should probably be marked as > @DeveloperAPI. Would you mind filing a JIRA for that ( > https://issues.apache.org/jira/browse/SPARK)? > > > On Sun, May 11, 2014 at 11:51 AM, Koert Kuipers wrote: > >

Re: Comprehensive Port Configuration reference?

2014-05-11 Thread Mark Baker
On Tue, May 6, 2014 at 9:09 AM, Jacob Eisinger wrote: > In a nut shell, Spark opens up a couple of well known ports. And,then the > workers and the shell open up dynamic ports for each job. These dynamic > ports make securing the Spark network difficult. Indeed. Judging by the frequency with

Re: writing my own RDD

2014-05-11 Thread Aaron Davidson
You got a good point there, those APIs should probably be marked as @DeveloperAPI. Would you mind filing a JIRA for that ( https://issues.apache.org/jira/browse/SPARK)? On Sun, May 11, 2014 at 11:51 AM, Koert Kuipers wrote: > resending... my email somehow never made it to the user list. > > > O

Re: How to use spark-submit

2014-05-11 Thread Stephen Boesch
Just discovered sbt-pack: that addresses (quite well) the last item for identifying and packaging the external jars. 2014-05-11 12:34 GMT-07:00 Stephen Boesch : > HI Sonal, > Yes I am working towards that same idea. How did you go about > creating the non-spark-jar dependencies ? The way I

Re: java.lang.NoSuchMethodError on Java API

2014-05-11 Thread Madhu
No, you don't need to do anything special to get it to run in Eclipse. Just add the assembly jar to the build path, create a main method, add your code, and click the green "run" button. Can you post your code here? I can try it in my environment. - Madhu https://www.linkedin.com/in/msiddal

Re: How to use spark-submit

2014-05-11 Thread Soumya Simanta
Will sbt-pack and the maven solution work for the Scala REPL? I need the REPL because it save a lot of time when I'm playing with large data sets because I load then once, cache them and then try out things interactively before putting in a standalone driver. I've sbt woking for my own drive

Re: Spark LIBLINEAR

2014-05-11 Thread Debasish Das
Hello Prof. Lin, Awesome news ! I am curious if you have any benchmarks comparing C++ MPI with Scala Spark liblinear implementations... Is Spark Liblinear apache licensed or there are any specific restrictions on using it ? Except using native blas libraries (which each user has to manage by pul

Re: cant get tests to pass anymore on master master

2014-05-11 Thread Koert Kuipers
resending because the list didnt seem to like my email before On Wed, May 7, 2014 at 5:01 PM, Koert Kuipers wrote: > i used to be able to get all tests to pass. > > with java 6 and sbt i get PermGen errors (no matter how high i make the > PermGen). so i have given up on that. > > with java 7 i

Re: is Mesos falling out of favor?

2014-05-11 Thread Paco Nathan
That's FUD. Tracking the Mesos and Spark use cases, there are very large production deployments of these together. Some are rather private but others are being surfaced. IMHO, one of the most amazing case studies is from Christina Delimitrou http://youtu.be/YpmElyi94AA For a tutorial, use the foll

Re: java.lang.NoSuchMethodError on Java API

2014-05-11 Thread Alessandro De Carli
Madhu, Thank you! I now switched to eclipse and imported the assembly jar, the IDE successfully finds the imports. But when I try to run my code I get "java.lang.NoClassDefFoundError: org/apache/spark/api/java/function/PairFunction" is there anything special to consider when I want to run my devel

Re: is Mesos falling out of favor?

2014-05-11 Thread Gary Malouf
For what it is worth, our team here at MediaCrossing has been using the Spark/Mesos combination since last summer with much success (low operations overhead, high developer performance). IMO, Hadoop is overcomplicated from both a development and operations perspective so

Re: Is there any problem on the spark mailing list?

2014-05-11 Thread Chris Fregly
btw, you can see all "missing" messages from may 7th (start of outage) here: http://mail-archives.apache.org/mod_mbox/spark-user/201405.mbox/browser the last message i received in my inbox was this one: Cheney SunRe: master attempted to re-register the worker and then took all workers as unregis

Re: writing my own RDD

2014-05-11 Thread Koert Kuipers
resending... my email somehow never made it to the user list. On Fri, May 9, 2014 at 2:11 PM, Koert Kuipers wrote: > in writing my own RDD i ran into a few issues with respect to stuff being > private in spark. > > in compute i would like to return an iterator that respects task killing > (as H

Re: why is Spark 0.9.1 (context creation?) so slow on my OSX laptop?

2014-05-11 Thread Madhu
Svend, I built it on my iMac and it was about the same speed as Windows 7, RHEL 6 VM on Windows 7, and Linux on EC2. Spark is pleasantly easy to build on all of these platforms, which is wonderful. How long does it take to start spark-shell? Maybe it's a JVM memory setting problem on your Laptop?

Re: Turn BLAS on MacOSX

2014-05-11 Thread DB Tsai
Hi Debasish, In https://github.com/apache/spark/blob/master/docs/mllib-guide.mdDependencies section, the document talks about the native blas dependencies issue. For netlib which breeze uses internally, if the native library isn't found, the jblas implementation will be used. Here is more detail

Re: Is there any problem on the spark mailing list?

2014-05-11 Thread lukas nalezenec
There was an outage: https://blogs.apache.org/infra/entry/mail_outage On Fri, May 9, 2014 at 1:27 PM, wxhsdp wrote: > i think so, fewer questions and answers these three days > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Is-there-any-problem-o

Re: details about event log

2014-05-11 Thread Andrew Or
Hi wxhsdp, These times are computed from Java's System.currentTimeMillis(), which is "the difference, measured in milliseconds, between the current time and midnight, January 1, 1970 UTC." Thus, this quantity doesn't mean much by itself, but is only meaningful when you subtract it from another Sys

Re: Test

2014-05-11 Thread Azuryy
Got. But it doesn't indicate all can receive this test. Mail list is unstable recently. Sent from my iPhone5s > On 2014年5月10日, at 13:31, Matei Zaharia wrote: > > This message has no content.

Re: java.lang.NoSuchMethodError on Java API

2014-05-11 Thread Madhu
Alessandro, I'm using Eclipse, IntelliJ settings will be similar. I created a standard project, without maven. For me, the easiest way was to add this jar to my Eclipse project build path: /assembly/target/scala-2.10/spark-assembly-x.x.x-hadoop1.0.4.jar It works for either Java or Scala plugin.

Re: Fwd: Is there a way to load a large file from HDFS faster into Spark

2014-05-11 Thread Soumya Simanta
Yep. I figured that out. I uncompressed the file and it looks much faster now. Thanks. On Sun, May 11, 2014 at 8:14 AM, Mayur Rustagi wrote: > .gz files are not splittable hence harder to process. Easiest is to move > to a splittable compression like lzo and break file into multiple blocks to >

Re: Fwd: Is there a way to load a large file from HDFS faster into Spark

2014-05-11 Thread Mayur Rustagi
.gz files are not splittable hence harder to process. Easiest is to move to a splittable compression like lzo and break file into multiple blocks to be read and for subsequent processing. On 11 May 2014 09:01, "Soumya Simanta" wrote: > > > I've a Spark cluster with 3 worker nodes. > > >- *Wor

Re: Error starting EC2 cluster

2014-05-11 Thread wxhsdp
your ssh connection refuse is due to not long enough wait time. your remote machine is not ready at the time, i set wait time to 500 secs, and it works -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Error-starting-EC2-cluster-tp5332p5501.html Sent from the

Re: How can adding a random count() change the behavior of my program?

2014-05-11 Thread Walrus theCat
Nick, I have encountered strange things like this before (usually when programming with mutable structures and side-effects), and for me, the answer was that, until .count (or .first, or similar), is called, your variable 'a' refers to a set of instructions that only get executed to form the objec

答复: 答复: java.io.FileNotFoundException: /test/spark-0.9.1/work/app-20140505053550-0000/2/stdout (No such file or directory)

2014-05-11 Thread Francis . Hu
I have just the problem resolved via running master and work daemons individually on where they are. if I execute the shell: sbin/start-all.sh , the problem always exist. 发件人: Francis.Hu [mailto:francis...@reachjunction.com] 发送时间: Tuesday, May 06, 2014 10:31 收件人: user@spark.apache.org

Re: Is there anything that I need to modify?

2014-05-11 Thread Arpit Tak
Try setting hostname to domain setting in /etc/hosts . Its not able to resolve ip to hostname try this ... localhost 192.168.10.220 CHBM220 On Wed, May 7, 2014 at 12:50 PM, Sophia wrote: > [root@CHBM220 spark-0.9.1]# > > SPARK_JAR=.assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop2

Re: Spark LIBLINEAR

2014-05-11 Thread DB Tsai
Dear Prof. Lin, Interesting! We had an implementation of L-BFGS in Spark and already merged in the upstream now. We read your paper comparing TRON and OWL-QN for logistic regression with L1 (http://www.csie.ntu.edu.tw/~cjlin/papers/l1.pdf), but it seems that it's not in the distributed setup. Wi

Spark LIBLINEAR

2014-05-11 Thread Chieh-Yen
Dear all, Recently we released a distributed extension of LIBLINEAR at http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/distributed-liblinear/ Currently, TRON for logistic regression and L2-loss SVM is supported. We provided both MPI and Spark implementations. This is very preliminary so your comme

java.lang.NoSuchMethodError on Java API

2014-05-11 Thread Alessandro De Carli
Dear All, I'm new to the whole Spark framework, but already fell in love with it :). For a research project at the University of Zurich I'm trying to implement a Matrix Centroid Decomposition in Spark. I'm using the Java API. My problem occurs when I try to call a JavaPairRDD.reduce: """ java.lan

Re: How to read a multipart s3 file?

2014-05-11 Thread Nicholas Chammas
On Tue, May 6, 2014 at 10:07 PM, kamatsuoka wrote: > I was using s3n:// but I got frustrated by how > slow it is at writing files. > I'm curious: How slow is slow? How long does it take you, for example, to save a 1GB file to S3 using s3n vs s3?