Re: Akka "connection refused" when running standalone Scala app on Spark 0.9.2

2014-10-04 Thread Irina Fedulova
I've finally resolved my issue! It turned out that it was not related to driver-master-worker connectivity settings. The problem was caused my mlib jar version mismatch: I noticed that I was using build.sbt from AMPCamp example which referenced mllib v0.9.0, but I was running it on Spark 0.9.2.

Re: Spark Streaming writing to HDFS

2014-10-04 Thread Sean Owen
Are you importing the '.mapred.' version of TextOutputFormat instead of the new API '.mapreduce.' version? On Sat, Oct 4, 2014 at 1:08 AM, Abraham Jacob wrote: > Hi All, > > > Would really appreciate if someone in the community can help me with this. I > have a simple Java spark streaming applica

Re: scala Vector vs mllib Vector

2014-10-04 Thread Dean Wampler
Briefly, MLlib's Vector and the concrete subclasses DenseVector and SparkVector wrap Java arrays, which are mutable and maximize memory efficiency. To update one of these vectors, you mutate the elements of the underlying array. That's great for performance, but dangerous in multithreaded programs

Re: spark 1.1.0 - hbase 0.98.6-hadoop2 version - py4j.protocol.Py4JJavaError java.lang.ClassNotFoundException

2014-10-04 Thread Nick Pentreath
forgot to copy user list On Sat, Oct 4, 2014 at 3:12 PM, Nick Pentreath wrote: > what version did you put in the pom.xml? > > it does seem to be in Maven central: > http://search.maven.org/#artifactdetails%7Corg.apache.hbase%7Chbase%7C0.98.6-hadoop2%7Cpom > > > org.apache.hbase > hbase

Re: scala Vector vs mllib Vector

2014-10-04 Thread ll
thanks dean. thanks for the answer with great clarity! i'm working on an algorithm that has a weight vector W(w0, w1, .., wN). the elements of this weight vector are adjusted/updated frequently - every iteration of the algorithm. how would you recommend to implement this vector? what is the

Re: scala Vector vs mllib Vector

2014-10-04 Thread Dean Wampler
Spark isolates each task, so I would use the MLlib vector. I didn't mention this, but it also integrates with Breeze, a Scala mathematics library that you might find useful. dean Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition (O'Reill

Re: My task is finished successfully, however, I find some exceptions in webpage.

2014-10-04 Thread Tim Chou
Can anyone help me? I find if I doesn't use a hdfs file as the input, then there's no this kind of exceptions. I search online and find nothing. How to debug spark program? Thanks, Tim 2014-10-03 17:46 GMT-05:00 Tim Chou : > Hi All, > > Sorry to disturb you. > > I have built a spark cluster bas

Re: Spark Streaming writing to HDFS

2014-10-04 Thread Abraham Jacob
Hi Sean/All, I am importing among various other things the newer mapreduce version - import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import org.apache.spark.Spa

Impala comparisons

2014-10-04 Thread Debasish Das
Hi, We write the output of models and other information as parquet files and later we let data APIs run SQL queries on the columnar data... SparkSQL is used to dump the data in parquet format and now we are considering whether using SparkSQL or Impala to read it back... I came across this benchm

Re: [ANN] SparkSQL support for Cassandra with Calliope

2014-10-04 Thread Rohit Rai
Hi Tian, We have published a build against Hadoop 2.0 with version *1.1.0-CTP-U2-H2* Let us know how your testing goes. Regards, Rohit *Founder & CEO, **Tuplejump, Inc.* www.tuplejump.com *The Data Engineering Platform* On Sat, Oct 4, 2014 at 3:49 AM, tian zhang

Re: Worker with no Executor (YARN client-mode)

2014-10-04 Thread Sandy Ryza
Hey Jon, Since you're running on YARN, the Worker shouldn't be involved. Are you able to go to the YARN ResourceManager web UI and click on "nodes" in the top left. Does that node show up in the list? If you click on it, what's shown under "Total Pmem allocated for Container"? It also might be

org/apache/commons/math3/random/RandomGenerator issue

2014-10-04 Thread anny9699
Hi, I use the breeze.stats.distributions.Bernoulli in my code, however met this problem java.lang.NoClassDefFoundError: org/apache/commons/math3/random/RandomGenerator I read the posts about this problem before, and if I added org.apache.commons commons-math3 3.3 run

Re: org/apache/commons/math3/random/RandomGenerator issue

2014-10-04 Thread Ted Yu
Cycling bits: http://search-hadoop.com/m/JW1q5UX9S1/breeze+spark&subj=Build+error+when+using+spark+with+breeze On Sat, Oct 4, 2014 at 12:59 PM, anny9699 wrote: > Hi, > > I use the breeze.stats.distributions.Bernoulli in my code, however met this > problem > java.lang.NoClassDefFoundError: > org/

Using FunSuite to test Spark throws NullPointerException

2014-10-04 Thread Mario Pastorelli
I would like to use FunSuite to test my Spark jobs by extending FunSuite with a new function, called |localTest|, that runs a test with a default SparkContext: |class SparkFunSuite extends FunSuite { def localTest(name:

Re: org/apache/commons/math3/random/RandomGenerator issue

2014-10-04 Thread 陈韵竹
Hi Ted, So according to previous posts, the problem should be solved by changing the spark-1.1.0 core pom file? Thanks! On Sat, Oct 4, 2014 at 1:06 PM, Ted Yu wrote: > Cycling bits: > http://search-hadoop.com/m/JW1q5UX9S1/breeze+spark&subj=Build+error+when+using+spark+with+breeze > > On Sat, O

Re: org/apache/commons/math3/random/RandomGenerator issue

2014-10-04 Thread Ted Yu
See the last comment in that thread from Xiangrui: bq. include breeze in the dependency set of your project. Do not rely on transitive dependencies Cheers On Sat, Oct 4, 2014 at 1:48 PM, 陈韵竹 wrote: > Hi Ted, > > So according to previous posts, the problem should be solved by changing > the spa

Re: org/apache/commons/math3/random/RandomGenerator issue

2014-10-04 Thread 陈韵竹
Hi Ted, I did include explicitly breeze in my pom.xml org.scalanlp breeze_${scala.binary.version} 0.9 But this error message still appears. Thanks! On Sat, Oct 4, 2014 at 2:03 PM, Ted Yu wrote: > See the last comment in that thread from Xiangrui: > > bq. incl

Re: org/apache/commons/math3/random/RandomGenerator issue

2014-10-04 Thread Ted Yu
breeze jar doesn't contain RandomGenerator class. Have you tried with commons-math3 3.1.1 in your pom.xml ? Let us know if you still encounter problems. Cheers On Sat, Oct 4, 2014 at 2:11 PM, 陈韵竹 wrote: > Hi Ted, > > I did include explicitly breeze in my pom.xml > > > > org.scalanl

Re: org/apache/commons/math3/random/RandomGenerator issue

2014-10-04 Thread anny9699
Hi Ted, I tried including org.apache.commons commons-math3 3.3 in my pom file and adding this jar to my classpath. However this error still appears as Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/math3/random/RandomGenerator at breeze.s

Re: org/apache/commons/math3/random/RandomGenerator issue

2014-10-04 Thread Ted Yu
The commons-math3 dependency should have brought down the following jar: ~/.m2/repository/org/apache/commons/commons-math3/3.1.1/commons-math3-3.1.1.jar $ jar tvf commons-math3-3.1.1.jar | grep RandomGenerator 200 Wed Jan 09 17:13:38 UTC 2013 org/apache/commons/math3/random/NormalizedRandomGene

Dstream Transformations

2014-10-04 Thread Jahagirdar, Madhu
In my spark streaming program I have created kafka utils to receive data and store data in elastic search and in flume. Storing function is applied on same dstream. My question what is the behavior of spark if after storing data in elastic search the worker node dies before storing in flume? Doe

Asynchronous Broadcast from driver to workers, is it possible?

2014-10-04 Thread Peng Cheng
While Spark already offers support for asynchronous reduce (collect data from workers, while not interrupting execution of a parallel transformation) through accumulator, I have made little progress on making this process reciprocal, namely, to broadcast data from driver to workers to be used by al

mllib sparse vector/matrix vs. graphx graph

2014-10-04 Thread ll
hi. i am working on an algorithm that has a graph data structure. it looks like there 2 ways to implement this with spark option 1: use graphx which already provide Vetices and Edges to build out the graph pretty nicely. option 2: use mllib sparse vector / matrix to build out the graph. th

Re: org/apache/commons/math3/random/RandomGenerator issue

2014-10-04 Thread anny9699
Thanks Ted this is working now! Previously I added another commons-math3 jar to my classpath and that one doesn't work. This one included by maven seems to work. Thanks a lot! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/org-apache-commons-math3-random-