Kafka Streaming - Error Could not compute split

2014-06-22 Thread Kanwaldeep
We are using Spark 1.0.0 deployed on Spark Standalone cluster and I'm getting the following exception. With previous version I've seen this error occur along with OutOfMemory errors which I'm not seeing with Sparks 1.0. Any suggestions? Job aborted due to stage failure: Task 3748.0:20 failed 4 ti

Re: Need help. Spark + Accumulo => Error: java.lang.NoSuchMethodError: org.apache.commons.codec.binary.Base64.encodeBase64String

2014-06-22 Thread anoldbrain
I used Java Decompiler to check the included "org.apache.commons.codec.binary.Base64" .class file (in spark-assembly jar file) and for both "encodeBase64" and "decodeBase64", there is only (byte []) version and no encodeBase64/decodeBase64(String). I have encountered the reported issue. This confl

Re: Need help. Spark + Accumulo => Error: java.lang.NoSuchMethodError: org.apache.commons.codec.binary.Base64.encodeBase64String

2014-06-22 Thread Sean Owen
No, this is just standard Maven informational license info in META-INF. It is not going to affect runtime behavior or how classes are loaded. On Mon, Jun 23, 2014 at 6:30 AM, anoldbrain wrote: > I checked the META-INF/DEPENDENCIES file in the spark-assembly jar from > official 1.0.0 binary releas

Re: hi

2014-06-22 Thread Akhil Das
Open your webUI in the browser and see the spark url in the top left corner of the page and use it while starting your spark shell instead of localhost:7077. Thanks Best Regards On Mon, Jun 23, 2014 at 10:56 AM, rapelly kartheek wrote: > Hi > Can someone help me with the following error that

Re: Persistent Local Node variables

2014-06-22 Thread Daedalus
Will using mapPartitions and creating a new RDD of ParsedData objects avoid multiple parsing? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Persistent-Local-Node-variables-tp8104p8107.html Sent from the Apache Spark User List mailing list archive at Nabble

Re: MLLib sample data format

2014-06-22 Thread Justin Yip
I see. That's good. Thanks. Justin On Sun, Jun 22, 2014 at 4:59 PM, Evan Sparks wrote: > Oh, and the movie lens one is userid::movieid::rating > > - Evan > > On Jun 22, 2014, at 3:35 PM, Justin Yip wrote: > > Hello, > > I am looking into a couple of MLLib data files in > https://github.com/ap

Persistent Local Node variables

2014-06-22 Thread Daedalus
*TL;DR:* I want to run a pre-processing step on the data from each partition (such as parsing) and retain the parsed object on each node for future processing calls to avoid repeated parsing. /More detail:/ I have a server and two nodes in my cluster, and data partitioned using hdfs. I am trying

Re: hi

2014-06-22 Thread Sourav Chandra
Please check what is the spark master url. Set that url while launching spark-shell You can get it from the terminal where spark master is running or from cluster ui. http://:8080 Thanks, Sourav On Mon, Jun 23, 2014 at 10:56 AM, rapelly kartheek wrote: > Hi > Can someone help me with the fo

Re: Need help. Spark + Accumulo => Error: java.lang.NoSuchMethodError: org.apache.commons.codec.binary.Base64.encodeBase64String

2014-06-22 Thread anoldbrain
I checked the META-INF/DEPENDENCIES file in the spark-assembly jar from official 1.0.0 binary release for CDH4, and found one "commons-codec" entry From: 'The Apache Software Foundation' (http://jakarta.apache.org) - Codec (http://jakarta.apache.org/commons/codec/) commons-codec:commons-codec:ja

hi

2014-06-22 Thread rapelly kartheek
Hi Can someone help me with the following error that I faced while setting up single node spark framework. karthik@karthik-OptiPlex-9020:~/spark-1.0.0$ MASTER=spark://localhost:7077 sbin/spark-shell bash: sbin/spark-shell: No such file or directory karthik@karthik-OptiPlex-9020:~/spark-1.0.0$ MA

Re: Shark vs Impala

2014-06-22 Thread Matei Zaharia
In this benchmark, the problem wasn’t that Shark could not run without enough memory; Shark spills some of the data to disk and can run just fine. The issue was that the in-memory form of the RDDs was larger than the cluster’s memory, although the raw Parquet / ORC files did fit in memory, so Cl

Re: MLLib sample data format

2014-06-22 Thread Evan Sparks
Oh, and the movie lens one is userid::movieid::rating - Evan > On Jun 22, 2014, at 3:35 PM, Justin Yip wrote: > > Hello, > > I am looking into a couple of MLLib data files in > https://github.com/apache/spark/tree/master/data/mllib. But I cannot find any > explanation for these files? Does a

Re: MLLib sample data format

2014-06-22 Thread Evan Sparks
These files follow the libsvm format where each line is a record, the first column is a label, and then after that the fields are offset:value where offset is the offset into the feature vector, and value is the value of the input feature. This is a fairly efficient representation for sparse b

Re: MLLib sample data format

2014-06-22 Thread Justin Yip
Hi Shuo, Yes. I was reading the guide as well as the sample code. For example, in http://spark.apache.org/docs/latest/mllib-linear-methods.html#linear-support-vector-machine-svm, now where in the github repository I can find the file: sc.textFile( "mllib/data/ridge-data/lpsa.data"). Thanks. Jus

Re: MLLib sample data format

2014-06-22 Thread Justin Yip
Hi Shuo, Yes. I was reading the guide as well as the sample code. For example, in http://spark.apache.org/docs/latest/mllib-linear-methods.html#linear-support-vector-machine-svm, nowhere in the github repository I can find the file: sc.textFile( "mllib/data/ridge-data/lpsa.data"). Thanks. Justi

Re: MLLib sample data format

2014-06-22 Thread Shuo Xiang
Hi, you might find http://spark.apache.org/docs/latest/mllib-guide.html helpful. On Sun, Jun 22, 2014 at 2:35 PM, Justin Yip wrote: > Hello, > > I am looking into a couple of MLLib data files in > https://github.com/apache/spark/tree/master/data/mllib. But I cannot find > any explanation for th

MLLib sample data format

2014-06-22 Thread Justin Yip
Hello, I am looking into a couple of MLLib data files in https://github.com/apache/spark/tree/master/data/mllib. But I cannot find any explanation for these files? Does anyone know if they are documented? Thanks. Justin

Re: Shark vs Impala

2014-06-22 Thread Debasish Das
600s for Spark vs 5s for Redshift...The numbers look much different from the amplab benchmark... https://amplab.cs.berkeley.edu/benchmark/ Is it like SSDs or something that's helping redshift or the whole data is in memory when you run the query ? Could you publish the query ? Also after spark-s

Re: Shark vs Impala

2014-06-22 Thread Toby Douglass
I've just benchmarked Spark and Impala. Same data (in s3), same query, same cluster. Impala has a long load time, since it cannot load directly from s3. I have to create a Hive table on s3, then insert from that to an Impala table. This takes a long time; Spark took about 600s for the query, Imp

Re: Shark vs Impala

2014-06-22 Thread Bertrand Dechoux
For the second question, I would say it is mainly because the projects have not the same aim. Impala does have a "cost-based optimizer and predicate propagation capability" which is natural because it is interpreting pseudo-SQL query. In the realm of relational database, it is often not a good idea

Shark vs Impala

2014-06-22 Thread Flavio Pompermaier
Hi folks, I was looking at the benchmark provided by Cloudera at http://blog.cloudera.com/blog/2014/05/new-sql-choices-in-the-apache-hadoop-ecosystem-why-impala-continues-to-lead/ . Is it real that Shark cannot execute some query if you don't have enough memory? And is it true/reliable that Impala

InputStreamsSuite test failed

2014-06-22 Thread crazymb
Hello ,I am a new guy on scala &spark, yestday i compile spark from 1.0.0 source code and run test,there is and testcase failed: For example run command in shell : sbt/sbt "testOnly org.apache.spark.streaming.InputStreamsSuite" the testcase: test("socket input stream") would

Re: Spark throws NoSuchFieldError when testing on cluster mode

2014-06-22 Thread Peng Cheng
Right problem solved in a most disgraceful manner. Just add a package relocation in maven shade config. The downside is that it is not compatible with my IDE (IntelliJ IDEA), will cause: Error:scala.reflect.internal.MissingRequirementError: object scala.runtime in compiler mirror not found.: objec

Re: Using Spark

2014-06-22 Thread Ricky Thomas
Awesome, thanks On Sunday, June 22, 2014, Matei Zaharia wrote: > Alright, added you. > > On Jun 20, 2014, at 2:52 PM, Ricky Thomas > wrote: > > Hi, > > Would like to add ourselves to the user list if possible please? > > Company: truedash > url: truedash.io > > Automatic pulling of all your dat