We are using Spark 1.0.0 deployed on Spark Standalone cluster and I'm getting
the following exception. With previous version I've seen this error occur
along with OutOfMemory errors which I'm not seeing with Sparks 1.0.
Any suggestions?
Job aborted due to stage failure: Task 3748.0:20 failed 4 ti
I used Java Decompiler to check the included
"org.apache.commons.codec.binary.Base64" .class file (in spark-assembly jar
file) and for both "encodeBase64" and "decodeBase64", there is only (byte
[]) version and no encodeBase64/decodeBase64(String).
I have encountered the reported issue. This confl
No, this is just standard Maven informational license info in
META-INF. It is not going to affect runtime behavior or how classes
are loaded.
On Mon, Jun 23, 2014 at 6:30 AM, anoldbrain wrote:
> I checked the META-INF/DEPENDENCIES file in the spark-assembly jar from
> official 1.0.0 binary releas
Open your webUI in the browser and see the spark url in the top left corner
of the page and use it while starting your spark shell instead of
localhost:7077.
Thanks
Best Regards
On Mon, Jun 23, 2014 at 10:56 AM, rapelly kartheek
wrote:
> Hi
> Can someone help me with the following error that
Will using mapPartitions and creating a new RDD of ParsedData objects avoid
multiple parsing?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Persistent-Local-Node-variables-tp8104p8107.html
Sent from the Apache Spark User List mailing list archive at Nabble
I see. That's good. Thanks.
Justin
On Sun, Jun 22, 2014 at 4:59 PM, Evan Sparks wrote:
> Oh, and the movie lens one is userid::movieid::rating
>
> - Evan
>
> On Jun 22, 2014, at 3:35 PM, Justin Yip wrote:
>
> Hello,
>
> I am looking into a couple of MLLib data files in
> https://github.com/ap
*TL;DR:* I want to run a pre-processing step on the data from each partition
(such as parsing) and retain the parsed object on each node for future
processing calls to avoid repeated parsing.
/More detail:/
I have a server and two nodes in my cluster, and data partitioned using
hdfs.
I am trying
Please check what is the spark master url. Set that url while launching
spark-shell
You can get it from the terminal where spark master is running or from
cluster ui. http://:8080
Thanks,
Sourav
On Mon, Jun 23, 2014 at 10:56 AM, rapelly kartheek
wrote:
> Hi
> Can someone help me with the fo
I checked the META-INF/DEPENDENCIES file in the spark-assembly jar from
official 1.0.0 binary release for CDH4, and found one "commons-codec" entry
From: 'The Apache Software Foundation' (http://jakarta.apache.org)
- Codec (http://jakarta.apache.org/commons/codec/)
commons-codec:commons-codec:ja
Hi
Can someone help me with the following error that I faced while setting
up single node spark framework.
karthik@karthik-OptiPlex-9020:~/spark-1.0.0$ MASTER=spark://localhost:7077
sbin/spark-shell
bash: sbin/spark-shell: No such file or directory
karthik@karthik-OptiPlex-9020:~/spark-1.0.0$ MA
In this benchmark, the problem wasn’t that Shark could not run without enough
memory; Shark spills some of the data to disk and can run just fine. The issue
was that the in-memory form of the RDDs was larger than the cluster’s memory,
although the raw Parquet / ORC files did fit in memory, so Cl
Oh, and the movie lens one is userid::movieid::rating
- Evan
> On Jun 22, 2014, at 3:35 PM, Justin Yip wrote:
>
> Hello,
>
> I am looking into a couple of MLLib data files in
> https://github.com/apache/spark/tree/master/data/mllib. But I cannot find any
> explanation for these files? Does a
These files follow the libsvm format where each line is a record, the first
column is a label, and then after that the fields are offset:value where offset
is the offset into the feature vector, and value is the value of the input
feature.
This is a fairly efficient representation for sparse b
Hi Shuo,
Yes. I was reading the guide as well as the sample code.
For example, in
http://spark.apache.org/docs/latest/mllib-linear-methods.html#linear-support-vector-machine-svm,
now where in the github repository I can find the file: sc.textFile(
"mllib/data/ridge-data/lpsa.data").
Thanks.
Jus
Hi Shuo,
Yes. I was reading the guide as well as the sample code.
For example, in
http://spark.apache.org/docs/latest/mllib-linear-methods.html#linear-support-vector-machine-svm,
nowhere in the github repository I can find the file: sc.textFile(
"mllib/data/ridge-data/lpsa.data").
Thanks.
Justi
Hi, you might find http://spark.apache.org/docs/latest/mllib-guide.html
helpful.
On Sun, Jun 22, 2014 at 2:35 PM, Justin Yip wrote:
> Hello,
>
> I am looking into a couple of MLLib data files in
> https://github.com/apache/spark/tree/master/data/mllib. But I cannot find
> any explanation for th
Hello,
I am looking into a couple of MLLib data files in
https://github.com/apache/spark/tree/master/data/mllib. But I cannot find
any explanation for these files? Does anyone know if they are documented?
Thanks.
Justin
600s for Spark vs 5s for Redshift...The numbers look much different from
the amplab benchmark...
https://amplab.cs.berkeley.edu/benchmark/
Is it like SSDs or something that's helping redshift or the whole data is
in memory when you run the query ? Could you publish the query ?
Also after spark-s
I've just benchmarked Spark and Impala. Same data (in s3), same query,
same cluster.
Impala has a long load time, since it cannot load directly from s3. I have
to create a Hive table on s3, then insert from that to an Impala table.
This takes a long time; Spark took about 600s for the query, Imp
For the second question, I would say it is mainly because the projects have
not the same aim. Impala does have a "cost-based optimizer and predicate
propagation capability" which is natural because it is interpreting
pseudo-SQL query. In the realm of relational database, it is often not a
good idea
Hi folks,
I was looking at the benchmark provided by Cloudera at
http://blog.cloudera.com/blog/2014/05/new-sql-choices-in-the-apache-hadoop-ecosystem-why-impala-continues-to-lead/
.
Is it real that Shark cannot execute some query if you don't have enough
memory?
And is it true/reliable that Impala
Hello ,I am a new guy on scala &spark, yestday i compile spark from 1.0.0
source code and run test,there is and testcase failed:
For example run command in shell : sbt/sbt "testOnly
org.apache.spark.streaming.InputStreamsSuite"
the testcase: test("socket input stream") would
Right problem solved in a most disgraceful manner. Just add a package
relocation in maven shade config.
The downside is that it is not compatible with my IDE (IntelliJ IDEA), will
cause:
Error:scala.reflect.internal.MissingRequirementError: object scala.runtime
in compiler mirror not found.: objec
Awesome, thanks
On Sunday, June 22, 2014, Matei Zaharia wrote:
> Alright, added you.
>
> On Jun 20, 2014, at 2:52 PM, Ricky Thomas > wrote:
>
> Hi,
>
> Would like to add ourselves to the user list if possible please?
>
> Company: truedash
> url: truedash.io
>
> Automatic pulling of all your dat
24 matches
Mail list logo