How did you recompile and deploy Spark to your cluster? it sounds like
a problem with not getting the assembly deployed correctly, rather
than your app.
On Tue, Oct 14, 2014 at 10:35 PM, Tamas Sandor wrote:
> Hi,
>
> I'm rookie in spark, but hope someone can help me out. I'm writing an app
> that
Examine the output (replace $YARN_APP_ID in the following with the
"application identifier" output by the previous command) (Note:
YARN_APP_LOGS_DIR is usually /tmp/logs or $HADOOP_HOME/logs/userlogs
depending on the Hadoop version.)
$ cat $YARN_APP_LOGS_DIR/$YARN_APP_ID/container*_01/stdout.
Hi Jimmy,
Did you try my patch?
The problem on my side was that the hadoop.tmp.dir (in hadoop core-site.xml)
was not handled properly by Spark when it is set on multiple partitions/disks,
i.e.:
hadoop.tmp.dir
file:/d1/yarn/local,file:/d2/yarn/local,file:/d3/yarn/local,file:/d4/yarn/local,
Hi,
we are Spark users and we use some Spark's test classes for our own application
unit tests. We use LocalSparkContext and SharedSparkContext. But these classes
are not included in the spark-core library. This is a good option as it's not a
good idea to include test classes in the runtime ja
Hi,
We really would like to use Spark but we can’t because we have a secure HDFS
environment (Cloudera).
I understood https://issues.apache.org/jira/browse/SPARK-2541 contains a patch.
Can one of the committers please take a look?
Thanks!
Erik.
—
Erik van Oosten
http://www.day-to-day-stu
Hi ,I'm pretty new to Big Data & Spark both. I've just started POC work on
spark and me & my team are evaluating it with other In Memory computing
tools such as GridGain, Bigmemory, Aerospike & some others too, specifically
to solve two sets of problems.1) Data Storage : Our current application
ru
Hi,
How large is the dataset you're saving into S3?
Actually saving to S3 is done in two steps:
1) writing temporary files
2) commiting them to proper directory
Step 2) could be slow because S3 do not have a quick atomic "move"
operation, you have to copy (server side but still takes time) and the
Did you manage to solve this issue ?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-in-cluster-and-errors-tp16249p16479.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Is this the spark uri (spark://host1:7077) that you are seeing in your
clusters webui? (http://master-host:8080) top left side of the page.
Thanks
Best Regards
On Wed, Oct 15, 2014 at 12:18 PM, Theodore Si wrote:
> Can anyone help me, please?
>
> 在 10/14/2014 9:58 PM, Theodore Si 写道:
>
> Hi al
Hi,
I have been working with Spark for a few weeks. I do not yet understand how
I should organize my dev and production environment.
Currently, I am using the IPython Notebook, I usually write tests scripts on
my mac with some very small data. Then when I am ready, I launch my script
on servers
I don't know why the JavaSchemaRDD.baseSchemaRDD is private[sql]. And I found
that DataTypeConversions is protected[sql].
Finally I find this solution:
jrdd.registerTempTable("transform_tmp")
jrdd.sqlContext.sql("select * from transform_tmp")
Could Any One tell me that: Is it
which means the details are not persisted and hence any failures in workers
and master wouldnt start the daemons normally ..right ?
On Wed, Oct 15, 2014 at 12:17 PM, Prashant Sharma [via Apache Spark User
List] wrote:
> [Removing dev lists]
>
> You are absolutely correct about that.
>
> Prashant
So if you need those features you can go ahead and setup one of Filesystem
or zookeeper options. Please take a look at:
http://spark.apache.org/docs/latest/spark-standalone.html.
Prashant Sharma
On Wed, Oct 15, 2014 at 3:25 PM, Chitturi Padma <
learnings.chitt...@gmail.com> wrote:
> which mean
I just ran the same code and it is running perfectly fine on my machine.
These are the things on my end:
- Spark version: 1.1.0
- Gave full path to the negative and positive files
- Set twitter auth credentials in the environment.
And here's the code:
import org.apache.spark.SparkContext
> impor
Besides the host1 question what can also happen is that you give the worker
more memory than available (try a value 1G below the memory available just
to be sure for example)
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Initial-job-has-not-accepted-any-re
What results do you want?
If your pair is like (a, b), where "a" is the key and "b" is the value, you
can try
rdd1 = rdd1.flatMap(lambda l: l)
and then use cogroup.
Best
Gen
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-make-operation-like-cogrop
Hi,
I am using 1.1.0. I did set my twitter credentials and I am using the full
path. I did not paste this in the public post. I am running on a cluster
and getting the exception. Are you running in local or standalone mode?
Thanks
On Oct 15, 2014 3:20 AM, "Akhil Das" wrote:
> I just ran the sam
I ran it in both local and standalone, it worked for me. It does throws a
bind exception which is normal since we are using both SparkContext and
StreamingContext.
Thanks
Best Regards
On Wed, Oct 15, 2014 at 5:25 PM, S Krishna wrote:
> Hi,
>
> I am using 1.1.0. I did set my twitter credentials
How did you resolve it?
On Tue, Jul 15, 2014 at 3:50 AM, SK wrote:
> The problem is resolved. Thanks.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/jsonRDD-NoSuchMethodError-tp9688p9742.html
> Sent from the Apache Spark User List mailing list ar
We use Spark 1.1, and HBase 0.98.1-cdh5.1.0, and need to read and write an
HBase table in Spark program.
I notice there are:
spark.driver.extraClassPath
spark.executor.extraClassPathproperties to manage extra ClassPath, over
even an deprecated SPARK_CLASSPATH.
The problem is what classpath or jar
Hi,
The following query in sparkSQL 1.1.0 CLI doesn't work.
*SET hive.metastore.warehouse.dir=/home/spark/hive/warehouse
;
create table test as
select v1.*, v2.card_type, v2.card_upgrade_time_black,
v2.card_upgrade_time_gold
from customer v1 left join customer_loyalty v2
on v1.account_id = v2.ac
Hi,
I have a Spark standalone example application which is working fine.
I'm now trying to integrate this application into a J2EE application, deployed
on JBoss 7.1.1 and accessed via a web service. The JBoss server is installed on
my local machine (Windows 7) and the master Spark is remote (Lin
In order to share an HBase connection pool, we create an object
Object Util {
val HBaseConf = HBaseConfiguration.create
val Connection= HConnectionManager.createConnection(HBaseConf)
}
which would be shared among tasks on the same executor. e.g.
val result = rdd.map(line => {
val table
It is wonderful to see some idea.
Now the questions:
1) What is a track segment?
Ans) It is the line that contains two adjacent points when all points are
arranged by time. Say a vehicle moves (t1, p1) -> (t2, p2) -> (t3, p3).
Then the segments are (p1, p2), (p2, p3) when the time ordering is (t1
+user@hbase
2014-10-15 20:48 GMT+08:00 Fengyun RAO :
> We use Spark 1.1, and HBase 0.98.1-cdh5.1.0, and need to read and write an
> HBase table in Spark program.
>
> I notice there are:
> spark.driver.extraClassPath
> spark.executor.extraClassPathproperties to manage extra ClassPath, over
> even
Ok,
I understand.
But in both cases the data are in the same processing node.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/A-question-about-streaming-throughput-tp16416p16501.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Have you tried the following ?
val result = rdd.map(line => { val table = Util.Connection.getTable("user")
...
Util.Connection.close() }
On Wed, Oct 15, 2014 at 6:09 AM, Fengyun RAO wrote:
> In order to share an HBase connection pool, we create an object
>
> Object Util {
> val HBaseConf =
Pardon me - there was typo in previous email.
Calling table.close() is the recommended approach.
HConnectionManager does reference counting. When all references to the
underlying connection are gone, connection would be released.
Cheers
On Wed, Oct 15, 2014 at 7:13 AM, Ted Yu wrote:
> Have you
I am writing to HBase, following are my options:
export SPARK_CLASSPATH=/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol.jar
spark-submit \
--jars
/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol.jar,/opt/cloudera/parcels/CDH/lib/hbase/hbase-common.jar,/opt/cloudera/parcels/CDH/lib/hbase
This is still happening to me on mesos. Any workarounds?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Worker-crashing-and-Master-not-seeing-recovered-worker-tp2312p16506.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
It looks like you're making the StreamingContext and SparkContext
separately from the same conf. Instead, how about passing the
SparkContext to the StreamingContext constructor? it seems like better
practice and is a guess at the problem cause.
On Tue, Oct 14, 2014 at 9:13 PM, SK wrote:
> Hi,
>
>
hi there... is there any other matrix operations in addition to multiply()?
like addition or dot product?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/matrix-operations-tp16508.html
Sent from the Apache Spark User List mailing list archive at Nabble.co
hi.. it looks like RowMatrix.multiply() takes a local Matrix as a parameter
and returns the result as a distributed RowMatrix.
how do you perform this series of multiplications if A, B, C, and D are all
RowMatrix?
((A x B) x C) x D)
thanks!
--
View this message in context:
http://apache-sp
Hi Yin,
pqt_rdt_snappy has 76 columns. These two parquet tables were created via Hive
0.12 from existing Avro data using CREATE TABLE following by an INSERT
OVERWRITE. These are partitioned tables - pqt_rdt_snappy has one partition
while pqt_segcust_snappy has two partitions. For pqt_segcust_sn
I guess I was a little light on the details in my haste. I'm using Spark on
YARN, and this is in the driver process in yarn-client mode (most notably
spark-shell). I've had to manually add a bunch of JARs that I had thought it
would just pick up like everything else does:
export
SPARK_SUBMIT
>From this line : Removing executor app-20141015142644-0125/0 because it is
EXITED I would guess that you need to examine the executor log to see why
the executor actually exited. My guess would be that the executor cannot
connect back to your driver. But check the log from the executor. It should
Hi,
I am trying to persist the files generated as a result of Naive bayes
training with MLlib. These comprise of the model file, label index(own
class) and term dictionary(own class). I need to save them on an HDFS
location and then deserialize when needed for prediction.
How can I do the same wi
Hi,
We are currently working on distributed matrix operations. Two RowMatrices
cannot be currently multiplied together. Neither can be they be added. They
functionality will be added soon.
You can of course achieve this yourself by using IndexedRowMatrix and doing
one join per operation you reques
Hi,
I compiled spark 1.1.0 with CDH 4.6 but when I try to get spark-sql cli up,
it gives error:
==
[atangri@pit-uat-hdputil1 bin]$ ./spark-sql
Spark assembly has been built with Hive, including Datanucleus jars on
classpath
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option
M
I see Hive 0.10.0 metastore sql does not have a VERSION table but spark is
looking for it.
Anyone else faced this issue or any ideas on how to fix it ?
Thanks,
Anurag Tangri
On Wed, Oct 15, 2014 at 10:51 AM, Anurag Tangri wrote:
> Hi,
> I compiled spark 1.1.0 with CDH 4.6 but when I try to
Hi Anurag,
Spark SQL (from the Spark standard distribution / sources) currently
requires Hive 0.12; as you mention, CDH4 has Hive 0.10, so that's not
gonna work.
CDH 5.2 ships with Spark 1.1.0 and is modified so that Spark SQL can
talk to the Hive 0.13.1 that is also bundled with CDH, so if that'
Hi Marcelo,
Exactly. Found it few minutes ago.
I ran mysql hive 12 sql on my hive 10 metastore, which created missing
tables and it seems to be working now.
Not sure if everything else in CDH 4.6/Hive 10 would also still be working
though or not.
Looks like we cannot use Spark SQL in a clean way
Hi there, I'm running spark on ec2, and am running into an error there that
I don't get locally. Here's the error:
11335 [handle-read-write-executor-3] ERROR
org.apache.spark.network.SendingConnection - Exception while reading
SendingConnection to ConnectionManagerId([IP HERE])
java.nio.channels.
You are right. Creating the StreamingContext from the SparkContext instead of
SparkConf helped. Thanks for the help.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Sentiment-Analysis-of-Twitter-streams-tp16410p16520.html
Sent from the Apache
Hi,
My setup: tomcat (running a web app which initializes SparkContext) and
dedicated Spark cluster (1 master 2 workers, 1VM per each).
I am able to properly start this setup where SparkContext properly
initializes connection with master. I am able to execute tasks and perform
required calculation
Hi All,
I figured out what the problem was. Thank you Sean for pointing me in the
right direction. All the jibber jabber about empty DStream / RDD was all
just pure nonsense [?] . I guess the sequence of events (the fact that spark
streaming started crashing just after I implemented the reduceByke
Hi,
As a result of a reduction operation, the resultant value "score" is a
DStream[Int] . How can I get the simple Int value?
I tried score[0], and score._1, but neither worked and can't find a
getValue() in the DStream API.
thanks
--
View this message in context:
http://apache-spark-user
Hi,
I am evaluating Sparking Streaming with kafka and i found that spark streaming
is slower than Spark. It took more time is processing same amount of data as
per the Spark Console it can process 2300 Records per seconds.
Is my assumption is correct? Spark Streaming has to do a lot of this along
Hi Greg,
I'm not sure exactly what it is that you're trying to achieve, but I'm
pretty sure those variables are not supposed to be set by users. You
should take a look at the documentation for
"spark.driver.extraClassPath" and "spark.driver.extraLibraryPath", and
the equivalent options for executo
Hi,
I want to check the DEBUG log of spark executor on YARN(using yarn-cluster
mode), but
1. yarn daemonlog setlevel DEBUG YarnChild.class
2. set log4j.properties in spark/conf folder on client node.
no means above works.
So how could i set the log level of spark executor* on YARN container to
Hi Eric,
Check the "Debugging Your Application" section at:
http://spark.apache.org/docs/latest/running-on-yarn.html
Long story short: upload your log4j.properties using the "--files"
argument of spark-submit.
(Mental note: we could make the log level configurable via a system property...)
On
Hi Xiangrui,
I am using yarn-cluster mode. The current hadoop cluster is configured to
only accept "yarn-cluster" mode and not allow "yarn-client" mode. I have no
prevelige to change that.
Without initializing with "k-means||", the job finished in 10 minutes. With
"k-means", it just hangs there f
Hi,
I am testing Spark Streaming (local mode, with Kafka). The code is as
follows:
public class LocalStreamTest2 {
public static void main(String[] args) {
JavaSparkContext sc = new JavaSparkContext("local[4]", "Local Stream Test");
JavaStreamingContext ssc = new JavaStreamingContext(sc, new D
Hi -
Has anybody figured out how to integrate a Play application with Spark and run
it on a Spark cluster using spark-submit script? I have seen some blogs about
creating a simple Play app and running it locally on a dev machine with sbt run
command. However, those steps don't work for Spark-su
Hi
Anyone can share a project as a sample? I tried them a couple days ago but
couldn't make it work. Looks like it's due to some Kafka dependency issue.
I'm using sbt-assembly.
Thanks
Gary
I have a Spark application which is running Spark Streaming and Spark
SQL.
I observed the size of shuffle files under "spark.local.dir" folder
keeps increase and never decreases. Eventually it will run
out-of-disk-space error.
The question is: when will Spark delete these shuffle files?
In the ap
Anybody with good hands on with Spark, please do reply. It would help us a
lot!!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Concepts-tp16477p16536.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
I would like to reiterate that I don't have Hive installed on the Hadoop
cluster.
I have some queries on following comment from Cheng Lian-2:
"The Thrift server is used to interact with existing Hive data, and thus
needs Hive Metastore to access Hive catalog. In your case, you need to build
Spark
I got tipped by an expert that the error of "Unsupported language features
in query" that I had was due to the fact that SparkSQL does not support
dynamic partitions, and I can do saveAsParquetFile() for each partition.
My inefficient implementation is to:
//1. run the query without DISTRIBUTE B
Thanks, Ted.
Util.Connection.close() should be called only once, so it can NOT be in a
map function
val result = rdd.map(line => {
val table = Util.Connection.getTable("user")
...
Util.Connection.close()
}
As you mentioned:
Calling table.close() is the recommended approach.
HConnectionMana
I have a spark cluster on mesos and when I run long running GraphX processing
I receive a lot of the following two errors and one by one my slaves stop
doing any work for the process until its idle. Any idea what is happening?
First type of error message:
INFO SendingConnection: Initiating connec
I may have misunderstood your point.
val result = rdd.map(line => {
val table = Util.Connection.getTable("user")
...
table.close()
}
Did you mean this is enough, and there’s no need to call
Util.Connection.close(),
or HConnectionManager.deleteAllConnections()?
Where is the documentation th
Indeed it was a problem on the executor side… I have to figure out how to fix
it now ;-)
Thanks!
Mehdi
De : Yana Kadiyska [mailto:yana.kadiy...@gmail.com]
Envoyé : mercredi 15 octobre 2014 18:32
À : Mehdi Singer
Cc : user@spark.apache.org
Objet : Re: Problem executing Spark via JBoss applicatio
63 matches
Mail list logo