Hye Rachana, could you provide the full jstack outputs? Maybe it's same as
https://issues.apache.org/jira/browse/SPARK-11104
Best Regards,
Shixiong Zhu
2016-01-04 12:56 GMT-08:00 Rachana Srivastava <
rachana.srivast...@markmonitor.com>:
> Hello All,
>
>
>
> I am runni
Just replace `localhost` with a host name that can be accessed by Yarn
containers.
Best Regards,
Shixiong Zhu
2015-12-22 0:11 GMT-08:00 prasadreddy :
> How do we achieve this on yarn-cluster mode
>
> Please advice.
>
> Thanks
> Prasad
>
>
>
> --
> View this me
Looks you need to add an "driver" option to your codes, such as
sqlContext.read.format("jdbc").options(
Map("url" -> "jdbc:oracle:thin:@:1521:xxx",
"driver" -> "oracle.jdbc.driver.OracleDriver",
"dbtable&q
You are right. "checkpointInterval" is only for data checkpointing.
"metadata checkpoint" is done for each batch. Feel free to send a PR to add
the missing doc.
Best Regards,
Shixiong Zhu
2015-12-18 8:26 GMT-08:00 Lan Jiang :
> Need some clarification about the documentat
Looks you have a reference to some Akka class. Could you post your codes?
Best Regards,
Shixiong Zhu
2015-12-17 23:43 GMT-08:00 Pankaj Narang :
> I am encountering below error. Can somebody guide ?
>
> Something similar is one this link
> https://github.com/elastic/elasticsearch-h
Could you check the Scala version of your Kafka?
Best Regards,
Shixiong Zhu
2015-12-18 2:31 GMT-08:00 Christos Mantas :
> Thank you, Luciano, Shixiong.
>
> I thought the "_2.11" part referred to the Kafka version - an unfortunate
> coincidence.
>
> Indeed
>
> s
What's the Scala version of your Spark? Is it 2.10?
Best Regards,
Shixiong Zhu
2015-12-17 10:10 GMT-08:00 Christos Mantas :
> Hello,
>
> I am trying to set up a simple example with Spark Streaming (Python) and
> Kafka on a single machine deployment.
> My Kafka broker/server
l#comment-14506806
Best Regards,
Shixiong Zhu
2015-12-17 4:39 GMT-08:00 Bartłomiej Alberski :
> I prepared simple example helping in reproducing problem:
>
> https://github.com/alberskib/spark-streaming-broadcast-issue
>
> I think that in that way it will be easier for you to under
It doesn't guarantee that. E.g.,
scala> sc.parallelize(Seq(1.0, 2.0, 3.0, 4.0), 2).filter(_ >
2.0).zipWithUniqueId().collect().foreach(println)
(3.0,1)
(4.0,3)
It only guarantees "unique".
Best Regards,
Shixiong Zhu
2015-12-13 10:18 GMT-08:00 Sourav Mazumder :
> H
Could you send a PR to fix it? Thanks!
Best Regards,
Shixiong Zhu
2015-12-08 13:31 GMT-08:00 Richard Marscher :
> Alright I was able to work through the problem.
>
> So the owning thread was one from the executor task launch worker, which
> at least in local mode runs the task and
Which version are you using? Could you post these thread names here?
Best Regards,
Shixiong Zhu
2015-12-07 14:30 GMT-08:00 Richard Marscher :
> Hi,
>
> I've been running benchmarks against Spark in local mode in a long running
> process. I'm seeing threads leaking each
Het Eyal, I just checked the couchbase spark connector jar. The target
version of some of classes are Java 8 (52.0). You can create a ticket in
https://issues.couchbase.com/projects/SPARKC
Best Regards,
Shixiong Zhu
2015-11-26 9:03 GMT-08:00 Ted Yu :
> StoreMode is from Couchbase connec
In addition, if you have more than two text files, you can just put them
into a Seq and use "reduce(_ ++ _)".
Best Regards,
Shixiong Zhu
2015-11-11 10:21 GMT-08:00 Jakob Odersky :
> Hey Jeff,
> Do you mean reading from multiple text files? In that case, as a
> workaround,
x27;s hard for us to find similar
issues in the PR build.
Best Regards,
Shixiong Zhu
2015-11-09 18:47 GMT-08:00 Ted Yu :
> Created https://github.com/apache/spark/pull/9585
>
> Cheers
>
> On Mon, Nov 9, 2015 at 6:39 PM, Josh Rosen
> wrote:
>
>> When we remove this
You should use `SparkConf.set` rather than `SparkConf.setExecutorEnv`. For
driver configurations, you need to set them before starting your
application. You can use the `--conf` argument before running
`spark-submit`.
Best Regards,
Shixiong Zhu
2015-11-04 15:55 GMT-08:00 William Li :
> Hi
"trackStateByKey" is about to be added in 1.6 to resolve the performance
issue of "updateStateByKey". You can take a look at
https://issues.apache.org/jira/browse/SPARK-2629 and
https://github.com/apache/spark/pull/9256
Scala 2.10 REPL javap doesn't support Java7 or Java8. It was fixed in Scala
2.11. See https://issues.scala-lang.org/browse/SI-4936
Best Regards,
Shixiong Zhu
2015-10-15 4:19 GMT+08:00 Robert Dodier :
> Hi,
>
> I am working with Spark 1.5.1 (official release), with Oracle Java
Thanks for reporting it Terry. I submitted a PR to fix it:
https://github.com/apache/spark/pull/9132
Best Regards,
Shixiong Zhu
2015-10-15 2:39 GMT+08:00 Reynold Xin :
> +dev list
>
> On Wed, Oct 14, 2015 at 1:07 AM, Terry Hoo wrote:
>
>> All,
>>
>> Does anyo
In addition, you cannot turn off JobListener and SQLListener now...
Best Regards,
Shixiong Zhu
2015-10-13 11:59 GMT+08:00 Shixiong Zhu :
> Is your query very complicated? Could you provide the output of `explain`
> your query that consumes an excessive amount of memory? If this is a
Is your query very complicated? Could you provide the output of `explain`
your query that consumes an excessive amount of memory? If this is a small
query, there may be a bug that leaks memory in SQLListener.
Best Regards,
Shixiong Zhu
2015-10-13 11:44 GMT+08:00 Nicholas Pritchard
Could you print the content of RDD to check if there are multiple values
for a key in a batch?
Best Regards,
Shixiong Zhu
2015-10-12 18:25 GMT+08:00 Sathiskumar :
> I'm running a Spark Streaming application for every 10 seconds, its job is
> to
> consume data from kafka, transfor
You don't need to care about this sleep. It runs in a separate thread and
usually won't affect the performance of your application.
Best Regards,
Shixiong Zhu
2015-10-09 6:03 GMT+08:00 yael aharon :
> Hello,
> I am working on improving the performance of our Spark on Yar
Each ReceiverInputDStream will create one Receiver. If you only use
one ReceiverInputDStream, there will be only one Receiver in the cluster.
But if you create multiple ReceiverInputDStreams, there will be multiple
Receivers.
Best Regards,
Shixiong Zhu
2015-10-12 23:47 GMT+08:00 Something
Could you show how did you set the configurations? You need to set these
configurations before creating SparkContext and SQLContext.
Moreover, the history sever doesn't support SQL UI. So
"spark.eventLog.enabled=true" doesn't work now.
Best Regards,
Shixiong Zhu
2015-
Which mode are you using? For standalone, it's
org.apache.spark.deploy.worker.Worker. For Yarn and Mesos, Spark just
submits its request to them and they will schedule processes for Spark.
Best Regards,
Shixiong Zhu
2015-10-12 20:12 GMT+08:00 Muhammad Haseeb Javed <11besemja...@seec
Do you have the full stack trace? Could you check if it's same as
https://issues.apache.org/jira/browse/SPARK-10422
Best Regards,
Shixiong Zhu
2015-10-01 17:05 GMT+08:00 Eyad Sibai :
> Hi
>
> I am trying to call .persist() on a dataframe but once I execute the next
>
Do you have the log? Looks like some exceptions in your codes make
SparkContext stopped.
Best Regards,
Shixiong Zhu
2015-09-30 17:30 GMT+08:00 tranan :
> Hello All,
>
> I have several Spark Streaming applications running on Standalone mode in
> Spark 1.5. Spark is currently set up
Do you have the log file? It may be because of wrong settings.
Best Regards,
Shixiong Zhu
2015-10-01 7:32 GMT+08:00 markluk :
> I setup a new Spark cluster. My worker node is dying with the following
> exception.
>
> Caused by: java.util.concurrent.TimeoutException: Futures tim
Right, you can use SparkContext and SQLContext in multiple threads. They
are thread safe.
Best Regards,
Shixiong Zhu
2015-10-01 4:57 GMT+08:00 :
> Hi all,
>
> I have a process where I do some calculations on each one of the columns
> of a dataframe.
> Intrinsecally, I run acr
I mean JavaSparkContext.setLogLevel. You can use it like this:
JavaStreamingContext jssc = new JavaStreamingContext(sparkConf,
Durations.seconds(2));
jssc.sparkContext().setLogLevel(...);
Best Regards,
Shixiong Zhu
2015-09-29 22:07 GMT+08:00 Ashish Soni :
> I am using Java Streaming cont
"count" Spark jobs will run in parallel.
Moreover, "spark.streaming.concurrentJobs" is an internal configuration and
it may be changed in future.
Best Regards,
Shixiong Zhu
2015-09-26 3:34 GMT+08:00 Atul Kulkarni :
> Can someone please help either by explaining or pointing
have enough space.
Best Regards,
Shixiong Zhu
2015-09-29 1:04 GMT+08:00 swetha :
>
> Hi,
>
> I see a lot of data getting filled locally as shown below from my streaming
> job. I have my checkpoint set to hdfs. But, I still see the following data
> filling my local nodes. Any idea if
Which version are you using? Could you take a look at the new Streaming UI
in 1.4.0?
Best Regards,
Shixiong Zhu
2015-09-29 7:52 GMT+08:00 Siva :
> Hi,
>
> Could someone recommend the monitoring tools for spark streaming?
>
> By extending StreamingListener we can dump the delay i
You can use JavaSparkContext.setLogLevel to set the log level in your codes.
Best Regards,
Shixiong Zhu
2015-09-28 22:55 GMT+08:00 Ashish Soni :
> I am not running it using spark submit , i am running locally inside
> Eclipse IDE , how i set this using JAVA Code
>
> Ashish
>
&
Looks like you returns a "Some(null)" in "compute". If you don't want to
create a RDD, it should return None. If you want to return an empty RDD, it
should return "Some(sc.emptyRDD)".
Best Regards,
Shixiong Zhu
2015-09-15 2:51 GMT+08:00 Juan Rodríguez Hortalá
compute: this will run in the executor and the location is not
guaranteed. E.g.,
DStream.foreachRDD(rdd => rdd.foreach { v =>
println(v)
})
"println(v)" is called in the executor.
Best Regards,
Shixiong Zhu
2015-09-17 3:47 GMT+08:00 Renyi Xiong :
> Hi,
>
> I want to d
Looks like you have an incompatible hbase-default.xml in some place. You
can use the following code to find the location of "hbase-default.xml"
println(Thread.currentThread().getContextClassLoader().getResource("hbase-default.xml"))
Best Regards,
Shixiong Zhu
2015-09-21
You can change "spark.sql.broadcastTimeout" to increase the timeout. The
default value is 300 seconds.
Best Regards,
Shixiong Zhu
2015-09-24 15:16 GMT+08:00 Eyad Sibai :
> I am trying to join two tables using dataframes using python 3.4 and I am
> getting the following error
&
BTW, did you already update your standalone cluster to use the same Spark
package?
Best Regards,
Shixiong Zhu
2015-09-07 0:02 GMT+08:00 Shixiong Zhu :
> Looks there are some circular references in SQL making the immutable List
> serialization fail in 2.11.
>
> In 2.11, Scala immutab
tream(i)
i1.readObject()
Could you provide the "explain" output? It would be helpful to find the
circular references.
Best Regards,
Shixiong Zhu
2015-09-05 0:26 GMT+08:00 Jeff Jones :
> We are using Scala 2.11 for a driver program that is running Spark SQL
> queries in a sta
The folder is in "/tmp" by default. Could you use "df -h" to check the free
space of /tmp?
Best Regards,
Shixiong Zhu
2015-09-05 9:50 GMT+08:00 shenyan zhen :
> Has anyone seen this error? Not sure which dir the program was trying to
> write to.
>
> I am runni
That's two jobs. `SparkPlan.executeTake` will call `runJob` twice in this
case.
Best Regards,
Shixiong Zhu
2015-08-25 14:01 GMT+08:00 Cheng, Hao :
> O, Sorry, I miss reading your reply!
>
>
>
> I know the minimum tasks will be 2 for scanning, but Jeff is talking about
&
Hao,
I can reproduce it using the master branch. I'm curious why you cannot
reproduce it. Did you check if the input HadoopRDD did have two partitions?
My test code is
val df = sqlContext.read.json("examples/src/main/resources/people.json")
df.show()
Best Regards,
Shixiong Zhu
main/scala/org/apache/spark/sql/execution/SparkPlan.scala#L185
Best Regards,
Shixiong Zhu
2015-08-25 8:11 GMT+08:00 Jeff Zhang :
> Hi Cheng,
>
> I know that sqlContext.read will trigger one spark job to infer the
> schema. What I mean is DataFrame#show cost 2 spark jobs. So overall
file. Could you convert your
data to String using "map" and use "saveAsTextFile" or other "save" methods?
Best Regards,
Shixiong Zhu
2015-08-14 11:02 GMT+08:00 kale <805654...@qq.com>:
>
>
>
>
Oh, I see. That's the total time of executing a query in Spark. Then the
difference is reasonable, considering Spark has much more work to do, e.g.,
launching tasks in executors.
Best Regards,
Shixiong Zhu
2015-07-26 16:16 GMT+08:00 Louis Hust :
> Look at the given url:
>
> Code c
Could you clarify how you measure the Spark time cost? Is it the total time
of running the query? If so, it's possible because the overhead of
Spark dominates for small queries.
Best Regards,
Shixiong Zhu
2015-07-26 15:56 GMT+08:00 Jerrick Hoang :
> how big is the dataset? how compli
directly. E.g.,
val r1 = context.wholeTextFiles(...)
val r2 = r1.flatMap(s -> ...)
r2.persist(StorageLevel.MEMORY)
val r3 = r2.filter(...)...
r3.saveAsTextFile(...)
val r4 = r2.map(...)...
r4.saveAsTextFile(...)
See
http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence
Best R
MemoryStore.ensureFreeSpace for details.
Best Regards,
Shixiong Zhu
2015-07-09 19:17 GMT+08:00 Dibyendu Bhattacharya <
dibyendu.bhattach...@gmail.com>:
> Hi ,
>
> Just would like to clarify few doubts I have how BlockManager behaves .
> This is mostly in regards to Spark Streaming C
DStream must be Serializable, it's metadata checkpointing. But you can use
KryoSerializer for data checkpointing. The data checkpointing uses
RDD.checkpoint which can be set by spark.serializer.
Best Regards,
Shixiong Zhu
2015-07-08 3:43 GMT+08:00 Chen Song :
> In Spark Streaming, wh
Before running your script, could you confirm that "
/data/software/spark-1.3.1-bin-2.4.0/applications/pss.am.core-1.0-SNAPSHOT-shaded.jar"
exists? You might forget to build this jar.
Best Regards,
Shixiong Zhu
2015-07-06 18:14 GMT+08:00 bit1...@163.com :
> Hi,
> I have follow
You can set "spark.ui.enabled" to "false" to disable the Web UI.
Best Regards,
Shixiong Zhu
2015-07-06 17:05 GMT+08:00 :
> Hello there,
>
>I heard that there is some way to shutdown Spark WEB UI, is there a
> configuration to s
communication
between driver and executors? Because this is an ongoing work, there is no
blog now. But you can find more details in this umbrella JIRA:
https://issues.apache.org/jira/browse/SPARK-5293
Best Regards,
Shixiong Zhu
2015-06-10 20:33 GMT+08:00 huangzheng <1106944...@qq.com>:
&g
You should not call `jssc.stop(true);` in a StreamingListener. It will
cause a dead-lock: `jssc.stop` won't return until `listenerBus` exits. But
since `jssc.stop` blocks `StreamingListener`, `listenerBus` cannot exit.
Best Regards,
Shixiong Zhu
2015-06-04 0:39 GMT+08:00 dgoldenberg :
Could you set "spark.shuffle.io.preferDirectBufs" to false to turn off the
off-heap allocation of netty?
Best Regards,
Shixiong Zhu
2015-06-03 11:58 GMT+08:00 Ji ZHANG :
> Hi,
>
> Thanks for you information. I'll give spark1.4 a try when it's released.
>
&
xception in thread "Spark Context Cleaner"
java.lang.NoClassDefFoundError: 0
at
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:149)"
Best Regards,
Shixiong Zhu
2015-06-03 0:08 GMT+08:00 Ryan Will
How about other jobs? Is it an executor log, or a driver log? Could you
post other logs near this error, please? Thank you.
Best Regards,
Shixiong Zhu
2015-06-02 17:11 GMT+08:00 Anders Arpteg :
> Just compiled Spark 1.4.0-rc3 for Yarn 2.2 and tried running a job that
> worked fine for Spa
completed". So I
agree that we can add `ssc.stop` as a the shutdown hook. But stopGracefully
should be false.
Best Regards,
Shixiong Zhu
2015-05-20 21:59 GMT-07:00 Dibyendu Bhattacharya <
dibyendu.bhattach...@gmail.com>:
> Thanks Tathagata for making this change..
>
> Diby
Could your provide the full driver log? Looks like a bug. Thank you!
Best Regards,
Shixiong Zhu
2015-05-13 14:02 GMT-07:00 Giovanni Paolo Gibilisco :
> Hi,
> I'm trying to run an application that uses a Hive context to perform some
> queries over JSON files.
> The code of t
SPARK-5522 is really cool. Didn't notice it.
Best Regards,
Shixiong Zhu
2015-05-07 11:36 GMT-07:00 Marcelo Vanzin :
> That shouldn't be true in 1.3 (see SPARK-5522).
>
> On Thu, May 7, 2015 at 11:33 AM, Shixiong Zhu wrote:
>
>> The history server may need several
The history server may need several hours to start if you have a lot of
event logs. Is it stuck, or still replaying logs?
Best Regards,
Shixiong Zhu
2015-05-07 11:03 GMT-07:00 Marcelo Vanzin :
> Can you get a jstack for the process? Maybe it's stuck somewhere.
>
> On Thu, May 7,
You are using Scala 2.11 with 2.10 libraries. You can change
"org.apache.spark" % "spark-streaming_2.10" % "1.3.1"
to
"org.apache.spark" %% "spark-streaming" % "1.3.1"
And sbt will use the corresponding libraries according to your S
"spark.history.fs.logDirectory" is for the history server. For Spark
applications, they should use "spark.eventLog.dir". Since you commented out
"spark.eventLog.dir", it will be "/tmp/spark-events". And this folder does
not exits.
Best Regards,
Shixiong Zh
http://spark.apache.org/docs/latest/running-on-yarn.html
Best Regards,
Shixiong Zhu
2015-04-30 1:00 GMT-07:00 xiaohe lan :
> Hi Madhvi,
>
> If I only install spark on one node, and use spark-submit to run an
> application, which are the Worker nodes? Any where are the executors ?
>
> Than
The configuration key should be "spark.akka.askTimeout" for this timeout.
The time unit is seconds.
Best Regards,
Shixiong(Ryan) Zhu
2015-04-26 15:15 GMT-07:00 Deepak Gopalakrishnan :
> Hello,
>
>
> Just to add a bit more context :
>
> I have done that in the code, but I cannot see it change fro
skEnd(e: SparkListenerTaskEnd) = println(">>>>
> onTaskEnd");
> });
>
> sc.parallelize(List(1, 2, 3)).map(i => throw new
> SparkException("test")).collect();
> }
> }
>
> I'm running it from Eclipse on local[*].
>
>
The problem is the code you use to test:
sc.parallelize(List(1, 2, 3)).map(throw new
SparkException("test")).collect();
is like the following example:
def foo: Int => Nothing = {
throw new SparkException("test")
}
sc.parallelize(List(1, 2, 3)).map(foo).collect();
So actually the Spark jobs do
I just checked the codes about creating OutputCommitCoordinator. Could you
reproduce this issue? If so, could you provide details about how to
reproduce it?
Best Regards,
Shixiong(Ryan) Zhu
2015-04-16 13:27 GMT+08:00 Canoe :
> 13119 Exception in thread "main" akka.actor.ActorNotFound: Actor not
Forgot this one: I cannot find any issue about creating
OutputCommitCoordinator. The order of creating OutputCommitCoordinato looks
right.
Best Regards,
Shixiong(Ryan) Zhu
2015-04-17 16:57 GMT+08:00 Shixiong Zhu :
> I just checked the codes about creating OutputCommitCoordinator. Could
0 ms strings gets printed on console.
> No output is getting printed.
> And timeinterval between two strings of form ( time:ms)is very less
> than Streaming Duration set in program.
>
> On Wed, Apr 15, 2015 at 5:11 AM, Shixiong Zhu wrote:
>
>> Could you s
Could you see something like this in the console?
---
Time: 142905487 ms
---
Best Regards,
Shixiong(Ryan) Zhu
2015-04-15 2:11 GMT+08:00 Shushant Arora :
> Hi
>
> I am running a spark streaming application but o
Thanks for the log. It's really helpful. I created a JIRA to explain why it
will happen: https://issues.apache.org/jira/browse/SPARK-6640
However, will this error always happens in your environment?
Best Regards,
Shixiong Zhu
2015-03-31 22:36 GMT+08:00 sparkdi :
> This is the whole out
Could you paste the whole stack trace here?
Best Regards,
Shixiong Zhu
2015-03-31 2:26 GMT+08:00 sparkdi :
> I have the same problem, i.e. exception with the same call stack when I
> start
> either pyspark or spark-shell. I use spark-1.3.0-bin-hadoop2.4 on ubuntu
> 14.10.
&
LGTM. Could you open a JIRA and send a PR? Thanks.
Best Regards,
Shixiong Zhu
2015-03-28 7:14 GMT+08:00 Manoj Samel :
> I looked @ the 1.3.0 code and figured where this can be added
>
> In org.apache.spark.deploy.yarn ApplicationMaster.scala:282 is
>
>
There is no configuration for it now.
Best Regards,
Shixiong Zhu
2015-03-26 7:13 GMT+08:00 Manoj Samel :
> There may be firewall rules limiting the ports between host running spark
> and the hadoop cluster. In that case, not all ports are allowed.
>
> Can it be a range of ports
It's a random port to avoid port conflicts, since multiple AMs can run in
the same machine. Why do you need a fixed port?
Best Regards,
Shixiong Zhu
2015-03-26 6:49 GMT+08:00 Manoj Samel :
> Spark 1.3, Hadoop 2.5, Kerbeors
>
> When running spark-shell in yarn client mode, it s
ost of our cases are the second one, we set
"spark.scheduler.executorTaskBlacklistTime" to 3 to solve such "No
space left on device" errors. So if a task runs unsuccessfully in some
executor, it won't be scheduled to the same executor in 30 seconds.
Best Regards,
Shixi
.
Best Regards,
Shixiong Zhu
2015-03-13 9:37 GMT+08:00 Soila Pertet Kavulya :
> Does Spark support skewed joins similar to Pig which distributes large
> keys over multiple partitions? I tried using the RangePartitioner but
> I am still experiencing failures because some keys are too large
RDD is not thread-safe. You should not use it in multiple threads.
Best Regards,
Shixiong Zhu
2015-02-27 23:14 GMT+08:00 rok :
> I'm seeing this java.util.NoSuchElementException: key not found: exception
> pop up sometimes when I run operations on an RDD from multiple threads in
Rdd.foreach runs in the executors. You should use `collect` to fetch data
to the driver. E.g.,
myRdd.collect().foreach {
node => {
mp(node) = 1
}
}
Best Regards,
Shixiong Zhu
2015-02-25 4:00 GMT+08:00 Vijayasarathy Kannan :
> Thanks, but it still doesn't s
Could you clarify why you need a 10G akka frame size?
Best Regards,
Shixiong Zhu
2015-02-05 9:20 GMT+08:00 Shixiong Zhu :
> The unit of "spark.akka.frameSize" is MB. The max value is 2047.
>
> Best Regards,
> Shixiong Zhu
>
> 2015-02-05 1:16 GMT+08:00 sahanbull :
>
The unit of "spark.akka.frameSize" is MB. The max value is 2047.
Best Regards,
Shixiong Zhu
2015-02-05 1:16 GMT+08:00 sahanbull :
> I am trying to run a spark application with
>
> -Dspark.executor.memory=30g -Dspark.kryoserializer.buffer.max.mb=2000
> -Dspark.akka.frame
It's a bug that has been fixed in https://github.com/apache/spark/pull/4258
but not yet been merged.
Best Regards,
Shixiong Zhu
2015-02-02 10:08 GMT+08:00 Arun Lists :
> Here is the relevant snippet of code in my main program:
>
> ===
>
It's because you committed the job in Windows to a Hadoop cluster running
in Linux. Spark has not yet supported it. See
https://issues.apache.org/jira/browse/SPARK-1825
Best Regards,
Shixiong Zhu
2015-01-28 17:35 GMT+08:00 Marco :
> I've created a spark app, which runs fine
The official distribution has the same issue. I opened a ticket:
https://issues.apache.org/jira/browse/SPARK-5172
Best Regards,
Shixiong Zhu
2015-01-08 15:51 GMT+08:00 Shixiong Zhu :
> I have not used CDH5.3.0. But looks
> spark-examples-1.2.0-cdh5.3.0-hadoop2.5.0-cdh5.3.0.jar contain
`--jars` accepts a comma-separated list of jars. See the usage about
`--jars`
--jars JARS Comma-separated list of local jars to include on the driver and
executor classpaths.
Best Regards,
Shixiong Zhu
2015-01-08 19:23 GMT+08:00 Guillermo Ortiz :
> I'm trying to execute Spark from
"spark-examples" jar. For
me, I will addd "-Dhbase.profile=hadoop2" to the build instruction so that
the "examples" project will use a haoop2-compatible hbase.
Best Regards,
Shixiong Zhu
2015-01-08 0:30 GMT+08:00 Antony Mayi :
> thanks, I found the issue, I was incl
call `map(_.toList)` to convert `CompactBuffer` to `List`
Best Regards,
Shixiong Zhu
2015-01-04 12:08 GMT+08:00 Sanjay Subramanian <
sanjaysubraman...@yahoo.com.invalid>:
> hi
> Take a look at the code here I wrote
>
> https://raw.githubusercontent.com/sanjaysubramanian/msf
The Iterable from cogroup is CompactBuffer, which is already materialized.
It's not a lazy Iterable. So now Spark cannot handle skewed data that some
key has too many values that cannot be fit into the memory.
I encountered the following issue when enabling dynamicAllocation. You may
want to take a look at it.
https://issues.apache.org/jira/browse/SPARK-4951
Best Regards,
Shixiong Zhu
2014-12-28 2:07 GMT+08:00 Tsuyoshi OZAWA :
> Hi Anders,
>
> I faced the same issue as you mentioned. Yes,
Congrats!
A little question about this release: Which commit is this release based
on? v1.2.0 and v1.2.0-rc2 are pointed to different commits in
https://github.com/apache/spark/releases
Best Regards,
Shixiong Zhu
2014-12-19 16:52 GMT+08:00 Patrick Wendell :
>
> I'm happy to a
@Rui do you mean the spark-core jar in the maven central repo
are incompatible with the same version of the the official pre-built Spark
binary? That's really weird. I thought they should have used the same codes.
Best Regards,
Shixiong Zhu
2014-12-18 17:22 GMT+08:00 Sean Owen :
>
>
Could you post the stack trace?
Best Regards,
Shixiong Zhu
2014-12-16 23:21 GMT+08:00 richiesgr :
>
> Hi
>
> This time I need expert.
> On 1.1.1 and only in cluster (standalone or EC2)
> when I use this code :
>
> countersPublishers.foreachRDD(rdd => {
Just point out a bug in your codes. You should not use `mapPartitions` like
that. For details, I recommend Section "setup() and cleanup()" in Sean
Owen's post:
http://blog.cloudera.com/blog/2014/09/how-to-translate-from-mapreduce-to-apache-spark/
Best Regards,
Shixiong Zhu
2014-
Good catch. `Join` should use `Iterator`, too. I open an JIRA here:
https://issues.apache.org/jira/browse/SPARK-4824
Best Regards,
Shixiong Zhu
2014-12-10 21:35 GMT+08:00 Johannes Simon :
> Hi!
>
> Using an iterator solved the problem! I've been chewing on this for days,
> s
Best Regards,
Shixiong Zhu
2014-12-10 20:13 GMT+08:00 Johannes Simon :
> Hi!
>
> I have been using spark a lot recently and it's been running really well
> and fast, but now when I increase the data size, it's starting to run into
> problems:
> I have an RDD in the
send it back to the
client.
spark-submit will return 1 when Yarn reports the ApplicationMaster failed.
Best Regards,
Shixiong Zhu
2014-12-06 1:59 GMT+08:00 LinQili :
> You mean the localhost:4040 or the application master web ui?
>
> Sent from my iPhone
>
> On Dec 5, 2014, at 17:2
What's the status of this application in the yarn web UI?
Best Regards,
Shixiong Zhu
2014-12-05 17:22 GMT+08:00 LinQili :
> I tried anather test code:
> def main(args: Array[String]) {
> if (args.length != 1) {
> Util.printLog("ERROR", "Args error - a
Don't set `spark.akka.frameSize` to 1. The max value of
`spark.akka.frameSize` is 2047. The unit is MB.
Best Regards,
Shixiong Zhu
2014-12-01 0:51 GMT+08:00 Yanbo :
>
> Try to use spark-shell --conf spark.akka.frameSize=1
>
> 在 2014年12月1日,上午12:25,Brian Dolan 写道
Created a JIRA to track it: https://issues.apache.org/jira/browse/SPARK-4664
Best Regards,
Shixiong Zhu
2014-12-01 13:22 GMT+08:00 Shixiong Zhu :
> Sorry. Should be not greater than 2048. 2047 is the greatest value.
>
> Best Regards,
> Shixiong Zhu
>
> 2014-12-01 13:20 GMT+
1 - 100 of 131 matches
Mail list logo