Hi,
Recently upgraded from 2.1.1 to 2.2.0. My Streaming job seems to have
broken. The submitted application is unable to connect to the cluster, when
all is running.
below is my stack trace
Spark Master:spark://192.168.10.207:7077
Job Arguments:
-appName orange_watch -directory /u01/watch/stream/
Hi all,
I am running Spark 2.0 on Mesos 1.1. I was trying to split up my job onto
several nodes. I try to set the number of executors by the formula
(spark.cores.max / spark.executor.cores). The behavior I saw was that Spark
will try to fill up on one mesos node as many executors as it can, then i
Dear all,
I have a HiveThriftServer2 serer running and most of our spark SQLs will go
there for calculation. From the Yarn GUI, I can see the application id and
the attempt ID of the thrift server. But with the REST api described on the
page (https://spark.apache.org/docs/latest/monitoring.html#re
I've done a SparkListener to record metrics for validation (it's a bit out
of date). Are you just looking to have graphing/alerting set up on the
Spark metrics?
On Tue, Dec 5, 2017 at 1:53 PM, Thakrar, Jayesh <
jthak...@conversantmedia.com> wrote:
> You can also get the metrics from the Spark app
@Richard I don't see any error in the executor log but let me run again to
make sure.
@Gerard Thanks much! but would your answer on .collect() change depending
on running the spark app in client vs cluster mode?
Thanks!
On Tue, Dec 5, 2017 at 1:54 PM, Gerard Maas wrote:
> The general answer t
This gitbook explains Spark compotents in detail.
'Mastering Apache Spark 2'
https://www.gitbook.com/book/jaceklaskowski/mastering-apache-spark/details
2017-12-04 12:48 GMT+09:00 Manuel Sopena Ballesteros <
manuel...@garvan.org.au>:
> Dear Spark community,
>
>
>
> Is there any resource (book
The general answer to your initial question is that "it depends". If the
operation in the rdd.foreach() closure can be parallelized, then you don't
need to collect first. If it needs some local context (e.g. a socket
connection), then you need to do rdd.collect first to bring the data
locally, whic
You can also get the metrics from the Spark application events log file.
See https://www.slideshare.net/JayeshThakrar/apache-bigdata2017sparkprofiling
From: "Qiao, Richard"
Date: Monday, December 4, 2017 at 6:09 PM
To: Nick Dimiduk , "user@spark.apache.org"
Subject: Re: Access to Applications
In the 2nd case, is there any producer’s error thrown in executor’s log?
Best Regards
Richard
From: kant kodali
Date: Tuesday, December 5, 2017 at 4:38 PM
To: "Qiao, Richard"
Cc: "user @spark"
Subject: Re: Do I need to do .collect inside forEachRDD
Reads from Kafka and outputs to Kafka. so I
On Tue, Dec 5, 2017 at 12:43 PM, bsikander wrote:
> 2) If I use context.addSparkListener, I can customize the listener but then
> I miss the onApplicationStart event. Also, I don't know the Spark's logic to
> changing the state of application from WAITING -> RUNNING.
I'm not sure I follow you her
Reads from Kafka and outputs to Kafka. so I check the output from Kafka.
On Tue, Dec 5, 2017 at 1:26 PM, Qiao, Richard
wrote:
> Where do you check the output result for both case?
>
> Sent from my iPhone
>
> > On Dec 5, 2017, at 15:36, kant kodali wrote:
> >
> > Hi All,
> >
> > I have a simple
Where do you check the output result for both case?
Sent from my iPhone
> On Dec 5, 2017, at 15:36, kant kodali wrote:
>
> Hi All,
>
> I have a simple stateless transformation using Dstreams (stuck with the old
> API for one of the Application). The pseudo code is rough like this
>
> dstream
Thank you for the reply.
I am not a Spark expert but I was reading through the code and I thought
that the state was changed from SUBMITTED to RUNNING only after executors
(CoarseGrainedExecutorBackend) were registered.
https://github.com/apache/spark/commit/015f7ef503d5544f79512b626749a1f0c48
SparkLauncher operates at a different layer than Spark applications.
It doesn't know about executors or driver or anything, just whether
the Spark application was started or not. So it doesn't work for your
case.
The best option for your case is to install a SparkListener and
monitor events. But t
Hi All,
I have a simple stateless transformation using Dstreams (stuck with the old
API for one of the Application). The pseudo code is rough like this
dstream.map().reduce().forEachRdd(rdd -> {
rdd.collect(),forEach(); // Is this necessary ? Does execute fine but
a bit slow
})
I understand
Hi, All.
Today, Apache Spark starts to use Apache ORC 1.4 as a `native` ORC
implementation.
SPARK-20728 Make OrcFileFormat configurable between `sql/hive` and
`sql/core`.
-
https://github.com/apache/spark/commit/326f1d6728a7734c228d8bfaa69442a1c7b92e9b
Thank you so much for all your supports for
Try with `SparkSession.builder().enableHiveSupport` ?
On Tue, Dec 5, 2017 at 3:22 PM, 163 wrote:
> Hi,
> How can I persistent database/table created in spark application?
>
> object TestPersistentDB {
> def main(args:Array[String]): Unit = {
> val spark = SparkSession
Hi,
I believe spark writes datetime fields as INT96. What are the implications
of https://issues.apache.org/jira/browse/SPARK-10364(Support Parquet
logical type TIMESTAMP_MILLIS) which is part of 2.2.0?
I am having issues while reading spark generated parquet dates using Apache
Drill (Drill suppo
When you pick a book, make sure it covers the version of Spark you want to
deploy. There are a lot of books out there that focus a lot on Spark 1.x. Spark
2.x generalizes the dataframe API, introduces Tungsten, etc. All might not be
relevant to a pure “sys admin” learning, but it is good to know
19 matches
Mail list logo