s by means of alternative software (e.g.
> grafana) - currently it's hardly possible to know the actual number of
> jobs, stages, tasks and their names and IDs in advance to register all the
> corresponding metrics statically.
>
>
> Kind Regards,
> Sergey
>
>
>
I remembered there was a PR about doing similar thing (
https://github.com/apache/spark/pull/18406). From my understanding, this
seems like a quite specific requirement, it may requires code change to
support your needs.
Thanks
Saisai
Sergey Zhemzhitsky 于2019年5月4日周六 下午4:44写道:
> Hello Spark User
We are happy to announce the availability of Spark 2.3.2!
Apache Spark 2.3.2 is a maintenance release, based on the branch-2.3
maintenance branch of Spark. We strongly recommend all 2.3.x users to
upgrade to this stable release.
To download Spark 2.3.2, head over to the download page:
http://spar
In Spark on YARN, error code 13 means SparkContext doesn't initialize in
time. You can check the yarn application log to get more information.
BTW, did you just write a plain python script without creating
SparkContext/SparkSession?
Aakash Basu 于2018年6月8日周五 下午4:15写道:
> Hi,
>
> I'm trying to run
atch (if it is delayed), which will
lead to unexpected results.
thomas lavocat 于2018年6月5日周二
下午7:48写道:
>
> On 05/06/2018 13:44, Saisai Shao wrote:
>
> You need to read the code, this is an undocumented configuration.
>
> I'm on it right now, but, Spark is a big piece of so
.
>
> On 05/06/2018 11:24, Saisai Shao wrote:
>
> spark.streaming.concurrentJobs is a driver side internal configuration,
> this means that how many streaming jobs can be submitted concurrently in
> one batch. Usually this should not be configured by user, unless you're
> fami
spark.streaming.concurrentJobs is a driver side internal configuration,
this means that how many streaming jobs can be submitted concurrently in
one batch. Usually this should not be configured by user, unless you're
familiar with Spark Streaming internals, and know the implication of this
configur
No, the underlying of DStream is RDD, so it will not leverage any SparkSQL
related feature. I think you should use Structured Streaming instead, which
is based on SparkSQL.
Khaled Zaouk 于2018年5月2日周三 下午4:51写道:
> Hi,
>
> I have a question regarding the execution engine of Spark Streaming
> (DStrea
Maybe you can try Livy (http://livy.incubator.apache.org/).
Thanks
Jerry
2018-04-11 15:46 GMT+08:00 杜斌 :
> Hi,
>
> Is there any way to submit some code segment to the existing SparkContext?
> Just like a web backend, send some user code to the Spark to run, but the
> initial SparkContext takes t
>
> In yarn mode, only two executor are assigned to process the task, since
> one executor can process one task only, they need 6 min in total.
>
This is not true. You should set --executor-cores/--num-executors to
increase the task parallelism for executor. To be fair, Spark application
should ha
er)
>
> // Filter out providers for which
> spark.security.credentials.{service}.enabled is false.
> providers
> .filter { p => isServiceEnabled(p.serviceName) }
> .map { p => (p.serviceName, p) }
> .toMap
> }
>
>
> If you could give me a tipp there would be g
I think you can build your own Accumulo credential provider as similar to
HadoopDelegationTokenProvider out of Spark, Spark already provided an
interface "ServiceCredentialProvider" for user to plug-in customized
credential provider.
Thanks
Jerry
2018-03-23 14:29 GMT+08:00 Jorge Machado :
> Hi G
AFAIK, there's no large scale test for Hadoop 3.0 in the community. So it
is not clear whether it is supported or not (or has some issues). I think
in the download page "Pre-Built for Apache Hadoop 2.7 and later" mostly
means that it supports Hadoop 2.7+ (2.8...), but not 3.0 (IIUC).
Thanks
Jerry
I guess you're using Capacity Scheduler with DefaultResourceCalculator,
which doesn't count cpu cores into resource calculation, this "1" you saw
is actually meaningless. If you want to also calculate cpu resource, you
should choose DominantResourceCalculator.
Thanks
Jerry
On Sat, Sep 9, 2017 at
I think spark.yarn.am.port is not used any more, so you don't need to
consider this.
If you're running Spark on YARN, I think some YARN RM port to submit
applications should also be reachable via firewall, as well as HDFS port to
upload resources.
Also in the Spark side, executors will be connect
You could set "spark.jars.packages" in `conf` field of session post API (
https://github.com/apache/incubator-livy/blob/master/docs/rest-api.md#post-sessions).
This is equal to --package in spark-submit.
BTW you'd better ask livy question in u...@livy.incubator.apache.org.
Thanks
Jerry
On Thu, A
Can you please post the specific problem you met?
Thanks
Jerry
On Sat, Aug 19, 2017 at 1:49 AM, Anshuman Kumar
wrote:
> Hello,
>
> I have recently installed Sparks 2.2.0, and trying to use it for some big
> data processing. Spark is installed on a server that I access from a remote
> computer.
Please see the reason in this thread (
https://github.com/apache/spark/pull/14340). It would better to use
structured streaming instead.
So I would like to -1 this patch. I think it's been a mistake to support
> dstream in Python -- yes it satisfies a checkbox and Spark could claim
> there's suppo
Spark running with standalone cluster manager currently doesn't support
accessing security Hadoop. Basically the problem is that standalone mode
Spark doesn't have the facility to distribute delegation tokens.
Currently only Spark on YARN or local mode supports security Hadoop.
Thanks
Jerry
On F
Current Spark doesn't support impersonate different users at run-time.
Current Spark's proxy user is application level, which means when setting
through --proxy-user the whole application will be running with that user.
On Thu, May 4, 2017 at 5:13 PM, matd wrote:
> Hi folks,
>
> I have a Spark a
AFAIK, I don't think the off-heap memory settings is enabled automatically,
there're two configurations control the tungsten off-heap memory usage:
1. spark.memory.offHeap.enabled.
2. spark.memory.offHeap.size.
On Sat, Apr 22, 2017 at 7:44 PM, geoHeil wrote:
> Hi,
> I wonder when to enable sp
> is a bug or expected behavior?
>
> On 14.04.2017 13:22, Saisai Shao wrote:
>
> AFAIK, For the first line, custom filter should be worked. But for the
> latter it is not supported.
>
> On Fri, Apr 14, 2017 at 6:17 PM, Sergey Grigorev
> wrote:
>
>> GET requ
/master:6066/v1/submissions/status/driver-20170414025324-
> <http://master:6066/v1/submissions/status/driver-20170414025324-> *return
> successful result. But if I open the spark master web ui then it requests
> username and password.
>
>
> On 14.04.2017 12:46, Saisai Shao w
Hi,
What specifically are you referring to "Spark API endpoint"?
Filter can only be worked with Spark Live and History web UI.
On Fri, Apr 14, 2017 at 5:18 PM, Sergey wrote:
> Hello all,
>
> I've added own spark.ui.filters to enable basic authentication to access to
> Spark web UI. It works fi
urity.auth.login.LoginException: Unable to obtain
> password from user
>
>
> On Fri, Mar 31, 2017 at 9:08 AM, Saisai Shao
> wrote:
>
>> Hi Bill,
>>
>> The exception is from executor side. From the gist you provided, looks
>> like the issue is that you only c
Hi Bill,
The exception is from executor side. From the gist you provided, looks like
the issue is that you only configured java options in driver side, I think
you should also configure this in executor side. You could refer to here (
https://github.com/hortonworks-spark/skc#running-on-a-kerberos-
It's quite obvious your hdfs URL is not complete, please looks at the
exception, your hdfs URI doesn't have host, port. Normally it should be OK
if HDFS is your default FS.
I think the problem is you're running on HDI, in which default FS is wasb.
So here short name without host:port will lead to
IIUC, your scenario is quite like what currently ReliableKafkaReceiver
does. You can only send ack to the upstream source after WAL is persistent,
otherwise because of asynchronization of data processing and data
receiving, there's still a chance data could be lost if you send out ack
before WAL.
I don't think using ManualClock is a right way to fix your problem here in
Spark Streaming.
ManualClock in Spark is mainly used for unit test, it should manually
advance the time to make the unit test work. The usage looks different
compared to the scenario you mentioned.
Thanks
Jerry
On Tue, Fe
I think it should be. These configurations doesn't depend on specific
cluster manager use chooses.
On Tue, Feb 28, 2017 at 4:42 AM, satishl wrote:
> Are spark.speculation and related settings supported on standalone mode?
>
>
>
> --
> View this message in context: http://apache-spark-user-list
ri
wrote:
> Thanks a lot the information!
>
> Is there any reason why EventLoggingListener ignore this event?
>
> *Thanks,*
>
>
> *Parag*
>
> On Wed, Feb 22, 2017 at 7:11 PM, Saisai Shao
> wrote:
>
>> AFAIK, Spark's EventLoggingListerner ignores Bl
AFAIK, Spark's EventLoggingListerner ignores BlockUpdate event, so it will
not be written into event-log, I think that's why you cannot get such info
in history server.
On Thu, Feb 23, 2017 at 9:51 AM, Parag Chaudhari
wrote:
> Hi,
>
> I am running spark shell in spark version 2.0.2. Here is my p
IIUC Spark doesn't strongly bind to HDFS, it uses a common FileSystem layer
which supports different FS implementations, HDFS is just one option. You
could also use S3 as a backend FS, from Spark's point it is transparent to
different FS implementations.
On Sun, Feb 12, 2017 at 5:32 PM, ayan guh
Hi Mich,
1. Each user could create a Livy session (batch or interactive), one
session is backed by one Spark application, and the resource quota is the
same as normal spark application (configured by
spark.executor.cores/memory,. etc), and this will be passed to yarn if
running on Yarn. This is ba
>From my understanding, this memory overhead should include
"spark.memory.offHeap.size", which means off-heap memory size should not be
larger than the overhead memory size when running in yarn.
On Thu, Nov 24, 2016 at 3:01 AM, Koert Kuipers wrote:
> in YarnAllocator i see that memoryOverhead is
You might take a look at this project (https://github.com/vegas-viz/Vegas),
it has Spark integration.
Thanks
Saisai
On Mon, Nov 21, 2016 at 10:23 AM, wenli.o...@alibaba-inc.com <
wenli.o...@alibaba-inc.com> wrote:
> Hi anyone,
>
> is there any easy way for me to do data visualization in spark us
2016 at 8:06 AM Li Li wrote:
>
> which log file should I
>
> On Thu, Oct 20, 2016 at 10:02 PM, Saisai Shao
> wrote:
> > Looks like ApplicationMaster is killed by SIGTERM.
> >
> > 16/10/20 18:12:04 ERROR yarn.ApplicationMaster: RECEIVED SIGNAL TERM
> > 16/10/20
Looks like ApplicationMaster is killed by SIGTERM.
16/10/20 18:12:04 ERROR yarn.ApplicationMaster: RECEIVED SIGNAL TERM
16/10/20 18:12:04 INFO yarn.ApplicationMaster: Final app status:
This container may be killed by yarn NodeManager or other processes, you'd
better check yarn log to dig out more
Not sure why your code will search Logging class under org/apache/spark,
this should be “org/apache/spark/internal/Logging”, and it changed long
time ago.
On Sun, Oct 16, 2016 at 3:25 AM, Brad Cox wrote:
> I'm experimenting with Spark 2.0.1 for the first time and hitting a
> problem right out o
I think security has nothing to do with what API you use, spark sql or RDD
API.
Assuming you're running on yarn cluster (that is the only cluster manager
supports Kerberos currently).
Firstly you need to get Kerberos tgt in your local spark-submit process,
after being authenticated by Kerberos, S
You should specify cluster manager (--master) and deploy mode
(--deploy-mode) in the spark-submit arguments, specifying this through
SparkConf is too late to switch to yarn cluster mode.
On Fri, Oct 7, 2016 at 5:20 PM, Aditya
wrote:
> Hi Saurav,
>
> Please share spark-submit command which you us
dalone?
>
> Why are there 2 ways to get information, REST API and this Sink?
>
>
> Best regards, Vladimir.
>
>
>
>
>
>
> On Mon, Sep 12, 2016 at 3:53 PM, Vladimir Tretyakov <
> vladimir.tretya...@sematext.com> wrote:
>
>> Hello Saisai Shao,
Here is the yarn RM REST API for you to refer (
http://hadoop.apache.org/docs/r2.7.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html).
You can use these APIs to query applications running on yarn.
On Sun, Sep 11, 2016 at 11:25 PM, Jacek Laskowski wrote:
> Hi Vladimir,
>
> You'd have to tal
loud.com
>
>
> *From:* Sun Rui
> *Date:* 2016-08-24 22:17
> *To:* Saisai Shao
> *CC:* tony@tendcloud.com; user
> *Subject:* Re: Can we redirect Spark shuffle spill data to HDFS or
> Alluxio?
> Yes, I also tried FUSE before, it is not stable and I don’t recommend it
&
ead of network I/O and replica
> of HDFS files.
>
> On Aug 24, 2016, at 21:02, Saisai Shao wrote:
>
> Spark Shuffle uses Java File related API to create local dirs and R/W
> data, so it can only be worked with OS supported FS. It doesn't leverage
> Hadoop FileSystem API, so wr
Spark Shuffle uses Java File related API to create local dirs and R/W data,
so it can only be worked with OS supported FS. It doesn't leverage Hadoop
FileSystem API, so writing to Hadoop compatible FS is not worked.
Also it is not suitable to write temporary shuffle data into distributed
FS, this
This looks like Spark application is running into a abnormal status. From
the stack it means driver could not send requests to AM, can you please
check if AM is reachable or are there any other exceptions beside this one.
>From my past test, Spark's dynamic allocation may run into some corner
case
The implementation inside the Python API and Scala API for RDD is slightly
different, so the difference of RDD lineage you printed is expected.
On Tue, Aug 16, 2016 at 10:58 AM, DEEPAK SHARMA wrote:
> Hi All,
>
>
> Below is the small piece of code in scala and python REPL in Apache
> Spark.Howev
1. Standalone mode doesn't support accessing kerberized Hadoop, simply
because it lacks the mechanism to distribute delegation tokens via cluster
manager.
2. For the HBase token fetching failure, I think you have to do kinit to
generate tgt before start spark application (
http://hbase.apache.org/0
I guess you're mentioning about spark assembly uber jar. In Spark 2.0,
there's no uber jar, instead there's a jars folder which contains all jars
required in the run-time. For the end user it is transparent, the way to
submit spark application is still the same.
On Wed, Aug 3, 2016 at 4:51 PM, Mic
Use dominant resource calculator instead of default resource calculator
will get the expected vcores as you wanted. Basically by default yarn does
not honor cpu cores as resource, so you will always see vcore is 1 no
matter what number of cores you set in spark.
On Wed, Aug 3, 2016 at 12:11 PM, sa
>
> java.lang.NoClassDefFoundError: spray/json/JsonReader
>
> at
> com.memsql.spark.pushdown.MemSQLPhysicalRDD$.fromAbstractQueryTree(MemSQLPhysicalRDD.scala:95)
>
> at
> com.memsql.spark.pushdown.MemSQLPushdownStrategy.apply(MemSQLPushdownStrategy.scala:49)
>
Looks
Several useful information can be found here (
https://issues.apache.org/jira/browse/YARN-1842), though personally I
haven't met this problem before.
Thanks
Saisai
On Tue, Jul 26, 2016 at 2:21 PM, Yu Wei wrote:
> Hi guys,
>
>
> When I tried to shut down spark application via "yarn application -
I think both 6066 and 7077 can be worked. 6066 is using the REST way to
submit application, while 7077 is the legacy way. From user's aspect, it
should be transparent and no need to worry about the difference.
- *URL:* spark://hw12100.local:7077
- *REST URL:* spark://hw12100.local:6066 (clu
The error stack is throwing from your code:
Caused by: scala.MatchError: [Ljava.lang.String;@68d279ec (of class
[Ljava.lang.String;)
at com.jd.deeplog.LogAggregator$.main(LogAggregator.scala:29)
at com.jd.deeplog.LogAggregator.main(LogAggregator.scala)
I think you should debug the
DStream.print() will collect some of the data to driver and display, please
see the implementation of DStream.print()
RDD.take() will collect some of the data to driver.
Normally the behavior should be consistent between cluster and local mode,
please find out the root cause of this problem, like
It is not worked to configure local dirs to HDFS. Local dirs are mainly
used for data spill and shuffle data persistence, it is not suitable to use
hdfs. If you met capacity problem, you could configure multiple dirs
located in different mounted disks.
On Wed, Jul 6, 2016 at 9:05 AM, Sri wrote:
I think you cannot use sql client in the cluster mode, also for
spark-shell/pyspark which has a repl, all these application can only be
started with client deploy mode.
On Thu, Jun 30, 2016 at 12:46 PM, Mich Talebzadeh wrote:
> Hi,
>
> When you use spark-shell or for that matter spark-sql, you a
It means several jars are missing in the yarn container environment, if you
want to submit your application through some other ways besides
spark-submit, you have to take care all the environment things yourself.
Since we don't know your implementation of java web service, so it is hard
to provide
spark.yarn.jar (none) The location of the Spark jar file, in case
overriding the default location is desired. By default, Spark on YARN will
use a Spark jar installed locally, but the Spark jar can also be in a
world-readable location on HDFS. This allows YARN to cache it on nodes so
that it doesn'
Hi Community,
In Spark 2.0.0 we upgrade to use jersey2 (
https://issues.apache.org/jira/browse/SPARK-12154) instead of jersey 1.9,
while for the whole Hadoop we still stick on the old version. This will
bring in some issues when yarn timeline service is enabled (
https://issues.apache.org/jira/bro
It works fine in my local test, I'm using latest master, maybe this bug is
already fixed.
On Wed, Jun 1, 2016 at 7:29 AM, Michael Armbrust
wrote:
> Version of Spark? What is the exception?
>
> On Tue, May 31, 2016 at 4:17 PM, Tim Gautier
> wrote:
>
>> How should I go about mapping from say a Da
>From my understanding, we should copy the file into another folder and move
to source folder after copy is finished, otherwise we will read the
half-copied data or meet the issue as you mentioned above.
On Wed, May 18, 2016 at 8:32 PM, Ted Yu wrote:
> The following should handle the situation y
I think it is already fixed if your problem is exactly the same as what
mentioned in this JIRA (https://issues.apache.org/jira/browse/SPARK-14423).
Thanks
Jerry
On Wed, May 18, 2016 at 2:46 AM, satish saley
wrote:
> Hello,
> I am executing a simple code with yarn-cluster
>
> --master
> yarn-clu
> .mode(SaveMode.Overwrite)
>From my understanding mode is not supported in continuous query.
def mode(saveMode: SaveMode): DataFrameWriter = {
// mode() is used for non-continuous queries
// outputMode() is used for continuous queries
assertNotStreaming("mode() can only be called on non-co
It is not supported now, currently only filestream is supported.
Thanks
Jerry
On Wed, May 18, 2016 at 10:14 AM, Todd wrote:
> Hi,
> I am wondering whether structured streaming supports Kafka as data source.
> I brief the source code(meanly related with DataSourceRegister trait), and
> didn't fi
, May 10, 2016 at 4:17 PM, 朱旻 wrote:
>
>
> it was a product sold by huawei . name is FusionInsight. it says spark was
> 1.3 with hadoop 2.7.1
>
> where can i find the code or config file which define the files to be
> uploaded?
>
>
> At 2016-05-10 16:06:05, "S
What is the version of Spark are you using? From my understanding, there's
no code in yarn#client will upload "__hadoop_conf__" into distributed cache.
On Tue, May 10, 2016 at 3:51 PM, 朱旻 wrote:
> hi all:
> I found a problem using spark .
> WHEN I use spark-submit to launch a task. it works
>
y, it will distributed evenly across the executors, also this is
target for tuning. Normally it depends on several conditions like receiver
distribution, partition distribution.
>
> The issue raises if the amount of streaming data does not fit into these 4
> caches? Will the job crash?
>
&
06 PM, Ashok Kumar wrote:
> hi,
>
> so if i have 10gb of streaming data coming in does it require 10gb of
> memory in each node?
>
> also in that case why do we need using
>
> dstream.cache()
>
> thanks
>
>
> On Monday, 9 May 2016, 9:58, Saisai Shao wrote
;
>
>
>
> At 2016-05-09 15:14:47, "Saisai Shao" wrote:
>
> For window related operators, Spark Streaming will cache the data into
> memory within this window, in your case your window size is up to 24 hours,
> which means data has to be in Executor's memory f
ile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 9 May 2016 at 08:14, Saisai Shao wrote:
>
>> For window related operators, Spark Streaming will cache the data into
>> memory within this window,
For window related operators, Spark Streaming will cache the data into
memory within this window, in your case your window size is up to 24 hours,
which means data has to be in Executor's memory for more than 1 day, this
may introduce several problems when memory is not enough.
On Mon, May 9, 2016
Writing RDD based application using pyspark will bring in additional
overheads, Spark is running on the JVM whereas your python code is running
on python runtime, so data should be communicated between JVM world and
python world, this requires additional serialization-deserialization, IPC.
Also oth
I guess the problem is that py4j automatically translate the python int
into java int or long according to the value of the data. If this value is
small it will translate to java int, otherwise it will translate into java
long.
But in java code, the parameter must be long type, so that's the excep
Hi Deepak,
I don't think supervise can be worked with yarn, it is a standalone and
Mesos specific feature.
Thanks
Saisai
On Tue, Apr 5, 2016 at 3:23 PM, Deepak Sharma wrote:
> Hi Rafael
> If you are using yarn as the engine , you can always use RM UI to see the
> application progress.
>
> Than
spark.jars.ivy, spark.jars.packages, spark.jars.excludes is the
configurations you can use.
Thanks
Saisai
On Sun, Apr 3, 2016 at 1:59 AM, Russell Jurney
wrote:
> Thanks, Andy!
>
> On Mon, Mar 28, 2016 at 8:44 AM, Andy Davidson <
> a...@santacruzintegration.com> wrote:
>
>> Hi Russell
>>
>> I us
Fri, Apr 1, 2016, 7:25 PM Saisai Shao wrote:
>
>> Hi Michael, shuffle data (mapper output) have to be materialized into
>> disk finally, no matter how large memory you have, it is the design purpose
>> of Spark. In you scenario, since you have a big memory, shuffle spill
>&g
Hi Michael, shuffle data (mapper output) have to be materialized into disk
finally, no matter how large memory you have, it is the design purpose of
Spark. In you scenario, since you have a big memory, shuffle spill should
not happen frequently, most of the disk IO you see might be final shuffle
fi
There's a JIRA (https://issues.apache.org/jira/browse/SPARK-14151) about
it, please take a look.
Thanks
Saisai
On Sat, Apr 2, 2016 at 6:48 AM, Walid Lezzar wrote:
> Hi,
>
> I looked into the spark code at how spark report metrics using the
> MetricsSystem class. I've seen that the spark Metrics
> 微信: zhitao_yan
> QQ : 4707059
> 地址:北京市东城区东直门外大街39号院2号楼航空服务大厦602室
> 邮编:100027
>
> --------
> TalkingData.com <http://talkingdata.com/> - 让数据说话
>
>
> *From:* Saisai Shao
> *Date:* 2016-03-22 18:03
> *To:* tony@tendcloud.com
> *CC:* user
> *Subject:* Re: Is there a way to s
I'm afraid currently it is not supported by Spark to submit application
through Yarn REST API. However Yarn AMRMClient is functionally equal to
REST API, not sure which specific features are you referring?
Thanks
Saisai
On Tue, Mar 22, 2016 at 5:27 PM, tony@tendcloud.com <
tony@tendcloud.
I guess in local mode you're using local FS instead of HDFS, here the
exception mainly threw from HDFS when running on Yarn, I think it would be
better to check the status and configurations of HDFS to see if it normal
or not.
Thanks
Saisai
On Tue, Mar 22, 2016 at 5:46 PM, Soni spark
wrote:
> H
If you want to avoid existing job failure while restarting NM, you could
enable work preserving for NM, in this case, the restart of NM will not
affect the running containers (containers can still run). That could
alleviate NM restart problem.
Thanks
Saisai
On Wed, Mar 16, 2016 at 6:30 PM, Alex D
You cannot directly invoke Spark application by using yarn#client like what
you mentioned, it is deprecated and not supported. you have to use
spark-submit to submit a Spark application to yarn.
Also here the specific problem is that you're invoking yarn#client to run
spark app as yarn-client mode
Currently configuration is a part of checkpoint data, and when recovering
from failure, Spark Streaming will fetch the configuration from checkpoint
data, so even if you change the configuration file, recovered Spark
Streaming application will not use it. So from my understanding currently
there's
quot;--conf" was lost
> when I copied it to mail.
>
> -- Forwarded message --
> From: Jy Chen
> Date: 2016-03-10 10:09 GMT+08:00
> Subject: Re: Dynamic allocation doesn't work on YARN
> To: Saisai Shao , user@spark.apache.org
>
>
> Hi,
> My
Would you please send out the configurations of dynamic allocation so we
could know better.
On Wed, Mar 9, 2016 at 4:29 PM, Jy Chen wrote:
> Hello everyone:
>
> I'm trying the dynamic allocation in Spark on YARN. I have followed
> configuration steps and started the shuffle service.
>
> Now it c
I think the first step is to publish your in-house built Hadoop related
jars to your local maven or ivy repo, and then change the Spark building
profiles like -Phadoop-2.x (you could use 2.7 or you have to change the pom
file if you met jar conflicts) -Dhadoop.version=3.0.0-SNAPSHOT to build
agains
If it is due to heartbeat problem and driver explicitly killed the
executors, there should be some driver logs mentioned about it. So you
could check the driver log about it. Also container (executor) logs are
useful, if this container is killed, then there'll be some signal related
logs, like (SIG
You don't have to specify the storage level for direct Kafka API, since it
doesn't require to store the input data ahead of time. Only receiver-based
approach could specify the storage level.
Thanks
Saisai
On Wed, Mar 2, 2016 at 7:08 PM, Vinti Maheshwari
wrote:
> Hi All,
>
> I wanted to set *St
You could set this configuration "auto.offset.reset" through parameter
"kafkaParams" which is provided in some other overloaded APIs of
createStream.
By default Kafka will pick data from latest offset unless you explicitly
set it, this is the behavior Kafka, not Spark.
Thanks
Saisai
On Mon, Feb
IIUC for example you want to set environment FOO=bar in executor side, you
could use "spark.executor.Env.FOO=bar" in conf file, AM will pick this
configuration and set as environment variable through container launching.
Just list all the envs you want to set in executor side like
spark.executor.xx
Hi Divya,
Would you please provide full stack of exception? From my understanding
--executor-cores should be worked, we could know better if you provide the
full stack trace.
The performance relies on many different aspects, I'd recommend you to
check the spark web UI to know the application runt
g sparkcontext manually in your application
> still works then I'll investigate more on my side. It just before I dig
> more I wanted to know if it was still supported.
>
> Nir
>
> On Thu, Jan 28, 2016 at 7:47 PM, Saisai Shao
> wrote:
>
>> I think I met th
I think I met this problem before, this problem might be due to some race
conditions in exit period. The way you mentioned is still valid, this
problem only occurs when stopping the application.
Thanks
Saisai
On Fri, Jan 29, 2016 at 10:22 AM, Nirav Patel wrote:
> Hi, we were using spark 1.3.1 a
You should also check the available YARN resources, overall the number of
containers can be allocated is restricted by Yarn resources. I guess here
your Yarn cluster resources can only allocate 3 containers, even if you set
the initial number to 10, still it cannot be satisfied.
On Wed, Jan 27, 2
Hi Todd,
There're two levels of locality based scheduling when you run Spark on Yarn
if dynamic allocation enabled:
1. Container allocation is based on the locality ratio of pending tasks,
this is Yarn specific and only works with dynamic allocation enabled.
2. Task scheduling is locality awared,
Any possibility that this file is still written by other application, so
what Spark Streaming processed is an incomplete file.
On Tue, Jan 26, 2016 at 5:30 AM, Shixiong(Ryan) Zhu wrote:
> Did you move the file into "hdfs://helmhdfs/user/patcharee/cerdata/", or
> write into it directly? `textFile
You could try increase the driver memory by "--driver-memory", looks like
the OOM is came from driver side, so the simple solution is to increase the
memory of driver.
On Tue, Jan 19, 2016 at 1:15 PM, Julio Antonio Soto wrote:
> Hi,
>
> I'm having trouble when uploadig spark jobs in yarn-cluster
1 - 100 of 200 matches
Mail list logo