Yes, it is working now..Thank you very much.
Best Regards,
Eduardus Hardika Sandy Atmaja
From: Russell Spitzer
Sent: Monday, June 28, 2021 11:22 PM
To: Eduardus Hardika Sandy Atmaja
Cc: user
Subject: Re: Request for FP-Growth source code
Sorry wrong repository
request.
Best Regards,
Eduardus Hardika Sandy Atmaja
problems with
LinearRegressionSGD and saying that it it slower than L-BFGS but I am not
sure what they mean. Shouldn’t SGD be better? Is there any plan to make
those functions available again in the new DataFrame-based API?
Thank you,
Sandy
--
Sent from: http://apache-spark-user-list.1001560
Hi,
Is SVD or PCA in Spark ML (i.e. spark.ml parity with the mllib
RowMatrix.computeSVD API) slated for any upcoming release?
Many thanks for any guidance!
-Sandy
Hi Arun,
A Java API was actually recently added to the library. It will be
available in the next release.
-Sandy
On Thu, Dec 10, 2015 at 12:16 AM, Arun Verma
wrote:
> Thank you for your reply. It is a Scala and Python library. Is similar
> library exists for Java?
>
> On Wed, De
Hi Ross,
This is most likely occurring because YARN is killing containers for
exceeding physical memory limits. You can make this less likely to happen
by bumping spark.yarn.executor.memoryOverhead to something higher than 10%
of your spark.executor.memory.
-Sandy
On Thu, Nov 19, 2015 at 8:14
record,
because it avoids the unnecessary overhead of creating Java objects. As
you've pointed out, this is at the expense of making the code more verbose
when caching.
-Sandy
On Fri, Nov 13, 2015 at 10:29 AM, jeff saremi
wrote:
> So we tried reading a sequencefile in Spark and realized that
Hi Nisrina,
The resources you specify are shared by all jobs that run inside the
application.
-Sandy
On Wed, Nov 4, 2015 at 9:24 AM, Nisrina Luthfiyati <
nisrina.luthfiy...@gmail.com> wrote:
> Hi all,
>
> I'm running some spark jobs in java on top of YARN by submitting o
/03/how-to-tune-your-apache-spark-jobs-part-2/
has a more detailed explanation of why this happens.
-Sandy
On Sat, Oct 31, 2015 at 4:29 AM, Jörn Franke wrote:
> Maybe Hortonworks support can help you much better.
>
> Otherwise you may want to change the yarn scheduler configuration
using the -Pyarn flag.
-Sandy
On Thu, Oct 22, 2015 at 9:04 AM, Deenar Toraskar
wrote:
> Hi I have got the prebuilt version of Spark 1.5 for Hadoop 2.6 (
> http://www.apache.org/dyn/closer.lua/spark/spark-1.5.1/spark-1.5.1-bin-hadoop2.6.tgz)
> working with CDH 5.4.0 in local mode on
stage including the InputSplits
gets submitted, Spark will try to request an appropriate number of
executors.
The memory in the YARN resource requests is --executor-memory + what's set
for spark.yarn.executor.memoryOverhead, which defaults to 10% of
--executor-memory.
-Sandy
On Wed, Sep 23, 2015
ty to give
the executors some additional headroom above the heap space.
-Sandy
On Mon, Sep 21, 2015 at 5:43 PM, Saisai Shao wrote:
> I think you need to increase the memory size of executor through command
> arguments "--executor-memory", or configuration "s
YARN will never kill processes for being unresponsive.
It may kill processes for occupying more memory than it allows. To get
around this, you can either bump spark.yarn.executor.memoryOverhead or turn
off the memory checks entirely with yarn.nodemanager.pmem-check-enabled.
-Sandy
On Tue, Sep
Java 7.
FWIW I was just able to get it to work by increasing MaxPermSize to 256m.
-Sandy
On Wed, Sep 9, 2015 at 11:37 AM, Reynold Xin wrote:
> Java 7 / 8?
>
> On Wed, Sep 9, 2015 at 10:10 AM, Sandy Ryza
> wrote:
>
>> I just upgraded the spark-timeseries
>> <htt
6064
2:163428 21112648
3: 12638 14459192
4: 12638 13455904
5: 105397642528
Not sure whether this is suspicious. Any ideas?
-Sandy
Those settings seem reasonable to me.
Are you observing performance that's worse than you would expect?
-Sandy
On Mon, Sep 7, 2015 at 11:22 AM, Alexander Pivovarov
wrote:
> Hi Sandy
>
> Thank you for your reply
> Currently we use r3.2xlarge boxes (vCPU: 8, Mem: 61 GiB)
>
Hi Alex,
If they're both configured correctly, there's no reason that Spark
Standalone should provide performance or memory improvement over Spark on
YARN.
-Sandy
On Fri, Sep 4, 2015 at 1:24 PM, Alexander Pivovarov
wrote:
> Hi Everyone
>
> We are trying the latest aws emr-
completes less quickly, have you checked to see whether YARN is
killing any containers? It could be that the job completes more slowly
because, without the memory overhead, YARN kills containers while it's
running. So it needs to run some tasks multiple times.
-Sandy
On Sat, Aug 29, 2015 at 6:
se task metrics.
-Sandy
On Thu, Aug 20, 2015 at 8:54 AM, Umesh Kacha wrote:
> Hi where do I see GC time in UI? I have set spark.yarn.executor.memoryOverhead
> as 3500 which seems to be good enough I believe. So you mean only GC could
> be the reason behind timeout I checked Yarn logs I
may be killing your executors for using too much off-heap space. You can
see whether this is happening by looking in the Spark AM or YARN
NodeManager logs.
-Sandy
On Thu, Aug 20, 2015 at 7:39 AM, Umesh Kacha wrote:
> Hi thanks much for the response. Yes I tried default settings too 0.2 it
>
What version of Spark are you using? Have you set any shuffle configs?
On Wed, Aug 19, 2015 at 11:46 AM, unk1102 wrote:
> I have one Spark job which seems to run fine but after one hour or so
> executor start getting lost because of time out something like the
> following
> error
>
> cluster.ya
-allocation
.
-Sandy
On Sat, Aug 15, 2015 at 6:40 AM, Mohit Anchlia
wrote:
> I am running on Yarn and do have a question on how spark runs executors on
> different data nodes. Is that primarily decided based on number of
> receivers?
>
> What do I need to do to ensure that mul
Hi Eric,
This is likely because you are putting the parameter after the primary
resource (latest_msmtdt_by_gridid_and_source.py), which makes it a
parameter to your application instead of a parameter to Spark/
-Sandy
On Wed, Aug 12, 2015 at 4:40 AM, Eric Bless
wrote:
> Previously I
Hi Jem,
Do they fail with any particular exception? Does YARN just never end up
giving them resources? Does an application master start? If so, what are
in its logs? If not, anything suspicious in the YARN ResourceManager logs?
-Sandy
On Fri, Aug 7, 2015 at 1:48 AM, Jem Tucker wrote:
>
Hi Mike,
Spark is rack-aware in its task scheduling. Currently Spark doesn't honor
any locality preferences when scheduling executors, but this is being
addressed in SPARK-4352, after which executor-scheduling will be rack-aware
as well.
-Sandy
On Sat, Jul 18, 2015 at 6:25 PM, Mike Fra
Can you try setting the spark.yarn.jar property to make sure it points to
the jar you're thinking of?
-Sandy
On Fri, Jul 17, 2015 at 11:32 AM, Arun Ahuja wrote:
> Yes, it's a YARN cluster and using spark-submit to run. I have SPARK_HOME
> set to the directory above and using
Hi Jonathan,
This is a problem that has come up for us as well, because we'd like
dynamic allocation to be turned on by default in some setups, but not break
existing users with these properties. I'm hoping to figure out a way to
reconcile these by Spark 1.5.
-Sandy
On Wed, Jul 15,
which happens to be in a
local directory that YARN gives it. Based on its title, if YARN-882 were
resolved, it would do nothing to limit the amount of on-disk cache space
Spark could use.
-Sandy
On Mon, Jul 13, 2015 at 6:57 AM, Peter Rudenko
wrote:
> Hi Andrew, here's what i found. M
To add to this, conceptually, it makes no sense to launch something in
yarn-cluster mode by creating a SparkContext on the client - the whole
point of yarn-cluster mode is that the SparkContext runs on the cluster,
not on the client.
On Thu, Jul 9, 2015 at 2:35 PM, Marcelo Vanzin wrote:
> You ca
Strange. Does the application show up at all in the YARN web UI?
Does application_1436314873375_0030
show up at all in the YARN ResourceManager logs?
-Sandy
On Wed, Jul 8, 2015 at 3:32 PM, Juan Gordon wrote:
> Hello Sandy,
>
> Yes I'm sure that YARN has the enought resources, i
Hi JG,
One way this can occur is that YARN doesn't have enough resources to run
your job. Have you verified that it does? Are you able to submit using
the same command from a node on the cluster?
-Sandy
On Wed, Jul 8, 2015 at 3:19 PM, jegordon wrote:
> I'm trying to submit a s
The scheduler configurations are helpful as well, but not useful without
the information outlined above.
-Sandy
On Fri, Jun 26, 2015 at 10:34 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote:
> These are my YARN queue configurations
>
> Queue State:RUNNINGUsed Capacity:206.7%Absolute Used Capacity:3.1
How many nodes do you have, how much space is allocated to each node for
YARN, how big are the executors you're requesting, and what else is running
on the cluster?
On Thu, Jun 25, 2015 at 3:57 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote:
> I run Spark App on Spark 1.3.1 over YARN.
>
> When i request --num-executor
stems need random access to your data, you'd want
to consider a system like HBase and Cassandra, though these are likely to
suffer a little bit on performance and incur higher operational overhead.
-Sandy
On Tue, Jun 23, 2015 at 11:21 PM, Sonal Goyal wrote:
> When you deploy spark ove
Hi Arun,
You can achieve this by
setting spark.scheduler.maxRegisteredResourcesWaitingTime to some really
high number and spark.scheduler.minRegisteredResourcesRatio to 1.0.
-Sandy
On Wed, Jun 24, 2015 at 2:21 AM, Steve Loughran
wrote:
>
> On 24 Jun 2015, at 05:55, canan chen
Oops, that link was for Oryx 1. Here's the repo for Oryx 2:
https://github.com/OryxProject/oryx
On Sat, Jun 20, 2015 at 10:20 AM, Sandy Ryza
wrote:
> Hi Debasish,
>
> The Oryx project (https://github.com/cloudera/oryx), which is Apache 2
> licensed, contains a model server that
Hi Debasish,
The Oryx project (https://github.com/cloudera/oryx), which is Apache 2
licensed, contains a model server that can serve models built with MLlib.
-Sandy
On Sat, Jun 20, 2015 at 8:00 AM, Charles Earl
wrote:
> Is velox NOT open source?
>
>
> On Saturday, June 20, 2015,
This looks really awesome.
On Tue, Jun 16, 2015 at 10:27 AM, Huang, Jie wrote:
> Hi All
>
> We are happy to announce Performance portal for Apache Spark
> http://01org.github.io/sparkscore/ !
>
> The Performance Portal for Apache Spark provides performance data on the
> Spark upsteam to the com
Hi Matt,
If you place your jars on HDFS in a public location, YARN will cache them
on each node after the first download. You can also use the
spark.executor.extraClassPath config to point to them.
-Sandy
On Wed, Jun 17, 2015 at 4:47 PM, Sweeney, Matt wrote:
> Hi folks,
>
> I’m l
Hi Patrick,
I'm noticing that you're using Spark 1.3.1. We fixed a bug in dynamic
allocation in 1.4 that permitted requesting negative numbers of executors.
Any chance you'd be able to try with the newer version and see if the
problem persists?
-Sandy
On Fri, Jun 12, 2015 at 7
On YARN, there is no concept of a Spark Worker. Multiple executors will be
run per node without any effort required by the user, as long as all the
executors fit within each node's resource limits.
-Sandy
On Wed, Jun 10, 2015 at 3:24 PM, Evo Eftimov wrote:
> Yes i think it is ONE wo
That might work, but there might also be other steps that are required.
-Sandy
On Thu, Jun 4, 2015 at 11:13 AM, Saiph Kappa wrote:
> Thanks! It is working fine now with spark-submit. Just out of curiosity,
> how would you use org.apache.spark.deploy.yarn.Client? Adding that
> spark_ya
spark-submit is the recommended way of launching Spark applications on
YARN, because it takes care of submitting the right jars as well as setting
up the classpath and environment variables appropriately.
-Sandy
On Thu, Jun 4, 2015 at 10:30 AM, Saiph Kappa wrote:
> No, I am not. I run it w
Hi Saiph,
Are you launching using spark-submit?
-Sandy
On Thu, Jun 4, 2015 at 10:20 AM, Saiph Kappa wrote:
> Hi,
>
> I've been running my spark streaming application in standalone mode
> without any worries. Now, I've been trying to run it on YARN (hadoop 2.7.0)
reducebyKey with parallelism = 10. If there are
fewer slots to run tasks than tasks, the tasks will just be run serially.
-Sandy
On Tue, Jun 2, 2015 at 11:24 AM, Shushant Arora
wrote:
> So in spark is after acquiring executors from ClusterManeger, does tasks
> are scheduled on executors
needs to be requested before Spark knows what tasks it will
run. Although dynamic allocation improves that last part.
-Sandy
On Tue, Jun 2, 2015 at 9:55 AM, Shushant Arora
wrote:
> Is it possible in JavaSparkContext ?
>
> JavaSparkContext jsc = new JavaSparkContext(conf);
>
Hi Shushant,
Spark currently makes no effort to request executors based on data locality
(although it does try to schedule tasks within executors based on data
locality). We're working on adding this capability at SPARK-4352
<https://issues.apache.org/jira/browse/SPARK-4352>.
-Sa
Hi Corey,
As of this PR https://github.com/apache/spark/pull/5297/files, this can be
controlled with spark.yarn.submit.waitAppCompletion.
-Sandy
On Thu, May 28, 2015 at 11:48 AM, Corey Nolet wrote:
> I am submitting jobs to my yarn cluster via the yarn-cluster mode and I'm
> notic
Awesome!
It's documented here:
https://spark.apache.org/docs/latest/submitting-applications.html
-Sandy
On Mon, May 18, 2015 at 8:03 PM, xiaohe lan wrote:
> Hi Sandy,
>
> Thanks for your information. Yes, spark-submit --master yarn
> --num-executors 5 --executor-cores 4
&g
*All
On Mon, May 18, 2015 at 9:07 AM, Sandy Ryza wrote:
> Hi Xiaohe,
>
> The all Spark options must go before the jar or they won't take effect.
>
> -Sandy
>
> On Sun, May 17, 2015 at 8:59 AM, xiaohe lan
> wrote:
>
>> Sorry, them both are assigned task
Hi Xiaohe,
The all Spark options must go before the jar or they won't take effect.
-Sandy
On Sun, May 17, 2015 at 8:59 AM, xiaohe lan wrote:
> Sorry, them both are assigned task actually.
>
> Aggregated Metrics by Executor
> Executor IDAddressTask TimeTotal TasksFail
-your-apache-spark-jobs-part-1/
http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/
-Sandy
On Thu, Apr 30, 2015 at 5:03 PM, java8964 wrote:
> Really not expert here, but try the following ideas:
>
> 1) I assume you are using yarn, then this blog is very good
/
-Sandy
On Tue, Apr 28, 2015 at 7:12 PM, bit1...@163.com wrote:
> Hi,guys,
> I have the following computation with 3 workers:
> spark-sql --master yarn --executor-memory 3g --executor-cores 2
> --driver-memory 1g -e 'select count(*) from table'
>
> The resources used are s
The setting to increase is spark.yarn.executor.memoryOverhead
On Wed, Apr 15, 2015 at 6:35 AM, Brahma Reddy Battula <
brahmareddy.batt...@huawei.com> wrote:
> Hello Sean Owen,
>
> Thanks for your reply..I"ll increase overhead memory and check it..
>
>
> Bytheway ,Any difference between 1.1 and 1.
me file, a better option
would be to pass the file in with the --files option when you spark-submit,
which will cache the file between executors on the same node.
-Sandy
On Tue, Apr 14, 2015 at 1:39 AM, Horsmann, Tobias <
tobias.horsm...@uni-due.de> wrote:
> Hi,
>
> I am trying to
Hi Riya,
As far as I know, that is correct, unless Mesos fine-grained mode handles
this in some mysterious way.
-Sandy
On Mon, Apr 13, 2015 at 2:09 PM, rcharaya wrote:
> I want to use Rack locality feature of Apache Spark in my application.
>
> Is YARN the only resource manager which
Hi Deepak,
I'm going to shamelessly plug my blog post on tuning Spark:
http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/
It talks about tuning executor size as well as how the number of tasks for
a stage is calculated.
-Sandy
On Thu, Apr 9, 2015 at 9:21 AM,
pr 1, 2015 at 7:08 PM, twinkle sachdeva wrote:
> Hi,
>
> Thanks Sandy.
>
>
> Another way to look at this is that would we like to have our long running
> application to die?
>
> So let's say, we create a window of around 10 batches, and we are using
> incremen
so the records
corresponding to a particular partition at the end of the first job can end
up split across multiple partitions in the second job.
-Sandy
On Wed, Apr 1, 2015 at 9:09 PM, kjsingh wrote:
> Hi,
>
> We are running an hourly job using Spark 1.2 on Yarn. It saves an RDD of
> Tuple2.
That's a good question, Twinkle.
One solution could be to allow a maximum number of failures within any
given time span. E.g. a max failures per hour property.
-Sandy
On Tue, Mar 31, 2015 at 11:52 PM, twinkle sachdeva <
twinkle.sachd...@gmail.com> wrote:
> Hi,
>
> In spark
Hi Matt,
I'm not sure whether we have documented compatibility guidelines here.
However, a strong goal is to keep the external shuffle service compatible
so that many versions of Spark can run against the same shuffle service.
-Sandy
On Wed, Mar 25, 2015 at 6:44 PM, Matt Cheah wrote:
Creating a SparkContext and setting master as yarn-cluster unfortunately
will not work.
SPARK-4924 added APIs for doing this in Spark, but won't be included until
1.4.
-Sandy
On Tue, Mar 17, 2015 at 3:19 AM, Akhil Das
wrote:
> Create SparkContext set master as yarn-cluster then run
Hi Sachin,
It appears that the application master is failing. To figure out what's
wrong you need to get the logs for the application master.
-Sandy
On Wed, Mar 25, 2015 at 7:05 AM, Sachin Singh
wrote:
> OS I am using Linux,
> when I will run simply as master yarn, its r
I checked and apparently it hasn't be released yet. it will be available
in the upcoming CDH 5.4 release.
-Sandy
On Mon, Mar 23, 2015 at 1:32 PM, Nitin kak wrote:
> I know there was an effort for this, do you know which version of Cloudera
> distribution we could find that?
>
&g
Hi Yuichiro,
The way to avoid this is to boost spark.yarn.executor.memoryOverhead until
the executors have enough off-heap memory to avoid going over their limits.
-Sandy
On Tue, Mar 24, 2015 at 11:49 AM, Yuichiro Sakamoto wrote:
> Hello.
>
> We use ALS(Collaborative filtering) of Sp
Ah, yes, I believe this is because only properties prefixed with "spark"
get passed on. The purpose of the "--conf" option is to allow passing
Spark properties to the SparkConf, not to add general key-value pairs to
the JVM system properties.
-Sandy
On Tue, Mar 24, 2015 at
Steve, that's correct, but the problem only shows up when different
versions of the YARN jars are included on the classpath.
-Sandy
On Tue, Mar 24, 2015 at 6:29 AM, Steve Loughran
wrote:
>
> > On 24 Mar 2015, at 02:10, Marcelo Vanzin wrote:
> >
> > This happens most
, and
the on-disk version can be compressed as well.
-Sandy
On Mon, Mar 23, 2015 at 5:29 PM, Bijay Pathak
wrote:
> Hello,
>
> I am running TeraSort <https://github.com/ehiggs/spark-terasort> on
> 100GB of data. The final metrics I am getting on Shuffle Spill are:
>
> Shuf
Hi Emre,
The --conf property is meant to work with yarn-cluster mode.
System.getProperty("key") isn't guaranteed, but new SparkConf().get("key")
should. Does it not?
-Sandy
On Mon, Mar 23, 2015 at 8:39 AM, Emre Sevinc wrote:
> Hello,
>
> According
The former is deprecated. However, the latter is functionally equivalent
to it. Both launch an app in what is now called "yarn-cluster" mode.
Oozie now also has a native Spark action, though I'm not familiar on the
specifics.
-Sandy
On Mon, Mar 23, 2015 at 1:01 PM, Nitin kak
The mode is not deprecated, but the name "yarn-standalone" is now
deprecated. It's now referred to as "yarn-cluster".
-Sandy
On Mon, Mar 23, 2015 at 11:49 AM, nitinkak001 wrote:
> Is yarn-standalone mode deprecated in Spark now. The reason I am asking is
> becau
>
> On Sat, Feb 21, 2015 at 12:05 AM, Sandy Ryza
> wrote:
>
>> Are you using the capacity scheduler or fifo scheduler without multi
>> resource scheduling by any chance?
>>
>> On Thu, Feb 12, 2015 at 1:51 PM, Anders Arpteg
>> wrote:
>>
>>>
ontainermanager.ContainerManagerImpl:
> Event EventType: FINISH_APPLICATION sent to absent application
> application_1422406067005_0053
>
> On Thu, Feb 12, 2015 at 10:38 PM, Sandy Ryza
> wrote:
>
>> It seems unlikely to me that it would be a 2.2 issue, though not entire
That's all correct.
-Sandy
On Fri, Feb 20, 2015 at 1:23 PM, Kelvin Chu <2dot7kel...@gmail.com> wrote:
> Hi Sandy,
>
> I appreciate your clear explanation. Let me try again. It's the best way
> to confirm I understand.
>
> spark.executor.memory + spark.yarn.ex
spark.storage.memoryFraction (default 0.6)
and spark.shuffle.memoryFraction (default 0.2), and the rest is for basic
Spark bookkeeping and anything the user does inside UDFs.
-Sandy
On Fri, Feb 20, 2015 at 11:44 AM, Kelvin Chu <2dot7kel...@gmail.com> wrote:
> Hi Sandy,
>
> I am also doing memory tunin
If that's the error you're hitting, the fix is to boost
spark.yarn.executor.memoryOverhead, which will put some extra room in
between the executor heap sizes and the amount of memory requested for them
from YARN.
-Sandy
On Fri, Feb 20, 2015 at 9:40 AM, lbierman wrote:
> A bit mo
Are you specifying the executor memory, cores, or number of executors
anywhere? If not, you won't be taking advantage of the full resources on
the cluster.
-Sandy
On Fri, Feb 20, 2015 at 2:41 AM, Sean Owen wrote:
> None of this really points to the problem. These indicate that worker
Hi Koert,
You should be using "-Phadoop-2.3" instead of "-Phadoop2.3".
-Sandy
On Wed, Feb 18, 2015 at 10:51 AM, Koert Kuipers wrote:
> does anyone have the right maven invocation for cdh5 with yarn?
> i tried:
> $ mvn -Phadoop2.3 -Dhadoop.version=2.5.0-cdh5
It seems unlikely to me that it would be a 2.2 issue, though not entirely
impossible. Are you able to find any of the container logs? Is the
NodeManager launching containers and reporting some exit code?
-Sandy
On Thu, Feb 12, 2015 at 1:21 PM, Anders Arpteg wrote:
> No, not submitting f
What version of Java are you using? Core NLP dropped support for Java 7 in
its 3.5.0 release.
Also, the correct command line option is --jars, not --addJars.
On Thu, Feb 12, 2015 at 12:03 PM, Deborah Siegel
wrote:
> Hi Abe,
> I'm new to Spark as well, so someone else could answer better. A few
at
>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:551)
>> at
>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:155)
>> at
>> org.apache.spark.deploy.SparkSubmit$.submit(SparkS
bel column that I can feed
into a regression.
So far all the paths I've gone down have led me to internal APIs or
convoluted casting in and out of RDD[Row] and DataFrame. Is there a simple
way of accomplishing this?
any assistance (lookin' at you Xiangrui) much appreciated,
Sandy
Hi Anders,
I just tried this out and was able to successfully acquire executors. Any
strange log messages or additional color you can provide on your setup?
Does yarn-client mode work?
-Sandy
On Wed, Feb 11, 2015 at 1:28 PM, Anders Arpteg wrote:
> Hi,
>
> Compiled the latest master
Hi Arun,
The limit for the YARN user on the cluster nodes should be all that
matters. What version of Spark are you using? If you can turn on
sort-based shuffle it should solve this problem.
-Sandy
On Tue, Feb 10, 2015 at 1:16 PM, Arun Luthra wrote:
> Hi,
>
> I'm running Spar
new StreamingContext(sparkConf, Seconds(bucketSecs))
>
> val sc = new SparkContext()
>
> On Tue, Feb 10, 2015 at 1:02 PM, Sandy Ryza
> wrote:
>
>> Is the SparkContext you're using the same one that the StreamingContext
>> wraps? If not, I don't think using
Is the SparkContext you're using the same one that the StreamingContext
wraps? If not, I don't think using two is supported.
-Sandy
On Tue, Feb 10, 2015 at 9:58 AM, Jon Gregg wrote:
> I'm still getting an error. Here's my code, which works successfully when
>
when
yarn.scheduler.maximum-allocation-mb is exceeded. The reason it doesn't
just use a smaller amount of memory is because it could be surprising to
the user to find out they're silently getting less memory than they
requested. Also, I don't think YARN exposes this up front so Spark has no
way t
I wouldn't be concerned by those ResourceManager log messages. What would
be concerning would be if the NodeManager reported that it was killing
containers for exceeding resource limits.
-Sandy
On Wed, Feb 4, 2015 at 10:19 AM, Michael Albert
wrote:
> Greetings!
>
> Thanks t
https://issues.apache.org/jira/browse/SPARK-5493 currently tracks this.
-Sandy
On Mon, Feb 2, 2015 at 9:37 PM, Zhan Zhang wrote:
> I think you can configure hadoop/hive to do impersonation. There is no
> difference between secure or insecure hadoop cluster by using kinit.
>
--executor-memory and --driver-memory when you launch your Spark job.
-Sandy
On Sat, Feb 7, 2015 at 10:04 AM, sachin Singh
wrote:
> Hi,
> when I am trying to execute my program as
> spark-submit --master yarn --class com.mytestpack.analysis.SparkTest
> sparktest-1.jar
>
>
ark.rdd.RDD[String]".
>
> Leaving it as an RDD and then constantly joining I think will be too slow
> for a streaming job.
>
> On Thu, Feb 5, 2015 at 8:06 PM, Sandy Ryza
> wrote:
>
>> Hi Jon,
>>
>> You'll need to put the file on HDFS (or whatever distribu
:8020/tmp/sparkTest/ file22.bin
> parameters
>
> This is what I executed with different values in num-executors and
> executor-memory.
> What do you think there are too many executors for those HDDs? Could
> it be the reason because of each executor takes more time?
>
> 2015-02-06 9:36
That's definitely surprising to me that you would be hitting a lot of GC
for this scenario. Are you setting --executor-cores and
--executor-memory? What are you setting them to?
-Sandy
On Thu, Feb 5, 2015 at 10:17 AM, Guillermo Ortiz
wrote:
> Any idea why if I use more containers I g
Hi Jon,
You'll need to put the file on HDFS (or whatever distributed filesystem
you're running on) and load it from there.
-Sandy
On Thu, Feb 5, 2015 at 3:18 PM, YaoPau wrote:
> I have a file "badFullIPs.csv" of bad IP addresses used for filtering. In
> yarn-client
Hi Guillermo,
What exactly do you mean by "each iteration"? Are you caching data in
memory?
-Sandy
On Wed, Feb 4, 2015 at 5:02 AM, Guillermo Ortiz
wrote:
> I execute a job in Spark where I'm processing a file of 80Gb in HDFS.
> I have 5 slaves:
> (32cores /256Gb / 7p
Also, do you see any lines in the YARN NodeManager logs where it says that
it's killing a container?
-Sandy
On Wed, Feb 4, 2015 at 8:56 AM, Imran Rashid wrote:
> Hi Michael,
>
> judging from the logs, it seems that those tasks are just working a really
> long time. If you
Hi Tomer,
Are you able to look in your NodeManager logs to see if the NodeManagers
are killing any executors for exceeding memory limits? If you observe
this, you can solve the problem by bumping up
spark.yarn.executor.memoryOverhead.
-Sandy
On Sun, Feb 1, 2015 at 5:28 AM, Tomer Benyamini
Filed https://issues.apache.org/jira/browse/SPARK-5500 for this.
-Sandy
On Fri, Jan 30, 2015 at 11:59 AM, Aaron Davidson wrote:
> Ah, this is in particular an issue due to sort-based shuffle (it was not
> the case for hash-based shuffle, which would immediately serialize each
> reco
* If you plan to directly cache Hadoop writable objects, you should
first copy them using
* a `map` function.
This should probably say "direct cachingly *or directly shuffling*". To
sort directly from a sequence file, the records need to be cloned first.
-Sandy
On Fri, Ja
ase memory,
> the more jobs you can run.
>
> This is of course assuming you could over subscribe a node in terms of cpu
> cores if you have memory available.
>
> YMMV
>
> HTH
> -Mike
>
> On Jan 30, 2015, at 7:10 AM, Sandy Ryza wrote:
>
> My answer was based off t
1 - 100 of 204 matches
Mail list logo