Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-22 Thread Kevin Markey
I've discovered that one of the anomalies I encountered was due to a 
(embarrassing? humorous?) user error.  See the user list thread "Failed 
RC-10 yarn-cluster job for FS closed error when cleaning up staging 
directory" for my discussion.  With the user error corrected, the FS 
closed exception only prevents deletion of the staging directory, but 
does not affect completion with "SUCCESS." The FS closed exception still 
needs some investigation at least by me.


I tried the patch reported by SPARK-1898, but it didn't fix the problem 
without fixing the user error.  I did not attempt to test my fix without 
the patch, so I can't pass judgment on the patch.


Although this is merely a pseudocluster based test -- I can't 
reconfigure our cluster with RC-10 -- I'll now change my vote to...


+1.

Thanks all who helped.
Kevin



On 05/21/2014 09:18 PM, Tom Graves wrote:

I don't think Kevin's issue would be with an api change in YarnClientImpl since 
in both cases he says he is using hadoop 2.3.0.  I'll take a look at his post 
in the user list.

Tom




On Wednesday, May 21, 2014 7:01 PM, Colin McCabe  wrote:
  



Hi Kevin,

Can you try https://issues.apache.org/jira/browse/SPARK-1898 to see if it
fixes your issue?

Running in YARN cluster mode, I had a similar issue where Spark was able to
create a Driver and an Executor via YARN, but then it stopped making any
progress.

Note: I was using a pre-release version of CDH5.1.0, not 2.3 like you were
using.

best,
Colin



On Wed, May 21, 2014 at 3:34 PM, Kevin Markey wrote:


0

Abstaining because I'm not sure if my failures are due to Spark,
configuration, or other factors...

Compiled and deployed RC10 for YARN, Hadoop 2.3

  per Spark 1.0.0 Yarn

documentation.  No problems.
Rebuilt applications against RC10 and Hadoop 2.3.0 (plain vanilla Apache
release).
Updated scripts for various applications.
Application had successfully compiled and run against Spark 0.9.1 and
Hadoop 2.3.0.
Ran in "yarn-cluster" mode.
Application ran to conclusion except that it ultimately failed because of
an exception when Spark tried to clean up the staging directory.  Also,
where before Yarn would report the running program as "RUNNING", it only
reported this application as "ACCEPTED".  It appeared to run two containers
when the first instance never reported that it was RUNNING.

I will post a

  separate note to the USER list about the specifics.

Thanks
Kevin Markey



On 05/21/2014 10:58 AM, Mark Hamstra wrote:


+1


On Tue, May 20, 2014 at 11:09 PM, Henry Saputra 
wrote:

   Signature and hash for source looks good

No external executable package with source - good
Compiled with git and maven - good
Ran examples and sample programs locally and standalone -good

+1

- Henry



On Tue, May 20, 2014 at 1:13 PM, Tathagata Das
 wrote:


Please vote on releasing the following candidate as Apache Spark version


1.0.0!


This has a few bug fixes on top of rc9:
SPARK-1875: https://github.com/apache/spark/pull/824
SPARK-1876: https://github.com/apache/spark/pull/819
SPARK-1878: https://github.com/apache/spark/pull/822
SPARK-1879: https://github.com/apache/spark/pull/823

The tag to be voted on is v1.0.0-rc10 (commit d8070234):

   https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=

d807023479ce10aec28ef3c1ab646ddefc2e663c


The

  release files, including signatures, digests, etc. can be found at:

http://people.apache.org/~tdas/spark-1.0.0-rc10/

The release artifacts are signed with the following key:
https://people.apache.org/keys/committer/tdas.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1018/

The documentation

  corresponding to this release can be found at:

http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/

The full list of changes in this release can be found at:

   https://git-wip-us.apache.org/repos/asf?p=spark.git;a=blob;

f=CHANGES.txt;h=d21f0ace6326e099360975002797eb7cba9d5273;hb=
d807023479ce10aec28ef3c1ab646ddefc2e663c


Please vote on releasing this package as Apache Spark 1.0.0!

The vote is open until

  Friday, May 23, at 20:00 UTC and passes if

amajority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.0.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.apache.org/

== API Changes ==
We welcome users to compile Spark applications against 1.0. There are
a few API changes in this release. Here are links to the associated
upgrade guides - user facing changes have been kept as small as
possible.

Changes to ML vector specification:

   http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/

mllib-guide.html#from-09-to-10


Changes to the Java API:

   http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/

java-programming-guide.html#upgrading-from-pre-10-versions-of-spark


Changes to the streaming API:

   http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/

strea

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-22 Thread Xiangrui Meng
Hi DB,

I found it is a little hard to implement the solution I mentioned:

> Do not send the primary jar and secondary jars to executors'
> distributed cache. Instead, add them to "spark.jars" in SparkSubmit
> and serve them via http by called sc.addJar in SparkContext.

If you look at ApplicationMaster code, which is entry point in
yarn-cluster mode. It actually creates a thread of the user class
first and waits the user class to create a spark context. It means the
user class has to be on the classpath at that time. I think we need to
add the primary jar and secondary jars twice, once to system
classpath, and then to the executor classloader.

Best,
Xiangrui

On Wed, May 21, 2014 at 3:50 PM, DB Tsai  wrote:
> @Xiangrui
> How about we send the primary jar and secondary jars into distributed cache
> without adding them into the system classloader of executors. Then we add
> them using custom classloader so we don't need to call secondary jars
> through reflection in primary jar. This will be consistent to what we do in
> standalone mode, and also solve the scalability of jar distribution issue.
>
> @Koert
> Yes, that's why I suggest we can either ignore the parent classloader of
> custom class loader to solve this as you say. In this case, we need add the
> all the classpath of the system loader into our custom one (which doesn't
> have parent) so we will not miss the default java classes. This is how
> tomcat works.
>
> @Patrick
> I agree that we should have the fix by Xiangrui first, since it solves most
> of the use case. I don't know when people will use dynamical addJar in Yarn
> since it's most useful for interactive environment.
>
>
> Sincerely,
>
> DB Tsai
> ---
> My Blog: https://www.dbtsai.com
> LinkedIn: https://www.linkedin.com/in/dbtsai
>
>
> On Wed, May 21, 2014 at 2:57 PM, Koert Kuipers  wrote:
>
>> db tsai, i do not think userClassPathFirst is working, unless the classes
>> you load dont reference any classes already loaded by the parent
>> classloader (a mostly hypothetical situation)... i filed a jira for this
>> here:
>> https://issues.apache.org/jira/browse/SPARK-1863
>>
>>
>>
>> On Tue, May 20, 2014 at 1:04 AM, DB Tsai  wrote:
>>
>> > In 1.0, there is a new option for users to choose which classloader has
>> > higher priority via spark.files.userClassPathFirst, I decided to submit
>> the
>> > PR for 0.9 first. We use this patch in our lab and we can use those jars
>> > added by sc.addJar without reflection.
>> >
>> > https://github.com/apache/spark/pull/834
>> >
>> > Can anyone comment if it's a good approach?
>> >
>> > Thanks.
>> >
>> >
>> > Sincerely,
>> >
>> > DB Tsai
>> > ---
>> > My Blog: https://www.dbtsai.com
>> > LinkedIn: https://www.linkedin.com/in/dbtsai
>> >
>> >
>> > On Mon, May 19, 2014 at 7:42 PM, DB Tsai  wrote:
>> >
>> > > Good summary! We fixed it in branch 0.9 since our production is still
>> in
>> > > 0.9. I'm porting it to 1.0 now, and hopefully will submit PR for 1.0
>> > > tonight.
>> > >
>> > >
>> > > Sincerely,
>> > >
>> > > DB Tsai
>> > > ---
>> > > My Blog: https://www.dbtsai.com
>> > > LinkedIn: https://www.linkedin.com/in/dbtsai
>> > >
>> > >
>> > > On Mon, May 19, 2014 at 7:38 PM, Sandy Ryza > > >wrote:
>> > >
>> > >> It just hit me why this problem is showing up on YARN and not on
>> > >> standalone.
>> > >>
>> > >> The relevant difference between YARN and standalone is that, on YARN,
>> > the
>> > >> app jar is loaded by the system classloader instead of Spark's custom
>> > URL
>> > >> classloader.
>> > >>
>> > >> On YARN, the system classloader knows about [the classes in the spark
>> > >> jars,
>> > >> the classes in the primary app jar].   The custom classloader knows
>> > about
>> > >> [the classes in secondary app jars] and has the system classloader as
>> > its
>> > >> parent.
>> > >>
>> > >> A few relevant facts (mostly redundant with what Sean pointed out):
>> > >> * Every class has a classloader that loaded it.
>> > >> * When an object of class B is instantiated inside of class A, the
>> > >> classloader used for loading B is the classloader that was used for
>> > >> loading
>> > >> A.
>> > >> * When a classloader fails to load a class, it lets its parent
>> > classloader
>> > >> try.  If its parent succeeds, its parent becomes the "classloader that
>> > >> loaded it".
>> > >>
>> > >> So suppose class B is in a secondary app jar and class A is in the
>> > primary
>> > >> app jar:
>> > >> 1. The custom classloader will try to load class A.
>> > >> 2. It will fail, because it only knows about the secondary jars.
>> > >> 3. It will delegate to its parent, the system classloader.
>> > >> 4. The system classloader will succeed, because it knows about the
>> > primary
>> > >> app jar.
>> > >> 5. A's classloader will be the system classloader.
>> > >> 6. A tries to instantiate an instance of class 

Contributions to MLlib

2014-05-22 Thread MEETHU MATHEW
Hi,


I would like to do some contributions towards the MLlib .I've a few concerns 
regarding the same.

1. Is there any reason for implementing the algorithms supported  by MLlib in 
Scala
2. Will you accept if  the contributions are done in Python or Java

Thanks,
Meethu M

Re: Should SPARK_HOME be needed with Mesos?

2014-05-22 Thread Gerard Maas
Sure.  Should I create a Jira as well?

I saw there's already a broader ticket regarding the ambiguous use of
SPARK_HOME [1]  (cc: Patrick as owner of that ticket)

I don't know if it would be more relevant to remove the use of SPARK_HOME
when using mesos and have the assembly as the only way forward, or whether
that's a too radical change that might break some existing systems.

>From a real-world ops perspective, the assembly should be the way to go. I
don't see installing and configuring Spark distros on a mesos master as a
way to have the mesos executor in place.

-kr, Gerard.

[1] https://issues.apache.org/jira/browse/SPARK-1110


On Thu, May 22, 2014 at 6:19 AM, Andrew Ash  wrote:

> Hi Gerard,
>
> I agree that your second option seems preferred.  You shouldn't have to
> specify a SPARK_HOME if the executor is going to use the spark.executor.uri
> instead.  Can you send in a pull request that includes your proposed
> changes?
>
> Andrew
>
>
> On Wed, May 21, 2014 at 10:19 AM, Gerard Maas 
> wrote:
>
> > Spark dev's,
> >
> > I was looking into a question asked on the user list where a
> > ClassNotFoundException was thrown when running a job on Mesos. Curious
> > issue with serialization on Mesos: more details here [1]:
> >
> > When trying to run that simple example on my Mesos installation, I faced
> > another issue: I got an error that "SPARK_HOME" was not set. I found that
> > curious b/c a local spark installation should not be required to run a
> job
> > on Mesos. All that's needed is the executor package, being the
> > assembly.tar.gz on a reachable location (HDFS/S3/HTTP).
> >
> > I went looking into the code and indeed there's a check on SPARK_HOME [2]
> > regardless of the presence of the assembly but it's actually only used if
> > the assembly is not provided (which is a kind-of best-effort recovery
> > strategy).
> >
> > Current flow:
> >
> > if (!SPARK_HOME) fail("No SPARK_HOME")
> > else if (assembly) { use assembly) }
> > else { try use SPARK_HOME to build spark_executor }
> >
> > Should be:
> > sparkExecutor =  if (assembly) {assembly}
> >  else if (SPARK_HOME) {try use SPARK_HOME to build
> > spark_executor}
> >  else { fail("No executor found. Please provide
> > spark.executor.uri (preferred) or spark.home")
> >
> > What do you think?
> >
> > -kr, Gerard.
> >
> >
> > [1]
> >
> >
> http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-with-Spark-Mesos-spark-shell-works-fine-td6165.html
> >
> > [2]
> >
> >
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala#L89
> >
>


Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-22 Thread Kevin Markey

I retested several different cases...

1. FS closed exception shows up ONLY in RC-10, not in Spark 0.9.1, with 
both Hadoop 2.2 and 2.3.

2. SPARK-1898 has no effect for my use cases.
3. The failure to report that the underlying application is "RUNNING" 
and that it has succeeded is due ONLY to my user error.


The FS closed exception only effects the cleanup of the staging 
directory, not the final success or failure.  I've not yet tested the 
effect of changing my application's initialization, use, or closing of 
FileSystem.


Thanks again.
Kevin

On 05/22/2014 01:32 AM, Kevin Markey wrote:
I've discovered that one of the anomalies I encountered was due to a 
(embarrassing? humorous?) user error.  See the user list thread 
"Failed RC-10 yarn-cluster job for FS closed error when cleaning up 
staging directory" for my discussion.  With the user error corrected, 
the FS closed exception only prevents deletion of the staging 
directory, but does not affect completion with "SUCCESS." The FS 
closed exception still needs some investigation at least by me.


I tried the patch reported by SPARK-1898, but it didn't fix the 
problem without fixing the user error.  I did not attempt to test my 
fix without the patch, so I can't pass judgment on the patch.


Although this is merely a pseudocluster based test -- I can't 
reconfigure our cluster with RC-10 -- I'll now change my vote to...


+1.

Thanks all who helped.
Kevin



On 05/21/2014 09:18 PM, Tom Graves wrote:
I don't think Kevin's issue would be with an api change in 
YarnClientImpl since in both cases he says he is using hadoop 2.3.0.  
I'll take a look at his post in the user list.


Tom




On Wednesday, May 21, 2014 7:01 PM, Colin McCabe 
 wrote:



Hi Kevin,

Can you try https://issues.apache.org/jira/browse/SPARK-1898 to see 
if it

fixes your issue?

Running in YARN cluster mode, I had a similar issue where Spark was 
able to

create a Driver and an Executor via YARN, but then it stopped making any
progress.

Note: I was using a pre-release version of CDH5.1.0, not 2.3 like you 
were

using.

best,
Colin



On Wed, May 21, 2014 at 3:34 PM, Kevin Markey 
wrote:



0

Abstaining because I'm not sure if my failures are due to Spark,
configuration, or other factors...

Compiled and deployed RC10 for YARN, Hadoop 2.3

  per Spark 1.0.0 Yarn

documentation.  No problems.
Rebuilt applications against RC10 and Hadoop 2.3.0 (plain vanilla 
Apache

release).
Updated scripts for various applications.
Application had successfully compiled and run against Spark 0.9.1 and
Hadoop 2.3.0.
Ran in "yarn-cluster" mode.
Application ran to conclusion except that it ultimately failed 
because of

an exception when Spark tried to clean up the staging directory.  Also,
where before Yarn would report the running program as "RUNNING", it 
only
reported this application as "ACCEPTED".  It appeared to run two 
containers

when the first instance never reported that it was RUNNING.

I will post a

  separate note to the USER list about the specifics.

Thanks
Kevin Markey



On 05/21/2014 10:58 AM, Mark Hamstra wrote:


+1


On Tue, May 20, 2014 at 11:09 PM, Henry Saputra 


wrote:

   Signature and hash for source looks good

No external executable package with source - good
Compiled with git and maven - good
Ran examples and sample programs locally and standalone -good

+1

- Henry



On Tue, May 20, 2014 at 1:13 PM, Tathagata Das
 wrote:

Please vote on releasing the following candidate as Apache Spark 
version



1.0.0!


This has a few bug fixes on top of rc9:
SPARK-1875: https://github.com/apache/spark/pull/824
SPARK-1876: https://github.com/apache/spark/pull/819
SPARK-1878: https://github.com/apache/spark/pull/822
SPARK-1879: https://github.com/apache/spark/pull/823

The tag to be voted on is v1.0.0-rc10 (commit d8070234):

https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=

d807023479ce10aec28ef3c1ab646ddefc2e663c


The

  release files, including signatures, digests, etc. can be found at:

http://people.apache.org/~tdas/spark-1.0.0-rc10/

The release artifacts are signed with the following key:
https://people.apache.org/keys/committer/tdas.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1018/ 



The documentation

  corresponding to this release can be found at:

http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/

The full list of changes in this release can be found at:

https://git-wip-us.apache.org/repos/asf?p=spark.git;a=blob;

f=CHANGES.txt;h=d21f0ace6326e099360975002797eb7cba9d5273;hb=
d807023479ce10aec28ef3c1ab646ddefc2e663c


Please vote on releasing this package as Apache Spark 1.0.0!

The vote is open until

  Friday, May 23, at 20:00 UTC and passes if

amajority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.0.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.apache.org/

=

Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-22 Thread Marcelo Vanzin
Hi Kevin,

On Thu, May 22, 2014 at 9:49 AM, Kevin Markey  wrote:
> The FS closed exception only effects the cleanup of the staging directory,
> not the final success or failure.  I've not yet tested the effect of
> changing my application's initialization, use, or closing of FileSystem.

Without going and reading more of the Spark code, if your app is
explicitly close()'ing the FileSystem instance, it may be causing the
exception. If Spark is caching the FileSystem instance, your app is
probably closing that same instance (which it got from the HDFS
library's internal cache).

It would be nice if you could test that theory; it might be worth
knowing that's the case so that we can tell people not to do that.

-- 
Marcelo


Re: Contributions to MLlib

2014-05-22 Thread Xiangrui Meng
Hi Meethu,

Thanks for asking! Scala is the native language in Spark. Implementing
algorithms in Scala can utilize the full power of Spark Core. Also,
Scala's syntax is very concise. Implementing ML algorithms using
different languages would increase the maintenance cost. However,
there are still much work to be done in the Python/Java land. For
example, we currently do not support distributed matrix and decision
tree in Python, and those interfaces may not be friendly for Java
users. If you would like to contribute to MLlib in Python or Java, it
would be a good place to start. Thanks!

Best,
Xiangrui

On Thu, May 22, 2014 at 3:06 AM, MEETHU MATHEW  wrote:
> Hi,
>
>
> I would like to do some contributions towards the MLlib .I've a few concerns 
> regarding the same.
>
> 1. Is there any reason for implementing the algorithms supported  by MLlib in 
> Scala
> 2. Will you accept if  the contributions are done in Python or Java
>
> Thanks,
> Meethu M


Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-22 Thread Colin McCabe
The FileSystem cache is something that has caused a lot of pain over the
years.  Unfortunately we (in Hadoop core) can't change the way it works now
because there are too many users depending on the current behavior.

Basically, the idea is that when you request a FileSystem with certain
options with FileSystem#get, you might get a reference to an FS object that
already exists, from our FS cache cache singleton.  Unfortunately, this
also means that someone else can change the working directory on you or
close the FS underneath you.  The FS is basically shared mutable state, and
you don't know whom you're sharing with.

It might be better for Spark to call FileSystem#newInstance, which bypasses
the FileSystem cache and always creates a new object.  If Spark can hang on
to the FS for a while, it can get the benefits of caching without the
downsides.  In HDFS, multiple FS instances can also share things like the
socket cache between them.

best,
Colin


On Thu, May 22, 2014 at 10:06 AM, Marcelo Vanzin wrote:

> Hi Kevin,
>
> On Thu, May 22, 2014 at 9:49 AM, Kevin Markey 
> wrote:
> > The FS closed exception only effects the cleanup of the staging
> directory,
> > not the final success or failure.  I've not yet tested the effect of
> > changing my application's initialization, use, or closing of FileSystem.
>
> Without going and reading more of the Spark code, if your app is
> explicitly close()'ing the FileSystem instance, it may be causing the
> exception. If Spark is caching the FileSystem instance, your app is
> probably closing that same instance (which it got from the HDFS
> library's internal cache).
>
> It would be nice if you could test that theory; it might be worth
> knowing that's the case so that we can tell people not to do that.
>
> --
> Marcelo
>


Re: Should SPARK_HOME be needed with Mesos?

2014-05-22 Thread Andrew Ash
Fixing the immediate issue of requiring SPARK_HOME to be set when it's not
actually used is a separate ticket in my mind from a larger cleanup of what
SPARK_HOME means across the cluster.

I think you should file a new ticket for just this particular issue.


On Thu, May 22, 2014 at 11:03 AM, Gerard Maas  wrote:

> Sure.  Should I create a Jira as well?
>
> I saw there's already a broader ticket regarding the ambiguous use of
> SPARK_HOME [1]  (cc: Patrick as owner of that ticket)
>
> I don't know if it would be more relevant to remove the use of SPARK_HOME
> when using mesos and have the assembly as the only way forward, or whether
> that's a too radical change that might break some existing systems.
>
> From a real-world ops perspective, the assembly should be the way to go. I
> don't see installing and configuring Spark distros on a mesos master as a
> way to have the mesos executor in place.
>
> -kr, Gerard.
>
> [1] https://issues.apache.org/jira/browse/SPARK-1110
>
>
> On Thu, May 22, 2014 at 6:19 AM, Andrew Ash  wrote:
>
>> Hi Gerard,
>>
>> I agree that your second option seems preferred.  You shouldn't have to
>> specify a SPARK_HOME if the executor is going to use the
>> spark.executor.uri
>> instead.  Can you send in a pull request that includes your proposed
>> changes?
>>
>> Andrew
>>
>>
>> On Wed, May 21, 2014 at 10:19 AM, Gerard Maas 
>> wrote:
>>
>> > Spark dev's,
>> >
>> > I was looking into a question asked on the user list where a
>> > ClassNotFoundException was thrown when running a job on Mesos. Curious
>> > issue with serialization on Mesos: more details here [1]:
>> >
>> > When trying to run that simple example on my Mesos installation, I faced
>> > another issue: I got an error that "SPARK_HOME" was not set. I found
>> that
>> > curious b/c a local spark installation should not be required to run a
>> job
>> > on Mesos. All that's needed is the executor package, being the
>> > assembly.tar.gz on a reachable location (HDFS/S3/HTTP).
>> >
>> > I went looking into the code and indeed there's a check on SPARK_HOME
>> [2]
>> > regardless of the presence of the assembly but it's actually only used
>> if
>> > the assembly is not provided (which is a kind-of best-effort recovery
>> > strategy).
>> >
>> > Current flow:
>> >
>> > if (!SPARK_HOME) fail("No SPARK_HOME")
>> > else if (assembly) { use assembly) }
>> > else { try use SPARK_HOME to build spark_executor }
>> >
>> > Should be:
>> > sparkExecutor =  if (assembly) {assembly}
>> >  else if (SPARK_HOME) {try use SPARK_HOME to build
>> > spark_executor}
>> >  else { fail("No executor found. Please provide
>> > spark.executor.uri (preferred) or spark.home")
>> >
>> > What do you think?
>> >
>> > -kr, Gerard.
>> >
>> >
>> > [1]
>> >
>> >
>> http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-with-Spark-Mesos-spark-shell-works-fine-td6165.html
>> >
>> > [2]
>> >
>> >
>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala#L89
>> >
>>
>
>


Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-22 Thread Aaron Davidson
In Spark 0.9.0 and 0.9.1, we stopped using the FileSystem cache correctly,
and we just recently resumed using it in 1.0 (and in 0.9.2) when this issue
was fixed: https://issues.apache.org/jira/browse/SPARK-1676

Prior to this fix, each Spark task created and cached its own FileSystems
due to a bug in how the FS cache handles UGIs. The big problem that arose
was that these FileSystems were never closed, so they just kept piling up.
There were two solutions we considered, with the following effects: (1)
Share the FS cache among all tasks and (2) Each task effectively gets its
own FS cache, and closes all of its FSes after the task completes.

We chose solution (1) for 3 reasons:
 - It does not rely on the behavior of a bug in HDFS.
 - It is the most performant option.
 - It is most consistent with the semantics of the (albeit broken) FS cache.

Since this behavior was changed in 1.0, it could be considered a
regression. We should consider the exact behavior we want out of the FS
cache. For Spark's purposes, it seems fine to cache FileSystems across
tasks, as Spark does not close FileSystems. The issue that comes up is that
user code which uses FileSystem.get() but then closes the FileSystem can
screw up Spark processes which were using that FileSystem. The workaround
for users would be to use FileSystem.newInstance() if they want full
control over the lifecycle of their FileSystems.


On Thu, May 22, 2014 at 12:06 PM, Colin McCabe wrote:

> The FileSystem cache is something that has caused a lot of pain over the
> years.  Unfortunately we (in Hadoop core) can't change the way it works now
> because there are too many users depending on the current behavior.
>
> Basically, the idea is that when you request a FileSystem with certain
> options with FileSystem#get, you might get a reference to an FS object that
> already exists, from our FS cache cache singleton.  Unfortunately, this
> also means that someone else can change the working directory on you or
> close the FS underneath you.  The FS is basically shared mutable state, and
> you don't know whom you're sharing with.
>
> It might be better for Spark to call FileSystem#newInstance, which bypasses
> the FileSystem cache and always creates a new object.  If Spark can hang on
> to the FS for a while, it can get the benefits of caching without the
> downsides.  In HDFS, multiple FS instances can also share things like the
> socket cache between them.
>
> best,
> Colin
>
>
> On Thu, May 22, 2014 at 10:06 AM, Marcelo Vanzin  >wrote:
>
> > Hi Kevin,
> >
> > On Thu, May 22, 2014 at 9:49 AM, Kevin Markey 
> > wrote:
> > > The FS closed exception only effects the cleanup of the staging
> > directory,
> > > not the final success or failure.  I've not yet tested the effect of
> > > changing my application's initialization, use, or closing of
> FileSystem.
> >
> > Without going and reading more of the Spark code, if your app is
> > explicitly close()'ing the FileSystem instance, it may be causing the
> > exception. If Spark is caching the FileSystem instance, your app is
> > probably closing that same instance (which it got from the HDFS
> > library's internal cache).
> >
> > It would be nice if you could test that theory; it might be worth
> > knowing that's the case so that we can tell people not to do that.
> >
> > --
> > Marcelo
> >
>


Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-22 Thread Kevin Markey

Thank you, all!  This is quite helpful.

We have been arguing how to handle this issue across a growing 
application.  Unfortunately the Hadoop FileSystem java doc should say 
all this but doesn't!


Kevin

On 05/22/2014 01:48 PM, Aaron Davidson wrote:

In Spark 0.9.0 and 0.9.1, we stopped using the FileSystem cache correctly,
and we just recently resumed using it in 1.0 (and in 0.9.2) when this issue
was fixed: https://issues.apache.org/jira/browse/SPARK-1676

Prior to this fix, each Spark task created and cached its own FileSystems
due to a bug in how the FS cache handles UGIs. The big problem that arose
was that these FileSystems were never closed, so they just kept piling up.
There were two solutions we considered, with the following effects: (1)
Share the FS cache among all tasks and (2) Each task effectively gets its
own FS cache, and closes all of its FSes after the task completes.

We chose solution (1) for 3 reasons:
  - It does not rely on the behavior of a bug in HDFS.
  - It is the most performant option.
  - It is most consistent with the semantics of the (albeit broken) FS cache.

Since this behavior was changed in 1.0, it could be considered a
regression. We should consider the exact behavior we want out of the FS
cache. For Spark's purposes, it seems fine to cache FileSystems across
tasks, as Spark does not close FileSystems. The issue that comes up is that
user code which uses FileSystem.get() but then closes the FileSystem can
screw up Spark processes which were using that FileSystem. The workaround
for users would be to use FileSystem.newInstance() if they want full
control over the lifecycle of their FileSystems.


On Thu, May 22, 2014 at 12:06 PM, Colin McCabe wrote:


The FileSystem cache is something that has caused a lot of pain over the
years.  Unfortunately we (in Hadoop core) can't change the way it works now
because there are too many users depending on the current behavior.

Basically, the idea is that when you request a FileSystem with certain
options with FileSystem#get, you might get a reference to an FS object that
already exists, from our FS cache cache singleton.  Unfortunately, this
also means that someone else can change the working directory on you or
close the FS underneath you.  The FS is basically shared mutable state, and
you don't know whom you're sharing with.

It might be better for Spark to call FileSystem#newInstance, which bypasses
the FileSystem cache and always creates a new object.  If Spark can hang on
to the FS for a while, it can get the benefits of caching without the
downsides.  In HDFS, multiple FS instances can also share things like the
socket cache between them.

best,
Colin


On Thu, May 22, 2014 at 10:06 AM, Marcelo Vanzin 
wrote:
Hi Kevin,

On Thu, May 22, 2014 at 9:49 AM, Kevin Markey 
wrote:

The FS closed exception only effects the cleanup of the staging

directory,

not the final success or failure.  I've not yet tested the effect of
changing my application's initialization, use, or closing of

FileSystem.

Without going and reading more of the Spark code, if your app is
explicitly close()'ing the FileSystem instance, it may be causing the
exception. If Spark is caching the FileSystem instance, your app is
probably closing that same instance (which it got from the HDFS
library's internal cache).

It would be nice if you could test that theory; it might be worth
knowing that's the case so that we can tell people not to do that.

--
Marcelo





Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-22 Thread Tathagata Das
Hey all,

On further testing, I came across a bug that breaks execution of
pyspark scripts on YARN.
https://issues.apache.org/jira/browse/SPARK-1900
This is a blocker and worth cutting a new RC.

We also found a fix for a known issue that prevents additional jar
files to be specified through spark-submit on YARN.
https://issues.apache.org/jira/browse/SPARK-1870
The has been fixed and will be in the next RC.

We are canceling this vote for now. We will post RC11 shortly. Thanks
everyone for testing!

TD

On Thu, May 22, 2014 at 1:25 PM, Kevin Markey  wrote:
> Thank you, all!  This is quite helpful.
>
> We have been arguing how to handle this issue across a growing application.
> Unfortunately the Hadoop FileSystem java doc should say all this but
> doesn't!
>
> Kevin
>
>
> On 05/22/2014 01:48 PM, Aaron Davidson wrote:
>>
>> In Spark 0.9.0 and 0.9.1, we stopped using the FileSystem cache correctly,
>> and we just recently resumed using it in 1.0 (and in 0.9.2) when this
>> issue
>> was fixed: https://issues.apache.org/jira/browse/SPARK-1676
>>
>> Prior to this fix, each Spark task created and cached its own FileSystems
>> due to a bug in how the FS cache handles UGIs. The big problem that arose
>> was that these FileSystems were never closed, so they just kept piling up.
>> There were two solutions we considered, with the following effects: (1)
>> Share the FS cache among all tasks and (2) Each task effectively gets its
>> own FS cache, and closes all of its FSes after the task completes.
>>
>> We chose solution (1) for 3 reasons:
>>   - It does not rely on the behavior of a bug in HDFS.
>>   - It is the most performant option.
>>   - It is most consistent with the semantics of the (albeit broken) FS
>> cache.
>>
>> Since this behavior was changed in 1.0, it could be considered a
>> regression. We should consider the exact behavior we want out of the FS
>> cache. For Spark's purposes, it seems fine to cache FileSystems across
>> tasks, as Spark does not close FileSystems. The issue that comes up is
>> that
>> user code which uses FileSystem.get() but then closes the FileSystem can
>> screw up Spark processes which were using that FileSystem. The workaround
>> for users would be to use FileSystem.newInstance() if they want full
>> control over the lifecycle of their FileSystems.
>>
>>
>> On Thu, May 22, 2014 at 12:06 PM, Colin McCabe
>> wrote:
>>
>>> The FileSystem cache is something that has caused a lot of pain over the
>>> years.  Unfortunately we (in Hadoop core) can't change the way it works
>>> now
>>> because there are too many users depending on the current behavior.
>>>
>>> Basically, the idea is that when you request a FileSystem with certain
>>> options with FileSystem#get, you might get a reference to an FS object
>>> that
>>> already exists, from our FS cache cache singleton.  Unfortunately, this
>>> also means that someone else can change the working directory on you or
>>> close the FS underneath you.  The FS is basically shared mutable state,
>>> and
>>> you don't know whom you're sharing with.
>>>
>>> It might be better for Spark to call FileSystem#newInstance, which
>>> bypasses
>>> the FileSystem cache and always creates a new object.  If Spark can hang
>>> on
>>> to the FS for a while, it can get the benefits of caching without the
>>> downsides.  In HDFS, multiple FS instances can also share things like the
>>> socket cache between them.
>>>
>>> best,
>>> Colin
>>>
>>>
>>> On Thu, May 22, 2014 at 10:06 AM, Marcelo Vanzin >>>
 wrote:
 Hi Kevin,

 On Thu, May 22, 2014 at 9:49 AM, Kevin Markey 
 wrote:
>
> The FS closed exception only effects the cleanup of the staging

 directory,
>
> not the final success or failure.  I've not yet tested the effect of
> changing my application's initialization, use, or closing of
>>>
>>> FileSystem.

 Without going and reading more of the Spark code, if your app is
 explicitly close()'ing the FileSystem instance, it may be causing the
 exception. If Spark is caching the FileSystem instance, your app is
 probably closing that same instance (which it got from the HDFS
 library's internal cache).

 It would be nice if you could test that theory; it might be worth
 knowing that's the case so that we can tell people not to do that.

 --
 Marcelo

>


Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-22 Thread Colin McCabe
On Thu, May 22, 2014 at 12:48 PM, Aaron Davidson  wrote:

> In Spark 0.9.0 and 0.9.1, we stopped using the FileSystem cache correctly,
> and we just recently resumed using it in 1.0 (and in 0.9.2) when this issue
> was fixed: https://issues.apache.org/jira/browse/SPARK-1676
>

Interesting...


> Prior to this fix, each Spark task created and cached its own FileSystems
> due to a bug in how the FS cache handles UGIs. The big problem that arose
> was that these FileSystems were never closed, so they just kept piling up.
> There were two solutions we considered, with the following effects: (1)
> Share the FS cache among all tasks and (2) Each task effectively gets its
> own FS cache, and closes all of its FSes after the task completes.
>

Since the FS cache is in hadoop-common-project, it's not so much a bug in
HDFS as a bug in Hadoop.  So even if you're using, say, Lustre, you'll
still get the same issues with org.apache.hadoop.fs.FileSystem and its
global cache.

We chose solution (1) for 3 reasons:
>  - It does not rely on the behavior of a bug in HDFS.

 - It is the most performant option.
>  - It is most consistent with the semantics of the (albeit broken) FS
> cache.
>
> Since this behavior was changed in 1.0, it could be considered a
> regression. We should consider the exact behavior we want out of the FS
> cache. For Spark's purposes, it seems fine to cache FileSystems across
> tasks, as Spark does not close FileSystems. The issue that comes up is that
> user code which uses FileSystem.get() but then closes the FileSystem can
> screw up Spark processes which were using that FileSystem. The workaround
> for users would be to use FileSystem.newInstance() if they want full
> control over the lifecycle of their FileSystems.
>

The current solution seems reasonable, as long as Spark processes:
1. don't change the current working directory (doing so isn't thread-safe
and will affect all other users of that FS object)
2. don't close the FileSystem object

Another solution would be to use newInstance and build your own FS cache,
essentially.  I don't think it would be that much code.  This might be
nicer because you could implement things like closing FileSystem objects
that haven't been used in a while.

cheers,
Colin



> On Thu, May 22, 2014 at 12:06 PM, Colin McCabe  >wrote:
>
> > The FileSystem cache is something that has caused a lot of pain over the
> > years.  Unfortunately we (in Hadoop core) can't change the way it works
> now
> > because there are too many users depending on the current behavior.
> >
> > Basically, the idea is that when you request a FileSystem with certain
> > options with FileSystem#get, you might get a reference to an FS object
> that
> > already exists, from our FS cache cache singleton.  Unfortunately, this
> > also means that someone else can change the working directory on you or
> > close the FS underneath you.  The FS is basically shared mutable state,
> and
> > you don't know whom you're sharing with.
> >
> > It might be better for Spark to call FileSystem#newInstance, which
> bypasses
> > the FileSystem cache and always creates a new object.  If Spark can hang
> on
> > to the FS for a while, it can get the benefits of caching without the
> > downsides.  In HDFS, multiple FS instances can also share things like the
> > socket cache between them.
> >
> > best,
> > Colin
> >
> >
> > On Thu, May 22, 2014 at 10:06 AM, Marcelo Vanzin  > >wrote:
> >
> > > Hi Kevin,
> > >
> > > On Thu, May 22, 2014 at 9:49 AM, Kevin Markey  >
> > > wrote:
> > > > The FS closed exception only effects the cleanup of the staging
> > > directory,
> > > > not the final success or failure.  I've not yet tested the effect of
> > > > changing my application's initialization, use, or closing of
> > FileSystem.
> > >
> > > Without going and reading more of the Spark code, if your app is
> > > explicitly close()'ing the FileSystem instance, it may be causing the
> > > exception. If Spark is caching the FileSystem instance, your app is
> > > probably closing that same instance (which it got from the HDFS
> > > library's internal cache).
> > >
> > > It would be nice if you could test that theory; it might be worth
> > > knowing that's the case so that we can tell people not to do that.
> > >
> > > --
> > > Marcelo
> > >
> >
>


Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-22 Thread Henry Saputra
Looks like SPARK-1900 is a blocker for YARN and might as well add
SPARK-1870 while at it.

TD or Patrick, could you kindly send [CANCEL] prefixed in the subject
email out for the RC10 Vote to help people follow the active VOTE
threads? The VOTE emails are getting a bit hard to follow.


- Henry


On Thu, May 22, 2014 at 2:05 PM, Tathagata Das
 wrote:
> Hey all,
>
> On further testing, I came across a bug that breaks execution of
> pyspark scripts on YARN.
> https://issues.apache.org/jira/browse/SPARK-1900
> This is a blocker and worth cutting a new RC.
>
> We also found a fix for a known issue that prevents additional jar
> files to be specified through spark-submit on YARN.
> https://issues.apache.org/jira/browse/SPARK-1870
> The has been fixed and will be in the next RC.
>
> We are canceling this vote for now. We will post RC11 shortly. Thanks
> everyone for testing!
>
> TD
>


Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-22 Thread Tathagata Das
Right! Doing that.

TD

On Thu, May 22, 2014 at 3:07 PM, Henry Saputra  wrote:
> Looks like SPARK-1900 is a blocker for YARN and might as well add
> SPARK-1870 while at it.
>
> TD or Patrick, could you kindly send [CANCEL] prefixed in the subject
> email out for the RC10 Vote to help people follow the active VOTE
> threads? The VOTE emails are getting a bit hard to follow.
>
>
> - Henry
>
>
> On Thu, May 22, 2014 at 2:05 PM, Tathagata Das
>  wrote:
>> Hey all,
>>
>> On further testing, I came across a bug that breaks execution of
>> pyspark scripts on YARN.
>> https://issues.apache.org/jira/browse/SPARK-1900
>> This is a blocker and worth cutting a new RC.
>>
>> We also found a fix for a known issue that prevents additional jar
>> files to be specified through spark-submit on YARN.
>> https://issues.apache.org/jira/browse/SPARK-1870
>> The has been fixed and will be in the next RC.
>>
>> We are canceling this vote for now. We will post RC11 shortly. Thanks
>> everyone for testing!
>>
>> TD
>>


[CANCEL][VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-22 Thread Tathagata Das
Hey all,

We are canceling the vote on RC10 because of a blocker bug in pyspark on Yarn.
https://issues.apache.org/jira/browse/SPARK-1900

Thanks everyone for testing! We will post RC11 soon.

TD


Re: Should SPARK_HOME be needed with Mesos?

2014-05-22 Thread Gerard Maas
ack


On Thu, May 22, 2014 at 9:26 PM, Andrew Ash  wrote:

> Fixing the immediate issue of requiring SPARK_HOME to be set when it's not
> actually used is a separate ticket in my mind from a larger cleanup of what
> SPARK_HOME means across the cluster.
>
> I think you should file a new ticket for just this particular issue.
>
>
> On Thu, May 22, 2014 at 11:03 AM, Gerard Maas wrote:
>
>> Sure.  Should I create a Jira as well?
>>
>> I saw there's already a broader ticket regarding the ambiguous use of
>> SPARK_HOME [1]  (cc: Patrick as owner of that ticket)
>>
>> I don't know if it would be more relevant to remove the use of SPARK_HOME
>> when using mesos and have the assembly as the only way forward, or whether
>> that's a too radical change that might break some existing systems.
>>
>> From a real-world ops perspective, the assembly should be the way to go.
>> I don't see installing and configuring Spark distros on a mesos master as a
>> way to have the mesos executor in place.
>>
>> -kr, Gerard.
>>
>> [1] https://issues.apache.org/jira/browse/SPARK-1110
>>
>>
>> On Thu, May 22, 2014 at 6:19 AM, Andrew Ash  wrote:
>>
>>> Hi Gerard,
>>>
>>> I agree that your second option seems preferred.  You shouldn't have to
>>> specify a SPARK_HOME if the executor is going to use the
>>> spark.executor.uri
>>> instead.  Can you send in a pull request that includes your proposed
>>> changes?
>>>
>>> Andrew
>>>
>>>
>>> On Wed, May 21, 2014 at 10:19 AM, Gerard Maas 
>>> wrote:
>>>
>>> > Spark dev's,
>>> >
>>> > I was looking into a question asked on the user list where a
>>> > ClassNotFoundException was thrown when running a job on Mesos. Curious
>>> > issue with serialization on Mesos: more details here [1]:
>>> >
>>> > When trying to run that simple example on my Mesos installation, I
>>> faced
>>> > another issue: I got an error that "SPARK_HOME" was not set. I found
>>> that
>>> > curious b/c a local spark installation should not be required to run a
>>> job
>>> > on Mesos. All that's needed is the executor package, being the
>>> > assembly.tar.gz on a reachable location (HDFS/S3/HTTP).
>>> >
>>> > I went looking into the code and indeed there's a check on SPARK_HOME
>>> [2]
>>> > regardless of the presence of the assembly but it's actually only used
>>> if
>>> > the assembly is not provided (which is a kind-of best-effort recovery
>>> > strategy).
>>> >
>>> > Current flow:
>>> >
>>> > if (!SPARK_HOME) fail("No SPARK_HOME")
>>> > else if (assembly) { use assembly) }
>>> > else { try use SPARK_HOME to build spark_executor }
>>> >
>>> > Should be:
>>> > sparkExecutor =  if (assembly) {assembly}
>>> >  else if (SPARK_HOME) {try use SPARK_HOME to build
>>> > spark_executor}
>>> >  else { fail("No executor found. Please provide
>>> > spark.executor.uri (preferred) or spark.home")
>>> >
>>> > What do you think?
>>> >
>>> > -kr, Gerard.
>>> >
>>> >
>>> > [1]
>>> >
>>> >
>>> http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-with-Spark-Mesos-spark-shell-works-fine-td6165.html
>>> >
>>> > [2]
>>> >
>>> >
>>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala#L89
>>> >
>>>
>>
>>
>


java.lang.OutOfMemoryError while running Shark on Mesos

2014-05-22 Thread prabeesh k
Hi,

I am trying to apply  inner join in shark using 64MB and 27MB files. I am
able to run the following queris on Mesos


   - "SELECT * FROM geoLocation1 "



   - """ SELECT * FROM geoLocation1  WHERE  country =  '"US"' """


But while trying inner join as

 "SELECT * FROM geoLocation1 g1 INNER JOIN geoBlocks1 g2 ON (g1.locId =
g2.locId)"



I am getting following error as follows.


Exception in thread "main" org.apache.spark.SparkException: Job aborted:
Task 1.0:7 failed 4 times (most recent failure: Exception failure:
java.lang.OutOfMemoryError: Java heap space)
 at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
 at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
 at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
 at
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
 at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
 at
org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
 at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
 at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
 at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


Please help me to resolve this.

Thanks in adv

regards,
prabeesh


Re: java.lang.OutOfMemoryError while running Shark on Mesos

2014-05-22 Thread Akhil Das
Hi Prabeesh,

Do a export _JAVA_OPTIONS="-Xmx10g" before starting the shark. Also you can
do a ps aux | grep shark and see how much memory it is being allocated,
mostly it should be 512mb, in that case increase the limit.

Thanks
Best Regards


On Fri, May 23, 2014 at 10:22 AM, prabeesh k  wrote:

>
> Hi,
>
> I am trying to apply  inner join in shark using 64MB and 27MB files. I am
> able to run the following queris on Mesos
>
>
>- "SELECT * FROM geoLocation1 "
>
>
>
>- """ SELECT * FROM geoLocation1  WHERE  country =  '"US"' """
>
>
> But while trying inner join as
>
>  "SELECT * FROM geoLocation1 g1 INNER JOIN geoBlocks1 g2 ON (g1.locId =
> g2.locId)"
>
>
>
> I am getting following error as follows.
>
>
> Exception in thread "main" org.apache.spark.SparkException: Job aborted:
> Task 1.0:7 failed 4 times (most recent failure: Exception failure:
> java.lang.OutOfMemoryError: Java heap space)
>  at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
>  at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>  at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
>  at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
>  at scala.Option.foreach(Option.scala:236)
> at
> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
>  at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>  at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>  at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>  at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>  at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
>
> Please help me to resolve this.
>
> Thanks in adv
>
> regards,
> prabeesh
>