Re: Update Public Documentation - SparkSession instead of SparkContext

2017-02-15 Thread Reynold Xin
There is an existing pull request to update it:
https://github.com/apache/spark/pull/16856

But it is a little bit tricky.



On Wed, Feb 15, 2017 at 7:44 AM, Chetan Khatri 
wrote:

> Hello Spark Dev Team,
>
> I was working with my team having most of the confusion that why your
> public documentation is not updated with SparkSession if SparkSession is
> the ongoing extension and best practice instead of creating sparkcontext.
>
> Thanks.
>


Re: Update Public Documentation - SparkSession instead of SparkContext

2017-02-15 Thread Chetan Khatri
Sorry, The context i am referring is for below URL
http://spark.apache.org/docs/2.0.1/programming-guide.html



On Wed, Feb 15, 2017 at 1:12 PM, Sean Owen  wrote:

> When asking a question like this, please actually link to what you are
> referring to. Some is intended.
>
>
> On Wed, Feb 15, 2017, 06:44 Chetan Khatri 
> wrote:
>
>> Hello Spark Dev Team,
>>
>> I was working with my team having most of the confusion that why your
>> public documentation is not updated with SparkSession if SparkSession is
>> the ongoing extension and best practice instead of creating sparkcontext.
>>
>> Thanks.
>>
>


Structured Streaming Spark Summit Demo - Databricks people

2017-02-15 Thread Sam Elamin
Hey folks

This one is mainly aimed at the databricks folks, I have been trying to
replicate the cloudtrail demo 
Micheal did at Spark Summit. The code for it can be found here


My question is how did you get the results to be displayed and updated
continusly in real time

I am also using databricks to duplicate it but I noticed the code link
mentions

 "If you count the number of rows in the table, you should find the value
increasing over time. Run the following every few minutes."
This leads me to believe that the version of Databricks that Micheal was
using for the demo is still not released, or at-least the functionality to
display those changes in real time aren't

Is this the case? or am I completely wrong?

Can I display the results of a structured streaming query in realtime using
the databricks "display" function?


Regards
Sam


File JIRAs for all flaky test failures

2017-02-15 Thread Kay Ousterhout
Hi all,

I've noticed the Spark tests getting increasingly flaky -- it seems more
common than not now that the tests need to be re-run at least once on PRs
before they pass.  This is both annoying and problematic because it makes
it harder to tell when a PR is introducing new flakiness.

To try to clean this up, I'd propose filing a JIRA *every time* Jenkins
fails on a PR (for a reason unrelated to the PR).  Just provide a quick
description of the failure -- e.g., "Flaky test: DagSchedulerSuite" or
"Tests failed because 250m timeout expired", a link to the failed build,
and include the "Tests" component.  If there's already a JIRA for the
issue, just comment with a link to the latest failure.  I know folks don't
always have time to track down why a test failed, but this it at least
helpful to someone else who, later on, is trying to diagnose when the issue
started to find the problematic code / test.

If this seems like too high overhead, feel free to suggest alternative ways
to make the tests less flaky!

-Kay


Re: Structured Streaming Spark Summit Demo - Databricks people

2017-02-15 Thread Nicholas Chammas
I don't think this is the right place for questions about Databricks. I'm
pretty sure they have their own website with a forum for questions about
their product.

Maybe this? https://forums.databricks.com/

On Wed, Feb 15, 2017 at 2:34 PM Sam Elamin  wrote:

> Hey folks
>
> This one is mainly aimed at the databricks folks, I have been trying to
> replicate the cloudtrail demo
>  Micheal did at Spark
> Summit. The code for it can be found here
> 
>
> My question is how did you get the results to be displayed and updated
> continusly in real time
>
> I am also using databricks to duplicate it but I noticed the code link
> mentions
>
>  "If you count the number of rows in the table, you should find the value
> increasing over time. Run the following every few minutes."
> This leads me to believe that the version of Databricks that Micheal was
> using for the demo is still not released, or at-least the functionality to
> display those changes in real time aren't
>
> Is this the case? or am I completely wrong?
>
> Can I display the results of a structured streaming query in realtime
> using the databricks "display" function?
>
>
> Regards
> Sam
>


Re: Structured Streaming Spark Summit Demo - Databricks people

2017-02-15 Thread Sam Elamin
Fair enough your absolutely right

Thanks for pointing me in the right direction
On Wed, 15 Feb 2017 at 20:13, Nicholas Chammas 
wrote:

> I don't think this is the right place for questions about Databricks. I'm
> pretty sure they have their own website with a forum for questions about
> their product.
>
> Maybe this? https://forums.databricks.com/
>
> On Wed, Feb 15, 2017 at 2:34 PM Sam Elamin 
> wrote:
>
> Hey folks
>
> This one is mainly aimed at the databricks folks, I have been trying to
> replicate the cloudtrail demo
>  Micheal did at Spark
> Summit. The code for it can be found here
> 
>
> My question is how did you get the results to be displayed and updated
> continusly in real time
>
> I am also using databricks to duplicate it but I noticed the code link
> mentions
>
>  "If you count the number of rows in the table, you should find the value
> increasing over time. Run the following every few minutes."
> This leads me to believe that the version of Databricks that Micheal was
> using for the demo is still not released, or at-least the functionality to
> display those changes in real time aren't
>
> Is this the case? or am I completely wrong?
>
> Can I display the results of a structured streaming query in realtime
> using the databricks "display" function?
>
>
> Regards
> Sam
>
>


Re: Structured Streaming Spark Summit Demo - Databricks people

2017-02-15 Thread Chris Fregly
Just be warned:  the last time I asked a question about a non-working 
Databricks Keynote Demo from Spark Summit on the forum mentioned here, they 
deleted my question!  And i’m a major contributor to those forums!!

Often times, those on-stage demos don’t actually work until many months after 
they’re presented on stage - especially the proprietary demos involving 
dbutils() and display().

Chris Fregly
Research Scientist @ PipelineIO
Founder @ Advanced Spark and TensorFlow Meetup
San Francisco - Chicago - Washington DC - London

On Feb 15, 2017, 12:14 PM -0800, Nicholas Chammas , 
wrote:
> I don't think this is the right place for questions about Databricks. I'm 
> pretty sure they have their own website with a forum for questions about 
> their product.
>
> Maybe this? https://forums.databricks.com/
>
> > On Wed, Feb 15, 2017 at 2:34 PM Sam Elamin  wrote:
> > > Hey folks
> > >
> > > This one is mainly aimed at the databricks folks, I have been trying to 
> > > replicate the cloudtrail demo Micheal did at Spark Summit. The code for 
> > > it can be found here
> > >
> > > My question is how did you get the results to be displayed and updated 
> > > continusly in real time
> > >
> > > I am also using databricks to duplicate it but I noticed the code link 
> > > mentions
> > >
> > >  "If you count the number of rows in the table, you should find the value 
> > > increasing over time. Run the following every few minutes."
> > > This leads me to believe that the version of Databricks that Micheal was 
> > > using for the demo is still not released, or at-least the functionality 
> > > to display those changes in real time aren't
> > >
> > > Is this the case? or am I completely wrong?
> > >
> > > Can I display the results of a structured streaming query in realtime 
> > > using the databricks "display" function?
> > >
> > >
> > > Regards
> > > Sam


Re: File JIRAs for all flaky test failures

2017-02-15 Thread Saikat Kanjilal
I was working on something to address this a while ago 
https://issues.apache.org/jira/browse/SPARK-9487 but the difficulty in testing 
locally made things a lot more complicated to fix for each of the unit tests, 
should we resurface this JIRA again, I would whole heartedly agree with the 
flakiness assessment of the unit tests.

[SPARK-9487] Use the same num. worker threads in Scala 
...
issues.apache.org
In Python we use `local[4]` for unit tests, while in Scala/Java we use 
`local[2]` and `local` for some unit tests in SQL, MLLib, and other components. 
If the ...





From: Kay Ousterhout 
Sent: Wednesday, February 15, 2017 12:10 PM
To: dev@spark.apache.org
Subject: File JIRAs for all flaky test failures

Hi all,

I've noticed the Spark tests getting increasingly flaky -- it seems more common 
than not now that the tests need to be re-run at least once on PRs before they 
pass.  This is both annoying and problematic because it makes it harder to tell 
when a PR is introducing new flakiness.

To try to clean this up, I'd propose filing a JIRA *every time* Jenkins fails 
on a PR (for a reason unrelated to the PR).  Just provide a quick description 
of the failure -- e.g., "Flaky test: DagSchedulerSuite" or "Tests failed 
because 250m timeout expired", a link to the failed build, and include the 
"Tests" component.  If there's already a JIRA for the issue, just comment with 
a link to the latest failure.  I know folks don't always have time to track 
down why a test failed, but this it at least helpful to someone else who, later 
on, is trying to diagnose when the issue started to find the problematic code / 
test.

If this seems like too high overhead, feel free to suggest alternative ways to 
make the tests less flaky!

-Kay


Re: File JIRAs for all flaky test failures

2017-02-15 Thread Armin Braun
I think one thing that is contributing to this a lot too is the general
issue of the tests taking up a lot of file descriptors (10k+ if I run them
on a standard Debian machine).
There are a few suits that contribute to this in particular like
`org.apache.spark.ExecutorAllocationManagerSuite` which, like a few others,
appears to consume a lot of fds.

Wouldn't it make sense to open JIRAs about those and actively try to reduce
the resource consumption of these tests?
Seems to me these can cause a lot of unpredictable behavior (making the
reason for flaky tests hard to identify especially when there's timeouts
etc. involved) + they make it prohibitively expensive for many to test
locally imo.

On Wed, Feb 15, 2017 at 9:24 PM, Saikat Kanjilal 
wrote:

> I was working on something to address this a while ago
> https://issues.apache.org/jira/browse/SPARK-9487 but the difficulty in
> testing locally made things a lot more complicated to fix for each of the
> unit tests, should we resurface this JIRA again, I would whole heartedly
> agree with the flakiness assessment of the unit tests.
> [SPARK-9487] Use the same num. worker threads in Scala ...
> 
> issues.apache.org
> In Python we use `local[4]` for unit tests, while in Scala/Java we use
> `local[2]` and `local` for some unit tests in SQL, MLLib, and other
> components. If the ...
>
>
>
> --
> *From:* Kay Ousterhout 
> *Sent:* Wednesday, February 15, 2017 12:10 PM
> *To:* dev@spark.apache.org
> *Subject:* File JIRAs for all flaky test failures
>
> Hi all,
>
> I've noticed the Spark tests getting increasingly flaky -- it seems more
> common than not now that the tests need to be re-run at least once on PRs
> before they pass.  This is both annoying and problematic because it makes
> it harder to tell when a PR is introducing new flakiness.
>
> To try to clean this up, I'd propose filing a JIRA *every time* Jenkins
> fails on a PR (for a reason unrelated to the PR).  Just provide a quick
> description of the failure -- e.g., "Flaky test: DagSchedulerSuite" or
> "Tests failed because 250m timeout expired", a link to the failed build,
> and include the "Tests" component.  If there's already a JIRA for the
> issue, just comment with a link to the latest failure.  I know folks don't
> always have time to track down why a test failed, but this it at least
> helpful to someone else who, later on, is trying to diagnose when the issue
> started to find the problematic code / test.
>
> If this seems like too high overhead, feel free to suggest alternative
> ways to make the tests less flaky!
>
> -Kay
>


Re: File JIRAs for all flaky test failures

2017-02-15 Thread Saikat Kanjilal
I would recommend we just open JIRA's for unit tests based on module 
(core/ml/sql etc) and we fix this one module at a time, this at least keeps the 
number of unit tests needing fixing down to a manageable number.



From: Armin Braun 
Sent: Wednesday, February 15, 2017 12:48 PM
To: Saikat Kanjilal
Cc: Kay Ousterhout; dev@spark.apache.org
Subject: Re: File JIRAs for all flaky test failures

I think one thing that is contributing to this a lot too is the general issue 
of the tests taking up a lot of file descriptors (10k+ if I run them on a 
standard Debian machine).
There are a few suits that contribute to this in particular like 
`org.apache.spark.ExecutorAllocationManagerSuite` which, like a few others, 
appears to consume a lot of fds.

Wouldn't it make sense to open JIRAs about those and actively try to reduce the 
resource consumption of these tests?
Seems to me these can cause a lot of unpredictable behavior (making the reason 
for flaky tests hard to identify especially when there's timeouts etc. 
involved) + they make it prohibitively expensive for many to test locally imo.

On Wed, Feb 15, 2017 at 9:24 PM, Saikat Kanjilal 
mailto:sxk1...@hotmail.com>> wrote:

I was working on something to address this a while ago 
https://issues.apache.org/jira/browse/SPARK-9487 but the difficulty in testing 
locally made things a lot more complicated to fix for each of the unit tests, 
should we resurface this JIRA again, I would whole heartedly agree with the 
flakiness assessment of the unit tests.

[SPARK-9487] Use the same num. worker threads in Scala 
...
issues.apache.org
In Python we use `local[4]` for unit tests, while in Scala/Java we use 
`local[2]` and `local` for some unit tests in SQL, MLLib, and other components. 
If the ...





From: Kay Ousterhout mailto:kayousterh...@gmail.com>>
Sent: Wednesday, February 15, 2017 12:10 PM
To: dev@spark.apache.org
Subject: File JIRAs for all flaky test failures

Hi all,

I've noticed the Spark tests getting increasingly flaky -- it seems more common 
than not now that the tests need to be re-run at least once on PRs before they 
pass.  This is both annoying and problematic because it makes it harder to tell 
when a PR is introducing new flakiness.

To try to clean this up, I'd propose filing a JIRA *every time* Jenkins fails 
on a PR (for a reason unrelated to the PR).  Just provide a quick description 
of the failure -- e.g., "Flaky test: DagSchedulerSuite" or "Tests failed 
because 250m timeout expired", a link to the failed build, and include the 
"Tests" component.  If there's already a JIRA for the issue, just comment with 
a link to the latest failure.  I know folks don't always have time to track 
down why a test failed, but this it at least helpful to someone else who, later 
on, is trying to diagnose when the issue started to find the problematic code / 
test.

If this seems like too high overhead, feel free to suggest alternative ways to 
make the tests less flaky!

-Kay



Re: File JIRAs for all flaky test failures

2017-02-15 Thread shane knapp
it's not an open-file limit -- i have the jenkins workers set up w/a soft
file limit of 100k, and a hard limit of 200k.

On Wed, Feb 15, 2017 at 12:48 PM, Armin Braun  wrote:

> I think one thing that is contributing to this a lot too is the general
> issue of the tests taking up a lot of file descriptors (10k+ if I run them
> on a standard Debian machine).
> There are a few suits that contribute to this in particular like
> `org.apache.spark.ExecutorAllocationManagerSuite` which, like a few
> others, appears to consume a lot of fds.
>
> Wouldn't it make sense to open JIRAs about those and actively try to
> reduce the resource consumption of these tests?
> Seems to me these can cause a lot of unpredictable behavior (making the
> reason for flaky tests hard to identify especially when there's timeouts
> etc. involved) + they make it prohibitively expensive for many to test
> locally imo.
>
> On Wed, Feb 15, 2017 at 9:24 PM, Saikat Kanjilal 
> wrote:
>
>> I was working on something to address this a while ago
>> https://issues.apache.org/jira/browse/SPARK-9487 but the difficulty in
>> testing locally made things a lot more complicated to fix for each of the
>> unit tests, should we resurface this JIRA again, I would whole heartedly
>> agree with the flakiness assessment of the unit tests.
>> [SPARK-9487] Use the same num. worker threads in Scala ...
>> 
>> issues.apache.org
>> In Python we use `local[4]` for unit tests, while in Scala/Java we use
>> `local[2]` and `local` for some unit tests in SQL, MLLib, and other
>> components. If the ...
>>
>>
>>
>> --
>> *From:* Kay Ousterhout 
>> *Sent:* Wednesday, February 15, 2017 12:10 PM
>> *To:* dev@spark.apache.org
>> *Subject:* File JIRAs for all flaky test failures
>>
>> Hi all,
>>
>> I've noticed the Spark tests getting increasingly flaky -- it seems more
>> common than not now that the tests need to be re-run at least once on PRs
>> before they pass.  This is both annoying and problematic because it makes
>> it harder to tell when a PR is introducing new flakiness.
>>
>> To try to clean this up, I'd propose filing a JIRA *every time* Jenkins
>> fails on a PR (for a reason unrelated to the PR).  Just provide a quick
>> description of the failure -- e.g., "Flaky test: DagSchedulerSuite" or
>> "Tests failed because 250m timeout expired", a link to the failed build,
>> and include the "Tests" component.  If there's already a JIRA for the
>> issue, just comment with a link to the latest failure.  I know folks don't
>> always have time to track down why a test failed, but this it at least
>> helpful to someone else who, later on, is trying to diagnose when the issue
>> started to find the problematic code / test.
>>
>> If this seems like too high overhead, feel free to suggest alternative
>> ways to make the tests less flaky!
>>
>> -Kay
>>
>
>


Does anyone here run the TheApacheSpark Youtube channel?

2017-02-15 Thread Sean Owen
I just saw https://www.youtube.com/user/TheApacheSpark and wondered who
'owns' it? if it's a quasi-official channel, can we list it on
http://spark.apache.org/community.html but then, how does one add videos?

If it's the Spark Summit video account, as it seems to be at the moment, it
shouldn't be called "Apache Spark" right?


Need Help: getting java.lang.OutOfMemory Error : GC overhead limit exceeded (TransportChannelHandler)

2017-02-15 Thread naresh gundla
Hi ,

I am running a spark application and getting out of memory errors in yarn
nodemanager logs and container get killed. Please find below for the errors
details.
Has anyone faced with this issue?

*Enabled spark dynamic allocation and yarn shuffle*

 2017-02-15 14:50:48,047 WARN io.netty.util.concurrent.DefaultPromise: An
exception was thrown by
org.apache.spark.network.server.TransportRequestHandler$2.operationComplete()
java.lang.OutOfMemoryError: GC overhead limit exceeded
2017-02-15 15:21:09,506 ERROR
org.apache.spark.network.server.TransportRequestHandler: Error opening
block StreamChunkId{streamId=1374579274227, chunkIndex=241} for request
from /10.154.16.83:50042
java.lang.IllegalStateException: Received out-of-order chunk index 241
(expected 114)
at
org.apache.spark.network.server.OneForOneStreamManager.getChunk(OneForOneStreamManager.java:81)
at
org.apache.spark.network.server.TransportRequestHandler.processFetchRequest(TransportRequestHandler.java:121)
at
org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:100)
at
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
at
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
at
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at
io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at
org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
at
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at java.lang.Thread.run(Thread.java:745)

2017-02-15 14:50:14,692 WARN
org.apache.spark.network.server.TransportChannelHandler: Exception in
connection from /10.154.16.74:58547
java.lang.OutOfMemoryError: GC overhead limit exceeded

Thanks
Naresh


Re: Need Help: getting java.lang.OutOfMemory Error : GC overhead limit exceeded (TransportChannelHandler)

2017-02-15 Thread Ryan Blue
Naresh,

We've configured our Spark JVMs to shut down if there is an
OutOfMemoryError. Otherwise, the error will bring down a random thread an
cause trouble like the IllegalStateException you hit. It is best to let
Spark recover by replacing the executor or failing the job.

rb

On Wed, Feb 15, 2017 at 1:58 PM, naresh gundla 
wrote:

> Hi ,
>
> I am running a spark application and getting out of memory errors in yarn
> nodemanager logs and container get killed. Please find below for the errors
> details.
> Has anyone faced with this issue?
>
> *Enabled spark dynamic allocation and yarn shuffle*
>
>  2017-02-15 14:50:48,047 WARN io.netty.util.concurrent.DefaultPromise: An
> exception was thrown by org.apache.spark.network.server.
> TransportRequestHandler$2.operationComplete()
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> 2017-02-15 15:21:09,506 ERROR 
> org.apache.spark.network.server.TransportRequestHandler:
> Error opening block StreamChunkId{streamId=1374579274227, chunkIndex=241}
> for request from /10.154.16.83:50042
> java.lang.IllegalStateException: Received out-of-order chunk index 241
> (expected 114)
> at org.apache.spark.network.server.OneForOneStreamManager.
> getChunk(OneForOneStreamManager.java:81)
> at org.apache.spark.network.server.TransportRequestHandler.
> processFetchRequest(TransportRequestHandler.java:121)
> at org.apache.spark.network.server.TransportRequestHandler.handle(
> TransportRequestHandler.java:100)
> at org.apache.spark.network.server.TransportChannelHandler.
> channelRead0(TransportChannelHandler.java:104)
> at org.apache.spark.network.server.TransportChannelHandler.
> channelRead0(TransportChannelHandler.java:51)
> at io.netty.channel.SimpleChannelInboundHandler.channelRead(
> SimpleChannelInboundHandler.java:105)
> at io.netty.channel.AbstractChannelHandlerContext.
> invokeChannelRead(AbstractChannelHandlerContext.java:333)
> at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(
> AbstractChannelHandlerContext.java:319)
> at io.netty.handler.timeout.IdleStateHandler.channelRead(
> IdleStateHandler.java:254)
> at io.netty.channel.AbstractChannelHandlerContext.
> invokeChannelRead(AbstractChannelHandlerContext.java:333)
> at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(
> AbstractChannelHandlerContext.java:319)
> at io.netty.handler.codec.MessageToMessageDecoder.channelRead(
> MessageToMessageDecoder.java:103)
> at io.netty.channel.AbstractChannelHandlerContext.
> invokeChannelRead(AbstractChannelHandlerContext.java:333)
> at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(
> AbstractChannelHandlerContext.java:319)
> at org.apache.spark.network.util.TransportFrameDecoder.
> channelRead(TransportFrameDecoder.java:86)
> at io.netty.channel.AbstractChannelHandlerContext.
> invokeChannelRead(AbstractChannelHandlerContext.java:333)
> at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(
> AbstractChannelHandlerContext.java:319)
> at io.netty.channel.DefaultChannelPipeline.fireChannelRead(
> DefaultChannelPipeline.java:787)
> at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(
> AbstractNioByteChannel.java:130)
> at io.netty.channel.nio.NioEventLoop.processSelectedKey(
> NioEventLoop.java:511)
> at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(
> NioEventLoop.java:468)
> at io.netty.channel.nio.NioEventLoop.processSelectedKeys(
> NioEventLoop.java:382)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> at io.netty.util.concurrent.SingleThreadEventExecutor$2.
> run(SingleThreadEventExecutor.java:116)
> at java.lang.Thread.run(Thread.java:745)
>
> 2017-02-15 14:50:14,692 WARN 
> org.apache.spark.network.server.TransportChannelHandler:
> Exception in connection from /10.154.16.74:58547
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> Thanks
> Naresh
>
>


-- 
Ryan Blue
Software Engineer
Netflix


Re: File JIRAs for all flaky test failures

2017-02-15 Thread Josh Rosen
A useful tool for investigating test flakiness is my Jenkins Test Explorer
service, running at https://spark-tests.appspot.com/

This has some useful timeline views for debugging flaky builds. For
instance, at
https://spark-tests.appspot.com/jobs/spark-master-test-maven-hadoop-2.6 (may
be slow to load) you can see this chart: https://i.imgur.com/j8LV3pX.png.
Here, each column represents a test run and each row represents a test
which failed at least once over the displayed time period.

In that linked example screenshot you'll notice that a few columns have
grey squares indicating that tests were skipped but lack any red squares to
indicate test failures. This usually indicates that the build failed due to
a problem other than an individual test failure. For example, I clicked
into one of those builds and found that one test suite failed in test setup
because the previous suite had not properly cleaned up its SparkContext
(I'll file a JIRA for this).

You can click through the interface to drill down to reports on individual
builds, tests, suites, etc. As an example of an individual test's detail
page,
https://spark-tests.appspot.com/test-details?suite_name=org.apache.spark.rdd.LocalCheckpointSuite&test_name=missing+checkpoint+block+fails+with+informative+message
shows
the patterns of flakiness in a streaming checkpoint test.

Finally, there's an experimental "interesting new test failures" report
which tries to surface tests which have started failing very recently:
https://spark-tests.appspot.com/failed-tests/new. Specifically, entries in
this feed are test failures which a) occurred in the last week, b) were not
part of a build which had 20 or more failed tests, and c) were not observed
to fail in during the previous week (i.e. no failures from [2 weeks ago, 1
week ago)), and d) which represent the first time that the test failed this
week (i.e. a test case will appear at most once in the results list). I've
also exposed this as an RSS feed at
https://spark-tests.appspot.com/rss/failed-tests/new.


On Wed, Feb 15, 2017 at 12:51 PM Saikat Kanjilal 
wrote:

I would recommend we just open JIRA's for unit tests based on module
(core/ml/sql etc) and we fix this one module at a time, this at least keeps
the number of unit tests needing fixing down to a manageable number.


--
*From:* Armin Braun 
*Sent:* Wednesday, February 15, 2017 12:48 PM
*To:* Saikat Kanjilal
*Cc:* Kay Ousterhout; dev@spark.apache.org
*Subject:* Re: File JIRAs for all flaky test failures

I think one thing that is contributing to this a lot too is the general
issue of the tests taking up a lot of file descriptors (10k+ if I run them
on a standard Debian machine).
There are a few suits that contribute to this in particular like
`org.apache.spark.ExecutorAllocationManagerSuite` which, like a few others,
appears to consume a lot of fds.

Wouldn't it make sense to open JIRAs about those and actively try to reduce
the resource consumption of these tests?
Seems to me these can cause a lot of unpredictable behavior (making the
reason for flaky tests hard to identify especially when there's timeouts
etc. involved) + they make it prohibitively expensive for many to test
locally imo.

On Wed, Feb 15, 2017 at 9:24 PM, Saikat Kanjilal 
wrote:

I was working on something to address this a while ago
https://issues.apache.org/jira/browse/SPARK-9487 but the difficulty in
testing locally made things a lot more complicated to fix for each of the
unit tests, should we resurface this JIRA again, I would whole heartedly
agree with the flakiness assessment of the unit tests.
[SPARK-9487] Use the same num. worker threads in Scala ...

issues.apache.org
In Python we use `local[4]` for unit tests, while in Scala/Java we use
`local[2]` and `local` for some unit tests in SQL, MLLib, and other
components. If the ...



--
*From:* Kay Ousterhout 
*Sent:* Wednesday, February 15, 2017 12:10 PM
*To:* dev@spark.apache.org
*Subject:* File JIRAs for all flaky test failures

Hi all,

I've noticed the Spark tests getting increasingly flaky -- it seems more
common than not now that the tests need to be re-run at least once on PRs
before they pass.  This is both annoying and problematic because it makes
it harder to tell when a PR is introducing new flakiness.

To try to clean this up, I'd propose filing a JIRA *every time* Jenkins
fails on a PR (for a reason unrelated to the PR).  Just provide a quick
description of the failure -- e.g., "Flaky test: DagSchedulerSuite" or
"Tests failed because 250m timeout expired", a link to the failed build,
and include the "Tests" component.  If there's already a JIRA for the
issue, just comment with a link to the latest failure.  I know folks don't
always have time to track down why a test failed, but this it at least
helpful to someone else who, later on, is trying to diagnose when the issue
started to find the problematic code / test

Re: File JIRAs for all flaky test failures

2017-02-15 Thread Saikat Kanjilal
The issue was not with a lack of tooling, I used the url you are describing 
below to drill down to the exact test failure/stack trace, the problem was that 
my builds would work like a charm locally but fail with these errors on 
Jenkins, this was the whole challenge in fixing the unit tests, it was rare (if 
ever) where I would be able to replicate test failures locally.

Sent from my iPhone

On Feb 15, 2017, at 5:40 PM, Josh Rosen 
mailto:joshro...@databricks.com>> wrote:

A useful tool for investigating test flakiness is my Jenkins Test Explorer 
service, running at https://spark-tests.appspot.com/

This has some useful timeline views for debugging flaky builds. For instance, 
at https://spark-tests.appspot.com/jobs/spark-master-test-maven-hadoop-2.6 (may 
be slow to load) you can see this chart: https://i.imgur.com/j8LV3pX.png. Here, 
each column represents a test run and each row represents a test which failed 
at least once over the displayed time period.

In that linked example screenshot you'll notice that a few columns have grey 
squares indicating that tests were skipped but lack any red squares to indicate 
test failures. This usually indicates that the build failed due to a problem 
other than an individual test failure. For example, I clicked into one of those 
builds and found that one test suite failed in test setup because the previous 
suite had not properly cleaned up its SparkContext (I'll file a JIRA for this).

You can click through the interface to drill down to reports on individual 
builds, tests, suites, etc. As an example of an individual test's detail page, 
https://spark-tests.appspot.com/test-details?suite_name=org.apache.spark.rdd.LocalCheckpointSuite&test_name=missing+checkpoint+block+fails+with+informative+message
 shows the patterns of flakiness in a streaming checkpoint test.

Finally, there's an experimental "interesting new test failures" report which 
tries to surface tests which have started failing very recently: 
https://spark-tests.appspot.com/failed-tests/new. Specifically, entries in this 
feed are test failures which a) occurred in the last week, b) were not part of 
a build which had 20 or more failed tests, and c) were not observed to fail in 
during the previous week (i.e. no failures from [2 weeks ago, 1 week ago)), and 
d) which represent the first time that the test failed this week (i.e. a test 
case will appear at most once in the results list). I've also exposed this as 
an RSS feed at https://spark-tests.appspot.com/rss/failed-tests/new.


On Wed, Feb 15, 2017 at 12:51 PM Saikat Kanjilal 
mailto:sxk1...@hotmail.com>> wrote:

I would recommend we just open JIRA's for unit tests based on module 
(core/ml/sql etc) and we fix this one module at a time, this at least keeps the 
number of unit tests needing fixing down to a manageable number.



From: Armin Braun mailto:m...@obrown.io>>
Sent: Wednesday, February 15, 2017 12:48 PM
To: Saikat Kanjilal
Cc: Kay Ousterhout; dev@spark.apache.org
Subject: Re: File JIRAs for all flaky test failures

I think one thing that is contributing to this a lot too is the general issue 
of the tests taking up a lot of file descriptors (10k+ if I run them on a 
standard Debian machine).
There are a few suits that contribute to this in particular like 
`org.apache.spark.ExecutorAllocationManagerSuite` which, like a few others, 
appears to consume a lot of fds.

Wouldn't it make sense to open JIRAs about those and actively try to reduce the 
resource consumption of these tests?
Seems to me these can cause a lot of unpredictable behavior (making the reason 
for flaky tests hard to identify especially when there's timeouts etc. 
involved) + they make it prohibitively expensive for many to test locally imo.

On Wed, Feb 15, 2017 at 9:24 PM, Saikat Kanjilal 
mailto:sxk1...@hotmail.com>> wrote:

I was working on something to address this a while ago 
https://issues.apache.org/jira/browse/SPARK-9487 but the difficulty in testing 
locally made things a lot more complicated to fix for each of the unit tests, 
should we resurface this JIRA again, I would whole heartedly agree with the 
flakiness assessment of the unit tests.

[SPARK-9487] Use the same num. worker threads in Scala 
...
issues.apache.org
In Python we use `local[4]` for unit tests, while in Scala/Java we use 
`local[2]` and `local` for some unit tests in SQL, MLLib, and other components. 
If the ...





From: Kay Ousterhout mailto:kayousterh...@gmail.com>>
Sent: Wednesday, February 15, 2017 12:10 PM
To: dev@spark.apache.org
Subject: File JIRAs for all flaky test failures

Hi all,

I've noticed the Spark tests getting increasingly flaky -- it seems more common 
than not now that the tests need to be re-run at least once on PRs before they 
pass.  This is both annoying and problem

Re: welcoming Takuya Ueshin as a new Apache Spark committer

2017-02-15 Thread Joseph Bradley
Congrats and welcome!

On Mon, Feb 13, 2017 at 6:54 PM, Takuya UESHIN 
wrote:

> Thank you very much everyone!
> I really look forward to working with you!
>
>
> On Tue, Feb 14, 2017 at 9:47 AM, Yanbo Liang  wrote:
>
>> Congratulations!
>>
>> On Mon, Feb 13, 2017 at 3:29 PM, Kazuaki Ishizaki 
>> wrote:
>>
>>> Congrats!
>>>
>>> Kazuaki Ishizaki
>>>
>>>
>>>
>>> From:Reynold Xin 
>>> To:"dev@spark.apache.org" 
>>> Date:2017/02/14 04:18
>>> Subject:welcoming Takuya Ueshin as a new Apache Spark committer
>>> --
>>>
>>>
>>>
>>> Hi all,
>>>
>>> Takuya-san has recently been elected an Apache Spark committer. He's
>>> been active in the SQL area and writes very small, surgical patches that
>>> are high quality. Please join me in congratulating Takuya-san!
>>>
>>>
>>>
>>>
>>
>
>
> --
> Takuya UESHIN
> Tokyo, Japan
>
> http://twitter.com/ueshin
>



-- 

Joseph Bradley

Software Engineer - Machine Learning

Databricks, Inc.

[image: http://databricks.com] 


Spark Job Performance monitoring approaches

2017-02-15 Thread Chetan Khatri
Hello All,

What would be the best approches to monitor Spark Performance, is there any
tools for Spark Job Performance monitoring ?

Thanks.


Re: welcoming Takuya Ueshin as a new Apache Spark committer

2017-02-15 Thread Liang-Chi Hsieh

Congratulations!


Takuya UESHIN wrote
> Thank you very much everyone!
> I really look forward to working with you!
> 
> 
> On Tue, Feb 14, 2017 at 9:47 AM, Yanbo Liang <

> ybliang8@

> > wrote:
> 
>> Congratulations!
>>
>> On Mon, Feb 13, 2017 at 3:29 PM, Kazuaki Ishizaki <

> ISHIZAKI@.ibm

> >
>> wrote:
>>
>>> Congrats!
>>>
>>> Kazuaki Ishizaki
>>>
>>>
>>>
>>> From:Reynold Xin <

> rxin@

> >
>>> To:"

> dev@.apache

> " <

> dev@.apache

> >
>>> Date:2017/02/14 04:18
>>> Subject:welcoming Takuya Ueshin as a new Apache Spark committer
>>> --
>>>
>>>
>>>
>>> Hi all,
>>>
>>> Takuya-san has recently been elected an Apache Spark committer. He's
>>> been
>>> active in the SQL area and writes very small, surgical patches that are
>>> high quality. Please join me in congratulating Takuya-san!
>>>
>>>
>>>
>>>
>>
> 
> 
> -- 
> Takuya UESHIN
> Tokyo, Japan
> 
> http://twitter.com/ueshin





-
Liang-Chi Hsieh | @viirya 
Spark Technology Center 
http://www.spark.tc/ 
--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/welcoming-Takuya-Ueshin-as-a-new-Apache-Spark-committer-tp20940p20994.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Spark Job Performance monitoring approaches

2017-02-15 Thread Georg Heiler
I know of the following tools
https://sites.google.com/site/sparkbigdebug/home
https://engineering.linkedin.com/blog/2016/04/dr-elephant-open-source-self-serve-performance-tuning-hadoop-spark
 https://github.com/SparkMonitor/varOne https://github.com/groupon/sparklint


Chetan Khatri  schrieb am Do., 16. Feb. 2017
um 06:15 Uhr:

> Hello All,
>
> What would be the best approches to monitor Spark Performance, is there
> any tools for Spark Job Performance monitoring ?
>
> Thanks.
>


Re: Spark Job Performance monitoring approaches

2017-02-15 Thread Chetan Khatri
Thank you Georg

On Thu, Feb 16, 2017 at 12:30 PM, Georg Heiler 
wrote:

> I know of the following tools
> https://sites.google.com/site/sparkbigdebug/home https://
> engineering.linkedin.com/blog/2016/04/dr-elephant-open-
> source-self-serve-performance-tuning-hadoop-spark https://
> github.com/SparkMonitor/varOne https://github.com/groupon/sparklint
>
> Chetan Khatri  schrieb am Do., 16. Feb. 2017
> um 06:15 Uhr:
>
>> Hello All,
>>
>> What would be the best approches to monitor Spark Performance, is there
>> any tools for Spark Job Performance monitoring ?
>>
>> Thanks.
>>
>