Re: Update Public Documentation - SparkSession instead of SparkContext

2017-02-15 Thread Reynold Xin
There is an existing pull request to update it: https://github.com/apache/spark/pull/16856 But it is a little bit tricky. On Wed, Feb 15, 2017 at 7:44 AM, Chetan Khatri wrote: > Hello Spark Dev Team, > > I was working with my team having most of the confusion that why your > public documentat

Re: Update Public Documentation - SparkSession instead of SparkContext

2017-02-15 Thread Chetan Khatri
Sorry, The context i am referring is for below URL http://spark.apache.org/docs/2.0.1/programming-guide.html On Wed, Feb 15, 2017 at 1:12 PM, Sean Owen wrote: > When asking a question like this, please actually link to what you are > referring to. Some is intended. > > > On Wed, Feb 15, 2017,

Structured Streaming Spark Summit Demo - Databricks people

2017-02-15 Thread Sam Elamin
Hey folks This one is mainly aimed at the databricks folks, I have been trying to replicate the cloudtrail demo Micheal did at Spark Summit. The code for it can be found here

File JIRAs for all flaky test failures

2017-02-15 Thread Kay Ousterhout
Hi all, I've noticed the Spark tests getting increasingly flaky -- it seems more common than not now that the tests need to be re-run at least once on PRs before they pass. This is both annoying and problematic because it makes it harder to tell when a PR is introducing new flakiness. To try to

Re: Structured Streaming Spark Summit Demo - Databricks people

2017-02-15 Thread Nicholas Chammas
I don't think this is the right place for questions about Databricks. I'm pretty sure they have their own website with a forum for questions about their product. Maybe this? https://forums.databricks.com/ On Wed, Feb 15, 2017 at 2:34 PM Sam Elamin wrote: > Hey folks > > This one is mainly aimed

Re: Structured Streaming Spark Summit Demo - Databricks people

2017-02-15 Thread Sam Elamin
Fair enough your absolutely right Thanks for pointing me in the right direction On Wed, 15 Feb 2017 at 20:13, Nicholas Chammas wrote: > I don't think this is the right place for questions about Databricks. I'm > pretty sure they have their own website with a forum for questions about > their pro

Re: Structured Streaming Spark Summit Demo - Databricks people

2017-02-15 Thread Chris Fregly
Just be warned:  the last time I asked a question about a non-working Databricks Keynote Demo from Spark Summit on the forum mentioned here, they deleted my question!  And i’m a major contributor to those forums!! Often times, those on-stage demos don’t actually work until many months after the

Re: File JIRAs for all flaky test failures

2017-02-15 Thread Saikat Kanjilal
I was working on something to address this a while ago https://issues.apache.org/jira/browse/SPARK-9487 but the difficulty in testing locally made things a lot more complicated to fix for each of the unit tests, should we resurface this JIRA again, I would whole heartedly agree with the flakine

Re: File JIRAs for all flaky test failures

2017-02-15 Thread Armin Braun
I think one thing that is contributing to this a lot too is the general issue of the tests taking up a lot of file descriptors (10k+ if I run them on a standard Debian machine). There are a few suits that contribute to this in particular like `org.apache.spark.ExecutorAllocationManagerSuite` which,

Re: File JIRAs for all flaky test failures

2017-02-15 Thread Saikat Kanjilal
I would recommend we just open JIRA's for unit tests based on module (core/ml/sql etc) and we fix this one module at a time, this at least keeps the number of unit tests needing fixing down to a manageable number. From: Armin Braun Sent: Wednesday, February 15,

Re: File JIRAs for all flaky test failures

2017-02-15 Thread shane knapp
it's not an open-file limit -- i have the jenkins workers set up w/a soft file limit of 100k, and a hard limit of 200k. On Wed, Feb 15, 2017 at 12:48 PM, Armin Braun wrote: > I think one thing that is contributing to this a lot too is the general > issue of the tests taking up a lot of file desc

Does anyone here run the TheApacheSpark Youtube channel?

2017-02-15 Thread Sean Owen
I just saw https://www.youtube.com/user/TheApacheSpark and wondered who 'owns' it? if it's a quasi-official channel, can we list it on http://spark.apache.org/community.html but then, how does one add videos? If it's the Spark Summit video account, as it seems to be at the moment, it shouldn't be

Need Help: getting java.lang.OutOfMemory Error : GC overhead limit exceeded (TransportChannelHandler)

2017-02-15 Thread naresh gundla
Hi , I am running a spark application and getting out of memory errors in yarn nodemanager logs and container get killed. Please find below for the errors details. Has anyone faced with this issue? *Enabled spark dynamic allocation and yarn shuffle* 2017-02-15 14:50:48,047 WARN io.netty.util.co

Re: Need Help: getting java.lang.OutOfMemory Error : GC overhead limit exceeded (TransportChannelHandler)

2017-02-15 Thread Ryan Blue
Naresh, We've configured our Spark JVMs to shut down if there is an OutOfMemoryError. Otherwise, the error will bring down a random thread an cause trouble like the IllegalStateException you hit. It is best to let Spark recover by replacing the executor or failing the job. rb On Wed, Feb 15, 201

Re: File JIRAs for all flaky test failures

2017-02-15 Thread Josh Rosen
A useful tool for investigating test flakiness is my Jenkins Test Explorer service, running at https://spark-tests.appspot.com/ This has some useful timeline views for debugging flaky builds. For instance, at https://spark-tests.appspot.com/jobs/spark-master-test-maven-hadoop-2.6 (may be slow to l

Re: File JIRAs for all flaky test failures

2017-02-15 Thread Saikat Kanjilal
The issue was not with a lack of tooling, I used the url you are describing below to drill down to the exact test failure/stack trace, the problem was that my builds would work like a charm locally but fail with these errors on Jenkins, this was the whole challenge in fixing the unit tests, it w

Re: welcoming Takuya Ueshin as a new Apache Spark committer

2017-02-15 Thread Joseph Bradley
Congrats and welcome! On Mon, Feb 13, 2017 at 6:54 PM, Takuya UESHIN wrote: > Thank you very much everyone! > I really look forward to working with you! > > > On Tue, Feb 14, 2017 at 9:47 AM, Yanbo Liang wrote: > >> Congratulations! >> >> On Mon, Feb 13, 2017 at 3:29 PM, Kazuaki Ishizaki >> wr

Spark Job Performance monitoring approaches

2017-02-15 Thread Chetan Khatri
Hello All, What would be the best approches to monitor Spark Performance, is there any tools for Spark Job Performance monitoring ? Thanks.

Re: welcoming Takuya Ueshin as a new Apache Spark committer

2017-02-15 Thread Liang-Chi Hsieh
Congratulations! Takuya UESHIN wrote > Thank you very much everyone! > I really look forward to working with you! > > > On Tue, Feb 14, 2017 at 9:47 AM, Yanbo Liang < > ybliang8@ > > wrote: > >> Congratulations! >> >> On Mon, Feb 13, 2017 at 3:29 PM, Kazuaki Ishizaki < > ISHIZAKI@.ibm > >

Re: Spark Job Performance monitoring approaches

2017-02-15 Thread Georg Heiler
I know of the following tools https://sites.google.com/site/sparkbigdebug/home https://engineering.linkedin.com/blog/2016/04/dr-elephant-open-source-self-serve-performance-tuning-hadoop-spark https://github.com/SparkMonitor/varOne https://github.com/groupon/sparklint Chetan Khatri schrieb am Do

Re: Spark Job Performance monitoring approaches

2017-02-15 Thread Chetan Khatri
Thank you Georg On Thu, Feb 16, 2017 at 12:30 PM, Georg Heiler wrote: > I know of the following tools > https://sites.google.com/site/sparkbigdebug/home https:// > engineering.linkedin.com/blog/2016/04/dr-elephant-open- > source-self-serve-performance-tuning-hadoop-spark https:// > github.com/Sp