Re: unit testing in spark

2017-04-11 Thread Elliot West
Jörn, I'm interested in your point on coverage. Coverage has been a useful tool for highlighting areas in the codebase that pose a source of potential risk. However, generally speaking, I've found that traditional coverage tools do not provide useful information when applied to distributed data pro

Re: unit testing in spark

2017-04-11 Thread Steve Loughran
(sorry sent an empty reply by accident) Unit testing is one of the easiest ways to isolate problems in an an internal class, things you can get wrong. But: time spent writing unit tests is time *not* spent writing integration tests. Which biases me towards the integration. What I do find is go

Re: unit testing in spark

2017-04-10 Thread Jörn Franke
I think in the end you need to check the coverage of your application. If your application is well covered on the job or pipeline level (depends however on how you implement these tests) then it can be fine. In the end it really depends on the data and what kind of transformation you implement.

Re: unit testing in spark

2017-04-10 Thread Gokula Krishnan D
Hello Shiv, Unit Testing is really helping when you follow TDD approach. And it's a safe way to code a program locally and also you can make use those test cases during the build process by using any of the continuous integration tools ( Bamboo, Jenkins). If so you can ensure that artifacts are be

Re: unit testing in spark

2017-04-05 Thread Shiva Ramagopal
Hi, I've been following this thread for a while. I'm trying to bring in a test strategy in my team to test a number of data pipelines before production. I have watched Lars' presentation and find it great. However I'm debating whether unit tests are worth the effort if there are good job-level an

Re: unit testing in spark

2016-12-11 Thread Juan Rodríguez Hortalá
Hi all, I would also would like to participate on that. Greetings, Juan On Fri, Dec 9, 2016 at 6:03 AM, Michael Stratton < michael.strat...@komodohealth.com> wrote: > That sounds great, please include me so I can get involved. > > On Fri, Dec 9, 2016 at 7:39 AM, Marco Mistroni > wrote: > >> M

Re: unit testing in spark

2016-12-09 Thread Michael Stratton
That sounds great, please include me so I can get involved. On Fri, Dec 9, 2016 at 7:39 AM, Marco Mistroni wrote: > Me too as I spent most of my time writing unit/integ tests pls advise > on where I can start > Kr > > On 9 Dec 2016 12:15 am, "Miguel Morales" wrote: > >> I would be interes

Re: unit testing in spark

2016-12-09 Thread Marco Mistroni
Me too as I spent most of my time writing unit/integ tests pls advise on where I can start Kr On 9 Dec 2016 12:15 am, "Miguel Morales" wrote: > I would be interested in contributing. Ive created my own library for > this as well. In my blog post I talk about testing with Spark in RSpec >

Re: unit testing in spark

2016-12-08 Thread Miguel Morales
Sure I'd love to participate. Being new at Scala things like dependency injection are still a bit iffy. Would love to exchange ideas. Sent from my iPhone > On Dec 8, 2016, at 4:29 PM, Holden Karau wrote: > > Maybe diverging a bit from the original question - but would it maybe make > sense

Re: unit testing in spark

2016-12-08 Thread Holden Karau
Maybe diverging a bit from the original question - but would it maybe make sense for those of us that all care about testing to try and do a hangout at some point so that we can exchange ideas? On Thu, Dec 8, 2016 at 4:15 PM, Miguel Morales wrote: > I would be interested in contributing. Ive cr

Re: unit testing in spark

2016-12-08 Thread Miguel Morales
I would be interested in contributing. Ive created my own library for this as well. In my blog post I talk about testing with Spark in RSpec style: https://medium.com/@therevoltingx/test-driven-development-w-apache-spark-746082b44941 Sent from my iPhone > On Dec 8, 2016, at 4:09 PM, Holden Ka

Re: unit testing in spark

2016-12-08 Thread Holden Karau
There are also libraries designed to simplify testing Spark in the various platforms, spark-testing-base for Scala/Java/Python (& video https://www.youtube.com/watch?v=f69gSGSLGrY), sscheck (scala focused property ba

Re: unit testing in spark

2016-12-08 Thread Lars Albertsson
I wrote some advice in a previous post on the list: http://markmail.org/message/bbs5acrnksjxsrrs It does not mention python, but the strategy advice is the same. Just replace JUnit/Scalatest with pytest, unittest, or your favourite python test framework. I recently held a presentation on the sub

Re: unit testing in spark

2016-12-08 Thread ndjido
Hi Pseudo, Just use unittest https://docs.python.org/2/library/unittest.html . > On 8 Dec 2016, at 19:14, pseudo oduesp wrote: > > somone can tell me how i can make unit test on pyspark ? > (book, tutorial ...)

unit testing in spark

2016-12-08 Thread pseudo oduesp
somone can tell me how i can make unit test on pyspark ? (book, tutorial ...)

Re: Scala: Perform Unit Testing in spark

2016-04-06 Thread Shishir Anshuman
I placed the *tests* jars in the *lib* folder, Now its working. On Wed, Apr 6, 2016 at 7:34 PM, Lars Albertsson wrote: > Hi, > > I wrote a longish mail on Spark testing strategy last month, which you > may find useful: > http://mail-archives.apache.org/mod_mbox/spark-user/201603.mbox/browser > >

Re: Scala: Perform Unit Testing in spark

2016-04-06 Thread Lars Albertsson
Hi, I wrote a longish mail on Spark testing strategy last month, which you may find useful: http://mail-archives.apache.org/mod_mbox/spark-user/201603.mbox/browser Let me know if you have follow up questions or want assistance. Regards, Lars Albertsson Data engineering consultant www.mapflat.c

Re: Scala: Perform Unit Testing in spark

2016-04-02 Thread Ted Yu
I think you should specify dependencies in this way: *"org.apache.spark" % "spark-core_2.10" % "1.6.0"* % "tests" Please refer to http://www.scalatest.org/user_guide/using_scalatest_with_sbt On Fri, Apr 1, 2016 at 3:33 PM, Shishir Anshuman wrote: > When I added *"org.apache.spark" % "spark-cor

Re: Scala: Perform Unit Testing in spark

2016-04-01 Thread Shishir Anshuman
When I added *"org.apache.spark" % "spark-core_2.10" % "1.6.0", *it should include spark-core_2.10-1.6.1-tests.jar. Why do I need to use the jar file explicitly? And how do I use the jars for compiling with *sbt* and running the tests on spark? On Sat, Apr 2, 2016 at 3:46 AM, Ted Yu wrote: >

Re: Scala: Perform Unit Testing in spark

2016-04-01 Thread Ted Yu
You need to include the following jars: jar tvf ./core/target/spark-core_2.10-1.6.1-tests.jar | grep SparkFunSuite 1787 Thu Mar 03 09:06:14 PST 2016 org/apache/spark/SparkFunSuite$$anonfun$withFixture$1.class 1780 Thu Mar 03 09:06:14 PST 2016 org/apache/spark/SparkFunSuite$$anonfun$withFixture

Re: Scala: Perform Unit Testing in spark

2016-04-01 Thread Holden Karau
You can also look at spark-testing-base which works in both Scalatest and Junit and see if that works for your use case. On Friday, April 1, 2016, Ted Yu wrote: > Assuming your code is written in Scala, I would suggest using ScalaTest. > > Please take a look at the XXSuite.scala files under mlli

Re: Scala: Perform Unit Testing in spark

2016-04-01 Thread Ted Yu
Assuming your code is written in Scala, I would suggest using ScalaTest. Please take a look at the XXSuite.scala files under mllib/ On Fri, Apr 1, 2016 at 1:31 PM, Shishir Anshuman wrote: > Hello, > > I have a code written in scala using Mllib. I want to perform unit testing > it. I cant decide

Scala: Perform Unit Testing in spark

2016-04-01 Thread Shishir Anshuman
Hello, I have a code written in scala using Mllib. I want to perform unit testing it. I cant decide between Junit 4 and ScalaTest. I am new to Spark. Please guide me how to proceed with the testing. Thank you.