Re: unit testing for spark code

2021-03-22 Thread Attila Zsolt Piros
Hi! Let me draw your attention to Holden's* spark-testing-base* project. The documentation is at https://github.com/holdenk/spark-testing-base/wiki. As I usually write test for spark internal features I haven't needed to test so high level. But I am interested about your experiences. Best regar

Re: unit testing for spark code

2021-03-22 Thread Nicholas Gustafson
I've found pytest works well if you're using PySpark. Though if you have a lot of tests, running them all can be pretty slow. On Mon, Mar 22, 2021 at 6:32 AM Amit Sharma wrote: > Hi, can we write unit tests for spark code. Is there any specific > framework? > > > Thanks > Amit >

Re: unit testing for spark code

2021-03-22 Thread Mich Talebzadeh
coding in Scala or Python? Are you using any IDE (IntelliJ, PyCharm) view my Linkedin profile *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which

Re: Unit testing Spark/Scala code with Mockito

2020-05-20 Thread ZHANG Wei
AFAICT, depends on testing goals, Unit Test, Integration Test or E2E Test. For Unit Test, mostly, it tests individual class or class methods. Mockito can help mock and verify dependent instances or methods. For Integration Test, some Spark testing helper methods can setup the environment, such as

Re: Unit testing Spark/Scala code with Mockito

2020-05-20 Thread Mich Talebzadeh
On a second note with regard Spark and read writes as I understand unit tests are not meant to test database connections. This should be done in integration tests to check that all the parts work together. Unit tests are just meant to test the functional logic, and not spark's ability to read from

Re: unit testing in spark

2017-04-11 Thread Elliot West
Jörn, I'm interested in your point on coverage. Coverage has been a useful tool for highlighting areas in the codebase that pose a source of potential risk. However, generally speaking, I've found that traditional coverage tools do not provide useful information when applied to distributed data pro

Re: unit testing in spark

2017-04-11 Thread Steve Loughran
(sorry sent an empty reply by accident) Unit testing is one of the easiest ways to isolate problems in an an internal class, things you can get wrong. But: time spent writing unit tests is time *not* spent writing integration tests. Which biases me towards the integration. What I do find is go

Re: unit testing in spark

2017-04-10 Thread Jörn Franke
I think in the end you need to check the coverage of your application. If your application is well covered on the job or pipeline level (depends however on how you implement these tests) then it can be fine. In the end it really depends on the data and what kind of transformation you implement.

Re: unit testing in spark

2017-04-10 Thread Gokula Krishnan D
Hello Shiv, Unit Testing is really helping when you follow TDD approach. And it's a safe way to code a program locally and also you can make use those test cases during the build process by using any of the continuous integration tools ( Bamboo, Jenkins). If so you can ensure that artifacts are be

Re: unit testing in spark

2017-04-05 Thread Shiva Ramagopal
Hi, I've been following this thread for a while. I'm trying to bring in a test strategy in my team to test a number of data pipelines before production. I have watched Lars' presentation and find it great. However I'm debating whether unit tests are worth the effort if there are good job-level an

Re: unit testing in spark

2016-12-11 Thread Juan Rodríguez Hortalá
Hi all, I would also would like to participate on that. Greetings, Juan On Fri, Dec 9, 2016 at 6:03 AM, Michael Stratton < michael.strat...@komodohealth.com> wrote: > That sounds great, please include me so I can get involved. > > On Fri, Dec 9, 2016 at 7:39 AM, Marco Mistroni > wrote: > >> M

Re: unit testing in spark

2016-12-09 Thread Michael Stratton
That sounds great, please include me so I can get involved. On Fri, Dec 9, 2016 at 7:39 AM, Marco Mistroni wrote: > Me too as I spent most of my time writing unit/integ tests pls advise > on where I can start > Kr > > On 9 Dec 2016 12:15 am, "Miguel Morales" wrote: > >> I would be interes

Re: unit testing in spark

2016-12-09 Thread Marco Mistroni
Me too as I spent most of my time writing unit/integ tests pls advise on where I can start Kr On 9 Dec 2016 12:15 am, "Miguel Morales" wrote: > I would be interested in contributing. Ive created my own library for > this as well. In my blog post I talk about testing with Spark in RSpec >

Re: unit testing in spark

2016-12-08 Thread Miguel Morales
Sure I'd love to participate. Being new at Scala things like dependency injection are still a bit iffy. Would love to exchange ideas. Sent from my iPhone > On Dec 8, 2016, at 4:29 PM, Holden Karau wrote: > > Maybe diverging a bit from the original question - but would it maybe make > sense

Re: unit testing in spark

2016-12-08 Thread Holden Karau
Maybe diverging a bit from the original question - but would it maybe make sense for those of us that all care about testing to try and do a hangout at some point so that we can exchange ideas? On Thu, Dec 8, 2016 at 4:15 PM, Miguel Morales wrote: > I would be interested in contributing. Ive cr

Re: unit testing in spark

2016-12-08 Thread Miguel Morales
I would be interested in contributing. Ive created my own library for this as well. In my blog post I talk about testing with Spark in RSpec style: https://medium.com/@therevoltingx/test-driven-development-w-apache-spark-746082b44941 Sent from my iPhone > On Dec 8, 2016, at 4:09 PM, Holden Ka

Re: unit testing in spark

2016-12-08 Thread Holden Karau
There are also libraries designed to simplify testing Spark in the various platforms, spark-testing-base for Scala/Java/Python (& video https://www.youtube.com/watch?v=f69gSGSLGrY), sscheck (scala focused property ba

Re: unit testing in spark

2016-12-08 Thread Lars Albertsson
I wrote some advice in a previous post on the list: http://markmail.org/message/bbs5acrnksjxsrrs It does not mention python, but the strategy advice is the same. Just replace JUnit/Scalatest with pytest, unittest, or your favourite python test framework. I recently held a presentation on the sub

Re: unit testing in spark

2016-12-08 Thread ndjido
Hi Pseudo, Just use unittest https://docs.python.org/2/library/unittest.html . > On 8 Dec 2016, at 19:14, pseudo oduesp wrote: > > somone can tell me how i can make unit test on pyspark ? > (book, tutorial ...)

Re: Unit testing framework for Spark Jobs?

2016-05-21 Thread Lars Albertsson
Not that I can share, unfortunately. It is on my backlog to create a repository with examples, but I am currently a bit overloaded, so don't hold your breath. :-/ If you want to be notified when it happens, please follow me on Twitter or Google+. See web site below for links. Regards, Lars Alber

Re: Unit testing framework for Spark Jobs?

2016-05-18 Thread Todd Nist
Perhaps these may be of some use: https://github.com/mkuthan/example-spark http://mkuthan.github.io/blog/2015/03/01/spark-unit-testing/ https://github.com/holdenk/spark-testing-base On Wed, May 18, 2016 at 2:14 PM, swetha kasireddy wrote: > Hi Lars, > > Do you have any examples for the methods

Re: Unit testing framework for Spark Jobs?

2016-05-18 Thread swetha kasireddy
Hi Lars, Do you have any examples for the methods that you described for Spark batch and Streaming? Thanks! On Wed, Mar 30, 2016 at 2:41 AM, Lars Albertsson wrote: > Thanks! > > It is on my backlog to write a couple of blog posts on the topic, and > eventually some example code, but I am curre

Re: Unit testing framework for Spark Jobs?

2016-03-30 Thread Lars Albertsson
Thanks! It is on my backlog to write a couple of blog posts on the topic, and eventually some example code, but I am currently busy with clients. Thanks for the pointer to Eventually - I was unaware. Fast exit on exception would be a useful addition, indeed. Lars Albertsson Data engineering cons

Re: Unit testing framework for Spark Jobs?

2016-03-28 Thread Steve Loughran
this is a good summary -Have you thought of publishing it at the end of a URL for others to refer to > On 18 Mar 2016, at 07:05, Lars Albertsson wrote: > > I would recommend against writing unit tests for Spark programs, and > instead focus on integration tests of jobs or pipelines of several >

Re: Unit testing framework for Spark Jobs?

2016-03-24 Thread Shiva Ramagopal
Hi Lars, Very pragmatic ideas around testing of Spark applications end-to-end! -Shiva On Fri, Mar 18, 2016 at 12:35 PM, Lars Albertsson wrote: > I would recommend against writing unit tests for Spark programs, and > instead focus on integration tests of jobs or pipelines of several > jobs. You

Re: Unit testing framework for Spark Jobs?

2016-03-19 Thread Vikas Kawadia
=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> >> > > > ---

Re: Unit testing framework for Spark Jobs?

2016-03-19 Thread Lars Albertsson
I would recommend against writing unit tests for Spark programs, and instead focus on integration tests of jobs or pipelines of several jobs. You can still use a unit test framework to execute them. Perhaps this is what you meant. You can use any of the popular unit test frameworks to drive your t

Re: Unit testing framework for Spark Jobs?

2016-03-02 Thread radoburansky
I am sure you have googled this: https://github.com/holdenk/spark-testing-base On Wed, Mar 2, 2016 at 6:54 PM, SRK [via Apache Spark User List] < ml-node+s1001560n2638...@n3.nabble.com> wrote: > Hi, > > What is a good unit testing framework for Spark batch/streaming jobs? I > have core spark, spa

Re: Unit testing framework for Spark Jobs?

2016-03-02 Thread Ricardo Paiva
I use the plain and old Junit Spark batch example: import org.apache.spark.SparkConf import org.apache.spark.SparkContext import org.apache.spark.sql.SQLContext import org.junit.AfterClass import org.junit.Assert.assertEquals import org.junit.BeforeClass import org.junit.Test object TestMyCode {

Re: Unit testing framework for Spark Jobs?

2016-03-02 Thread Silvio Fiorito
Please check out the following for some good resources: https://github.com/holdenk/spark-testing-base https://spark-summit.org/east-2016/events/beyond-collect-and-parallelize-for-tests/ On 3/2/16, 12:54 PM, "SRK" wrote: >Hi, > >What is a good unit testing framework for Spark batch/streami

Re: Unit testing framework for Spark Jobs?

2016-03-02 Thread Yin Yang
Cycling prior bits: http://search-hadoop.com/m/q3RTto4sby1Cd2rt&subj=Re+Unit+test+with+sqlContext On Wed, Mar 2, 2016 at 9:54 AM, SRK wrote: > Hi, > > What is a good unit testing framework for Spark batch/streaming jobs? I > have > core spark, spark sql with dataframes and streaming api getting

Re: Unit Testing

2015-08-13 Thread Burak Yavuz
I would recommend this spark package for your unit testing needs ( http://spark-packages.org/package/holdenk/spark-testing-base). Best, Burak On Thu, Aug 13, 2015 at 5:51 AM, jay vyas wrote: > yes there certainly is, so long as eclipse has the right plugins and so on > to run scala programs. Y

Re: Unit Testing

2015-08-13 Thread jay vyas
yes there certainly is, so long as eclipse has the right plugins and so on to run scala programs. You're really asking two questions: (1) Can I use a modern IDE to develop spark apps and (2) can we easily unit test spark streaming apps. the answer is yes to both... Regarding your IDE: I like t

Re: Unit testing with HiveContext

2015-04-09 Thread Daniel Siegmann
Thanks Ted, using HiveTest as my context worked. It still left a metastore directory and Derby log in my current working directory though; I manually added a shutdown hook to delete them and all was well. On Wed, Apr 8, 2015 at 4:33 PM, Ted Yu wrote: > Please take a look at > sql/hive/src/main/s

Re: Unit testing with HiveContext

2015-04-08 Thread Ted Yu
Please take a look at sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala : protected def configure(): Unit = { warehousePath.delete() metastorePath.delete() setConf("javax.jdo.option.ConnectionURL", s"jdbc:derby:;databaseName=$metastorePath;create=true")

Re: Unit testing and Spark Streaming

2014-12-12 Thread Jay Vyas
https://github.com/jayunit100/SparkStreamingCassandraDemo On this note, I've built a framework which is mostly "pure" so that functional unit tests can be run composing mock data for Twitter statuses, with just regular junit... That might be relevant also. I think at some point we should come

Re: Unit testing and Spark Streaming

2014-12-12 Thread Emre Sevinc
On Fri, Dec 12, 2014 at 2:17 PM, Eric Loots wrote: > How can the log level in test mode be reduced (or extended when needed) ? Hello Eric, The following might be helpful for reducing the log messages during unit testing: http://stackoverflow.com/a/2736/236007 -- Emre Sevinç https://be.lin

Re: Unit testing jar request

2014-11-12 Thread nightwolf
+1 I agree we need this too. Looks like there is already an issue for it here; https://spark-project.atlassian.net/browse/SPARK-750 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unit-testing-jar-request-tp16475p18801.html Sent from the Apache Spark User L

Re: Unit Testing (JUnit) with Spark

2014-10-29 Thread touchdown
add these to your dependencies: "io.netty" % "netty" % "3.6.6.Final" exclude("io.netty", "netty-all") to the end of spark and hadoop dependencies reference: https://spark-project.atlassian.net/browse/SPARK-1138 I am using Spark 1.1 so the akka issue is already fixed -- View this message in co

Re: Unit testing: Mocking out Spark classes

2014-10-16 Thread Daniel Siegmann
Mocking these things is difficult; executing your unit tests in a local Spark context is preferred, as recommended in the programming guide . I know this may not technically be a unit test, but it is hopefully close enough. Y

Re: Unit Testing (JUnit) with Spark

2014-07-29 Thread soumick86
Few lines of my error logs look like 2014-07-29 16:32:16,326 ERROR [ActorSystemImpl] Uncaught fatal error from thread [spark-akka.actor.default-dispatcher-6] shutting down ActorSystem [spark] java.lang.VerifyError: (class: org/jboss/netty/channel/socket/nio/NioWorkerPool, method: createWorker si

Re: Unit Testing (JUnit) with Spark

2014-07-29 Thread Daniel Siegmann
Sonal's suggestion of looking at the JavaAPISuite is a good idea. Just a few things to note. Pay special attention to what's being done in the setUp and tearDown methods, because that's where the magic is happening. To unit test against Spark, pretty much all you need to do is create a context run

Re: Unit Testing (JUnit) with Spark

2014-07-29 Thread Sonal Goyal
You can take a look at https://github.com/apache/spark/blob/master/core/src/test/java/org/apache/spark/JavaAPISuite.java and model your junits based on it. Best Regards, Sonal Nube Technologies On Tue, Jul 29, 2014 at 10:10 PM, K

Re: Unit Testing (JUnit) with Spark

2014-07-29 Thread Kostiantyn Kudriavtsev
Hi, try this one http://simpletoad.blogspot.com/2014/07/runing-spark-unit-test-on-windows-7.html it’s more about fixing windows-specific issue, but code snippet gives general idea just run etl and check output w/ Assert(s) On Jul 29, 2014, at 6:29 PM, soumick86 wrote: > Is there any example

Re: Unit Testing (JUnit) with Spark

2014-07-29 Thread jay vyas
I've been working some on building spark blueprints, and recently tried to generalize one for easy blueprints of spark apps. https://github.com/jayunit100/SparkBlueprint.git It runs the spark app's main method in a unit test, and builds in SBT. You can easily try it out and improve on it. Obvio

Re: unit testing with spark

2014-02-27 Thread Ameet Kini
Turns out that my race condition was caused by sbt running suites in parallel. I had to disable it like so: parallelExecution in Test := false This is mentioned in the sbt docs http://www.scala-sbt.org/0.13.1/docs/Detailed-Topics/Testing#disable-parallel-execution-of-tests With suites running se