Mocking these things is difficult; executing your unit tests in a local Spark context is preferred, as recommended in the programming guide <http://spark.apache.org/docs/latest/programming-guide.html#unit-testing>. I know this may not technically be a unit test, but it is hopefully close enough.
You can load your test data using SparkContext.parallelize and retrieve the data (for verification) using RDD.collect. On Thu, Oct 16, 2014 at 9:07 AM, Saket Kumar <saket.ku...@bgch.co.uk> wrote: > Hello all, > > I am trying to unit test my classes involved my Spark job. I am trying to > mock out the Spark classes (like SparkContext and Broadcast) so that I can > unit test my classes in isolation. However I have realised that these are > classes instead of traits. My first question is why? > > It is quite hard to mock out classes using ScalaTest+ScalaMock as the > classes which need to be mocked out need to be annotated with > org.scalamock.annotation.mock as per > http://www.scalatest.org/user_guide/testing_with_mock_objects#generatedMocks. > I cannot do that in my case as I am trying to mock out the spark classes. > > Am I missing something? Is there a better way to do this? > > val sparkContext = mock[SparkInteraction] > val trainingDatasetLoader = mock[DatasetLoader] > val broadcastTrainingDatasetLoader = mock[Broadcast[DatasetLoader]] > def transformerFunction(source: Iterator[(HubClassificationData, > String)]): Iterator[String] = { > source.map(_._2) > } > val classificationResultsRDD = mock[RDD[String]] > val classificationResults = Array("","","") > val inputRDD = mock[RDD[(HubClassificationData, String)]] > > inSequence{ > inAnyOrder{ > (sparkContext.broadcast[DatasetLoader] > _).expects(trainingDatasetLoader).returns(broadcastTrainingDatasetLoader) > } > } > > val sparkInvoker = new SparkJobInvoker(sparkContext, > trainingDatasetLoader) > > when(inputRDD.mapPartitions(transformerFunction)).thenReturn(classificationResultsRDD) > sparkInvoker.invoke(inputRDD) > > Thanks, > Saket > -- Daniel Siegmann, Software Developer Velos Accelerating Machine Learning 440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001 E: daniel.siegm...@velos.io W: www.velos.io