Hi,
I have looked through some of the test examples and also the brief
documentation on unit testing at
http://spark.apache.org/docs/latest/programming-guide.html#unit-testing, but
still dont have a good understanding of writing unit tests using the Spark
framework. Previously, I have written unit tests using specs2 framework and
have got them to work in Scalding. I tried to use the specs2 framework with
Spark, but could not find any simple examples I could follow. I am open to
specs2 or Funsuite, whichever works best with Spark. I would like some
additional guidance, or some simple sample code using specs2 or Funsuite. My
code is provided below.
I have the following code in src/main/scala/GetInfo.scala. It reads a Json
file and extracts some data. It takes the input file (args(0)) and output
file (args(1)) as arguments.
object GetInfo{
def main(args: Array[String]) {
val inp_file = args(0)
val conf = new SparkConf().setAppName("GetInfo")
val sc = new SparkContext(conf)
val res = sc.textFile(log_file)
.map(line => { parse(line) })
.map(json =>
{
implicit lazy val formats =
org.json4s.DefaultFormats
val aid = (json \ "d" \ "TypeID").extract[Int]
val ts = (json \ "d" \ "TimeStamp").extract[Long]
val gid = (json \ "d" \ "ID").extract[String]
(aid, ts, gid)
}
)
.groupBy(tup => tup._3)
.sortByKey(true)
.map(g => (g._1, g._2.map(_._2).max))
res.map(tuple=> "%s, %d".format(tuple._1,
tuple._2)).saveAsTextFile(args(1))
}
I would like to test the above code. My unit test is in src/test/scala. The
code I have so far for the unit test appears below:
import org.apache.spark._
import org.specs2.mutable._
class GetInfoTest extends Specification with java.io.Serializable{
val data = List (
("d": {"TypeID" = 10, "Timestamp": 1234, "ID": "ID1"}),
("d": {"TypeID" = 11, "Timestamp": 5678, "ID": "ID1"}),
("d": {"TypeID" = 10, "Timestamp": 1357, "ID": "ID2"}),
("d": {"TypeID" = 11, "Timestamp": 2468, "ID": "ID2"})
)
val expected_out = List(
("ID1",5678),
("ID2",2468),
)
"A GetInfo job" should {
//***** How do I pass "data" define above as input and output
which GetInfo expects as arguments? ******
val sc = new SparkContext("local", "GetInfo")
//*** how do I get the output ***
//assuming out_buffer has the output I want to match it to the
expected output
"match expected output" in {
( out_buffer == expected_out) must beTrue
}
}
}
I would like some help with the tasks marked with "****" in the unit test
code above. If specs2 is not the right way to go, I am also open to
FunSuite. I would like to know how to pass the input while calling my
program from the unit test and get the output.
Thanks for your help.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/guidance-on-simple-unit-testing-with-Spark-tp7604.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.