Good catch! If you'd like, you can send a pull request changing the files in docs/ to do this (see https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark <https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark>), otherwise maybe open an issue on https://issues.apache.org/jira/browse/SPARK <https://issues.apache.org/jira/browse/SPARK> so we can track it.
Matei > On Oct 29, 2014, at 3:16 PM, Michael Albert <[email protected]> > wrote: > > Greetings! > > This might be a documentation issue as opposed to a coding issue, in that > perhaps the correct answer is "don't do that", but as this is not obvious, I > am writing. > > The following code produces output most would not expect: > > package misc > > import org.apache.spark.SparkConf > import org.apache.spark.SparkContext > import org.apache.spark.SparkContext._ > > object DemoBug extends App { > val conf = new SparkConf() > val sc = new SparkContext(conf) > > val rdd = sc.parallelize(List("A","B","C","D")) > val str1 = "A" > > val rslt1 = rdd.filter(x => { x != "A" }).count > val rslt2 = rdd.filter(x => { str1 != null && x != "A" }).count > > println("DemoBug: rslt1 = " + rslt1 + " rslt2 = " + rslt2) > } > > This produces the output: > DemoBug: rslt1 = 3 rslt2 = 0 > > Compiled with sbt: > libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "1.1.0" > Run on an EC2 EMR instance with a recent image (hadoop 2.4.0, spark 1.1.0) > > If instead there is a proper "main()", it works as expected. > > Thank you. > > Sincerely, > Mike
