Very coincidentally I ran into something equally puzzling yesterday where
something was bizarrely null when it can't have been in a Spark program
that extends App. I also changed to use main() and it works fine. So
definitely some issue here. If nobody makes a JIRA before I get home I'll
do it.
On Oct 29, 2014 11:20 PM, "Michael Albert" <m_albert...@yahoo.com.invalid>
wrote:

> Greetings!
>
> This might be a documentation issue as opposed to a coding issue, in that
> perhaps the correct answer is "don't do that", but as this is not obvious,
> I am writing.
>
> The following code produces output most would not expect:
>
> package misc
>
> import org.apache.spark.SparkConf
> import org.apache.spark.SparkContext
> import org.apache.spark.SparkContext._
>
> object DemoBug extends App {
>     val conf = new SparkConf()
>     val sc = new SparkContext(conf)
>
>     val rdd = sc.parallelize(List("A","B","C","D"))
>     val str1 = "A"
>
>     val rslt1 = rdd.filter(x => { x != "A" }).count
>     val rslt2 = rdd.filter(x => { str1 != null && x != "A" }).count
>
>     println("DemoBug: rslt1 = " + rslt1 + " rslt2 = " + rslt2)
> }
>
> This produces the output:
> DemoBug: rslt1 = 3 rslt2 = 0
>
> Compiled with sbt:
> libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "1.1.0"
> Run on an EC2 EMR instance with a recent image (hadoop 2.4.0, spark 1.1.0)
>
> If instead there is a proper "main()", it works as expected.
>
> Thank you.
>
> Sincerely,
>  Mike
>

Reply via email to