Re: cache changes precision

2014-07-24 Thread Ron Gonzalez
Cool I'll take a look and give it a try! Thanks, Ron Sent from my iPad > On Jul 24, 2014, at 10:35 PM, Andrew Ash wrote: > > Hi Ron, > > I think you're encountering the issue where cacheing data from Hadoop ends up > with many duplicate values instead of what you expect. Try adding a .clone

Re: cache changes precision

2014-07-24 Thread Andrew Ash
Hi Ron, I think you're encountering the issue where cacheing data from Hadoop ends up with many duplicate values instead of what you expect. Try adding a .clone() to the datum() call. The issue is that Hadoop returns the same object many times but with its contents changed. This is an optimizat

cache changes precision

2014-07-24 Thread Ron Gonzalez
Hi,   I'm doing the following:   def main(args: Array[String]) = {     val sparkConf = new SparkConf().setAppName("AvroTest").setMaster("local[2]")     val sc = new SparkContext(sparkConf)     val conf = new Configuration()     val job = new Job(conf)     val path = new Path("/tmp/a.avro");     va