I should have noted that I understand the notation of 0E-18 (exponential form, I think) and that in a normal case it is no different than 0; I just wanted to make sure that there wasn't something tricky going on since the representation was seemingly changing.
Michael, that's a fair point. I keep operating under the assumption of some guaranteed performance from BigDecimal but I realize there is probably some math happening that's causing results that can't perfectly be represented. Thanks guys. I'm good now. On Mon, Oct 24, 2016 at 8:57 PM Jakob Odersky <ja...@odersky.com> wrote: > Yes, thanks for elaborating Michael. > The other thing that I wanted to highlight was that in this specific > case the value is actually exactly zero (0E-18 = 0*10^(-18) = 0). > > On Mon, Oct 24, 2016 at 8:50 PM, Michael Matsko <m...@gwmail.gwu.edu> > wrote: > > Efe, > > > > I think Jakob's point is that that there is no problem. When you deal > with > > real numbers, you don't get exact representations of numbers. There is > > always some slop in representations, things don't ever cancel out > exactly. > > Testing reals for equality to zero will almost never work. > > > > Look at Goldberg's paper > > > https://ece.uwaterloo.ca/~dwharder/NumericalAnalysis/02Numerics/Double/paper.pdf > > for a quick intro. > > > > Mike > > > > On Oct 24, 2016, at 10:36 PM, Efe Selcuk <efema...@gmail.com> wrote: > > > > Okay, so this isn't contributing to any kind of imprecision. I suppose I > > need to go digging further then. Thanks for the quick help. > > > > On Mon, Oct 24, 2016 at 7:34 PM Jakob Odersky <ja...@odersky.com> wrote: > >> > >> What you're seeing is merely a strange representation, 0E-18 is zero. > >> The E-18 represents the precision that Spark uses to store the decimal > >> > >> On Mon, Oct 24, 2016 at 7:32 PM, Jakob Odersky <ja...@odersky.com> > wrote: > >> > An even smaller example that demonstrates the same behaviour: > >> > > >> > Seq(Data(BigDecimal(0))).toDS.head > >> > > >> > On Mon, Oct 24, 2016 at 7:03 PM, Efe Selcuk <efema...@gmail.com> > wrote: > >> >> I’m trying to track down what seems to be a very slight imprecision > in > >> >> our > >> >> Spark application; two of our columns, which should be netting out to > >> >> exactly zero, are coming up with very small fractions of non-zero > >> >> value. The > >> >> only thing that I’ve found out of place is that a case class entry > into > >> >> a > >> >> Dataset we’ve generated with BigDecimal(“0”) will end up as 0E-18 > after > >> >> it > >> >> goes through Spark, and I don’t know if there’s any appreciable > >> >> difference > >> >> between that and the actual 0 value, which can be generated with > >> >> BigDecimal. > >> >> Here’s a contrived example: > >> >> > >> >> scala> case class Data(num: BigDecimal) > >> >> defined class Data > >> >> > >> >> scala> val x = Data(0) > >> >> x: Data = Data(0) > >> >> > >> >> scala> x.num > >> >> res9: BigDecimal = 0 > >> >> > >> >> scala> val y = Seq(x, x.copy()).toDS.reduce( (a,b) => a.copy(a.num + > >> >> b.num)) > >> >> y: Data = Data(0E-18) > >> >> > >> >> scala> y.num > >> >> res12: BigDecimal = 0E-18 > >> >> > >> >> scala> BigDecimal("1") - 1 > >> >> res15: scala.math.BigDecimal = 0 > >> >> > >> >> Am I looking at anything valuable? > >> >> > >> >> Efe >