I should have noted that I understand the notation of 0E-18 (exponential
form, I think) and that in a normal case it is no different than 0; I just
wanted to make sure that there wasn't something tricky going on since the
representation was seemingly changing.

Michael, that's a fair point. I keep operating under the assumption of some
guaranteed performance from BigDecimal but I realize there is probably some
math happening that's causing results that can't perfectly be represented.

Thanks guys. I'm good now.

On Mon, Oct 24, 2016 at 8:57 PM Jakob Odersky <ja...@odersky.com> wrote:

> Yes, thanks for elaborating Michael.
> The other thing that I wanted to highlight was that in this specific
> case the value is actually exactly zero (0E-18 = 0*10^(-18) = 0).
>
> On Mon, Oct 24, 2016 at 8:50 PM, Michael Matsko <m...@gwmail.gwu.edu>
> wrote:
> > Efe,
> >
> > I think Jakob's point is that that there is no problem.  When you deal
> with
> > real numbers, you don't get exact representations of numbers.  There is
> > always some slop in representations, things don't ever cancel out
> exactly.
> > Testing reals for equality to zero will almost never work.
> >
> > Look at Goldberg's paper
> >
> https://ece.uwaterloo.ca/~dwharder/NumericalAnalysis/02Numerics/Double/paper.pdf
> > for a quick intro.
> >
> > Mike
> >
> > On Oct 24, 2016, at 10:36 PM, Efe Selcuk <efema...@gmail.com> wrote:
> >
> > Okay, so this isn't contributing to any kind of imprecision. I suppose I
> > need to go digging further then. Thanks for the quick help.
> >
> > On Mon, Oct 24, 2016 at 7:34 PM Jakob Odersky <ja...@odersky.com> wrote:
> >>
> >> What you're seeing is merely a strange representation, 0E-18 is zero.
> >> The E-18 represents the precision that Spark uses to store the decimal
> >>
> >> On Mon, Oct 24, 2016 at 7:32 PM, Jakob Odersky <ja...@odersky.com>
> wrote:
> >> > An even smaller example that demonstrates the same behaviour:
> >> >
> >> >     Seq(Data(BigDecimal(0))).toDS.head
> >> >
> >> > On Mon, Oct 24, 2016 at 7:03 PM, Efe Selcuk <efema...@gmail.com>
> wrote:
> >> >> I’m trying to track down what seems to be a very slight imprecision
> in
> >> >> our
> >> >> Spark application; two of our columns, which should be netting out to
> >> >> exactly zero, are coming up with very small fractions of non-zero
> >> >> value. The
> >> >> only thing that I’ve found out of place is that a case class entry
> into
> >> >> a
> >> >> Dataset we’ve generated with BigDecimal(“0”) will end up as 0E-18
> after
> >> >> it
> >> >> goes through Spark, and I don’t know if there’s any appreciable
> >> >> difference
> >> >> between that and the actual 0 value, which can be generated with
> >> >> BigDecimal.
> >> >> Here’s a contrived example:
> >> >>
> >> >> scala> case class Data(num: BigDecimal)
> >> >> defined class Data
> >> >>
> >> >> scala> val x = Data(0)
> >> >> x: Data = Data(0)
> >> >>
> >> >> scala> x.num
> >> >> res9: BigDecimal = 0
> >> >>
> >> >> scala> val y = Seq(x, x.copy()).toDS.reduce( (a,b) => a.copy(a.num +
> >> >> b.num))
> >> >> y: Data = Data(0E-18)
> >> >>
> >> >> scala> y.num
> >> >> res12: BigDecimal = 0E-18
> >> >>
> >> >> scala> BigDecimal("1") - 1
> >> >> res15: scala.math.BigDecimal = 0
> >> >>
> >> >> Am I looking at anything valuable?
> >> >>
> >> >> Efe
>

Reply via email to