I'm climbing under the hood in there for SPARK-3250, and I see this: override def sample(items: Iterator[T]): Iterator[T] = { items.filter { item => val x = rng.nextDouble() (x >= lb && x < ub) ^ complement } }
The clause (x >= lb && x < ub) is equivalent to (x < ub-lb), which is faster, and requires only one parameter (sampling fraction). Any caller asking for BernoulliSampler(a, b) can equally well ask for BernoulliSampler(b-a). Is there some angle I'm missing? --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org