First of all, I think we all agree that data source v2 API should at least
support InternalRow and ColumnarBatch. With this assumption, the current
API has 2 problems:
*First problem*: We use mixin traits to add support for different data
formats.
The mixin traits define API to return DataReader
I agree we should reuse as much as possible. For PySpark, I think the
obvious choices of Breeze and numpy arrays already made make a lot of
sense, I’m not sure about the other language bindings and would defer to
others.
I was under the impression that UDTs were gone and (probably?) not coming
bac
Yes i’m referring to that method deviance. It fails when ever y is 0. I think
R deviance calculation logic checks if y is 0 and assigns 1 to y for such
cases.
There are few deviances Like nulldeviance, residualdiviance and deviance
that Glm regression summary object has.
You might want to check th
Are you referring this?
override def deviance(y: Double, mu: Double, weight: Double): Double = {
2.0 * weight * (y * math.*log(y / mu)* - (y - mu))
}
Not sure how does R handle this, but my guess is they may add a small
number, e.g. 0.5, to the numerator and denominator. If you can c
The fundamental difficulty seems to be that there's a spurious "round-trip"
in the API. Spark inspects the source to determine what type it's going to
provide, picks an appropriate method according to that type, and then calls
that method on the source to finally get what it wants. Pushing this out
Thanks for the thoughts! We've gone back and forth quite a bit about local
linear algebra support in Spark. For reference, there have been some
discussions here:
https://issues.apache.org/jira/browse/SPARK-6442
https://issues.apache.org/jira/browse/SPARK-16365
https://issues.apache.org/jira/brows
As instructed offline, I opened a JIRA for this:
https://issues.apache.org/jira/browse/SPARK-24020
I will create a pull request soon.
Le 4/17/2018 à 6:21 PM, Petar Zecevic a écrit :
Hello everybody
We (at University of Zagreb and University of Washington) have
implemented an optimization of
GeneralizedLinearRegression.ylogy seems to handle this case; can you be
more specific about where the log(0) happens? that's what should be fixed,
right? if so, then a JIRA and PR are the right way to proceed.
On Wed, Apr 18, 2018 at 2:37 PM svattig wrote:
> In Spark 2.3, When Poisson Model(with
Wenchen, can you explain a bit more clearly why this is necessary? The
pseudo-code you used doesn’t clearly demonstrate why. Why couldn’t this be
handled this with inheritance from an abstract Factory class? Why define
all of the createXDataReader methods, but make the DataFormat a field in
the fac
In Spark 2.3, When Poisson Model(with labelCol having few counts as 0's) is
fit, the Deviance calculations are broken as result of log(0). I think this
is the same case as in spark 2.2.
But the new toString method in Spark 2.3's
GeneralizedLinearRegressionTrainingSummary class is throwing error at
10 matches
Mail list logo