So, all the coefficients are the same but for CRUZADAS? How are you
fitting the model in R (glm)? Can you try setting zero penalty for
alpha and lambda:
.setRegParam(0)
.setElasticNetParam(0)
Cheers,
S
Am 24.10.17 um 13:19 schrieb Alexis Peña:
Thanks for your Answer, the features “Cruzadas” are Binaries (0/1).
The chisq statistic must be work whit 2x2 tables.
i fit the model in SAS and R and both the coeff have estimates (not
significant). Two of this kind of features has estimations
CRUZADAS
4907
0,247624087
CRUZADAS
5304
-0,161424508
Thanks
*De: *Weichen Xu <weichen...@databricks.com>
*Fecha: *martes, 24 de octubre de 2017, 07:23
*Para: *Alexis Peña <alexis.p...@exalitica.com>
*CC: *"user @spark" <user@spark.apache.org>
*Asunto: *Re: Zero Coefficient in logistic regression
Yes chi-squared statistic only used in categorical features. It looks
not proper here.
Thanks!
On Tue, Oct 24, 2017 at 5:13 PM, Simon Dirmeier <simon.dirme...@web.de
<mailto:simon.dirme...@web.de>> wrote:
Hey,
as far as I know feature selection using the a chi-squared
statistic, can only be done on categorical features and not on
possibly continuous ones?
Furthermore, since your logistic model doesn't use any
regularization, you should be fine here. So I'd check the
ChiSqSeletor and possibly replace it with another feature
selection method.
There is however always the chance that your response does not
depend on your covariables, so you'd estimate a zero coefficient.
Cheers,
Simon
Am 24.10.17 um 04:56 schrieb Alexis Peña:
Hi Guys,
We are fitting a Logistic model using the following code.
val Chisqselector = new
ChiSqSelector().setNumTopFeatures(10).setFeaturesCol("VECTOR_1").setLabelCol("TARGET").setOutputCol("selectedFeatures")
val assembler = new
VectorAssembler().setInputCols(Array("FEATURES",
"selectedFeatures", "PROM_MESES_DIST", "RECENCIA", "TEMP_MIN",
"TEMP_MAX", "PRECIPITACIONES")).setOutputCol("Union")
val lr = new
LogisticRegression().setLabelCol("TARGET").setFeaturesCol("Union")
val pipeline = new Pipeline().setStages(Array(Chisqselector,
assembler, lr))
do you know why the coeff for the following features are zero
estimate, is it produced in ChisqSelector or Logistic model?
Thanks in advance!!
CODIGO
PARAMETRO
COEFICIENTES_MUESTREO_BALANCEADO
PROPIAS
CV_UM
0,276866756
PROPIAS
CV_U3M
-0,241851427
PROPIAS
CV_U6M
-0,568312819
PROPIAS
CV_U12M
0,134706601
PROPIAS
M_UM
5,47E-06
PROPIAS
M_U3M
-7,10E-06
PROPIAS
M_U6M
1,73E-05
PROPIAS
M_U12M
-5,41E-06
PROPIAS
CP_UM
-0,050750105
PROPIAS
CP_U3M
0,125483162
PROPIAS
CP_U6M
-0,353906788
PROPIAS
CP_U12M
0,159538155
PROPIAS
TUM
-0,020217902
PROPIAS
TU3M
0,002101906
PROPIAS
TU6M
-0,005481915
PROPIAS
TU12M
0,003443081
CRUZADAS
2303
0
CRUZADAS
3901
0
CRUZADAS
3905
0
CRUZADAS
3907
0
CRUZADAS
3909
0
CRUZADAS
4102
0
CRUZADAS
4307
0
CRUZADAS
4501
0
CRUZADAS
4907
0,247624087
CRUZADAS
5304
-0,161424508
LP
PROM_MESES_DIST
-0,680356554
PROPIAS
RECENCIA
-0,00289069
EXTERNAS
TEMP_MIN
0,006488683
EXTERNAS
TEMP_MAX
-0,013497441
EXTERNAS
PRECIPITACIONES
-0,007607086
INTERCEPTO
2,401593191