Re: [MLLib] Logistic Regression and standadization

2018-04-28 Thread Valeriy Avanesov
Hi Joseph, I've just tried that out. MLLib indeed returns different models. I see no problem here then. How can Filipp's issue be possible? Best, Valeriy. On 04/27/2018 10:00 PM, Valeriy Avanesov wrote: Hi all, maybe I'm missing something, but from what was discussed here

Re: [MLLib] Logistic Regression and standadization

2018-04-27 Thread Valeriy Avanesov
standardization will get different result. But, if comparing results between R-glmnet and mllib, if we set the same parameters for regularization/standardization/... , then we should get the same result. If not, thenmaybe there's a bug. In this case you can paste your testing code and I can hel

Re: [MLLib] Logistic Regression and standadization

2018-04-20 Thread Valeriy Avanesov
Hi all. Filipp, do you use l1/l2/elstic-net penalization? I believe in this case standardization matters. Best, Valeriy. On 04/17/2018 11:40 AM, Weichen Xu wrote: Not a bug. When disabling standadization, mllib LR will still do standadization for features, but it will scale the coefficie

Re: [MLlib] Gaussian Process regression in MLlib

2018-03-12 Thread Valeriy Avanesov
owse/SPARK-23437 All concerned are welcome to discuss. Best, Valeriy. On Sat, Feb 3, 2018 at 9:24 PM, Valeriy Avanesov <mailto:acop...@gmail.com>> wrote: Hi, no, I don't thing we should actually compute the n \times n matrix. Leave alone inverting it. However, vari

Re: [MLlib] Gaussian Process regression in MLlib

2018-02-03 Thread Valeriy Avanesov
s, S Am 01.02.18 um 20:01 schrieb Valeriy Avanesov: Hi all, it came to my surprise that there is no implementation of Gaussian Process in Spark MLlib. The approach is widely known, employed and scalable (its sparse versions). Is there a good reason for that? Has it been discussed before? If

[MLlib] Gaussian Process regression in MLlib

2018-02-01 Thread Valeriy Avanesov
Hi all, it came to my surprise that there is no implementation of Gaussian Process in Spark MLlib. The approach is widely known, employed and scalable (its sparse versions). Is there a good reason for that? Has it been discussed before? If there is a need in this approach being a part of MLl