Re: Flink ML - NaN Handling

2017-02-13 Thread Till Rohrmann
Hi Stavros, your idea to add an imputer is really good. Please open a JIRA issue for that. You're right that failing fast is usually the better behaviour in case of an undefined value such as NaN or infinity. Thus, I think it makes sense to define for the different components their value range an

Re: Flink ML - NaN Handling

2017-02-12 Thread Stavros Kontopoulos
Btw I think we should add an Imputer if we follow scikit-learn as stated here for preparing the dataset: http://scikit-learn.org/stable/modules/preprocessing.html "Imputation of Missing Values" paragraph. What do you think? Should I add it as an issue on jira? The question for NaN also holds for g

Re: Flink ML - NaN Handling

2017-02-12 Thread Stavros Kontopoulos
Ok cool thnx Till. On Sun, Feb 12, 2017 at 4:59 PM, Till Rohrmann wrote: > Hi Stavros, > > so far we've sticked mainly to scikit-learn in terms of semantics. Thus, I > would recommend to follow scikit-learn's approach to handle NaNs. > > Cheers, > Till > > On Fri, Feb 10, 2017 at 11:48 PM, Stavr

Re: Flink ML - NaN Handling

2017-02-12 Thread Till Rohrmann
Hi Stavros, so far we've sticked mainly to scikit-learn in terms of semantics. Thus, I would recommend to follow scikit-learn's approach to handle NaNs. Cheers, Till On Fri, Feb 10, 2017 at 11:48 PM, Stavros Kontopoulos < st.kontopou...@gmail.com> wrote: > Hello guys, > > Is there a story for t

Flink ML - NaN Handling

2017-02-10 Thread Stavros Kontopoulos
Hello guys, Is there a story for this (might have been discussed earlier)? I see differences between scikit-learn and numpy. Do we standardize on scikit-learn? PS. I am working on the preprocessing stuff. Best, Stavros