Re: Quick request: prolific PR openers, review your open PRs

2017-01-08 Thread Kazuaki Ishizaki
Sure, I updated status of some PRs. Regards, Kazuaki Ishizaki From: Sean Owen To: dev Date: 2017/01/04 21:37 Subject:Quick request: prolific PR openers, review your open PRs Just saw that there are many people with >= 8 open PRs. Some are legitimately in flight but many ar

Re: Quick request: prolific PR openers, review your open PRs

2017-01-08 Thread Takeshi Yamamuro
Yea, I'll update soon, thanks On Sun, Jan 8, 2017 at 10:01 PM, Kazuaki Ishizaki wrote: > Sure, I updated status of some PRs. > > Regards, > Kazuaki Ishizaki > > > > From:Sean Owen > To:dev > Date:2017/01/04 21:37 > Subject:Quick request: prolific PR openers, rev

Re: Quick request: prolific PR openers, review your open PRs

2017-01-08 Thread Jacek Laskowski
+1 What an excellent way to offload some of your chores! I'm so much to learn from you, Sean! (Now since Sean seems to have a bit more time I'm gonna send few PRs hoping he spares some time to find merits in them :)) Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering

protected val mapStatuses is ConcurrentHashMap in both MapOutputTrackerMaster and MapOutputTrackerWorker?

2017-01-08 Thread Jacek Laskowski
Hi, Just noticed that both MapOutputTrackerMaster [1] and MapOutputTrackerWorker [2] use Java's ConcurrentHashMap for mapStatuses [3] which makes this abstract mapStatuses attribute less abstract. I think it was the outcome of some refactoring that led to a small duplication (and makes the distinc

A note about MLlib's StandardScaler

2017-01-08 Thread Gilad Barkan
Hi It seems that the output of MLlib's *StandardScaler*(*withMean=*True, *withStd*=True)are not as expected. The above configuration is expected to do the following transformation: X -> Y = (X-Mean)/Std - Eq.1 This transformation (a.k.a. Standardization) should result in a "standardized" vecto

Re: A note about MLlib's StandardScaler

2017-01-08 Thread Holden Karau
Hi Gilad, Spark uses the sample standard variance inside of the StandardScaler (see https://spark.apache.org/docs/2.0.2/api/scala/index.html#org.apache.spark.mllib.feature.StandardScaler ) which I think would explain the results you are seeing you are seeing. I believe the scalers are intended to

Re: handling of empty partitions

2017-01-08 Thread Liang-Chi Hsieh
Hi Georg, Can you describe your question more clear? Actually, the example codes you posted in stackoverflow doesn't crash as you said in the post. geoHeil wrote > I am working on building a custom ML pipeline-model / estimator to impute > missing values, e.g. I want to fill with last good kno

Re: handling of empty partitions

2017-01-08 Thread Holden Karau
Hi Georg, Thanks for the question along with the code (as well as posting to stack overflow). In general if a question is well suited for stackoverflow its probably better suited to the user@ list instead of the dev@ list so I've cc'd the user@ list for you. As far as handling empty partitions wh

Re: A note about MLlib's StandardScaler

2017-01-08 Thread Liang-Chi Hsieh
Actually I think it is possibly that an user/developer needs the standardized features with population mean and std in some cases. It would be better if StandardScaler can offer the option to do that. Holden Karau wrote > Hi Gilad, > > Spark uses the sample standard variance inside of the Stan

Re: handling of empty partitions

2017-01-08 Thread geoHeil
Thanks a lot, Holden. @Liang-Chi Hsieh did you try to run https://gist.github.com/geoHeil/6a23d18ccec085d486165089f9f430f2 for me that is crashing in either line 51 or 58. Holden described the problem pretty well. Ist it clear for you now? Cheers, Georg Holden Karau [via Apache Spark Developers