Could the function MLUtils.loadLibSVMFile be modified to support zero-based-index data?

2014-07-07 Thread Lizhengbing (bing, BIPA)
1) I download the imdb data from http://komarix.org/ac/ds/Blanc__Mel.txt.bz2 and use this data to test LBFGS When I run examples referencing http://spark.apache.org/docs/latest/mllib-optimization.html, an error occus. 4/07/07 08:37:27 ERROR Executor: Exception in task ID 2 java.lang.ArrayIndex

Re: Contributing to MLlib on GLM

2014-07-07 Thread Gang Bai
Poisson and Gamma regressions for modeling count data are definitely important in spark.mllib.regression. So don’t worry. Let’s change the updater to SquaredL2Updater as we discussed in the PR. Then we can ask Jenkins to run the test. On Jul 8, 2014, at 3:00 AM, xwei wrote: > Hi Gang, > > No

Re: Invalid link for Spark 1.0.0 in Official Web Site

2014-07-07 Thread Reynold Xin
Thanks for reporting this. I just fixed it. On Fri, Jul 4, 2014 at 11:14 AM, Kousuke Saruta wrote: > Hi, > > I found there is a invalid link in > . > The link for release note of Spark 1.0.0 indicates > http://spark.apache.org/releases/spark-release-1.0

Re: [VOTE] Release Apache Spark 1.0.1 (RC2)

2014-07-07 Thread Tom Graves
+1. Ran some Spark on yarn jobs on a hadoop 2.4 cluster with authentication on. Tom On Friday, July 4, 2014 2:39 PM, Patrick Wendell wrote: Please vote on releasing the following candidate as Apache Spark version 1.0.1! The tag to be voted on is v1.0.1-rc1 (commit 7d1043c): https://git-wip

Re: Contributing to MLlib on GLM

2014-07-07 Thread xwei
Hi Gang, No admin is looking at our patch:( do you have some suggestions so that our patch can get noticed by the admin? Best regards, Xiaokai On Mon, Jun 30, 2014 at 8:18 PM, Gang Bai [via Apache Spark Developers List] wrote: > Thanks Xiaokai, > > I’ve created a pull request to merge featur

Re: [VOTE] Release Apache Spark 1.0.1 (RC2)

2014-07-07 Thread Xiangrui Meng
+1 Ran mllib examples. On Sun, Jul 6, 2014 at 1:21 PM, Matei Zaharia wrote: > +1 > > Tested on Mac OS X. > > Matei > > On Jul 6, 2014, at 1:54 AM, Andrew Or wrote: > >> +1, verified that the UI bug is in fact fixed in >> https://github.com/apache/spark/pull/1255. >> >> >> 2014-07-05 20:01 GMT-0

Re: Constraint Solver for Spark

2014-07-07 Thread Xiangrui Meng
Hey Deb, If your goal is to solve the subproblems in ALS, exploring sparsity doesn't give you much benefit because the data is small and dense. Porting either ECOS's or PDCO's implementation but using dense representation should be sufficient. Feel free to open a JIRA and we can move our discussio

Re: PLSA

2014-07-07 Thread Denis Turdakov
Hi, Deb. Thanks for your idea to use ALS for PLSA training. I discussed it with our engineers and it seems it's better to use EM. We have the following points: 1. We have some doubts that ALS is applicable to the problem. By its definition, PLSA is a matrix decomposition with respect to Kullback–