from:"Xin Liu"

Compare LogisticRegression results using Mllib with those using other libraries (e.g. statsmodel)

2015-05-20 Thread Xin Liu

Hi, I have tried a few models in Mllib to train a LogisticRegression model. However, I consistently get much better results using other libraries such as statsmodel (which gives similar results as R) in terms of AUC. For illustration purpose, I used a small data (I have tried much bigger data) ht

Re: Compare LogisticRegression results using Mllib with those using other libraries (e.g. statsmodel)

2015-05-22 Thread Xin Liu

egressionWithLBFGS regularizes the intercept while in the ml > version, the intercept is excluded from regularization. As a result, > if lambda is zero, the model should be the same. > > > > On Wed, May 20, 2015 at 3:42 PM, Xin Liu wrote: > > Hi, > > I have tried a few

Parquet Multiple Output

2015-06-12 Thread Xin Liu

Hi, I have a scenario where I'd like to store a RDD using parquet format in many files, which corresponds to days, such as 2015/01/01, 2015/02/02, etc. So far I used this method http://stackoverflow.com/questions/23995040/write-to-multiple-outputs-by-key-spark-one-spark-job to store text files

SizeEstimator

2018-02-26 Thread Xin Liu

Hi folks, We have a situation where, shuffled data is protobuf based, and SizeEstimator is taking a lot of time. We have tried to override SizeEstimator to return a constant value, which speeds up things a lot. My questions, what is the side effect of disabling SizeEstimator? Is it just spark do

Re: SizeEstimator

2018-02-26 Thread Xin Liu

Thanks! Our protobuf object is fairly complex. Even O(N) takes a lot of time. On Mon, Feb 26, 2018 at 6:33 PM, 叶先进 wrote: > H Xin Liu, > > Could you provide a concrete user case if possible(code to reproduce > protobuf object and comparisons between protobuf and normal ob

Re: SizeEstimator

2018-02-26 Thread Xin Liu

ss free > memory spilling may become more expensive. > > > If the walk is your bottleneck and not GC then I would recommend JOL and > guessing to better predict memory. > > On Mon, Feb 26, 2018, 4:47 PM Xin Liu wrote: > >> Hi folks, >> >> We have a situation w

Compare LogisticRegression results using Mllib with those using other libraries (e.g. statsmodel)

Re: Compare LogisticRegression results using Mllib with those using other libraries (e.g. statsmodel)

Parquet Multiple Output

SizeEstimator

Re: SizeEstimator

Re: SizeEstimator

6 matches

Site Navigation

Mail list logo

Footer information