Hi,
I have tried a few models in Mllib to train a LogisticRegression model.
However, I consistently get much better results using other libraries such
as statsmodel (which gives similar results as R) in terms of AUC. For
illustration purpose, I used a small data (I have tried much bigger data)
ht
egressionWithLBFGS regularizes the intercept while in the ml
> version, the intercept is excluded from regularization. As a result,
> if lambda is zero, the model should be the same.
>
>
>
> On Wed, May 20, 2015 at 3:42 PM, Xin Liu wrote:
>
> Hi,
>
> I have tried a few
Hi,
I have a scenario where I'd like to store a RDD using parquet format in
many files, which corresponds to days, such as 2015/01/01, 2015/02/02, etc.
So far I used this method
http://stackoverflow.com/questions/23995040/write-to-multiple-outputs-by-key-spark-one-spark-job
to store text files
Hi folks,
We have a situation where, shuffled data is protobuf based, and
SizeEstimator is taking a lot of time.
We have tried to override SizeEstimator to return a constant value, which
speeds up things a lot.
My questions, what is the side effect of disabling SizeEstimator? Is it
just spark do
Thanks!
Our protobuf object is fairly complex. Even O(N) takes a lot of time.
On Mon, Feb 26, 2018 at 6:33 PM, 叶先进 wrote:
> H Xin Liu,
>
> Could you provide a concrete user case if possible(code to reproduce
> protobuf object and comparisons between protobuf and normal ob
ss free
> memory spilling may become more expensive.
>
>
> If the walk is your bottleneck and not GC then I would recommend JOL and
> guessing to better predict memory.
>
> On Mon, Feb 26, 2018, 4:47 PM Xin Liu wrote:
>
>> Hi folks,
>>
>> We have a situation w