ly,
>
> DB Tsai
> ---
> My Blog: https://www.dbtsai.com
> LinkedIn: https://www.linkedin.com/in/dbtsai
>
>
> On Wed, Sep 3, 2014 at 9:28 PM, Jiusheng Chen
> wrote:
>
>> Thanks DB and Xiangrui. Glad to know you guys are actively working on i
7:34 PM, Xiangrui Meng wrote:
>
>> +DB & David (They implemented QWLQN on Spark today.)
>> On Sep 3, 2014 7:18 PM, "Jiusheng Chen" wrote:
>>
>>> Hi Xiangrui,
>>>
>>> A side-by question about MLLib.
>>> It looks current LBFGS
gt;> Assuming that your data is very sparse, I would recommend
>> >> RDD.repartition. But if it is not the case and you don't want to
>> >> shuffle the data, you can try a CombineInputFormat and then parse the
>> >> lines into labeled points. Coale
How about increase HDFS file extent size? like current value is 128M, we
make it 512M or bigger.
On Tue, Aug 12, 2014 at 11:46 AM, ZHENG, Xu-dong wrote:
> Hi all,
>
> We are trying to use Spark MLlib to train super large data (100M features
> and 5B rows). The input data in HDFS has ~26K partit
It seems MLlib right now doesn't support weighted training, training samples
have equal importance. Weighted training can be very useful to reduce data
size and speed up training.
Do you have plan to support it in future? The data format will be something
like:
label:*weight * index1:value1 inde