On Apr 12, 2013, at 11:55pm, Ted Dunning wrote:

> The first thing to try is feature hashing to reduce your feature vector size. 
>  

Unfortunately LibLinear takes feature indices directly (assumes they're 
sequential ints from 0..n-1), so I don't think feature hashing will help here.

If I constructed a minimal perfect hash function then I could skip storing the 
mapping from feature to index, but that's not what's taking most of the memory; 
it's the n x m array of weights used by LibLinear.

> With multiple probes and possibly with random weights you might be able to 
> drop the size by 10x. 

More details here would be great, sometime when you're not trying to type on an 
iPhone :)

-- Ken

PS - My initial naive idea was to remove any row where all of the weights were 
below a threshold that I calculated from the distribution of all weights. 


> 
> Sent from my iPhone
> 
> On Apr 12, 2013, at 18:30, Ken Krugler <[email protected]> wrote:
> 
>> Hi all,
>> 
>> We're (ab)using LibLinear (linear SVM) as a multi-class classifier, with 
>> 200+ labels and 400K features.
>> 
>> This results in a model that's > 800MB, which is a bit unwieldy. 
>> Unfortunately LibLinear uses a full array of weights (nothing sparse), being 
>> a port from the C version.
>> 
>> I could do feature reduction (removing rows from the matrix) with Mahout 
>> prior to training the model, but I'd prefer to reduce the (in memory) nxm 
>> array of weights.
>> 
>> Any suggestions for approaches to take?
>> 
>> Thanks,
>> 
>> -- Ken
>> 
>> --------------------------
>> Ken Krugler
>> +1 530-210-6378
>> http://www.scaleunlimited.com
>> custom big data solutions & training
>> Hadoop, Cascading, Cassandra & Solr
>> 
>> 
>> 
>> 
>> 

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr





Reply via email to