Probably worth noting that the factory methods in mllib create an object of 
type org.apache.spark.mllib.linalg.Vector which stores data in a similar format 
as Breeze vectors

Chris

On Sep 15, 2014, at 3:24 PM, Xiangrui Meng <men...@gmail.com> wrote:

> Or you can use the factory method `Vectors.sparse`:
> 
> val sv = Vectors.sparse(numProducts, productIds.map(x => (x, 1.0)))
> 
> where numProducts should be the largest product id plus one.
> 
> Best,
> Xiangrui
> 
> On Mon, Sep 15, 2014 at 12:46 PM, Chris Gore <cdg...@cdgore.com> wrote:
>> Hi Sameer,
>> 
>> MLLib uses Breeze’s vector format under the hood.  You can use that.
>> http://www.scalanlp.org/api/breeze/index.html#breeze.linalg.SparseVector
>> 
>> For example:
>> 
>> import breeze.linalg.{DenseVector => BDV, SparseVector => BSV, Vector => BV}
>> 
>> val numClasses = classes.distinct.count.toInt
>> 
>> val userWithClassesAsSparseVector = rows.map(x => (x.userID, new
>> BSV[Double](x.classIDs.sortWith(_ < _),
>> Seq.fill(x.classIDs.length)(1.0).toArray,
>> numClasses).asInstanceOf[BV[Double]]))
>> 
>> Chris
>> 
>> On Sep 15, 2014, at 11:28 AM, Sameer Tilak <ssti...@live.com> wrote:
>> 
>> Hi All,
>> I have transformed the data into following format: First column is user id,
>> and then all the other columns are class ids. For a user only class ids that
>> appear in this row have value 1 and others are 0.  I need to crease a sparse
>> vector from this. Does the API for creating a sparse vector that can
>> directly support this format?
>> 
>> User id    Product class ids
>> 
>> 2622572 145447 1620 13421 28565 285556 293 4553 67261 130 3646 1671 18806
>> 183576 3286 51715 57671 57476
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to