No, here's an example:

COL1  COL2
a         one
b         two
a         two
c         three


StringIndexer.setInputCol(COL1).setOutputCol(SI1) ->

(0-> a, 1->b,2->c)
SI1
0
1
0
2

StringIndexer.setInputCol(COL2).setOutputCol(SI2) ->
(0-> one, 1->two, 2->three)
SI1
0
1
1
2

VectorAssembler.setInputCols(SI1, SI2).setOutputCol(features) ->
features
00
11
01
22


HashingTF.setNumFeatures(2).setInputCol(COL1).setOutputCol(HT1)

bucket1 bucket2
a,a,b       c

HT1
3 //Hash collision
3
3
1

Thanks,
Peter Rudenko
On 2015-08-07 09:55, praveen S wrote:

Is StringIndexer + VectorAssembler equivalent to HashingTF while converting the document for analysis?



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to