No, here's an example: COL1 COL2 a one b two a two c three
StringIndexer.setInputCol(COL1).setOutputCol(SI1) -> (0-> a, 1->b,2->c) SI1 0 1 0 2 StringIndexer.setInputCol(COL2).setOutputCol(SI2) -> (0-> one, 1->two, 2->three) SI1 0 1 1 2 VectorAssembler.setInputCols(SI1, SI2).setOutputCol(features) -> features 00 11 01 22 HashingTF.setNumFeatures(2).setInputCol(COL1).setOutputCol(HT1) bucket1 bucket2 a,a,b c HT1 3 //Hash collision 3 3 1 Thanks, Peter Rudenko On 2015-08-07 09:55, praveen S wrote:
Is StringIndexer + VectorAssembler equivalent to HashingTF while converting the document for analysis?
--------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org