Is the StringIndexer
<https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala>
keeps all the mapped label to indices in the memory of the driver machine?
It seems to be unless I am missing something.

What if our data that needs to be indexed is huge and columns to be indexed
are high cardinality (or with lots of categories) and more than one such
column need to be indexed? Meaning it wouldn't fit in memory.

Thanks.

Regards,
Shahab

Reply via email to