Is the StringIndexer <https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala> keeps all the mapped label to indices in the memory of the driver machine? It seems to be unless I am missing something.
What if our data that needs to be indexed is huge and columns to be indexed are high cardinality (or with lots of categories) and more than one such column need to be indexed? Meaning it wouldn't fit in memory. Thanks. Regards, Shahab