Thanks for the pointer! A linked issue from the one you shared also appears to be relevant.
SPARK-8418 <https://issues.apache.org/jira/browse/SPARK-8418>: "Add single- and multi-value support to ML Transformers" On Tue, Aug 23, 2016 at 10:41 AM Nick Pentreath <nick.pentre...@gmail.com> wrote: > It's not impossible that a Transformer could output multiple columns - > it's simply because none of the current ones do. It's true that it might be > a relatively less common use case in general. > > But take StringIndexer for example. It turns strings (categorical > features) into ints (0-based indexes). It could (should) accept multiple > input columns for efficiency (see > https://issues.apache.org/jira/browse/SPARK-11215). This is a case where > multiple output columns would be required. > > N > > > On Tue, 23 Aug 2016 at 16:15 Nicholas Chammas <nicholas.cham...@gmail.com> > wrote: > >> If you create your own Spark 2.x ML Transformer, there are multiple >> mix-ins (is that the correct term?) that you can use to define its behavior >> which are in ml/param/shared.py >> <https://github.com/apache/spark/blob/master/python/pyspark/ml/param/shared.py> >> . >> >> Among them are the following mix-ins: >> >> - HasInputCol >> - HasInputCols >> - HasOutputCol >> >> What’s *not* available is a HasOutputCols mix-in, and I assume that is >> intentional. >> >> Is there a design reason why Transformers should not be able to define >> multiple output columns? >> >> I’m guessing if you are an ML beginner who thinks they need a Transformer >> with multiple output columns, you’ve misunderstood something. 😅 >> >> Nick >> >> >