It's not impossible that a Transformer could output multiple columns - it's simply because none of the current ones do. It's true that it might be a relatively less common use case in general.
But take StringIndexer for example. It turns strings (categorical features) into ints (0-based indexes). It could (should) accept multiple input columns for efficiency (see https://issues.apache.org/jira/browse/SPARK-11215). This is a case where multiple output columns would be required. N On Tue, 23 Aug 2016 at 16:15 Nicholas Chammas <nicholas.cham...@gmail.com> wrote: > If you create your own Spark 2.x ML Transformer, there are multiple > mix-ins (is that the correct term?) that you can use to define its behavior > which are in ml/param/shared.py > <https://github.com/apache/spark/blob/master/python/pyspark/ml/param/shared.py> > . > > Among them are the following mix-ins: > > - HasInputCol > - HasInputCols > - HasOutputCol > > What’s *not* available is a HasOutputCols mix-in, and I assume that is > intentional. > > Is there a design reason why Transformers should not be able to define > multiple output columns? > > I’m guessing if you are an ML beginner who thinks they need a Transformer > with multiple output columns, you’ve misunderstood something. 😅 > > Nick > >