My pipeline (i.e. a 2.0 Pipeline) is mostly made of the built-in transformers and estimators that come with Spark. One transformer, however, is custom (i.e. I subclassed Transformer), and all it does is use a UDF to append a VectorUDT column to a DataFrame.
To speak in more concrete terms, my custom transformer takes two columns that contain people’s names, and appends a column of features describing how similar those names are. So I’m not sure where I stand as far as being able to persist this Pipeline which includes my custom Transformer. It sounds like you’re saying I need to do the work of defining how to persist and unpersist this Transformer myself. Is that correct? Is there an example I can reference of how I might do that? Looking at this instance <https://github.com/apache/spark/blob/acaf2a81ad5238fd1bc81e7be2c328f40c07e755/python/pyspark/ml/classification.py#L1421-L1433> of _to_java() from a built-in Estimator, for example, doesn’t give me any clues as to how I’d do it for my custom Transformer. Nick On Fri, Aug 19, 2016 at 3:16 PM Holden Karau <hol...@pigscanfly.ca> wrote: > I don't think we've given a lot of thought to model persistence for custom > Python models yet - if the Python models is wrapping a JVM model using the > JavaMLWritable along with '_to_java' should work provided your Java model > alread is saveable. On the other hand - if your model isn't wrapping a Java > model you shouldn't feel the need to shoehorn yourself into this approach - > in either case much of the persistence work is up to you it's just a matter > if you do it in the JVM or Python. > > On Friday, August 19, 2016, Nicholas Chammas <nicholas.cham...@gmail.com> > wrote: > >> I understand persistence for PySpark ML pipelines is already present in >> 2.0, and further improvements are being made for 2.1 (e.g. SPARK-13786 >> <https://issues.apache.org/jira/browse/SPARK-13786>). >> >> I’m having trouble, though, persisting a pipeline that includes a custom >> Transformer (see SPARK-17025 >> <https://issues.apache.org/jira/browse/SPARK-17025>). It appears that >> there is a magic _to_java() method that I need to implement. >> >> Is the intention that developers implementing custom Transformers would >> also specify how it should be persisted, or are there ideas about how to >> make this automatic? I searched on JIRA but I’m not sure if I missed an >> issue that already addresses this problem. >> >> Nick >> >> > > > -- > Cell : 425-233-8271 > Twitter: https://twitter.com/holdenkarau > >