Re: Persisting PySpark ML Pipelines that include custom Transformers

Holden Karau Fri, 19 Aug 2016 12:17:33 -0700

I don't think we've given a lot of thought to model persistence for custom
Python models yet - if the Python models is wrapping a JVM model using the
JavaMLWritable along with '_to_java' should work provided your Java model
alread is saveable. On the other hand - if your model isn't wrapping a Java
model you shouldn't feel the need to shoehorn yourself into this approach -
in either case much of the persistence work is up to you it's just a matter
if you do it in the JVM or Python.


On Friday, August 19, 2016, Nicholas Chammas <nicholas.cham...@gmail.com>
wrote:

> I understand persistence for PySpark ML pipelines is already present in
> 2.0, and further improvements are being made for 2.1 (e.g. SPARK-13786
> <https://issues.apache.org/jira/browse/SPARK-13786>).
>
> I’m having trouble, though, persisting a pipeline that includes a custom
> Transformer (see SPARK-17025
> <https://issues.apache.org/jira/browse/SPARK-17025>). It appears that
> there is a magic _to_java() method that I need to implement.
>
> Is the intention that developers implementing custom Transformers would
> also specify how it should be persisted, or are there ideas about how to
> make this automatic? I searched on JIRA but I’m not sure if I missed an
> issue that already addresses this problem.
>
> Nick
> 
>


-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau

Re: Persisting PySpark ML Pipelines that include custom Transformers

Reply via email to