Hi Sean and Jatin,
Could you point to some examples of load() methods that use the spark version 
vs the model version (or the columns available)?
I see only cases where we use the spark version (eg 
https://github.com/apache/spark/blob/c04ad17ccf14a07ffdb2bf637124492a341075f2/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L1239)

Logically, I think it would be better to use a separate versioning mechanism 
for models than to use the spark version, IMHO it is more reliable that way.
Especially since we patch versions of spark by merging some fixes back 
sometimes, it seems less reliable to depend on a specific spark version in the 
code.
In addition, models don't change as frequently as the spark version, and having 
an explicit versioning mechanism makes it clearer how often the saved model 
structure has changed over time.  Having said that, I think you could implement 
either way without issues if the code is written carefully - but logically, if 
I had to choose, I would prefer having a separate versioning mechanism for 
models.
Thank you, Ilya



-----Original Message-----
From: Sean Owen <sro...@gmail.com> 
Sent: Wednesday, January 16, 2019 10:12 AM
To: dev <dev@spark.apache.org>
Cc: Jatin Puri <purija...@gmail.com>
Subject: How to implement model versions in MLlib?

I know some implementations of model save/load in MLlib use an explicit version 
1.0, 2.0, 3.0 mechanism. I've also seen that some just decide based on the 
version of Spark that wrote the model.

Is one or the other preferred?

See 
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fpull%2F23549%23discussion_r248318392&amp;data=02%7C01%7Cilmat%40microsoft.com%7C29323b7fda27400fd9c008d67bc50f09%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636832483621703945&amp;sdata=jiOUVzQ5LHetLSmHUvtSTbtekNSUyeK%2FDTdZDzOZrF8%3D&amp;reserved=0
for example. In cases like this, is it simpler still to just select all the 
values written in the model and decide what to do based on the presence or 
absence of columns? That seems a little more robust. It wouldn't be so much an 
option if the contents or meaning of the columns had changed.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to