Hi Sean and Jatin, Could you point to some examples of load() methods that use the spark version vs the model version (or the columns available)? I see only cases where we use the spark version (eg https://github.com/apache/spark/blob/c04ad17ccf14a07ffdb2bf637124492a341075f2/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L1239)
Logically, I think it would be better to use a separate versioning mechanism for models than to use the spark version, IMHO it is more reliable that way. Especially since we patch versions of spark by merging some fixes back sometimes, it seems less reliable to depend on a specific spark version in the code. In addition, models don't change as frequently as the spark version, and having an explicit versioning mechanism makes it clearer how often the saved model structure has changed over time. Having said that, I think you could implement either way without issues if the code is written carefully - but logically, if I had to choose, I would prefer having a separate versioning mechanism for models. Thank you, Ilya -----Original Message----- From: Sean Owen <sro...@gmail.com> Sent: Wednesday, January 16, 2019 10:12 AM To: dev <dev@spark.apache.org> Cc: Jatin Puri <purija...@gmail.com> Subject: How to implement model versions in MLlib? I know some implementations of model save/load in MLlib use an explicit version 1.0, 2.0, 3.0 mechanism. I've also seen that some just decide based on the version of Spark that wrote the model. Is one or the other preferred? See https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fpull%2F23549%23discussion_r248318392&data=02%7C01%7Cilmat%40microsoft.com%7C29323b7fda27400fd9c008d67bc50f09%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636832483621703945&sdata=jiOUVzQ5LHetLSmHUvtSTbtekNSUyeK%2FDTdZDzOZrF8%3D&reserved=0 for example. In cases like this, is it simpler still to just select all the values written in the model and decide what to do based on the presence or absence of columns? That seems a little more robust. It wouldn't be so much an option if the contents or meaning of the columns had changed. --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org