I'm building an LDA Pipeline, currently with 4 steps, Tokenizer, StopWordsRemover, CountVectorizer, and LDA. I would like to add more steps, for example, stemming and lemmatization, and also 1-gram and 2-grams (which I believe is not supported by the default NGram class). Is there a way to add these steps? In sklearn, you can create classes with fit() and transform() methods, and that should be enough. Is that true in Spark ML as well (or something similar)?
-- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-add-custom-steps-to-Pipeline-models-tp27522.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org