Looks like you're not registering the input param correctly. Below are examples from the Spark Java source that show how to build a custom transformer. Note that a Model is a Transformer.
Also, that chimpler/wordpress/naive bayes example is a bit dated. I tried to implement it a while ago, but didn't get very far given that the ML API has charged ahead in favor of pipelines and the new spark.ml package. https://github.com/apache/spark/blob/branch-1.5/examples/src/main/java/org/apache/spark/examples/ml/JavaDeveloperApiExample.java#L126 https://github.com/apache/spark/blob/branch-1.5/examples/src/main/java/org/apache/spark/examples/ml/JavaDeveloperApiExample.java#L191 On Fri, Jan 1, 2016 at 11:38 AM, Andy Davidson < a...@santacruzintegration.com> wrote: > I am trying to write a trivial transformer I use use in my pipeline. I am > using java and spark 1.5.2. It was suggested that I use the Tokenize.scala > class as an example. This should be very easy how ever I do not understand > Scala, I am having trouble debugging the following exception. > > Any help would be greatly appreciated. > > Happy New Year > > Andy > > java.lang.IllegalArgumentException: requirement failed: Param > null__inputCol does not belong to > Stemmer_2f3aa96d-7919-4eaa-ad54-f7c620b92d1c. > at scala.Predef$.require(Predef.scala:233) > at org.apache.spark.ml.param.Params$class.shouldOwn(params.scala:557) > at org.apache.spark.ml.param.Params$class.set(params.scala:436) > at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:37) > at org.apache.spark.ml.param.Params$class.set(params.scala:422) > at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:37) > at org.apache.spark.ml.UnaryTransformer.setInputCol(Transformer.scala:83) > at com.pws.xxx.ml.StemmerTest.test(StemmerTest.java:30) > > > > public class StemmerTest extends AbstractSparkTest { > > @Test > > public void test() { > > Stemmer stemmer = new Stemmer() > > .setInputCol("raw”) //*line 30* > > .setOutputCol("filtered"); > > } > > } > > > /** > > * @ see spark-1.5.1/mllib/src/main/scala/org/apache > /spark/ml/feature/Tokenizer.scala > > * @ see https://chimpler.wordpress.com/2014/06/11/classifiying-documents- > using-naive-bayes-on-apache-spark-mllib/ > > * @ see http://www.tonytruong.net/movie-rating-prediction-with-apache- > spark-and-hortonworks/ > > * > > * @author andrewdavidson > > * > > */ > > public class Stemmer extends UnaryTransformer<List<String>, List<String>, > Stemmer> implements Serializable{ > > static Logger logger = LoggerFactory.getLogger(Stemmer.class); > > private static final long serialVersionUID = 1L; > > private static final ArrayType inputType = > DataTypes.createArrayType(DataTypes.StringType, true); > > private final String uid = Stemmer.class.getSimpleName() + "_" + > UUID.randomUUID().toString(); > > > @Override > > public String uid() { > > return uid; > > } > > > /* > > override protected def validateInputType(inputType: DataType): > Unit = { > > require(inputType == StringType, s"Input type must be string type but > got $inputType.") > > } > > */ > > @Override > > public void validateInputType(DataType inputTypeArg) { > > String msg = "inputType must be " + inputType.simpleString() + " > but got " + inputTypeArg.simpleString(); > > assert (inputType.equals(inputTypeArg)) : msg; > > } > > > > @Override > > public Function1<List<String>, List<String>> createTransformFunc() { > > // > http://stackoverflow.com/questions/6545066/using-scala-from-java-passing-functions-as-parameters > > Function1<List<String>, List<String>> f = new > AbstractFunction1<List<String>, List<String>>() { > > public List<String> apply(List<String> words) { > > for(String word : words) { > > logger.error("AEDWIP input word: {}", word); > > } > > return words; > > } > > }; > > > > return f; > > } > > > @Override > > public DataType outputDataType() { > > return DataTypes.createArrayType(DataTypes.StringType, true); > > } > > } > -- *Chris Fregly* Principal Data Solutions Engineer IBM Spark Technology Center, San Francisco, CA http://spark.tc | http://advancedspark.com