Hi, I am building a pipeline and I've read most that I can find on the topic (spark.ml library and the AMPcamp version of pipelines: http://ampcamp.berkeley.edu/5/exercises/image-classification-with-pipelines.html). I do not have structured data as in the case of the new Spark.ml library which uses SchemaRDD/DataFrames so the second alternative seems the most convenient to me. I'm writing in Scala.
The problem I have is that I want to build a pipeline that can be able to be branched in (at least) two ways; 1. One of my steps outputs an Either-type (where the output is either an object containing statistics to why this step/data failed or contain the expected output). So I would like to branch the pipeline to either skip the rest of the pipeline and continue in a reporting-step (write a report with the help of the statistics object) or that the pipeline is continued to the next step in the pipeline. In the generic case this could of course be two independent pipelines (like a first pipeline-node that takes multiple datatypes and passes the input to the correct pipeline in the following step). 2. The other way I would like to branch the pipeline is to send the same data to multiple new pipeline-nodes. These nodes are not dependent on each other so they should just branch of. In the generic case this could be two new pipelines themselves. Has anyone tried this or have a nice idea of how this could be performed? I like the "simplicity" in the AMPcamp-pipeline which relies on type-safety, but I'm confused about how to create a branching pipeline using only type-declarations. Thanks, Staffan -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Pipelines-for-controlling-workflow-tp22403.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org