Hi,
I am building a pipeline and I've read most that I can find on the topic
(spark.ml library and the AMPcamp version of pipelines:
http://ampcamp.berkeley.edu/5/exercises/image-classification-with-pipelines.html).
I do not have structured data as in the case of the new Spark.ml library
which uses SchemaRDD/DataFrames so the second alternative seems the most
convenient to me. I'm writing in Scala. 

The problem I have is that I want to build a pipeline that can be able to be
branched in (at least) two ways;
1. One of my steps outputs an Either-type (where the output is either an
object containing statistics to why this step/data failed or contain the
expected output). So I would like to branch the pipeline to either skip the
rest of the pipeline and continue in a reporting-step (write a report with
the help of the statistics object) or that the pipeline is continued to the
next step in the pipeline. In the generic case this could of course be two
independent pipelines (like a first pipeline-node that takes multiple
datatypes and passes the input to the correct pipeline in the following
step). 
 
2. The other way I would like to branch the pipeline is to send the same
data to multiple new pipeline-nodes. These nodes are not dependent on each
other so they should just branch of. In the generic case this could be two
new pipelines themselves. 

Has anyone tried this or have a nice idea of how this could be performed? I
like the "simplicity" in the AMPcamp-pipeline which relies on type-safety,
but I'm confused about how to create a branching pipeline using only
type-declarations. 

Thanks,
Staffan  



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Pipelines-for-controlling-workflow-tp22403.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to