Hello all New Spark user here. We've been looking at the Spark ecosystem to build some new parts of our log processing pipeline.
The spark-dataflow project looks especially interesting. The windowing and triggers concepts look like a good fit for what we need to do: our log data going into Kafka is only in approximate time order and some events are sometimes delayed for quite some time. https://cloud.google.com/dataflow/model/windowing https://cloud.google.com/dataflow/model/triggers A few questions: 0. Is this a good forum for questions about spark-dataflow? 1. Is anybody using spark-dataflow for serious projects running outside of Google Cloud? How's it going with 0.2.3? Do windowing and triggers work? 2. Is anybody looking at adding support for Spark Streaming to spark-dataflow? It looks like SparkPipelineRunner and other parts would need to be extended to understand about StreamingContext. 3. Are there good alternatives to spark-dataflow that should be considered? 4. Should we be looking at rolling our own windowing + triggers setup directly on top of Spark/Spark Streaming instead of trying to use spark-dataflow? 5. If number 4 sounds like an option, is there any code out there that is doing this already that we can look at for some inspiration? Any advice appreciated. Thanks Albert --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org