Re: Scheduling Spark process

2015-11-08 Thread Hitoshi Ozawa
I'm not getting your question about scheduling. Did you create a Spark application and asking how to schedule it to run? Are you going to output results from the scheduled run in hdfs and join them in the first chain with the real time result? -- View this message in context: http://apache-spar

Re: Scheduling Spark process

2015-11-05 Thread Danilo Rizzo
Hi Adrian, yes, your assumption is correct. I'm using HBase for storing the partial calculations. Thank you for the feedbacks - it is exactly what I had in mind. Thx D On Thu, Nov 5, 2015 at 10:43 AM, Adrian Tanase wrote: > You should also specify how you’re planning to query or “publish” th

Re: Scheduling Spark process

2015-11-05 Thread Adrian Tanase
You should also specify how you’re planning to query or “publish” the data. I would consider a combination of: - spark streaming job that ingests the raw events in real time, validates, pre-process and saves to stable storage - stable storage could be HDFS/parquet or a database optimized for ti