Re: Spark for Log Analytics

2016-03-31 Thread ashish rawat
Thanks for your replies Steve and Chris. Steve, I am creating a real-time pipeline, so I am not looking to dump data to hdfs rite now. Also, since the log sources would be Nginx, Mongo and application events, it might not be possible to always route events directly from the source to flume. There

Re: Spark for Log Analytics

2016-03-31 Thread Chris Fregly
oh, and I forgot to mention Kafka Streams which has been heavily talked about the last few days at Strata here in San Jose. Streams can simplify a lot of this architecture by perform some light-to-medium-complex transformations in Kafka itself. i'm waiting anxiously for Kafka 0.10 with production

Re: Spark for Log Analytics

2016-03-31 Thread Chris Fregly
this is a very common pattern, yes. note that in Netflix's case, they're currently pushing all of their logs to a Fronting Kafka + Samza Router which can route to S3 (or HDFS), ElasticSearch, and/or another Kafka Topic for further consumption by internal apps using other technologies like Spark St

Re: Spark for Log Analytics

2016-03-31 Thread Steve Loughran
On 31 Mar 2016, at 09:37, ashish rawat mailto:dceash...@gmail.com>> wrote: Hi, I have been evaluating Spark for analysing Application and Server Logs. I believe there are some downsides of doing this: 1. No direct mechanism of collecting log, so need to introduce other tools like Flume into

Spark for Log Analytics

2016-03-31 Thread ashish rawat
Hi, I have been evaluating Spark for analysing Application and Server Logs. I believe there are some downsides of doing this: 1. No direct mechanism of collecting log, so need to introduce other tools like Flume into the pipeline. 2. Need to write lots of code for parsing different patterns from