Thanks for your replies Steve and Chris.
Steve,
I am creating a real-time pipeline, so I am not looking to dump data to
hdfs rite now. Also, since the log sources would be Nginx, Mongo and
application events, it might not be possible to always route events
directly from the source to flume. There
oh, and I forgot to mention Kafka Streams which has been heavily talked
about the last few days at Strata here in San Jose.
Streams can simplify a lot of this architecture by perform some
light-to-medium-complex transformations in Kafka itself.
i'm waiting anxiously for Kafka 0.10 with production
this is a very common pattern, yes.
note that in Netflix's case, they're currently pushing all of their logs to
a Fronting Kafka + Samza Router which can route to S3 (or HDFS),
ElasticSearch, and/or another Kafka Topic for further consumption by
internal apps using other technologies like Spark St
On 31 Mar 2016, at 09:37, ashish rawat
mailto:dceash...@gmail.com>> wrote:
Hi,
I have been evaluating Spark for analysing Application and Server Logs. I
believe there are some downsides of doing this:
1. No direct mechanism of collecting log, so need to introduce other tools like
Flume into
Hi,
I have been evaluating Spark for analysing Application and Server Logs. I
believe there are some downsides of doing this:
1. No direct mechanism of collecting log, so need to introduce other tools
like Flume into the pipeline.
2. Need to write lots of code for parsing different patterns from