Re: Spark for offline log processing/querying

Mat Schaffer Mon, 23 May 2016 00:39:56 -0700

It's only really mildly interactive. When I used presto+hive in the past
(just a consumer not an admin) it seemed to be able to provide answers
within ~2m even for fairly large data sets. Hoping I can get a similar
level of responsiveness with spark.


Thanks, Sonal! I'll take a look at the example log processor and see what I
can come up with.


-Mat

matschaffer.com

On Mon, May 23, 2016 at 3:08 PM, Jörn Franke <[email protected]> wrote:

> Do you want to replace ELK by Spark? Depending on your queries you could
> do as you proposed. However, many of the text analytics queries will
> probably be much faster on ELK. If your queries are more interactive and
> not about batch processing then it does not make so much sense. I am not
> sure why you plan to use Presto.
>
> On 23 May 2016, at 07:28, Mat Schaffer <[email protected]> wrote:
>
> I'm curious about trying to use spark as a cheap/slow ELK
> (ElasticSearch,Logstash,Kibana) system. Thinking something like:
>
> - instances rotate local logs
> - copy rotated logs to s3
> (s3://logs/region/grouping/instance/service/*.logs)
> - spark to convert from raw text logs to parquet
> - maybe presto to query the parquet?
>
> I'm still new on Spark though, so thought I'd ask if anyone was familiar
> with this sort of thing and if there are maybe some articles or documents I
> should be looking at in order to learn how to build such a thing. Or if
> such a thing even made sense.
>
> Thanks in advance, and apologies if this has already been asked and I
> missed it!
>
> -Mat
>
> matschaffer.com
>
>

Re: Spark for offline log processing/querying

Reply via email to