Thanks Jeff, your explanation was very useful.
On Mon, Feb 25, 2013 at 12:37 PM, Jeff Lord <[email protected]> wrote: > Daniel, > > Flume was designed as a configurable pipeline for discrete events in order > to get them reliably from a source (e.g. web server application) -> to a > destination (e.g. into hdfs). > Flume provides the facility to write the same event to multiple > destinations (e.g. HDFS and Hbase or HDFS and Cassandra). > There is also a third party cassandra plugin (sink) for Flume NG that will > write events into Cassandra. > https://github.com/btoddb/flume-ng-cassandra-sink > Whether or not you process the log "in the fly" is going to depend on your > use case and resources, but if it is feasible than writing directly into > Cassandra is probably going to be the most efficient. > > I am not personally familiar with the logprocessing plugin you mention but > it appears to be built on top of the old flume. > We highly recommend using Flume NG going forward, so it sounds like you > might want to try Flume NG with the cassandra sink. > > Hope this helps. > > -Jeff > > > > > > On Sun, Feb 24, 2013 at 8:39 PM, Daniel Bruno <[email protected]>wrote: > >> Hello everyone, >> >> I'm researching about Flume as a solution for web analytics. >> >> I read some texts about that, and my idea is to use Flume to collect the >> logs and put in a Cassadra database. But first i have some doubts that I >> wanna share. >> >> Is a good approach process the log "in the fly" and insert it in the >> database processed? >> >> Or is better collect the log, and store them (e.g. HDFS), and have >> scheduled jobs with Pig and later insert in a database like HBase or >> Cassandra? >> >> I found an interesting solution made by Gemini (now Cloudian) called >> logprocessing, someone used it? >> >> >> Thanks >> -- >> Daniel Bruno >> http://danielbruno.eti.br >> > > -- Daniel Bruno http://danielbruno.eti.br
