*We have a mixture of:* ERROR: something bad happened
*Some logs are "actions":* user upload a file user collaborated a file *Some logs are metrics:* counter:webapp.rps:+1 -Jonathan On Fri, Jun 7, 2013 at 8:12 PM, Mark <static.void....@gmail.com> wrote: > Are these always log files in the sense of log files or do they also > contain some event data.. i.e. Product A was purchased or User A just > signed in, etc? > > On Jun 7, 2013, at 6:53 PM, Jonathan Creasy <jcre...@box.com> wrote: > > Correct, we essentially use the logs as an additional buffer in case of > outage in the pipeline. Typically though, messages are produces as soon as > they are written. > > > -Jonathan > > > On Fri, Jun 7, 2013 at 6:06 PM, Mark <static.void....@gmail.com> wrote: > >> Ok so in your use case instead of your application(s) writing directly to >> Kafka you instead have a separate process running that will tail log files >> and ship them over to Kafka. Is that correct? >> >> On Jun 7, 2013, at 5:33 PM, Jonathan Creasy <j...@box.com> wrote: >> >> > I recommend Kafka or Flume-NG for this. >> > >> > Our Analytics team is using a Kafka Producer on each server to tail logs >> > and ship them to Kafka. We use Oozie to schedule a MapReduce consumer >> every >> > few minutes to read all the Kafka topics into HDFS. >> > >> > We use Kafka as a buffer, we keep a few weeks of data there. Our >> security >> > team for example sometimes connects up and consumes some logs for >> various >> > purposes. Usually when they want aggregate log data in realtime. >> > >> > Most folks access them in HDFS. We have <1 minute of delay for most log >> > lines getting from the server where they were written to HDFS. >> > >> > -Jonathan >> > >> > >> > On Fri, Jun 7, 2013 at 5:30 PM, Mark <static.void....@gmail.com> wrote: >> > >> >> Like I said, Im a bit confused. I see the terms "events", "messages" >> and >> >> "logs" and not quite sure what to make of it. >> >> >> >> We are trying to determine the best way to aggregate all of our logs >> for >> >> processing in Hadoop. Kafka seems to fit this bill nicely however I >> want to >> >> know If its suited for other types of messages as well. Are there >> certain >> >> determine factors on why one would choose Kafka over RabbitMQ? Is it >> mostly >> >> scale or is it the type of messages/events/logs being >> produced/consumed? >> >> >> >> On Jun 7, 2013, at 5:21 PM, Alexis Richardson < >> alexis.richard...@gmail.com> >> >> wrote: >> >> >> >>> On Sat, Jun 8, 2013 at 1:08 AM, Mark <static.void....@gmail.com> >> wrote: >> >>>> Im a bit confused on the concept of a "message" in Kafka. How does >> >> this differ, if at all, from a message in RabbitMQ? It seems to me that >> >> Kafka is better suited for very write intensive "messages" like log >> data >> >> but RabbitMQ may be a better fit for traditional "messages"… i.e. >> "Product >> >> Purchased" or "User Registered" message. >> >>> >> >>> I'm not sure why you think this, or how to distinguish between a 'log' >> >>> message and some other kind. >> >>> >> >>> Messages = data, annotated with metadata. The latter is typically a >> >>> protocol-specific envelope. Kafka and Rabbit certainly have different >> >>> envelopes, eg for mapping data to subscribers/queries. >> >>> >> >>> alexis >> >> >> >> >> > >> > >> > -- >> > ** >> > >> > *Jonathan Creasy* | Sr. Ops Engineer >> > >> > e: j...@box.com | t: 314.580.8909 >> >> > >