Flume Configuration & topology approach

2014-04-03 Thread Mohit Durgapal
I hope that's not too vague. Regards Mohit Durgapal

Re: Flume Configuration & topology approach

2014-04-03 Thread Mohit Durgapal
l not want to have every agent > writing to hdfs so at some point you may consider adding a collector tier > that will aggregate the flow and reduce the connections going into your > hdfs cluster. > > -Jeff > > > > On Thu, Apr 3, 2014 at 6:20 AM, Mohit Durgapal wrote: > &g

Re: copy to hdfs

2014-06-15 Thread Mohit Durgapal
Replace -n agent with -n tier1 On Sunday, June 15, 2014, kishore alajangi wrote: > Dear Sharminder, > > Thanks for your reply, yes I am playing with flume, as you suggested i am > using spool directory source, My configuration file looks like > > *tier1.sources = source1 > tier1.channels = chan

Re: Collect TCP data over TCP stream

2014-07-31 Thread Mohit Durgapal
I am not sure if that's exactly what you need but have you tried the syslog tcp source? It can listen and consume events from a tcp queue on a specific host & port. On Fri, Aug 1, 2014 at 8:47 AM, Liu Blade wrote: > Dear all, > > The scenario is we want to collect data over TCP connection which

how to load balance flume

2014-08-14 Thread Mohit Durgapal
I have a requirement where I need to feed push traffic(comma separated logs) at a very high rate to flume. I have three concerns: 1. I am using php to send events to flume through rsyslog. The code I am using is : *openlog("mylogs", LOG_NDELAY, LOG_LOCAL2); syslog(LOG_INFO, "aaid,bid,ci

Re: how to load balance flume

2014-08-14 Thread Mohit Durgapal
rement but if its > based on the headers, you'll have to write your own interceptors. > > > > On Thu, Aug 14, 2014 at 12:55 PM, Mohit Durgapal > wrote: > >> I have a requirement where I need to feed push traffic(comma separated >> logs) at a very high rate to fl

avro source vs syslog source in flume

2014-08-24 Thread Mohit Durgapal
I have to make a flume topology in which I can divide the events based on some header value to different sinks through selectors. My logging script is implemented in php, and this php script writes to syslog which then forwards the events to a pre-configured flume node. Could anyone tell me which

issue with failover sinks in flume

2014-09-16 Thread Mohit Durgapal
We have a two stage topology in flume in which we are in the first tier adding headers based on hash value of a field in the event. The hashing logic is added in the interceptor in Tier 1 of flume topology which basically sets a header field. And then we use multiplexing to direct events to Tier 2

Re: issue with failover sinks in flume

2014-09-16 Thread Mohit Durgapal
e inverted. The higher > the value of the priority, earlier that sink will get picked up for > processing. So sink with priority 11 gets picked up before sink with > priority 1. > > > > On Tue, Sep 16, 2014 at 5:55 AM, Mohit Durgapal > wrote: > >> We have a two st

Re: issue with failover sinks in flume

2014-09-17 Thread Mohit Durgapal
Has anyone ever faced similar problems with sink failover? On Wed, Sep 17, 2014 at 11:25 AM, Mohit Durgapal wrote: > Hi Hari, > > Even after inverting the priorities the same problem occurs. Are the sink > priorities specific to a sink group or is it can be defined just once for

Re: issue with failover sinks in flume

2014-09-17 Thread Mohit Durgapal
Hari > > > On Tue, Sep 16, 2014 at 6:01 AM, Mohit Durgapal > wrote: > >> We have a two stage topology in flume in which we are in the first tier >> adding headers based on hash value of a field in the event. >> The hashing logic is added in the interceptor in Tier 1

Re: issue with failover sinks in flume

2014-09-18 Thread Mohit Durgapal
Hi Hari, Were you able to find if there's something wrong with the config ? Regards Mohit On Thu, Sep 18, 2014 at 10:44 AM, Mohit Durgapal wrote: > Hi Hari, > > This is our latest config: > > > > agent1tier1.sources = tcpsrc > agent1tier1.sinks = avro-forward

Failover sink groups

2014-09-30 Thread Mohit Durgapal
Has anyone been able to use flume failover functionality successfully? I tried using it but it sends events to all the sinks(including the low priority ones) in the sink group even when the primary sink(the high priority one) is alive and running. Any help would be great. Regards Mohit

Re: Failover sink groups

2014-09-30 Thread Mohit Durgapal
first. > > Thanks, > Hari > > > On Tue, Sep 30, 2014 at 11:53 AM, Mohit Durgapal > wrote: > >> Has anyone been able to use flume failover functionality successfully? I >> tried using it but it sends events to all the sinks(including the low >> priority ones) in

issue with rsyslog , AWS ELB & flume

2014-10-15 Thread Mohit Durgapal
Hi All, I have been using a php script to write logs to rsyslog and then rsyslog sends messages directly to flume(syslogtcp source) on a tcp port. Now as I am moving to AWS I want to introduce an ELB(Elastic Load Balancer) layer between rsyslog & flume nodes. So I added an ELB with tcp port forwa

flume syslog source max msg size

2014-10-27 Thread Mohit Durgapal
Hi, I am using rsyslog to send messages to flume nodes via AWS ELB. On flume nodes I am using the source type *syslogtcp * where the ELB forwards the messages. Now I see the messages that are over 2k in size are being broken into chunks of size 2k when I receive them in flume. As my messages are

Re: flume syslog source max msg size

2014-10-30 Thread Mohit Durgapal
Hi Jeff & Santiago, Thanks for your help!! I realized that just after posting that question. Sorry for not updating it earlier. Thanks Mohit On Thu, Oct 30, 2014 at 5:36 PM, Santiago Mola wrote: > Hi Mohit, > > 2014-10-27 19:30 GMT+01:00 Mohit Durgapal : > >> >>

Re: Appending data into HDFS Sink!

2015-01-19 Thread Mohit Durgapal
But why do you want your MR Job to read from the .tmp file? .tmp means it is a temporary file i.e it's state is not specific(at least not to the user) and hence you're not supposed to read from it. Your MR Job should only consider files that are not ending with .tmp. Also, there's very high probabi

Re: When to stop an agent with a Spool Directory source

2015-01-22 Thread Mohit Durgapal
I am not sure about your use case but I am writing about my experiences with flume. Generally you would use flume for streaming data and will not stop the components ever. You have source & sinks running all the time so that as soon as an application writes into the spool directory, the source rea

Re: When to stop an agent with a Spool Directory source

2015-01-22 Thread Mohit Durgapal
correction: instead of "while starting the sink" i meant "while starting the agent". On Fri, Jan 23, 2015 at 11:39 AM, Mohit Durgapal wrote: > > Currently I am using cloudera to start/stop flume agents. > > But earlier I used to provide the source& sin

Re: When to stop an agent with a Spool Directory source

2015-01-22 Thread Mohit Durgapal
ards Mohit On Fri, Jan 23, 2015 at 11:01 AM, Carlotta Hicks wrote: > How/where are you starting your sources and sinks? > > > > *From:* Mohit Durgapal [mailto:durgapalmo...@gmail.com] > *Sent:* Friday, January 23, 2015 12:20 AM > *To:* user > *Subject:* Re: When to stop an

Re: Flume to handle spam detection and rate limiting

2015-01-24 Thread Mohit Durgapal
Hi, I don't think that there is any spam detection features that are in-built in flume. But don't you think spam-filtering and check against other attacks should happen a layer before flume, maybe add a layer of LBs between the web-server(or any other source generating data stream) and flume? Di

Re: Add hostname to event body

2015-08-18 Thread Mohit Durgapal
Why don't you try writing a custom interceptor? It's quite easy and it gives you the freedom to do anything you want to do with the incoming events. Regards Mohit On Tuesday, August 18, 2015, Balthasar Schopman < b.schop...@tech.leaseweb.com> wrote: > Hi, > > We're setting up Flume to monitor the