I'd like to setup a tiered configuration with a number of Avro sources
accepting events from application servers which add some headers and then
forward onto a single agent that persists to HDFS.
This is the same topology as in the 'Consolidation' example in the User Guide.
I would like to use
interceptor with Flume 1.2
as packaged by CDH 4.x?
thank you,
Paul Chavez
easier.
Brock
On Mon, Dec 10, 2012 at 10:36 AM, Paul Chavez
wrote:
> I would like to use the new regex interceptor to pull timestamp values out of
> my event data. However, I do not manage the Hadoop installation at my work
> and we are using the CDH distributions which currently ha
ceptor which has
caused post processing pain as the event timestamps are off by ~5 minutes from
the header. I'm looking forward to using regex capture interceptor to timestamp
the events with the event time soon.
Thanks,
Paul Chavez
-Original Message-
From: Brock Noland [mailto:b
Any insight or advice is appreciated,
thank you,
Paul Chavez
n
the Pig case, not sure about Hive though.
Hari
On Thursday, December 27, 2012, Paul Chavez wrote:
This is kind of a generic HDFS question, but it does relate to flume, so
hopefully someone can provide feedback.
I have a flume configuration that sinks to HDFS using timestamp headers. I
wou
iated.
Thank you,
Paul Chavez
-Original Message-
From: Brock Noland [mailto:br...@cloudera.com]
Sent: Monday, December 10, 2012 8:55 AM
To: user@flume.apache.org
Subject: Re: Possible to use Regex Interceptor in Flume 1.2
Hi,
If you built flume 1.3.0 and took the RegexExtractorInterceptor
iated..
Thanks,
Paul Chavez
upgrade
You need to increase the transactionCapacity of the channel to at least the
batchSize of the HDFS sink. In your case, it is 1000 for the channel
transaction capacity and your hdfs batch size is 1.
--
Hari Shreedharan
On Thursday, February 28, 2013 at 4:00 PM, Paul Chavez wrote:
Oh I see the error. You said transaction capacity. It is defaulting to 1000, I
have never configured it before, just relied on defaults. Configuring it to
1 worked.
Thank you,
Paul Chavez
-Original Message-
From: Paul Chavez
Sent: Thursday, February 28, 2013 4:11 PM
To: '
r added in the filePrefix
staging2.sinks.hdfs_FilterLogst.type = hdfs
staging2.sinks.hdfs_FilterLogs.channel = mc_FilterLogs
staging2.sinks.hdfs_FilterLogs.hdfs.path = /flume_stg/FilterLogsJSON/%Y%m%d
staging2.sinks.hdfs_FilterLogs.hdfs.filePrefix = %{host}
Hope that helps,
P
It just depends on what you want to do with the header. In the case I presented
the header is set by the agent running the HDFS sink, which seemed to align
with your use case. If you need to know the originating host, just have the
interceptor or originating host set a different header, the %{}
I am curious about the observed behavior of a set of agents configured with a
Load Balancing sink processor.
I have 4 'tier1' agents receiving events directly from app servers that feed
into 2 'tier2' agents that write to HDFS. They are connected up via Avro
Sink/Sources and a Load Balancing Si
il.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:109)
On Fri, Mar 29, 2013 at 4:20 PM, Paul Chavez
mailto:pcha...@verticalsearchworks.com>> wrote:
I am curious about the observed behavior of a set of agents configured with a
Load Balancing sink proce
ile.FileChannel.stop:329) - Stopping FileChannel
fc_Default { dataDirs: [c:\flume_data\log] }...
Thank you,
Paul Chavez
...@aicer.org]
Sent: Wednesday, April 10, 2013 12:07 PM
To: user@flume.apache.org
Subject: Re: FileChannel on Windows
Paul,
If Flume 1.3.1 is what you are looking for, you don't have to build it from
source.
You can just download it directly from the site. Its already released.
Paul Chavez wrote:
Hello,
I've run into a problem with the spoolDir source, on Windows, and am not sure
how to proceed.
The agent starts fine and the source is created without issue and is apparently
ready. After agent start a .flumespool directory is created in the path the
source is watching. This directory re
anks,
Paul
____
From: Paul Chavez [mailto:pcha...@verticalsearchworks.com]
Sent: Thursday, April 11, 2013 3:15 PM
To: user@flume.apache.org
Subject: spoolDir source problem
Hello,
I've run into a problem with the spoolDir source, on Windows, and am not sure
how to proceed.
The agent starts fine an
g files that have not yet been processed in the spooling
directory before deleting the folder so that you can put the files back after
the directory is recreated.
Then restart your agent to see if this works.
Let me know if this helps.
On 12 April 2013 14:41, Paul Chavez
mailto:pcha...@verticals
ere.
This will give you some confidence that the set up works before you deploy it
I dont really use Windows for development so unfortunately I am not able to
help you troubleshoot this.
On 12 April 2013 16:37, Paul Chavez
mailto:pcha...@verticalsearchworks.com>> wrote:
1. Flume 1.3.1 I b
meta file " +
trackerFile);
}
}
I am not sure why the agent is not able to delete the file. Does the agent have
the permission to access those directories ? i mean both read and write ?
I am no expert but just making a guess
On Sat, Apr 13, 2013 at 2:18 AM, Paul Chav
Er, I meant to say if file stream is not closed before attempting delete. I'm
not a (good) programmer but it *looks* like that class is closing before delete
but I honestly didn't understand entirely what was going on.
____
From: Paul Chavez [m
es. I had some coworker help so the details of why it works now
are in someone else's head.
thanks,
Paul
____
From: Paul Chavez [mailto:pcha...@verticalsearchworks.com]
Sent: Friday, April 12, 2013 2:23 PM
To: user@flume.apache.org
Subject: RE: spoolDir source p
Thank you,
That was the exact issue and I also submitted an alternate fix. I tried to
create a review but the webapp would not let me attach the diff file.
Paul Chavez
From: Israel Ekpo [mailto:isr...@aicer.org]
Sent: Friday, April 12, 2013 6:16 PM
To: user
Not sure if this is the issue, but I believe this configuration property is
wrong:
ais_agent.sources.ais-source1.selector.mapping.default = ais-ch1
It should be:
ais_agent.sources.ais-source1.selector.default = ais-ch1
Hope that helps,
Paul Chavez
-Original Message-
From: Steve
No, there are some considerations regarding heap size for very large channel
capacities as documented here:
https://cwiki.apache.org/confluence/display/FLUME/Flume%27s+Memory+Consumption
-Original Message-
From: Matt Wise [mailto:m...@nextdoor.com]
Sent: Monday, May 13, 2013 12:17 PM
T
There are a few ways to monitor flume in operation. We use the JSON reporting,
which is available via 'http://:/metrics'. You need to
start the agent with the following parameters to get this interface:
-Dflume.monitoring.type=http -Dflume.monitoring.port=34545
We use cacti to graph channel siz
assumption correct?
thanks,
Paul Chavez
From: Connor Woodson [mailto:cwoodson@gmail.com]
Sent: Tuesday, May 21, 2013 2:13 PM
To: user@flume.apache.org
Subject: Re: HDFSEventSink Memory Leak Workarounds
The other property you will want to look at is maxOpenFiles
on a single line.
This overall workflow has proven to be extremely useful and flexible. We manage
multiple data flows with a single source/channel/sink by writing to paths based
on the envelope headers. (eg
/flume/%{logType}/%{logSubType}/date=%Y%M%d/hour=%H)
Hope that helps!
Paul Chavez
sampt
interceptor in much the same way, except in that case it'll stamp the event
with whenever the source first saw it. This can result in an event being
bucketed in the wrong date/time partition but that's better than it gumming up
the whole data flow.
Hope that helps,
Paul C
Yes, any header works using that same notation. I use it to allow us to bucket
many event 'types' with a single channel.
for example, a path with custom headers and Hive ready partition folders:
/flumelogs/%{someHeader}/%{someOtherHeader}/datekey=%Y%m%d/hour=%H
Hope that helps,
P
those properties set. I would start there.
Hope that helps,
Paul Chavez
-Original Message-
From: Josh Myers [mailto:josh.my...@mydrivesolutions.com]
Sent: Monday, June 17, 2013 6:47 AM
To: user@flume.apache.org
Subject: Flume events rolling file too regularly
Hi guys,
We are sending JSON e
If you're referring to the spooling file source, I am using the following in a
production config:
bufferMaxLineLength = 5000
I'm pretty sure I picked this out from reading the code, and fairly certain
it's working as intended. No guarantees though
You've specified a channel named NullChannel in the agent1.channels properties,
but you don't define it anywhere in the configuration. You've done the same
with the NullSink.
You'll need to add something like this to the config:
agent1.channels.NullChannel.type = memory
agent1.sinks.NullSink.t
headers we use for tokenized paths. The static
interceptor will insert an arbitrary header if it doesn't exist so I have a
couple that put in the value 'Unknown' so that I can still send the events
through the HDFS sink but I can also find them later if need be.
hope that helps,
Pau
want events
'near' real time. We didn't want to use the exec source as that gives no
delivery guarantee, at least with a spooling source if the flume agent stops
processing the incremental files stay in the spool dir until it's back up.
Hope that helps,
Paul Chavez
From: Wang
Yes, I am curious what you mean as well. When testing I had dropped a few 15GB
files in the spoolDir and while they processed slowly they did complete. In
fact, my only issue with that test was the last hop HDFS sinks couldn't keep up
and I had to add a couple more to keep upstream channels from
st hop and
that was enough to clear the bottleneck.
Good luck,
Paul Chavez
From: Wang, Yongkun | Yongkun | BDD [mailto:yongkun.w...@mail.rakuten.com]
Sent: Thursday, August 22, 2013 10:27 PM
To:
Subject: Re: sleep() in script doesn't work when called by exec Source
If it happened at the last
ation file, but make sure each node configuration has a
different name. Then on each node invoke flume with the proper name.
Hope that helps,
Paul Chavez
From: Suhas Satish [mailto:suhas.sat...@gmail.com]
Sent: Thursday, September 12, 2013 10:34 AM
To: user@flume.apache.org; prashanth.b...@nttd
Seems like the use case the Regex Filter Interceptor was developed for. That's
my first inclination, at least.
Thanks,
Paul Chavez
From: ZORAIDA HIDALGO SANCHEZ [mailto:zora...@tid.es]
Sent: Thursday, September 12, 2013 9:47 AM
To: user@flume.apache.org
Subject: Delete first line of a
Sink interface can work
with SSL enabled (hence the extra hop) and if it can, can it use the
ssl_keystore and ssl_truststore already available from a secure hadoop cluster.
On Thu, Sep 12, 2013 at 10:48 AM, Paul Chavez
mailto:pcha...@verticalsearchworks.com>> wrote:
et this all up on a
flume channel has
space again. Another option may be to add another HDFS sink or two pulling from
the same channel, but from what you are saying this may not increase
performance.
Hope that helps,
Paul Chavez
From: Cameron Wellock [mailto:cameron.well...@nrelate.com]
Sent: Thursday, September
s after the other write stops, as I took the error messages
at face value and restarted flume. I will try that today, time permitting, and
I'll let you know what happens.
Thanks again,
Cameron
On Thu, Sep 26, 2013 at 12:07 PM, Paul Chavez
mailto:pcha...@verticalsearchworks.com>> w
3 at 1:43 PM, Paul Chavez
mailto:pcha...@verticalsearchworks.com>> wrote:
Thanks for the update. I remember I had a similar situation now, except that I
had the transactionCapacity lower than the batch size for the sink. I guess
having them exactly the same is not optimal either.
-Paul
F
Option 3 is possible with two agents.
Agent1:
Rabbitmq -> source1 -> replicating channel selector ->
-> channel1 -> avro sink1 -> agent2.avrosource1
-> channel2 -> avro sink2 -> agent2.avrosource2
Agent2:
#CSV path with your interceptor
Avrosource1 -> interceptor -> channel -> sink to CSV
#XML p
That is exactly what I do for a similar scenario. In my case it's one big log
file that gets written to all day on each server, so I developed a script that
runs once a minute to grab the new lines off the file since last run, creates
an incremental file with that data and then drops it in a dir
ll like to have a more resilient HDFS sink and so I
support development effort in this area.
Thanks,
Paul Chavez
From: Roshan Naik [mailto:ros...@hortonworks.com]
Sent: Tuesday, October 15, 2013 11:14 AM
To: d...@flume.apache.org
Cc: user@flume.apache.org; comm...@flume.apache.org
Subject: Re: f
t4 goes to the file channel
agent1.sources.scribe-source-ds1.selector.mapping.dataset4 = file-channel-1
#everything else goes to the memory channel (and eventually null sink)
agent1.sources.scribe-source-ds1.selector.default = mem-channel-1
Hope that helps,
Paul Chavez
From: dwight.marz...@here.com [mailto:
Try bumping your memory channel capacities up, they are the same as the batch
size. I would go to at least 1000 on each mem channel.
Also, what to the logs and metrics show?
From: Siddharth Tiwari [mailto:siddharth.tiw...@live.com]
Sent: Thursday, October 31, 2013 11:53 AM
To: user@flume.apache.
current partition exists and then they process the previous partition.
Good luck,
Paul Chavez
From: chenchun [mailto:chenchun.f...@gmail.com]
Sent: Monday, November 04, 2013 3:35 AM
To: user@flume.apache.org
Subject: logs jams in flume collector
Hi, we are using flume to transfer logs to hdfs. We
Tue, Nov 5, 2013 at 6:27 AM, Paul Chavez
mailto:pcha...@verticalsearchworks.com>> wrote:
What do you mean by 'log jam'? Do you mean events are stuck in the channel and
all processing stops, or just that events are moving slower than you'd like?
If it's just going slowly
ated. I especially
would like to hear some real world morphline examples.
Hope that helps,
Paul Chavez
From: Matt Wise [mailto:m...@nextdoor.com]
Sent: Monday, November 11, 2013 10:04 AM
To: user@flume.apache.org
Subject: Re: Dynamic Key=Value Parsing with an Interceptor?
Anyone have any ideas o
No, you would need to have some kind of script or application run to read the
events and send them to flume. A script that is scheduled to run every 5
minutes and save the events since the last interval to a CSV file that is
dropped into a directory for the spoolDir source to pick up, for exampl
channel data/log
directories onto separate drives.
-Paul Chavez
From: Devin Suiter RDX [mailto:dsui...@rdx.com]
Sent: Tuesday, December 17, 2013 8:30 AM
To: user@flume.apache.org
Subject: File Channel Best Practice
Hi,
There has been a lot of discussion about file channel speed today, and I have
41800",
"ChannelFillPercentage": "1.8881",
"Type": "CHANNEL",
"EventPutAttemptCount": "1285941800",
"ChannelSize": "18881",
"StopTime": "0",
"StartTime": "1387357632600",
"
https://cwiki.apache.org/confluence/display/FLUME/Flume%27s+Memory+Consumption.
We have gotten away with default channel sizes (1 million) so far without
issue. We do try to separate the file channels to different physical disks as
much as we can to optimize our hardware.
Hope that hel
Flume handles your data as discrete events, so it's 'file support' is a
function of what you can deserialize into a flume event. Data sources do not
have to be files, either. Applications can send events directly to flume using
a variety of methods, and you can send events to message queues, sea
hhaya Vishwakarma"
mailto:chhaya.vishwaka...@lntinfotech.com>>
wrote:
Ok
Can I collect the data which is in word document ,excel or csv ?
From: Paul Chavez [mailto:pcha...@verticalsearchworks.com]
Sent: Friday, February 14, 2014 11:54 AM
To: user@flume.apache.org<mailto:user@flume.apac
I would recommend using a scheduled script to create diff files off the log
files. I have one that runs against large logs files that roll over on UTC day.
It runs once a minute, checkpoints the log, creates a diff a drops it in the
spool directory and then cleans up any completed files.
I agre
There is a configuration error in your multiplexing channel selector section.
You are referencing ‘server-agent.sources.avor-Src.’ and it should be
‘server-agent.sources.mySrc.’. Otherwise, the configuration looks good and
should satisfy your requirements.
From: terrey shih [mailto:terreys...@g
Start adding additional HDFS sinks attached to the same channel. You can also
tune batch sizes when writing to HDFS to increase per sink performance.
On Sep 2, 2014, at 11:54 PM, "Sebastiano Di Paola"
mailto:sebastiano.dipa...@gmail.com>> wrote:
Hi there,
I'm a completely newbie of Flume, so I
We’ve been running the 1.4 release of flume on windows for over a year. We had
to do a custom build at first before it was initially released to pick up a
SpoolDir source issue.
I use winsw (https://github.com/kohsuke/winsw/wiki) to wrap the java command,
with 64-bit JDK6.
The java command lin
pplication adds to the initial flume event. To keep channels from
blocking if this header goes missing we have a static interceptor that adds the
value 'MissingSubType' if the header does not exist. This setup has worked well
for us across dozens of separate log streams for ov
and we
have never actually seen corrupted headers.
> On Oct 16, 2014, at 8:24 AM, "Jean-Philippe Caruana"
> wrote:
>
> Le 15/10/2014 17:57, Paul Chavez a écrit :
>> Yes, that will work fine. From experience, I can say definitely account for
>> the possibil
states if not set then UTF-8 will be assumed.
Can anyone elaborate on when/why this data is being corrupted?
Thanks,
Paul Chavez
t be worth trying to add that in the Flume command line options? Or maybe
on the front application ?
Regards
Jeff
From: Paul Chavez [mailto:pcha...@ntent.com]
Sent: mardi 9 décembre 2014 20:25
To: user@flume.apache.org<mailto:user@flume.apache.org>
Subject: UTF-8 data mangled in flight
You can use a regex extractor interceptor to create the time stamp header from
your data.
> On Dec 24, 2014, at 1:03 AM, Mungeol Heo wrote:
>
> Hello,
>
> I try to use the configuration, which listed below, to transfer logs to HDFS.
>
> ...
> agent01.sources.source01.interceptors = intercept
n' }
At my work we use a service wrapper called winsw
(https://github.com/kohsuke/winsw) to run Flume as a service and then just
monitor it like any other windows service.
Hope that helps,
Paul Chavez
From: mahendran m [mailto:mahendra...@hotmail.com]
Sent: Thursday, January 08, 2015 1:49 AM
To
Flume doesn’t really address this use case. Even the spooling directory source
will decompose the file into individual events (one per line per event by
default).
From: Bob Metelsky [mailto:bob.metel...@gmail.com]
Sent: Monday, February 02, 2015 3:49 PM
To: user@flume.apache.org
Subject: Re: Si
HTTP Source is a listener. It doesn’t actively pull from a remote endpoint.
The address in the config is an address on the flume server to bind to.
From: Timothy Garza [mailto:timothy.ga...@collinsongroup.com]
Sent: Thursday, November 12, 2015 2:09 PM
To: user@flume.apache.org
Subject: RE: Flume
I have tried to go the syslogUDP route to get log files from a Windows server
to a flume agent, and did not find it an adequate solution.
- We are seeing corruped events when sending IIS logs (known issue:
https://issues.apache.org/jira/browse/FLUME-1365)
- Our data is too large to fit in a 1500
Answers inline:
>On Thu, Oct 25, 2012 at 11:54 AM, Paul Chavez
> wrote:
>> I have tried to go the syslogUDP route to get log files from a Windows
>> server to a flume agent, and did not find it an adequate solution.
>>
>> - We are seeing corruped events when
of flume-ng containing HTTPSource is packaged along with the
rest of the hadoop distribution I will look at it. As a 'windoze guy' ;-) I do
not manage the hadoop systems.
Thank you,
Paul Chavez
-Original Message-
From: Will McQueen [mailto:w...@cloudera.com]
Would the new
. Is there a way to use the console appender to
trigger java.exe to send a non-0 exit code if an error occurs so the rest of my
automation tooling can detect the error? I can parse the output with Powershell
and have it send a non-0 exit code itself, but looking for a more 'native' way
to do this.
thanks,
Paul Chavez
this.
Thanks,
Hari
--
Hari Shreedharan
On Tuesday, October 30, 2012 at 11:25 AM, Paul Chavez wrote:
I am working on getting the flume avro-client functionality working on Windows,
and am currently stuck on how to determine if the file was sent successfully.
I used the blog post at
http
75 matches
Mail list logo