Re: HDFS sink: "clever" routing

2015-05-29 Thread Guyle M. Taber
Ok I figured this out by using the %{basename} placeholder. However I’m trying to figure out how to prevent the epoch suffix from being applied to every file as it’s written to hdfs. Example: 20150528133001.txt-.1432920411283 How do I prevent the epoch timestamp from being appended to every fil

Flume truncating files at about 2060 characters

2015-08-31 Thread Guyle M. Taber
I’m using an Avrosink to send events to HDFS and we’re seeing with long content lines, our lines seem to be getting truncated at about the 2060 character mark. How can I prevent long lines from being truncated when using an Avro sink in this fashion? Here’s a snippet of an event from the raw lo

Re: Flume truncating files at about 2060 characters

2015-08-31 Thread Guyle M. Taber
anyone except the intended recipient. If you > have received this message in error, or are not the named recipient(s), > please immediately notify the sender by return email, and delete all copies > of this message. > > On Mon, Aug 31, 2015 at 11:03 AM, Guyle M. Taber <mailt

Re: Flume truncating files at about 2060 characters

2015-08-31 Thread Guyle M. Taber
e received this message in error, or are not the named recipient(s), > please immediately notify the sender by return email, and delete all copies > of this message. > > On Mon, Aug 31, 2015 at 11:14 AM, Guyle M. Taber <mailto:gu...@gmtech.net>> wrote: > Fantastic. &g

flume sink and substitution variables

2016-07-28 Thread Guyle M. Taber
I’m trying to determine if I can use a substitution variable in the hdfs file path that is derived from the apache virtual host name that is called on a web server listening as multiple vhost names. Where is the substitution variable %host deriving that value and is there another var I can use?

Where to put the flume agents within a cluster

2017-06-23 Thread Guyle M. Taber
We have a 32 data node Hadoop cluster that receives incoming flume data via three data nodes acting as flume agents. We’re using round robin DNS entries to spread incoming flume data from various external architectures to the three flume agents on those three data nodes. It seems like historica

Fluming spooldir with sub-directories

2017-12-19 Thread Guyle M. Taber
Does anyone have a working example of using flume 1.7 or 1.8 with the use of “recursiveDirectorySearch”? My source (spooldir) has multiple subdirectories and I’ve read that version 1.7+ has the ability to work with subdirectories. I have this configured and flume starts up without error, but not

Using flume and a Google Cloud Storage (GCS) Sink.

2018-10-11 Thread Guyle M. Taber
I’m having trouble getting a flume sink to GCS working. There’s not a lot of documentation out there for GCS and Flume. Does anyone have a GCS sink in Flume working? I can successfully list the contents of the GCS bucket using “gustils” and the values in core-site-xml seem valid, as I can list

flume to s3 - renaming .tmp files fails.

2019-04-25 Thread Guyle M. Taber
I’m using a new flume sink to S3 that doesn’t seem to successfully close out .tmp files created in S3 buckets. So I’m essentially getting a whole lot of unclosed .tmp files. The IAM role being used has full S3 permissions to this bucket. Here’s the flume error when trying to rename and close th

Re: flume to s3 - renaming .tmp files fails.

2019-04-25 Thread Guyle M. Taber
one is present? > > Please remove or obfuscate bucket names, account number, etc. > > The policy on the role or bucket is most certainly a missing permission, > rename requires a few odd ones in addition to the usual actions, ie: > > "s3:GetObjectVersion", "

Re: flume to s3 - renaming .tmp files fails.

2019-04-25 Thread Guyle M. Taber
is present? >> >> Please remove or obfuscate bucket names, account number, etc. >> >> The policy on the role or bucket is most certainly a missing permission, >> rename requires a few odd ones in addition to the usual actions, ie: >> >> "s3:GetObje

Flume - Interceptor, regex mapping and two default selectors

2019-06-13 Thread Guyle M. Taber
I've got three channels and three sinks. I'm using a flume regex interceptor and sending only matching data to a specific sink, but I want all data going to the other two channels/sinks. I may be doing this wrong, but it seems like I might be able to use multiple channels for the "selector.defau