Flume 1.5.0 hdfs sink stuck with channel is full messages

2014-07-27 Thread Viral Bajaria
Hi, I am using 1.5.0 with the hdfs sink to write files to S3. The process writes fine for a while but eventually I started getting the following message: INFO [pool-5-thread-1] (org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readEvents:224) - Last read was never committed - resett

flume hdfs sink notify / callback to add partition

2014-07-28 Thread Viral Bajaria
Hi, Is there a way to get the hdfs sink to signal that a file was just closed and then use that signal to add a partition to hive if one does not exist already. Right now, what I do is: - move files to s3 - run recover partitions <--- step takes forever. But given that I have so much historical

Re: Flume 1.5.0 hdfs sink stuck with channel is full messages

2014-07-28 Thread Viral Bajaria
now I have removed the idleTimeout since a file will eventually get closed due to the LRU but this is a known issue. Should I open a JIRA for this ? Thanks, Viral On Sun, Jul 27, 2014 at 7:25 PM, Viral Bajaria wrote: > Hi, > > I am using 1.5.0 with the hdfs sink to write files

Re: AWS S3 flume source

2014-07-31 Thread Viral Bajaria
I have a similar use case that cropped up yesterday. I saw the archive and found that there was a recommendation to build it as Sharninder suggested. For now, I went down the route of writing a python script which downloads from S3 and puts the files in a directory which is configured to be picked

Re: flume hdfs sink notify / callback to add partition

2014-07-31 Thread Viral Bajaria
at 2:40 PM, Viral Bajaria wrote: > Hi, > > Is there a way to get the hdfs sink to signal that a file was just closed > and then use that signal to add a partition to hive if one does not exist > already. > > Right now, what I do is: > > - move files to s3 > - run reco

Re: flume hdfs sink notify / callback to add partition

2014-08-04 Thread Viral Bajaria
E command. > > > On Thursday, July 31, 2014, Viral Bajaria wrote: > >> Any suggestions on this ? Still trying to figure out how do I get a >> notification that a new partition is being created by the HDFS sink and I >> can add that via a ALTER TABLE statement on a sepa

Re: AWS S3 flume source

2014-08-05 Thread Viral Bajaria
ing config files >>>> stored on disk) stop fetching data from some S3 buckets >>>> >>>> >>>> Would you need to be able to pull files from multiple S3 >>>> directories with the same source? >>>> >>

Re: Next Flume Meetup

2014-08-12 Thread Viral Bajaria
Hi Roshan, I tried searching around, but could not find the ticket. Can you point me to the ticket which shows the details of the Hive Sink ? Is it coming out in the next release ? Thanks, Viral On Tue, Aug 12, 2014 at 1:43 PM, Roshan Naik wrote: > I could also do a talk/demo on the new Hi

flume and columnar file format ?

2014-08-12 Thread Viral Bajaria
All, I had a question around using flume with columnar file formats like Parquet or ORC files. Has anyone tried writing to HDFS by creating ORC files instead of the TEXT or SEQUNCEFILE ? Thanks, Viral

Re: Flume 1.5.0 hdfs sink stuck with channel is full messages

2014-10-16 Thread Viral Bajaria
I have observed this again. I think this is definitely an issue because I can repro it when running the flume agent for a few 100 files with the above settings. Anyone else seen this ? Thanks, Viral On Mon, Jul 28, 2014 at 3:03 PM, Viral Bajaria wrote: > I found the issue. > > The

Re: Flume 1.5.0 hdfs sink stuck with channel is full messages

2014-10-17 Thread Viral Bajaria
l 28, 2014 at 3:04 PM, Viral Bajaria > wrote: > >> I found the issue. >> >> The deadlock happens since I have 3 events that can close a file: >> >> 1) maxOpenFiles: set to 500 >> 2) maxFileSize : set to 128MB >> 3) idleTimeout : set to 30 seconds >&g