Hi, Jonathan how should we interpret your last e-mail? You opened an JIRA issue and want to start implementing this and do you have any estimate how long it will take?
I think the biggest challenge here is to have dynamic configuration of Flume. It doesn't seem to be part of FLUME-2437 issue. Am I right? > Would you need to be able to pull files from multiple S3 directories with the same source? I think we don't need to track multiple S3 buckets with a single source. I just imagine an approach where each S3 source can be added or deleted on demand and attached to any Channel. I'm only afraid about this dynamic configuration. I'll open a new thread about this. It seems we have two totally separate things: * build S3 source * make flume configurable dynamically -- Paweł 2014-08-01 9:51 GMT+02:00 Otis Gospodnetic <otis.gospodne...@gmail.com>: > Hi, > > On Fri, Aug 1, 2014 at 4:52 AM, Jonathan Natkins <na...@streamsets.com> > wrote: > >> Hey all, >> >> I created a JIRA for this: >> https://issues.apache.org/jira/browse/FLUME-2437 >> > > Thanks! Should Fix Version be set to the next Flume release version? > > I thought I'd start working on one myself, which can hopefully be >> contributed back. I'm curious: do you have particular requirements? Based >> on the emails in this thread, it sounds like the original goal was to have >> something that's like a SpoolDirectorySource that just picks up new files >> from S3. Is that accurate? >> > > Yes, I think so. We need to be able to: > * fetch data (logs for pulling them in Logsene > <http://sematext.com/logsene/>) from S3 periodically (e.g. every 1 min, > every 5 min, etc.) > * fetch data from multiple S3 buckets > * associate an S3 bucket with a user/token/key > * dynamically (i.e. without editing/writing config files stored on disk) > add new S3 buckets from which data should be fetch > * dynamically (i.e. without editing/writing config files stored on disk) > stop fetching data from some S3 buckets > > >> Would you need to be able to pull files from multiple S3 directories with >> the same source? >> > > I think the above addresses this question. > > >> Thanks, >> Natty >> > > Thanks! > > Otis > -- > Performance Monitoring * Log Analytics * Search Analytics > Solr & Elasticsearch Support * http://sematext.com/ > > > >> >> >> On Thu, Jul 31, 2014 at 4:58 PM, Otis Gospodnetic < >> otis.gospodne...@gmail.com> wrote: >> >>> +1 for seeing S3Source, starting with a JIRA issue. >>> >>> But being able to dynamically add/remove S3 buckets from which to pull >>> data seems important. >>> >>> Any suggestions for how to approach that? >>> >>> Otis >>> -- >>> Performance Monitoring * Log Analytics * Search Analytics >>> Solr & Elasticsearch Support * http://sematext.com/ >>> >>> >>> On Thu, Jul 31, 2014 at 9:14 PM, Hari Shreedharan < >>> hshreedha...@cloudera.com> wrote: >>> >>>> Please go ahead and file a jira. If you are willing to submit a patch, >>>> you can post it on the jira. >>>> >>>> Viral Bajaria wrote: >>>> >>>> >>>> I have a similar use case that cropped up yesterday. I saw the archive >>>> and found that there was a recommendation to build it as Sharninder >>>> suggested. >>>> >>>> For now, I went down the route of writing a python script which >>>> downloads from S3 and puts the files in a directory which is >>>> configured to be picked up via a spooldir. >>>> >>>> I would prefer to get a direct S3 source, and maybe we could >>>> collaborate on it and open-source it. Let me know if you prefer that >>>> and we can work directly on it by creating a JIRA. >>>> >>>> Thanks, >>>> Viral >>>> >>>> >>>> >>>> On Thu, Jul 31, 2014 at 10:26 AM, Hari Shreedharan >>>> <hshreedha...@cloudera.com <mailto:hshreedha...@cloudera.com>> wrote: >>>> >>>> In both cases, Sharninder is right :) >>>> >>>> Sharninder wrote: >>>> >>>> >>>> >>>> As far as I know, there is no (open source) implementation of an S3 >>>> source, so yes, you'll have to implement your own. You'll have to >>>> implement a Pollable source and the dev documentation has an outline >>>> that you can use. You can also look at the existing Execsource and >>>> work your way up. >>>> >>>> As far as I know, there is no way to configure flume without >>>> using the >>>> configuration file. >>>> >>>> >>>> >>>> On Thu, Jul 31, 2014 at 7:57 PM, Paweł <pro...@gmail.com >>>> <mailto:pro...@gmail.com> >>>> <mailto:pro...@gmail.com <mailto:pro...@gmail.com>>> wrote: >>>> >>>> Hi, >>>> I'm wondering if Flume is able to read directly from S3. >>>> >>>> I'll describe my case. I have log files stored in AWS S3. I have >>>> to fetch periodically new S3 objects and read log lines from it. >>>> Than use log lines (events) are processed in standard flume's >>>> way >>>> (as with other sources). >>>> >>>> *1) Is there any way to fetch S3 objects or I have to write >>>> my own >>>> Source?* >>>> >>>> >>>> There is also second case. I want to have flume configuration >>>> dynamic. Flume sources can change in time. New AWS key and S3 >>>> bucket can be added or deleted. >>>> >>>> *2) Is there any other way to configure Flume than by static >>>> configuration file?* >>>> >>>> -- >>>> Paweł Róg >>>> >>>> >>>> >>> >> >