On Mon, Aug 11, 2014 at 4:04 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:
> Hi,
>
> On Wed, Aug 6, 2014 at 5:04 AM, Ashish wrote:
>
>> Sharing some random thoughts
>>
>> 1. Download the file using S3 SDK and let the SpoolDirectory
>> implementation take care of rest. Like a Decorator
Hi,
On Wed, Aug 6, 2014 at 5:04 AM, Ashish wrote:
> Sharing some random thoughts
>
> 1. Download the file using S3 SDK and let the SpoolDirectory
> implementation take care of rest. Like a Decorator in front of
> SpoolDirectory
>
My worry is that using SpoolDirectory requires temporary writes t
Hi,
On Tue, Aug 5, 2014 at 10:57 PM, Jonathan Natkins
wrote:
> Hi all,
>
> I started trying to write some code on this, and realized there are a
> number of issues that need to be discussed in order to really design this
> feature effectively. The requirements that have been discussed thus far a
I was thinking the same. I think the store (DB, FS, ZK, something else)
used to track state (what's been read from S3, what's been processed, etc.)
would ideally be abstract/extensible.
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematex
May be best is not to depend on Zk directly. create some sort of
abstraction which can use Zk, DB or some other mechanism to share the
distributed state. How about keeping the distributed state out of picture
till we have a working S3 source, and plugin the meta-data information
piece to it later.
Yeah, I realize that. The reason I think it should be somewhat dependent
upon FLUME-1491 is that ZooKeeper seems to me to be a pretty heavy-weight
requirement just to use a particular source. FLUME-1491 would make Flume
generally dependent upon ZooKeeper, which is a good transition point to
start u
Seems like a bit of confusion here. Flume-1491 only deals with
configuration part, nothing else. Even if it get integrated, you would
still need to write/expose API to store meta-data info in Zk (Flume-1491
doesn't bring that in).
HTH !
On Mon, Aug 11, 2014 at 11:39 AM, Jonathan Natkins
wrote:
Given that FLUME-1491 hasn't been committed yet, and may still be a ways
away, does it seem reasonable to punt on having multiple sources working
off of a single bucket until ZK is integrated into Flume? The alternative
probably requires write access to the S3 bucket to record some shared
state, an
Adding the dev list to the discussion
On Wed, Aug 6, 2014 at 9:37 AM, Jonathan Natkins
wrote:
> Ashish, I've put some comments inline.
>
>
> On Tuesday, August 5, 2014, Ashish wrote:
>
>> Sharing some random thoughts
>>
>> 1. Download the file using S3 SDK and let the SpoolDirectory
>> impleme
Ashish, I've put some comments inline.
On Tuesday, August 5, 2014, Ashish wrote:
> Sharing some random thoughts
>
> 1. Download the file using S3 SDK and let the SpoolDirectory
> implementation take care of rest. Like a Decorator in front of
> SpoolDirectory
>
> This works for the simple case, b
Hi,
I think that it is not possible to simply use SpoolDirectorySource. Maybe
it will be possible to use some elements of SpoolDirectory but without
touching it's code I think SpoolDirectory is not a good base. At the very
beginning SpoolDirectorySource does this:
File directory = new File(spoolD
Agree to the feedback provided by Ashish.
I have started writing one which is similar to the ExecSource, but I like
the idea of doing something where spooldir takes over most of the hard work
of spitting out events to sinks. Let me think more on how to structure
that.
Quick thinking out loud, I c
Sharing some random thoughts
1. Download the file using S3 SDK and let the SpoolDirectory implementation
take care of rest. Like a Decorator in front of SpoolDirectory
2. Use S3 SDK to create InputStream of S3 objects directly in code and
create events out of it.
Would be great to reuse an exist
Hi all,
I started trying to write some code on this, and realized there are a
number of issues that need to be discussed in order to really design this
feature effectively. The requirements that have been discussed thus far are:
1. Fetching data from S3 periodically
2. Fetching data from multiple
Hi,
Thanks for explanation Jonathan. I think I will also start working on it.
When you have any patch (even draft) I'd be glad if you can attach it in
JIRA. I'll do the same.
What do you think?
--
Paweł Róg
2014-08-01 20:19 GMT+02:00 Hari Shreedharan :
> +1 on an S3 Source. I would gladly review
+1 on an S3 Source. I would gladly review.
Jonathan Natkins wrote:
Hey Pawel,
My intention is to start working on it, but I don't know exactly how
long it will take, and I'm not a committer, so time estimates would
have to be taken with a grain of salt regardless. If this is something
that you
Hey Pawel,
My intention is to start working on it, but I don't know exactly how long
it will take, and I'm not a committer, so time estimates would have to be
taken with a grain of salt regardless. If this is something that you need
urgently, it may not be ideal to wait for me to start building so
Hi,
Jonathan how should we interpret your last e-mail? You opened an JIRA issue
and want to start implementing this and do you have any estimate how long
it will take?
I think the biggest challenge here is to have dynamic configuration of
Flume. It doesn't seem to be part of FLUME-2437 issue. Am I
Hi,
On Fri, Aug 1, 2014 at 4:52 AM, Jonathan Natkins
wrote:
> Hey all,
>
> I created a JIRA for this:
> https://issues.apache.org/jira/browse/FLUME-2437
>
Thanks! Should Fix Version be set to the next Flume release version?
I thought I'd start working on one myself, which can hopefully be
> c
Hey all,
I created a JIRA for this: https://issues.apache.org/jira/browse/FLUME-2437
I thought I'd start working on one myself, which can hopefully be
contributed back. I'm curious: do you have particular requirements? Based
on the emails in this thread, it sounds like the original goal was to ha
+1 for seeing S3Source, starting with a JIRA issue.
But being able to dynamically add/remove S3 buckets from which to pull data
seems important.
Any suggestions for how to approach that?
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://semat
Please go ahead and file a jira. If you are willing to submit a patch,
you can post it on the jira.
Viral Bajaria wrote:
I have a similar use case that cropped up yesterday. I saw the archive
and found that there was a recommendation to build it as Sharninder
suggested.
For now, I went down t
I have a similar use case that cropped up yesterday. I saw the archive and
found that there was a recommendation to build it as Sharninder suggested.
For now, I went down the route of writing a python script which downloads
from S3 and puts the files in a directory which is configured to be picked
In both cases, Sharninder is right :)
Sharninder wrote:
As far as I know, there is no (open source) implementation of an S3
source, so yes, you'll have to implement your own. You'll have to
implement a Pollable source and the dev documentation has an outline
that you can use. You can also look
As far as I know, there is no (open source) implementation of an S3 source,
so yes, you'll have to implement your own. You'll have to implement a
Pollable source and the dev documentation has an outline that you can use.
You can also look at the existing Execsource and work your way up.
As far as
25 matches
Mail list logo