I just submitted the patch on https://issues.apache.org/jira/browse/FLUME-1838.

Would love some reviews, thanks!
-Andrew


On Jan 14, 2013, at 1:01 PM, Andrew Otto <[email protected]> wrote:

> Thanks guys!  I've opened up a JIRA here:
> 
> https://issues.apache.org/jira/browse/FLUME-1838
> 
> 
> On Jan 14, 2013, at 12:43 PM, Alexander Alten-Lorenz <[email protected]> 
> wrote:
> 
>> Hey Andrew,
>> 
>> for your reference, we have a lot of developer informations in our wiki:
>> 
>> https://cwiki.apache.org/confluence/display/FLUME/Developer+Section
>> https://cwiki.apache.org/confluence/display/FLUME/Developers+Quick+Hack+Sheet
>> 
>> cheers,
>> Alex
>> 
>> On Jan 14, 2013, at 6:37 PM, Hari Shreedharan <[email protected]> 
>> wrote:
>> 
>>> Hi Andrew, 
>>> 
>>> Really happy to hear Wikimedia Foundation is considering Flume. I am fairly 
>>> sure that if you find such a source useful, there would definitely be 
>>> others who find it useful too. I'd recommend filing a jira and starting a 
>>> discussion, and then submitting the patch. We would be happy to review and 
>>> commit it. 
>>> 
>>> 
>>> Thanks,
>>> Hari
>>> 
>>> -- 
>>> Hari Shreedharan
>>> 
>>> 
>>> On Monday, January 14, 2013 at 9:29 AM, Andrew Otto wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> I'm an Systems Engineer at the Wikimedia Foundation, and we're 
>>>> investigating using Flume for our web request log HDFS imports. We've 
>>>> previously been using Kafka, but have had to change short term 
>>>> architecture plans in order to get data into HDFS reliably and regularly 
>>>> soon.
>>>> 
>>>> Our current web request logs are available for consumption over a 
>>>> multicast UDP stream. I could hack something together to try and pipe this 
>>>> into Flume using the existing sources (SyslogUDPSource, or maybe some 
>>>> combination of socat + NetcatSource), but I'd rather reduce the number of 
>>>> moving parts. I'd like to consume directly from the multicast UDP stream 
>>>> as a Flume source.
>>>> 
>>>> I coded up proof of concept based on the SyslogUDPSource, mainly just 
>>>> stripping out the syslog event header extraction, and adding in multicast 
>>>> Datagram connection code. I plan on cleaning this up, and making this a 
>>>> generic raw UDP source, with multicast being a configuration option.
>>>> 
>>>> My question to you guys is, is this something the Flume community would 
>>>> find useful? If so, should I open up a JIRA to track this? I've got a fork 
>>>> of the Flume git repo over on github and will be doing my work there. I'd 
>>>> love to share it upstream if it would be useful.
>>>> 
>>>> Thanks!
>>>> -Andrew Otto
>>>> Systems Engineer
>>>> Wikimedia Foundation
>>>> 
>>>> 
>>> 
>>> 
>> 
>> --
>> Alexander Alten-Lorenz
>> http://mapredit.blogspot.com
>> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>> 
> 

Reply via email to