Linkedin Gabblin compaction tool is using Hive to perform the compaction. Does 
it mean Lumos is replaced?

Confused… 

On Mar 17, 2015, at 10:00 PM, Xiao <lixiao1...@gmail.com> wrote:

> Hi, all, 
> 
> Do you know whether Linkedin plans to open source Lumos in the near future?
> 
> I found the answer from Qiao Lin’s post about replication from Oracle/mySQL 
> to Hadoop. 
> 
>       - https://engineering.linkedin.com/data-ingestion/gobblin-big-data-ease
> 
> At the source side, it can be DataBus-based or file based. 
> 
> At the target side, it is Lumos to rebuild the snapshots due to inability to 
> do an update/delete in Hadoop. 
> 
> The slides about Lumos:
>       http://www.slideshare.net/Hadoop_Summit/th-220p230-cramachandranv1
> The talk about Lumos: 
>       https://www.youtube.com/watch?v=AGlRjlrNDYk
> 
> Event publishing is different from database replication. Kafka is used for 
> change publishing or maybe also used for sending changes (recorded in files). 
> 
> Thanks, 
> 
> Xiao Li
> 
> On Mar 17, 2015, at 7:26 PM, Arya Ketan <ketan.a...@gmail.com> wrote:
> 
>> AFAIK , linkedin uses databus to do the same. Aesop is built on top of
>> databus , extending its beautiful capabilities to mysql n hbase
>> On Mar 18, 2015 7:37 AM, "Xiao" <lixiao1...@gmail.com> wrote:
>> 
>>> Hi, all,
>>> 
>>> Do you know how Linkedin team publishes changed rows in Oracle to Kafka? I
>>> believe they already knew the whole problem very well.
>>> 
>>> Using triggers? or directly parsing the log? or using any Oracle
>>> GoldenGate interfaces?
>>> 
>>> Any lesson or any standard message format? Could the Linkedin people share
>>> it with us? I believe it can help us a lot.
>>> 
>>> Thanks,
>>> 
>>> Xiao Li
>>> 
>>> 
>>> On Mar 17, 2015, at 12:26 PM, James Cheng <jch...@tivo.com> wrote:
>>> 
>>>> This is a great set of projects!
>>>> 
>>>> We should put this list of projects on a site somewhere so people can
>>> more easily see and refer to it. These aren't Kafka-specific, but most seem
>>> to be "MySQL CDC." Does anyone have a place where they can host a page?
>>> Preferably a wiki, so we can keep it up to date easily.
>>>> 
>>>> -James
>>>> 
>>>> On Mar 17, 2015, at 8:21 AM, Hisham Mardam-Bey <
>>> hisham.mardam...@gmail.com> wrote:
>>>> 
>>>>> Pretty much a hijack / plug as well (=
>>>>> 
>>>>> https://github.com/mardambey/mypipe
>>>>> 
>>>>> "MySQL binary log consumer with the ability to act on changed rows and
>>>>> publish changes to different systems with emphasis on Apache Kafka."
>>>>> 
>>>>> Mypipe currently encodes events using Avro before pushing them into
>>> Kafka
>>>>> and is Avro schema repository aware. The project is young; and patches
>>> for
>>>>> improvements are appreciated (=
>>>>> 
>>>>> On Mon, Mar 16, 2015 at 10:35 PM, Arya Ketan <ketan.a...@gmail.com>
>>> wrote:
>>>>> 
>>>>>> Great work.
>>>>>> Sorry for kinda hijacking this thread, but I though that we had built
>>>>>> some-thing on mysql bin log event propagator and wanted to share it .
>>>>>> You guys can also look into Aesop ( https://github.com/Flipkart/aesop
>>> ).
>>>>>> Its
>>>>>> a change propagation frame-work. It has relays which listens to bin
>>> logs of
>>>>>> Mysql, keeps track of SCNs  and has consumers which can then
>>> (transform/map
>>>>>> or interpret as is) the bin log-event to a destination. Consumers also
>>> keep
>>>>>> track of SCNs and a slow consumer can go back to a previous SCN if it
>>> wants
>>>>>> to re-listen to events  ( similar to kafka's consumer view ).
>>>>>> 
>>>>>> All the producers/consumers are extensible and you can write your own
>>>>>> custom consumer and feed off the data to it.
>>>>>> 
>>>>>> Common use-cases:
>>>>>> a) Archive mysql based data into say hbase
>>>>>> b) Move mysql based data to say a search store for serving reads.
>>>>>> 
>>>>>> It has a decent ( not an awesome :) ) console too which gives a nice
>>> human
>>>>>> readable view of where the producers and consumers are.
>>>>>> 
>>>>>> Current supported producers are mysql bin logs, hbase wall-edits.
>>>>>> 
>>>>>> 
>>>>>> Further insights/reviews/feature reqs/pull reqs/advices are all
>>> welcome.
>>>>>> 
>>>>>> --
>>>>>> Arya
>>>>>> 
>>>>>> Arya
>>>>>> 
>>>>>> On Tue, Mar 17, 2015 at 1:48 AM, Gwen Shapira <gshap...@cloudera.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> Really really nice!
>>>>>>> 
>>>>>>> Thank you.
>>>>>>> 
>>>>>>> On Mon, Mar 16, 2015 at 7:18 AM, Pierre-Yves Ritschard <
>>> p...@spootnik.org
>>>>>>> 
>>>>>>> wrote:
>>>>>>>> Hi kafka,
>>>>>>>> 
>>>>>>>> I just wanted to mention I published a very simple project which can
>>>>>>>> connect as MySQL replication client and stream replication events to
>>>>>>>> kafka: https://github.com/pyr/sqlstream
>>>>>>>> 
>>>>>>>> When you don't have control over an application, it can provide a
>>>>>> simple
>>>>>>>> way of consolidating SQL data in kafka.
>>>>>>>> 
>>>>>>>> This is an early release and there are a few caveats (mentionned in
>>> the
>>>>>>>> README), mostly the poor partitioning which I'm going to evolve
>>> quickly
>>>>>>>> and the reconnection strategy which doesn't try to keep track of
>>> binlog
>>>>>>>> position, other than that, it should work as advertised.
>>>>>>>> 
>>>>>>>> Cheers,
>>>>>>>> - pyr
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Hisham Mardam-Bey
>>>>> http://hisham.cc/
>>>> 
>>> 
>>> 
> 

Reply via email to