Re: [ANN] sqlstream: Simple MySQL binlog to Kafka stream

James Cheng Mon, 23 Mar 2015 12:11:31 -0700

I created a wiki page that lists all the MySQL replication options that people 
posted, plus a couple others. People may/may not find it useful.


https://github.com/wushujames/mysql-cdc-projects/wiki

I wasn't sure where to host it, so I put it up on a Github Wiki.

-James

On Mar 17, 2015, at 11:09 PM, Xiao <lixiao1...@gmail.com> wrote:

> Linkedin Gabblin compaction tool is using Hive to perform the compaction. 
> Does it mean Lumos is replaced?
> 
> Confused…
> 
> On Mar 17, 2015, at 10:00 PM, Xiao <lixiao1...@gmail.com> wrote:
> 
>> Hi, all,
>> 
>> Do you know whether Linkedin plans to open source Lumos in the near future?
>> 
>> I found the answer from Qiao Lin’s post about replication from Oracle/mySQL 
>> to Hadoop.
>> 
>>      - https://engineering.linkedin.com/data-ingestion/gobblin-big-data-ease
>> 
>> At the source side, it can be DataBus-based or file based.
>> 
>> At the target side, it is Lumos to rebuild the snapshots due to inability to 
>> do an update/delete in Hadoop.
>> 
>> The slides about Lumos:
>>      http://www.slideshare.net/Hadoop_Summit/th-220p230-cramachandranv1
>> The talk about Lumos:
>>      https://www.youtube.com/watch?v=AGlRjlrNDYk
>> 
>> Event publishing is different from database replication. Kafka is used for 
>> change publishing or maybe also used for sending changes (recorded in files).
>> 
>> Thanks,
>> 
>> Xiao Li
>> 
>> On Mar 17, 2015, at 7:26 PM, Arya Ketan <ketan.a...@gmail.com> wrote:
>> 
>>> AFAIK , linkedin uses databus to do the same. Aesop is built on top of
>>> databus , extending its beautiful capabilities to mysql n hbase
>>> On Mar 18, 2015 7:37 AM, "Xiao" <lixiao1...@gmail.com> wrote:
>>> 
>>>> Hi, all,
>>>> 
>>>> Do you know how Linkedin team publishes changed rows in Oracle to Kafka? I
>>>> believe they already knew the whole problem very well.
>>>> 
>>>> Using triggers? or directly parsing the log? or using any Oracle
>>>> GoldenGate interfaces?
>>>> 
>>>> Any lesson or any standard message format? Could the Linkedin people share
>>>> it with us? I believe it can help us a lot.
>>>> 
>>>> Thanks,
>>>> 
>>>> Xiao Li
>>>> 
>>>> 
>>>> On Mar 17, 2015, at 12:26 PM, James Cheng <jch...@tivo.com> wrote:
>>>> 
>>>>> This is a great set of projects!
>>>>> 
>>>>> We should put this list of projects on a site somewhere so people can
>>>> more easily see and refer to it. These aren't Kafka-specific, but most seem
>>>> to be "MySQL CDC." Does anyone have a place where they can host a page?
>>>> Preferably a wiki, so we can keep it up to date easily.
>>>>> 
>>>>> -James
>>>>> 
>>>>> On Mar 17, 2015, at 8:21 AM, Hisham Mardam-Bey <
>>>> hisham.mardam...@gmail.com> wrote:
>>>>> 
>>>>>> Pretty much a hijack / plug as well (=
>>>>>> 
>>>>>> https://github.com/mardambey/mypipe
>>>>>> 
>>>>>> "MySQL binary log consumer with the ability to act on changed rows and
>>>>>> publish changes to different systems with emphasis on Apache Kafka."
>>>>>> 
>>>>>> Mypipe currently encodes events using Avro before pushing them into
>>>> Kafka
>>>>>> and is Avro schema repository aware. The project is young; and patches
>>>> for
>>>>>> improvements are appreciated (=
>>>>>> 
>>>>>> On Mon, Mar 16, 2015 at 10:35 PM, Arya Ketan <ketan.a...@gmail.com>
>>>> wrote:
>>>>>> 
>>>>>>> Great work.
>>>>>>> Sorry for kinda hijacking this thread, but I though that we had built
>>>>>>> some-thing on mysql bin log event propagator and wanted to share it .
>>>>>>> You guys can also look into Aesop ( https://github.com/Flipkart/aesop
>>>> ).
>>>>>>> Its
>>>>>>> a change propagation frame-work. It has relays which listens to bin
>>>> logs of
>>>>>>> Mysql, keeps track of SCNs  and has consumers which can then
>>>> (transform/map
>>>>>>> or interpret as is) the bin log-event to a destination. Consumers also
>>>> keep
>>>>>>> track of SCNs and a slow consumer can go back to a previous SCN if it
>>>> wants
>>>>>>> to re-listen to events  ( similar to kafka's consumer view ).
>>>>>>> 
>>>>>>> All the producers/consumers are extensible and you can write your own
>>>>>>> custom consumer and feed off the data to it.
>>>>>>> 
>>>>>>> Common use-cases:
>>>>>>> a) Archive mysql based data into say hbase
>>>>>>> b) Move mysql based data to say a search store for serving reads.
>>>>>>> 
>>>>>>> It has a decent ( not an awesome :) ) console too which gives a nice
>>>> human
>>>>>>> readable view of where the producers and consumers are.
>>>>>>> 
>>>>>>> Current supported producers are mysql bin logs, hbase wall-edits.
>>>>>>> 
>>>>>>> 
>>>>>>> Further insights/reviews/feature reqs/pull reqs/advices are all
>>>> welcome.
>>>>>>> 
>>>>>>> --
>>>>>>> Arya
>>>>>>> 
>>>>>>> Arya
>>>>>>> 
>>>>>>> On Tue, Mar 17, 2015 at 1:48 AM, Gwen Shapira <gshap...@cloudera.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Really really nice!
>>>>>>>> 
>>>>>>>> Thank you.
>>>>>>>> 
>>>>>>>> On Mon, Mar 16, 2015 at 7:18 AM, Pierre-Yves Ritschard <
>>>> p...@spootnik.org
>>>>>>>> 
>>>>>>>> wrote:
>>>>>>>>> Hi kafka,
>>>>>>>>> 
>>>>>>>>> I just wanted to mention I published a very simple project which can
>>>>>>>>> connect as MySQL replication client and stream replication events to
>>>>>>>>> kafka: https://github.com/pyr/sqlstream
>>>>>>>>> 
>>>>>>>>> When you don't have control over an application, it can provide a
>>>>>>> simple
>>>>>>>>> way of consolidating SQL data in kafka.
>>>>>>>>> 
>>>>>>>>> This is an early release and there are a few caveats (mentionned in
>>>> the
>>>>>>>>> README), mostly the poor partitioning which I'm going to evolve
>>>> quickly
>>>>>>>>> and the reconnection strategy which doesn't try to keep track of
>>>> binlog
>>>>>>>>> position, other than that, it should work as advertised.
>>>>>>>>> 
>>>>>>>>> Cheers,
>>>>>>>>> - pyr
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Hisham Mardam-Bey
>>>>>> http://hisham.cc/
>>>>> 
>>>> 
>>>> 
>> 
>

Re: [ANN] sqlstream: Simple MySQL binlog to Kafka stream

Reply via email to