Linkedin Gabblin compaction tool is using Hive to perform the compaction. Does it mean Lumos is replaced?
Confused… On Mar 17, 2015, at 10:00 PM, Xiao <lixiao1...@gmail.com> wrote: > Hi, all, > > Do you know whether Linkedin plans to open source Lumos in the near future? > > I found the answer from Qiao Lin’s post about replication from Oracle/mySQL > to Hadoop. > > - https://engineering.linkedin.com/data-ingestion/gobblin-big-data-ease > > At the source side, it can be DataBus-based or file based. > > At the target side, it is Lumos to rebuild the snapshots due to inability to > do an update/delete in Hadoop. > > The slides about Lumos: > http://www.slideshare.net/Hadoop_Summit/th-220p230-cramachandranv1 > The talk about Lumos: > https://www.youtube.com/watch?v=AGlRjlrNDYk > > Event publishing is different from database replication. Kafka is used for > change publishing or maybe also used for sending changes (recorded in files). > > Thanks, > > Xiao Li > > On Mar 17, 2015, at 7:26 PM, Arya Ketan <ketan.a...@gmail.com> wrote: > >> AFAIK , linkedin uses databus to do the same. Aesop is built on top of >> databus , extending its beautiful capabilities to mysql n hbase >> On Mar 18, 2015 7:37 AM, "Xiao" <lixiao1...@gmail.com> wrote: >> >>> Hi, all, >>> >>> Do you know how Linkedin team publishes changed rows in Oracle to Kafka? I >>> believe they already knew the whole problem very well. >>> >>> Using triggers? or directly parsing the log? or using any Oracle >>> GoldenGate interfaces? >>> >>> Any lesson or any standard message format? Could the Linkedin people share >>> it with us? I believe it can help us a lot. >>> >>> Thanks, >>> >>> Xiao Li >>> >>> >>> On Mar 17, 2015, at 12:26 PM, James Cheng <jch...@tivo.com> wrote: >>> >>>> This is a great set of projects! >>>> >>>> We should put this list of projects on a site somewhere so people can >>> more easily see and refer to it. These aren't Kafka-specific, but most seem >>> to be "MySQL CDC." Does anyone have a place where they can host a page? >>> Preferably a wiki, so we can keep it up to date easily. >>>> >>>> -James >>>> >>>> On Mar 17, 2015, at 8:21 AM, Hisham Mardam-Bey < >>> hisham.mardam...@gmail.com> wrote: >>>> >>>>> Pretty much a hijack / plug as well (= >>>>> >>>>> https://github.com/mardambey/mypipe >>>>> >>>>> "MySQL binary log consumer with the ability to act on changed rows and >>>>> publish changes to different systems with emphasis on Apache Kafka." >>>>> >>>>> Mypipe currently encodes events using Avro before pushing them into >>> Kafka >>>>> and is Avro schema repository aware. The project is young; and patches >>> for >>>>> improvements are appreciated (= >>>>> >>>>> On Mon, Mar 16, 2015 at 10:35 PM, Arya Ketan <ketan.a...@gmail.com> >>> wrote: >>>>> >>>>>> Great work. >>>>>> Sorry for kinda hijacking this thread, but I though that we had built >>>>>> some-thing on mysql bin log event propagator and wanted to share it . >>>>>> You guys can also look into Aesop ( https://github.com/Flipkart/aesop >>> ). >>>>>> Its >>>>>> a change propagation frame-work. It has relays which listens to bin >>> logs of >>>>>> Mysql, keeps track of SCNs and has consumers which can then >>> (transform/map >>>>>> or interpret as is) the bin log-event to a destination. Consumers also >>> keep >>>>>> track of SCNs and a slow consumer can go back to a previous SCN if it >>> wants >>>>>> to re-listen to events ( similar to kafka's consumer view ). >>>>>> >>>>>> All the producers/consumers are extensible and you can write your own >>>>>> custom consumer and feed off the data to it. >>>>>> >>>>>> Common use-cases: >>>>>> a) Archive mysql based data into say hbase >>>>>> b) Move mysql based data to say a search store for serving reads. >>>>>> >>>>>> It has a decent ( not an awesome :) ) console too which gives a nice >>> human >>>>>> readable view of where the producers and consumers are. >>>>>> >>>>>> Current supported producers are mysql bin logs, hbase wall-edits. >>>>>> >>>>>> >>>>>> Further insights/reviews/feature reqs/pull reqs/advices are all >>> welcome. >>>>>> >>>>>> -- >>>>>> Arya >>>>>> >>>>>> Arya >>>>>> >>>>>> On Tue, Mar 17, 2015 at 1:48 AM, Gwen Shapira <gshap...@cloudera.com> >>>>>> wrote: >>>>>> >>>>>>> Really really nice! >>>>>>> >>>>>>> Thank you. >>>>>>> >>>>>>> On Mon, Mar 16, 2015 at 7:18 AM, Pierre-Yves Ritschard < >>> p...@spootnik.org >>>>>>> >>>>>>> wrote: >>>>>>>> Hi kafka, >>>>>>>> >>>>>>>> I just wanted to mention I published a very simple project which can >>>>>>>> connect as MySQL replication client and stream replication events to >>>>>>>> kafka: https://github.com/pyr/sqlstream >>>>>>>> >>>>>>>> When you don't have control over an application, it can provide a >>>>>> simple >>>>>>>> way of consolidating SQL data in kafka. >>>>>>>> >>>>>>>> This is an early release and there are a few caveats (mentionned in >>> the >>>>>>>> README), mostly the poor partitioning which I'm going to evolve >>> quickly >>>>>>>> and the reconnection strategy which doesn't try to keep track of >>> binlog >>>>>>>> position, other than that, it should work as advertised. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> - pyr >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Hisham Mardam-Bey >>>>> http://hisham.cc/ >>>> >>> >>> >