Re: Kafka as a database/repository question

Hans Jespersen Thu, 15 Dec 2016 12:54:06 -0800

for #2 definitely use a compacted topic. Compaction will remove old
messages and keep the last update for each key. To use this function you
will need to publish messages as Key/Value pairs. Apache Kafka 0.10.1 has
some important fixes to make compacted topics more reliable when scaling to
large numbers of keys so make sure to use the latest release if this
becomes a large amount of data.


#3 sounds like a Kafka Sink Connector for Solr (something like this
https://github.com/jcustenborder/kafka-connect-solr)

#4 messages in compacted topics do not expire and are only removed when
updated by a newer message of the same key.

-hans

/**
 * Hans Jespersen, Principal Systems Engineer, Confluent Inc.
 * h...@confluent.io (650)924-2670
 */

On Thu, Dec 15, 2016 at 10:16 AM, Kenny Gorman <ke...@eventador.io> wrote:

> A couple thoughts..
>
> - If you plan on fetching old messages in a non-contiguous manner then
> this may not be the best design. For instance, “give me messages from
> mondays for the last 3 quarters” is better served with a database. But if
> you want to say “give me messages from the last month until now” that works
> great.
>
> - I am not sure what you mean by updating messages. You would need to have
> some sort of key and push in new messages with that key. Then when you read
> by key, the application should understand that the latest is the version it
> should use.
>
> - Alternatively, you can consume to something like a DB and use SQL to
> select what you want using regular SQL. We see this pattern a lot.
>
> - For storing messages indefinitely it’s mostly making sure the config
> options are set appropriately and you have enough storage space. Set
> replication to something that makes you comfortable, maybe take backups as
> was mentioned.
>
> Hope this helps some
>
> Kenny Gorman
> Founder
> www.eventador.io
>
>
> > On Dec 15, 2016, at 12:00 PM, Susheel Kumar <susheel2...@gmail.com>
> wrote:
> >
> > Hello Folks,
> >
> > I am going thru an existing design where Kafka is planned to be utilised
> in
> > below manner
> >
> >
> >   1. Messages will pushed to Kafka by producers
> >   2. There will be updates to existing messages on ongoing basis.  The
> >   expectation is that all the updates are consolidated in Kafka and the
> >   latest and greatest version/copy is kept
> >   3. Consumers will read the messages from Kafka and push to Solr for
> >   ingestion purposes
> >   4. There will be no purging/removal of messages since it is expected to
> >   replay the messages in the future and perform full-re-ingestion.  So
> >   messages will be kept in Kafka for indefinite period similar to
> database
> >   where data once stored remains there and can be used later in teh
> future.
> >
> >
> > Do you see any pitfalls / any issue with this design especially wrt to
> > storing the messages indefinitely.
> >
> >
> > Thanks,
> > Susheel
>
>

Re: Kafka as a database/repository question

Reply via email to