Thanks, Hans for the insight. Will use compacted topic. On Thu, Dec 15, 2016 at 3:53 PM, Hans Jespersen <h...@confluent.io> wrote:
> for #2 definitely use a compacted topic. Compaction will remove old > messages and keep the last update for each key. To use this function you > will need to publish messages as Key/Value pairs. Apache Kafka 0.10.1 has > some important fixes to make compacted topics more reliable when scaling to > large numbers of keys so make sure to use the latest release if this > becomes a large amount of data. > > #3 sounds like a Kafka Sink Connector for Solr (something like this > https://github.com/jcustenborder/kafka-connect-solr) > > #4 messages in compacted topics do not expire and are only removed when > updated by a newer message of the same key. > > -hans > > /** > * Hans Jespersen, Principal Systems Engineer, Confluent Inc. > * h...@confluent.io (650)924-2670 > */ > > On Thu, Dec 15, 2016 at 10:16 AM, Kenny Gorman <ke...@eventador.io> wrote: > > > A couple thoughts.. > > > > - If you plan on fetching old messages in a non-contiguous manner then > > this may not be the best design. For instance, “give me messages from > > mondays for the last 3 quarters” is better served with a database. But if > > you want to say “give me messages from the last month until now” that > works > > great. > > > > - I am not sure what you mean by updating messages. You would need to > have > > some sort of key and push in new messages with that key. Then when you > read > > by key, the application should understand that the latest is the version > it > > should use. > > > > - Alternatively, you can consume to something like a DB and use SQL to > > select what you want using regular SQL. We see this pattern a lot. > > > > - For storing messages indefinitely it’s mostly making sure the config > > options are set appropriately and you have enough storage space. Set > > replication to something that makes you comfortable, maybe take backups > as > > was mentioned. > > > > Hope this helps some > > > > Kenny Gorman > > Founder > > www.eventador.io > > > > > > > On Dec 15, 2016, at 12:00 PM, Susheel Kumar <susheel2...@gmail.com> > > wrote: > > > > > > Hello Folks, > > > > > > I am going thru an existing design where Kafka is planned to be > utilised > > in > > > below manner > > > > > > > > > 1. Messages will pushed to Kafka by producers > > > 2. There will be updates to existing messages on ongoing basis. The > > > expectation is that all the updates are consolidated in Kafka and the > > > latest and greatest version/copy is kept > > > 3. Consumers will read the messages from Kafka and push to Solr for > > > ingestion purposes > > > 4. There will be no purging/removal of messages since it is expected > to > > > replay the messages in the future and perform full-re-ingestion. So > > > messages will be kept in Kafka for indefinite period similar to > > database > > > where data once stored remains there and can be used later in teh > > future. > > > > > > > > > Do you see any pitfalls / any issue with this design especially wrt to > > > storing the messages indefinitely. > > > > > > > > > Thanks, > > > Susheel > > > > >