Yeah if you're into Flume you can definitely do per event modification/routing in an interceptor with relative ease. I don't know the size of the total MAC addresses to look up (or actually why a hash partitioning scheme wouldn't just work, but w/e I assume you have your reasons). There's kind of an example of doing this here:
http://blog.cloudera.com/blog/2014/11/flafka-apache-flume-meets-apache-kafka-for-event-processing/ In the example in the blog it uses HBase to read some "profile" data. You could sub this with whatever other store you wanted (Redis, Cassandra, whatever) So you'd go: Your Systems -> Kafka (raw data) -> Flume Source->Interceptor->Kafka Channel(Raw data to the correct topic by setting the Flume Event header) Of course you could code that all yourself too. Whatever floats your boat. Writing interceptors is really easy and there's quite a few examples around. Jeff On Thu, Jan 29, 2015 at 4:10 AM, David Morales <dmora...@stratio.com> wrote: > Hi Toni, > > 1. Kafka can create topics on the fly, in case you need it. > > https://kafka.apache.org/08/configuration.html > > auto.create.topics.enabletrueEnable auto creation of topic on the server. > If this is set to true then attempts to produce, consume, or fetch metadata > for a non-existent topic will automatically create it with the default > replication factor and number of partitions. > > > > 2. About topic selection based on rules/dictionary, this must be solved on > your side. > > You can use custom-code in your app or an event transport solution, like > Flume. > > Flume 1.6 now includes a sink for Kafka, and it already supports dynamic > topics (by using a preprocessor) > > https://github.com/thilinamb/flume-ng-kafka-sink > > > - > > *topic*[optional] > - The topic in Kafka to which the messages will be published. If this > topic is mentioned, every message will be published to the same > topic. If > dynamic topics are required, it's possible to use a preprocessor > instead of > a static topic. It's mandatory that either of the parameters *topic* > or *preprocessor* is provided, because the topic cannot be null when > publishing to Kafka. If none of these parameters are provided, > the messages > will be published to a default topic called default-flume-topic. > > > > Regards. > > > > > 2015-01-29 0:16 GMT+01:00 Lakshmanan Muthuraman <lakshma...@tokbox.com>: > > > Hi Toni, > > > > Couple of thoughts. > > > > 1. Kafka behaviour need not be changed at run time. Your producers which > > push your MAC data into kafka should know to which topic it should write. > > Your producer can be flume, log stash or it can be your own custom > written > > java producer. > > > > As long as your producer know which topic to write, they can keep > creating > > new topics as new MAC data comes through your pipeline. > > > > On Wed, Jan 28, 2015 at 12:10 PM, Toni Cebrián <toni.cebr...@gmail.com> > > wrote: > > > > > Hi, > > > > > > I'm starting to weight different alternatives for data ingestion > and > > > I'd like to know whether Kafka meets the problem I have. > > > Say we have a set of devices each with its own MAC and then we > > receive > > > data in Kafka. There is a dictionary defined elsewhere that says each > MAC > > > to which topic must publish. So I have basically 2 questions: > > > New MACs keep comming and the dictionary must be updated accordingly. > How > > > could I change this Kafka behaviour during runtime? > > > A problem for the future. Say that dictionaries are so big that they > > don't > > > fit in memory. Are there any patterns for bookkeeping internal data > > > structures and how route to them? > > > > > > T. > > > > > > > > > -- > > David Morales de Frías :: +34 607 010 411 :: @dmoralesdf > <https://twitter.com/dmoralesdf> > > > <http://www.stratio.com/> > Vía de las dos Castillas, 33, Ática 4, 3ª Planta > 28224 Pozuelo de Alarcón, Madrid > Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd > <https://twitter.com/StratioBD>* > -- Jeff Holoman Systems Engineer 678-612-9519