no brokers found when trying to rebalance
Using kafka 0.8.1.1, the cluster had been healthy with producers and consumers being able to function well. After a restart of the cluster, it looks like consumers are locked out. When I try to consume from a topic, I get this warning: [2015-01-23 07:48:50,667] WARN [console-consumer-32626_kafka-node-1.abc.net-1421999330230-a3d2b9e1], no brokers found when trying to rebalance. (kafka.consumer.ZookeeperConsumerConnector) I don't see any errors in server.log on the kafka nodes and there aren't any firewalls between the hosts (brokers, consumers and producers). I can query the topic state: $ /opt/cloudera/parcels/CLABS_KAFKA/bin/kafka-topics --zookeeper zookeeper-node-1:2181/kafka --describe --topic rawunstruct Topic:rawunstruct PartitionCount:5 ReplicationFactor:3 Configs: Topic: rawunstruct Partition: 0 Leader: 328 Replicas: 328,327,329 Isr: 328,327 Topic: rawunstruct Partition: 1 Leader: 328 Replicas: 329,328,327 Isr: 328,327 Topic: rawunstruct Partition: 2 Leader: 328 Replicas: 327,329,328 Isr: 328,327 Topic: rawunstruct Partition: 3 Leader: 328 Replicas: 328,329,327 Isr: 328,327 Topic: rawunstruct Partition: 4 Leader: 328 Replicas: 329,327,328 Isr: 328,327 However, when I list /kafka/brokers/ids on any of the ZK servers, I don't see any brokers ids. I did upgrade from CDH 5.2 to 5.3 but other components in the stack seem to be able to talk to Zookeeper just fine. Any pointers for troubleshooting? Thanks
Re: no brokers found when trying to rebalance
Yes, turns out that during the upgrade from CDH5.2 to CDH5.3, the namespace changed from "/kafka" to simply "/" on Zk. I spoke with the developers over at Cloudera and they said, using "/" seemed to be the convention. I am going to open a case with them to clearly document the change so other customers don't end up scratching their heads :) I think "/kafka" is just cleaner. On Fri, Jan 23, 2015 at 9:34 AM, Jun Rao wrote: > Is the broker configured with the correct ZK url and the right namespace? > > Thanks, > > Jun > > On Fri, Jan 23, 2015 at 12:17 AM, Tim Smith wrote: > > > Using kafka 0.8.1.1, the cluster had been healthy with producers and > > consumers being able to function well. After a restart of the cluster, it > > looks like consumers are locked out. > > > > When I try to consume from a topic, I get this warning: > > [2015-01-23 07:48:50,667] WARN > > [console-consumer-32626_kafka-node-1.abc.net-1421999330230-a3d2b9e1], no > > brokers found when trying to rebalance. > > (kafka.consumer.ZookeeperConsumerConnector) > > > > I don't see any errors in server.log on the kafka nodes and there aren't > > any firewalls between the hosts (brokers, consumers and producers). > > > > I can query the topic state: > > $ /opt/cloudera/parcels/CLABS_KAFKA/bin/kafka-topics --zookeeper > > zookeeper-node-1:2181/kafka --describe --topic rawunstruct > > Topic:rawunstruct PartitionCount:5 ReplicationFactor:3 Configs: > > Topic: rawunstruct Partition: 0 Leader: 328 Replicas: 328,327,329 Isr: > > 328,327 > > Topic: rawunstruct Partition: 1 Leader: 328 Replicas: 329,328,327 Isr: > > 328,327 > > Topic: rawunstruct Partition: 2 Leader: 328 Replicas: 327,329,328 Isr: > > 328,327 > > Topic: rawunstruct Partition: 3 Leader: 328 Replicas: 328,329,327 Isr: > > 328,327 > > Topic: rawunstruct Partition: 4 Leader: 328 Replicas: 329,327,328 Isr: > > 328,327 > > > > However, when I list /kafka/brokers/ids on any of the ZK servers, I don't > > see any brokers ids. > > > > I did upgrade from CDH 5.2 to 5.3 but other components in the stack seem > to > > be able to talk to Zookeeper just fine. > > > > Any pointers for troubleshooting? > > > > Thanks > > >
Data Structure abstractions over kafka
Hi, In the big data ecosystem, I have started to use kafka, essentially, as a: - unordered list/array, and - a cluster-wide pipe I guess you could argue that any message bus product is a simple array/pipe but kafka's scale and model make things so easy :) I am wondering if there are any abstractions on top of kafka that will let me use kafka to store/organize other simple data structures like a linked-list? I have a use case for massive linked list that can easily grow to tens of gigabytes and could easily use - (1) redundancy (2) multiple producers/consumers working on processing the list (implemented over spark, storm etc). Any ideas? Maybe maintain a linked-list of offsets in another store like ZooKeeper or a NoSQL DB while store the messages on kafka? Thanks, - Tim
Re: Using Kafka as a persistent store
I have had a similar issue where I wanted a single source of truth between Search and HDFS. First, if you zoom out a little, eventually you are going to have some compute engine(s) process the data. If you store it in a compute neutral tier like kafka then you will need to suck the data out at runtime and stage it for the compute engine to use. So pick your poison, process at ingest and store multiple copies of data, one per compute engine, OR store in a neutral store and process at runtime. I am not saying one is better than the other but that's how I see the trade-off so depending on your use cases, YMMV. What I do is: - store raw data into kafka - use spark streaming to transform data to JSON and post it back to kafka - Hang multiple data stores off kafka that ingest the JSON - Not do any other transformations in the "consumer" stores and store the copy as immutable event So I do have multiple copies (one per compute tier) but they all look the same. Unless different compute engines, natively start to use a common data storage format, I don't see how one could get away from storing multiple copies. Primarily, I see Lucene based products have their format, the Hadoop ecosystem seems congregating around Parquet and then the NoSQL players have their formats (one per each product). My 2 cents worth :) On Mon, Jul 13, 2015 at 10:35 AM, Daniel Schierbeck < daniel.schierb...@gmail.com> wrote: > Am I correct in assuming that Kafka will only retain a file handle for the > last segment of the log? If the number of handles grows unbounded, then it > would be an issue. But I plan on writing to this topic continuously anyway, > so not separating data into cold and hot storage is the entire point. > > Daniel Schierbeck > > > On 13. jul. 2015, at 15.41, Scott Thibault < > scott.thiba...@multiscalehn.com> wrote: > > > > We've tried to use Kafka not as a persistent store, but as a long-term > > archival store. An outstanding issue we've had with that is that the > > broker holds on to an open file handle on every file in the log! The > other > > issue we've had is when you create a long-term archival log on shared > > storage, you can't simply access that data from another cluster b/c of > meta > > data being stored in zookeeper rather than in the log. > > > > --Scott Thibault > > > > > > On Mon, Jul 13, 2015 at 4:44 AM, Daniel Schierbeck < > > daniel.schierb...@gmail.com> wrote: > > > >> Would it be possible to document how to configure Kafka to never delete > >> messages in a topic? It took a good while to figure this out, and I see > it > >> as an important use case for Kafka. > >> > >> On Sun, Jul 12, 2015 at 3:02 PM Daniel Schierbeck < > >> daniel.schierb...@gmail.com> wrote: > >> > >>> > On 10. jul. 2015, at 23.03, Jay Kreps wrote: > > If I recall correctly, setting log.retention.ms and > >> log.retention.bytes > >>> to > -1 disables both. > >>> > >>> Thanks! > >>> > > On Fri, Jul 10, 2015 at 1:55 PM, Daniel Schierbeck < > daniel.schierb...@gmail.com> wrote: > > > > >> On 10. jul. 2015, at 15.16, Shayne S wrote: > >> > >> There are two ways you can configure your topics, log compaction and > >>> with > >> no cleaning. The choice depends on your use case. Are the records > > uniquely > >> identifiable and will they receive updates? Then log compaction is > >> the > > way > >> to go. If they are truly read only, you can go without log > >> compaction. > > > > I'd rather be free to use the key for partitioning, and the records > >> are > > immutable — they're event records — so disabling compaction > altogether > > would be preferable. How is that accomplished? > >> > >> We have a small processes which consume a topic and perform upserts > >> to > > our > >> various database engines. It's easy to change how it all works and > >>> simply > >> consume the single source of truth again. > >> > >> I've written a bit about log compaction here: > >>> > http://www.shayne.me/blog/2015/2015-06-25-everything-about-kafka-part-2/ > >> > >> On Fri, Jul 10, 2015 at 3:46 AM, Daniel Schierbeck < > >> daniel.schierb...@gmail.com> wrote: > >> > >>> I'd like to use Kafka as a persistent store – sort of as an > >>> alternative > > to > >>> HDFS. The idea is that I'd load the data into various other systems > >> in > >>> order to solve specific needs such as full-text search, analytics, > > indexing > >>> by various attributes, etc. I'd like to keep a single source of > >> truth, > >>> however. > >>> > >>> I'm struggling a bit to understand how I can configure a topic to > >>> retain > >>> messages indefinitely. I want to make sure that my data isn't > >> deleted. > > Is > >>> there a guide to configuring Kafka like this? > > > > > > > > -- > > *This e-mail is not encrypted. Due to the unsecured nature of > unencrypted > > e-mail, there may be s
Limits of REST interface to kafka
At the outset, this isn't about challenging the work that has been done, primarily by folks @ Confluence for wrapping kafka in a REST API. Clearly, there is a use case for a REST Service and they rose up to the challenge. That said, I am trying to evaluate the limitations of a REST service around kafka, if any, specifically for a Spark Streaming app as a consumer? The reason I ask because I have been asked to move my Spark Streaming app, that's a native kafka consumer, to a REST service that provides security (auth/encryption). And, I am trying to figure out what requirements should I place on the REST service if I am give up my direct access to the kafka cluster? With Spark Streaming, what I have learned is that you need to maintain a certain ratio between Spark executors and kafka partitions to balance the processing load evenly on the app. The consumer group re-balancing in kafka works really well to transition a load off a dead Spark executor to an idle one. Not sure, how these features and other features that I use almost unknowingly in kafka translate to a REST service like the kafka-rest provides? More broadly, are there features in kafka that inherently run into limitations of the stateless/REST model? Thanks, Tim
Re: Flume use case for Kafka & HDFS
Not out of the box, no - I don't think you can use an attribute of the posted JSON to specify topics for kafka or folder for HDFS. For dynamically creating topics in kafka, you would have to write some kind of custom kafka producer - the kafka channel or sink in flume requires a kafka topic to be defined in the flume config. For HDFS sink in flume, you can create parameters with a custom interceptor or maybe use morphlines/grok and then pass them to the hdfs sink. On Sat, Sep 19, 2015 at 11:26 AM, Hemanth Abbina wrote: > I'm new to Flume and thinking to use Flume in the below scenario. > > Our system receives events as HTTP POST, and we need to store them in > Kafka(for processing) as well as HDFS(as permanent store). > > Can we configure Flume as below ? > > * Source: HTTP (expecting JSON event as HTTP body, with a dynamic > topic name in the URI) > > * Channel: KAFKA (should store the received JSON body, to a topic > mentioned in the URI) > > * Sink: HDFS (should store the data in a folder mentioned in the > URI. > > For example, If I receive a JSON event from a HTTP source with the below > attributes, > > * URL: https://xx.xx.xx.xx/event/abc > > * Body of POST: { name: xyz, value=123} > > > The event should be saved to Kafka channel - with topic 'abc' and written > to HDFS to a folder as 'abc'. > This 'abc' will be dynamic and change from event to event. > > Is this possible with Flume ? > > Thanks in advance > Hemanth > -- -- Thanks, Tim