no brokers found when trying to rebalance

2015-01-23 Thread Tim Smith
Using kafka 0.8.1.1, the cluster had been healthy with producers and
consumers being able to function well. After a restart of the cluster, it
looks like consumers are locked out.

When I try to consume from a topic, I get this warning:
[2015-01-23 07:48:50,667] WARN
[console-consumer-32626_kafka-node-1.abc.net-1421999330230-a3d2b9e1], no
brokers found when trying to rebalance.
(kafka.consumer.ZookeeperConsumerConnector)

I don't see any errors in server.log on the kafka nodes and there aren't
any firewalls between the hosts (brokers, consumers and producers).

I can query the topic state:
$ /opt/cloudera/parcels/CLABS_KAFKA/bin/kafka-topics --zookeeper
zookeeper-node-1:2181/kafka --describe --topic rawunstruct
Topic:rawunstruct PartitionCount:5 ReplicationFactor:3 Configs:
Topic: rawunstruct Partition: 0 Leader: 328 Replicas: 328,327,329 Isr:
328,327
Topic: rawunstruct Partition: 1 Leader: 328 Replicas: 329,328,327 Isr:
328,327
Topic: rawunstruct Partition: 2 Leader: 328 Replicas: 327,329,328 Isr:
328,327
Topic: rawunstruct Partition: 3 Leader: 328 Replicas: 328,329,327 Isr:
328,327
Topic: rawunstruct Partition: 4 Leader: 328 Replicas: 329,327,328 Isr:
328,327

However, when I list /kafka/brokers/ids on any of the ZK servers, I don't
see any brokers ids.

I did upgrade from CDH 5.2 to 5.3 but other components in the stack seem to
be able to talk to Zookeeper just fine.

Any pointers for troubleshooting?

Thanks


Re: no brokers found when trying to rebalance

2015-01-24 Thread Tim Smith
Yes, turns out that during the upgrade from CDH5.2 to CDH5.3, the namespace
changed from "/kafka" to simply "/" on Zk. I spoke with the developers over
at Cloudera and they said, using "/" seemed to be the convention. I am
going to open a case with them to clearly document the change so other
customers don't end up scratching their heads :)

I think "/kafka" is just cleaner.



On Fri, Jan 23, 2015 at 9:34 AM, Jun Rao  wrote:

> Is the broker configured with the correct ZK url and the right namespace?
>
> Thanks,
>
> Jun
>
> On Fri, Jan 23, 2015 at 12:17 AM, Tim Smith  wrote:
>
> > Using kafka 0.8.1.1, the cluster had been healthy with producers and
> > consumers being able to function well. After a restart of the cluster, it
> > looks like consumers are locked out.
> >
> > When I try to consume from a topic, I get this warning:
> > [2015-01-23 07:48:50,667] WARN
> > [console-consumer-32626_kafka-node-1.abc.net-1421999330230-a3d2b9e1], no
> > brokers found when trying to rebalance.
> > (kafka.consumer.ZookeeperConsumerConnector)
> >
> > I don't see any errors in server.log on the kafka nodes and there aren't
> > any firewalls between the hosts (brokers, consumers and producers).
> >
> > I can query the topic state:
> > $ /opt/cloudera/parcels/CLABS_KAFKA/bin/kafka-topics --zookeeper
> > zookeeper-node-1:2181/kafka --describe --topic rawunstruct
> > Topic:rawunstruct PartitionCount:5 ReplicationFactor:3 Configs:
> > Topic: rawunstruct Partition: 0 Leader: 328 Replicas: 328,327,329 Isr:
> > 328,327
> > Topic: rawunstruct Partition: 1 Leader: 328 Replicas: 329,328,327 Isr:
> > 328,327
> > Topic: rawunstruct Partition: 2 Leader: 328 Replicas: 327,329,328 Isr:
> > 328,327
> > Topic: rawunstruct Partition: 3 Leader: 328 Replicas: 328,329,327 Isr:
> > 328,327
> > Topic: rawunstruct Partition: 4 Leader: 328 Replicas: 329,327,328 Isr:
> > 328,327
> >
> > However, when I list /kafka/brokers/ids on any of the ZK servers, I don't
> > see any brokers ids.
> >
> > I did upgrade from CDH 5.2 to 5.3 but other components in the stack seem
> to
> > be able to talk to Zookeeper just fine.
> >
> > Any pointers for troubleshooting?
> >
> > Thanks
> >
>


Data Structure abstractions over kafka

2015-07-13 Thread Tim Smith
Hi,

In the big data ecosystem, I have started to use kafka, essentially, as a:
-  unordered list/array, and
- a cluster-wide pipe

I guess you could argue that any message bus product is a simple array/pipe
but kafka's scale and model make things so easy :)

I am wondering if there are any abstractions on top of kafka that will let
me use kafka to store/organize other simple data structures like a
linked-list? I have a use case for massive linked list that can easily grow
to tens of gigabytes and could easily use - (1) redundancy (2) multiple
producers/consumers working on processing the list (implemented over spark,
storm etc).

Any ideas? Maybe maintain a linked-list of offsets in another store like
ZooKeeper or a NoSQL DB while store the messages on kafka?

Thanks,

- Tim


Re: Using Kafka as a persistent store

2015-07-13 Thread Tim Smith
I have had a similar issue where I wanted a single source of truth between
Search and HDFS. First, if you zoom out a little, eventually you are going
to have some compute engine(s) process the data. If you store it in a
compute neutral tier like kafka then you will need to suck the data out at
runtime and stage it for the compute engine to use. So pick your poison,
process at ingest and store multiple copies of data, one per compute
engine, OR store in a neutral store and process at runtime. I am not saying
one is better than the other but that's how I see the trade-off so
depending on your use cases, YMMV.

What I do is:
- store raw data into kafka
- use spark streaming to transform data to JSON and post it back to kafka
- Hang multiple data stores off kafka that ingest the JSON
- Not do any other transformations in the "consumer" stores and store the
copy as immutable event

So I do have multiple copies (one per compute tier) but they all look the
same.

Unless different compute engines, natively start to use a common data
storage format, I don't see how one could get away from storing multiple
copies. Primarily, I see Lucene based products have their format, the
Hadoop ecosystem seems congregating around Parquet and then the NoSQL
players have their formats (one per each product).

My 2 cents worth :)



On Mon, Jul 13, 2015 at 10:35 AM, Daniel Schierbeck <
daniel.schierb...@gmail.com> wrote:

> Am I correct in assuming that Kafka will only retain a file handle for the
> last segment of the log? If the number of handles grows unbounded, then it
> would be an issue. But I plan on writing to this topic continuously anyway,
> so not separating data into cold and hot storage is the entire point.
>
> Daniel Schierbeck
>
> > On 13. jul. 2015, at 15.41, Scott Thibault <
> scott.thiba...@multiscalehn.com> wrote:
> >
> > We've tried to use Kafka not as a persistent store, but as a long-term
> > archival store.  An outstanding issue we've had with that is that the
> > broker holds on to an open file handle on every file in the log!  The
> other
> > issue we've had is when you create a long-term archival log on shared
> > storage, you can't simply access that data from another cluster b/c of
> meta
> > data being stored in zookeeper rather than in the log.
> >
> > --Scott Thibault
> >
> >
> > On Mon, Jul 13, 2015 at 4:44 AM, Daniel Schierbeck <
> > daniel.schierb...@gmail.com> wrote:
> >
> >> Would it be possible to document how to configure Kafka to never delete
> >> messages in a topic? It took a good while to figure this out, and I see
> it
> >> as an important use case for Kafka.
> >>
> >> On Sun, Jul 12, 2015 at 3:02 PM Daniel Schierbeck <
> >> daniel.schierb...@gmail.com> wrote:
> >>
> >>>
>  On 10. jul. 2015, at 23.03, Jay Kreps  wrote:
> 
>  If I recall correctly, setting log.retention.ms and
> >> log.retention.bytes
> >>> to
>  -1 disables both.
> >>>
> >>> Thanks!
> >>>
> 
>  On Fri, Jul 10, 2015 at 1:55 PM, Daniel Schierbeck <
>  daniel.schierb...@gmail.com> wrote:
> 
> >
> >> On 10. jul. 2015, at 15.16, Shayne S  wrote:
> >>
> >> There are two ways you can configure your topics, log compaction and
> >>> with
> >> no cleaning. The choice depends on your use case. Are the records
> > uniquely
> >> identifiable and will they receive updates? Then log compaction is
> >> the
> > way
> >> to go. If they are truly read only, you can go without log
> >> compaction.
> >
> > I'd rather be free to use the key for partitioning, and the records
> >> are
> > immutable — they're event records — so disabling compaction
> altogether
> > would be preferable. How is that accomplished?
> >>
> >> We have a small processes which consume a topic and perform upserts
> >> to
> > our
> >> various database engines. It's easy to change how it all works and
> >>> simply
> >> consume the single source of truth again.
> >>
> >> I've written a bit about log compaction here:
> >>>
> http://www.shayne.me/blog/2015/2015-06-25-everything-about-kafka-part-2/
> >>
> >> On Fri, Jul 10, 2015 at 3:46 AM, Daniel Schierbeck <
> >> daniel.schierb...@gmail.com> wrote:
> >>
> >>> I'd like to use Kafka as a persistent store – sort of as an
> >>> alternative
> > to
> >>> HDFS. The idea is that I'd load the data into various other systems
> >> in
> >>> order to solve specific needs such as full-text search, analytics,
> > indexing
> >>> by various attributes, etc. I'd like to keep a single source of
> >> truth,
> >>> however.
> >>>
> >>> I'm struggling a bit to understand how I can configure a topic to
> >>> retain
> >>> messages indefinitely. I want to make sure that my data isn't
> >> deleted.
> > Is
> >>> there a guide to configuring Kafka like this?
> >
> >
> >
> > --
> > *This e-mail is not encrypted.  Due to the unsecured nature of
> unencrypted
> > e-mail, there may be s

Limits of REST interface to kafka

2015-09-24 Thread Tim Smith
At the outset, this isn't about challenging the work that has been done,
primarily by folks @ Confluence for wrapping kafka in a REST API. Clearly,
there is a use case for a REST Service and they rose up to the challenge.

That said, I am trying to evaluate the limitations of a REST service around
kafka, if any, specifically for a Spark Streaming app as a consumer?

The reason I ask because I have been asked to move my Spark Streaming app,
that's a native kafka consumer, to a REST service that provides security
(auth/encryption). And, I am trying to figure out what requirements should
I place on the REST service if I am give up my direct access to the kafka
cluster?

With Spark Streaming, what I have learned is that you need to maintain a
certain ratio between Spark executors and kafka partitions to balance the
processing load evenly on the app. The consumer group re-balancing in kafka
works really well to transition a load off a dead Spark executor to an idle
one. Not sure, how these features and other features that I use almost
unknowingly in kafka translate to a REST service like the kafka-rest
provides?

More broadly, are there features in kafka that inherently run into
limitations of the stateless/REST model?

Thanks,

Tim


Re: Flume use case for Kafka & HDFS

2015-09-24 Thread Tim Smith
Not out of the box, no - I don't think you can use an attribute of the
posted JSON to specify topics for kafka or folder for HDFS.

For dynamically creating topics in kafka, you would have to write some kind
of custom kafka producer - the kafka channel or sink in flume requires a
kafka topic to be defined in the flume config. For HDFS sink in flume, you
can create parameters with a custom interceptor or maybe use
morphlines/grok and then pass them to the hdfs sink.



On Sat, Sep 19, 2015 at 11:26 AM, Hemanth Abbina 
wrote:

> I'm new to Flume and thinking to use Flume in the below scenario.
>
> Our system receives events as HTTP POST, and we need to store them in
> Kafka(for processing) as well as HDFS(as permanent store).
>
> Can we configure Flume as below ?
>
> * Source:  HTTP (expecting JSON event as HTTP body, with a dynamic
> topic name in the URI)
>
> * Channel: KAFKA (should store the received JSON body, to a topic
> mentioned in the URI)
>
> * Sink:  HDFS (should store the data in a folder mentioned in the
> URI.
>
> For example, If I receive a JSON event from a HTTP source with the below
> attributes,
>
> * URL: https://xx.xx.xx.xx/event/abc
>
> * Body of POST:  { name: xyz, value=123}
>
>
> The event should be saved to Kafka channel - with topic 'abc' and written
> to HDFS to a folder as 'abc'.
> This 'abc' will be dynamic and change from event to event.
>
> Is this possible with Flume ?
>
> Thanks in advance
> Hemanth
>



-- 

--
Thanks,

Tim