Consumer seek on 0.9.0 API

2016-02-17 Thread Péricé Robin
Hi, I'm trying to use the new Consumer API with this example : https://github.com/apache/kafka/tree/trunk/examples/src/main/java/kafka/examples With a Producer I sent 1000 messages to my Kafka broker. I need to know if it's possible, for example, to read message from offset 500 to 1000. What I d

Re: Questions from new user

2016-02-17 Thread Alexis Midon
0. I don't understand how deleting log files quickly relates to file/page cache. Only consumer read patterns are the main factor here afaik. The OS will eventually discard unused cached pages. I'm not an expert of page cache policies though, and will be happy to learn. 1. have a look at per-topic

Re: Kafka as master data store

2016-02-17 Thread Damian Guy
Hi Ted - if the data is keyed you can use a key compacted topic and essentially keep the data 'forever',i.e., you'll always have the latest version of the data for a given key. However, you'd still want to backup the data someplace else just-in-case. On 16 February 2016 at 21:25, Ted Swerve wrote

Re: Replication Factor and number of brokers

2016-02-17 Thread Alexis Midon
it will throw an exception: https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/admin/AdminUtils.scala#L76-L78 On Tue, Feb 16, 2016 at 3:55 PM Alex Loddengaard wrote: > Hi Sean, you'll want equal or more brokers than your replication factor. > Meaning, if your replication factor

Re: Kafka as master data store

2016-02-17 Thread Daniel Schierbeck
I'm also very interested in using Kafka as a persistent, distributed commit log – essentially the write side of a distributed database, with the read side being an array of various query stores (Elasticsearch, Redis, whatever) and stream processing systems. The benefit of retaining data in Kafka i

Re: Compression - MessageSet size

2016-02-17 Thread Alexis Midon
- it would be interesting to see the actual ProduceRequests/Responses and FetchReq/Resp. - at this point I would dive into the broker source code and follow the fetch request handling. https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/KafkaApis.scala#L418 On Tue, Feb 16

Re: Replication Factor and number of brokers

2016-02-17 Thread Ben Stopford
If you create a topic with more replicas than brokers it should throw an error but if you lose a broker you'd have under replicated partitions. B On Tuesday, 16 February 2016, Alex Loddengaard wrote: > Hi Sean, you'll want equal or more brokers than your replication factor. > Meaning, if your r

Re: 0.9.0.1 RC1

2016-02-17 Thread Christian Posta
BTW, what's the etiquette for votes (non-binding) for this community? welcomed? noise? happy to see the non-binding votes, I'd like to contribute, just don't want to pollute the vote call. thoughts? thanks! On Tue, Feb 16, 2016 at 10:56 PM, Jun Rao wrote: > Thanks everyone for voting. The result

Re: Enable Kafka Consumer 0.8.2.1 Reconnect to Zookeeper

2016-02-17 Thread Alexis Midon
By "re-connect", I'm assuming that the ZK session is expired, not disconnected. For details see http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkSessions In that case, the high level consumer is basically dead, and the application should create a new instance of it. On Mon, F

Kafka response ordering guarantees

2016-02-17 Thread Ivan Dyachkov
Hello all. I'm developing a kafka client and have a question about kafka server guarantees. A statement from https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-Network makes me a bit confused: "The server guarantees that on a single TCP con

Re: Enable Kafka Consumer 0.8.2.1 Reconnect to Zookeeper

2016-02-17 Thread Christian Posta
Yep, assuming you haven't completely partitioned that client from the cluster, ZK should automatically try to connect/reconnect to other peers in the server list. Otherwise, it's as Alexis said -- your session would expire; you'd have to recreate the session once you have connectivity On Wed, Feb

Re: Enable Kafka Consumer 0.8.2.1 Reconnect to Zookeeper

2016-02-17 Thread Joe San
) ~[na:1.8.0_60] at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) ~[zookeeper-3.4.6.jar:3.4.6-1569965] at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) ~[zookeeper-3.4.6.jar:3.4.6-1569965] 20160217-20:12:44.960+0100 [sally-kafka-consumer

Resetting Kafka Offsets -- and What are offsets.... exactly?

2016-02-17 Thread John Bickerstaff
*Use Case: Disaster Recovery & Re-indexing SOLR* I'm using Kafka to hold messages from a service that prepares "documents" for SOLR. A second micro service (a consumer) requests these messages, does any final processing, and fires them into SOLR. The whole thing is (in part) designed to be used

Re: Enable Kafka Consumer 0.8.2.1 Reconnect to Zookeeper

2016-02-17 Thread Christian Posta
ketChannelImpl.java:717) > ~[na:1.8.0_60] > at > > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) > ~[zookeeper-3.4.6.jar:3.4.6-1569965] > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) > ~[zookeeper-3.4.6.jar:3.4.6-15

Re: Enable Kafka Consumer 0.8.2.1 Reconnect to Zookeeper

2016-02-17 Thread Joe San
8.0_60] > > at > > > > > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) > > ~[zookeeper-3.4.6.jar:3.4.6-1569965] > > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) > > ~[zookeeper-3.4.6.ja

Re: Resetting Kafka Offsets -- and What are offsets.... exactly?

2016-02-17 Thread Christian Posta
The number is the log-ordered number of bytes. So really, the offset is kinda like the "number of bytes" to begin reading from. 0 means read the log from the beginning. The second message is 0 + size of message. So the message "ids" are really just the offset of the previous message sizes. For exa

Re: Enable Kafka Consumer 0.8.2.1 Reconnect to Zookeeper

2016-02-17 Thread Christian Posta
; > > > > > > > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) > > > ~[zookeeper-3.4.6.jar:3.4.6-1569965] > > > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) > > > ~[zookeeper-3.4.6.jar:3.4.6-

Re: Consumer seek on 0.9.0 API

2016-02-17 Thread Alex Loddengaard
Hi Robin, I believe seek() needs to be called after the consumer gets its partition assignments. Try calling poll() before you call seek(), then poll() again and process the records from the latter poll(). There may be a better way to do this -- let's see if anyone else has a suggestion. Alex O

Re: Replication Factor and number of brokers

2016-02-17 Thread Alex Loddengaard
Thanks, Alexis and Ben! Alex On Wed, Feb 17, 2016 at 5:57 AM, Ben Stopford wrote: > If you create a topic with more replicas than brokers it should throw an > error but if you lose a broker you'd have under replicated partitions. > > B > > On Tuesday, 16 February 2016, Alex Loddengaard wrote:

Re: 0.9.0.1 RC1

2016-02-17 Thread Jun Rao
Christian, Similar to other Apache projects, a vote from a committer is considered binding. During the voting process, we encourage non-committers to vote as well. We will cancel the release even if a critical issue is reported from a non-committer. Thanks, Jun On Tue, Feb 16, 2016 at 11:05 PM,

Re: 0.9.0.1 RC1

2016-02-17 Thread Christian Posta
Awesome, glad to hear. Thanks Jun! On Wed, Feb 17, 2016 at 12:57 PM, Jun Rao wrote: > Christian, > > Similar to other Apache projects, a vote from a committer is considered > binding. During the voting process, we encourage non-committers to vote as > well. We will cancel the release even if a c

Re: 0.9.0.1 RC1

2016-02-17 Thread Gwen Shapira
Actually, for releases, committers are non-binding. PMC votes are the only binding ones for releases. On Wed, Feb 17, 2016 at 11:57 AM, Jun Rao wrote: > Christian, > > Similar to other Apache projects, a vote from a committer is considered > binding. During the voting process, we encourage non-c

Re: 0.9.0.1 RC1

2016-02-17 Thread Christian Posta
yep! http://www.apache.org/dev/release-publishing.html#voted On Wed, Feb 17, 2016 at 1:05 PM, Gwen Shapira wrote: > Actually, for releases, committers are non-binding. PMC votes are the only > binding ones for releases. > > On Wed, Feb 17, 2016 at 11:57 AM, Jun Rao wrote: > > > Christian, > > >

Kafka connect HDFS conenctor

2016-02-17 Thread Venkatesh Rudraraju
Hi, I tried using the HDFS connector sink with kafka-connect and works as described-> http://docs.confluent.io/2.0.0/connect/connect-hdfs/docs/index.html My Scenario : I have plain Json data in a kafka topic. Can I still use HDFS connector sink to read data from kafka-topic and write to HDFS in

Re: Resetting Kafka Offsets -- and What are offsets.... exactly?

2016-02-17 Thread John Bickerstaff
Thank you Christian -- I appreciate your taking the time to help me out on this. Here's what I found while continuing to dig into this. If I take 30024 and subtract the number of messages I know I have in Kafka (3500) I get 26524. If I reset thus: set /kafka/consumers/myGroupName/offsets/myTopi

Re: Resetting Kafka Offsets -- and What are offsets.... exactly?

2016-02-17 Thread John Bickerstaff
Hmmm... more info. So, inside /var/log/kafka-logs/myTopicName-0 I find two files 00026524.index 00026524.log Interestingly, they both bear the number of the "lowest" offset returned by the command I mention above. If I "cat" the 000.26524.log file, I get all my mes

Re: Kafka response ordering guarantees

2016-02-17 Thread Ben Stopford
So long as you set max.inflight.requests.per.connection = 1 Kafka should provide strong ordering within a partition (so use the same key for messages that should retain their order). There is a bug currently raised agaisnt this feature though where there is an edge case that can cause ordering i

Re: Optimize the performance of inserting data to Cassandra with Kafka and Spark Streaming

2016-02-17 Thread yuanjia8...@163.com
Hi Jerry, 1. Make sure that 1000 messages have been sent to kafka, before consuming. 2. If you don't care the sequence between messages, you can use mutiple partition and use more comsumers. LiYuanJia From: Jerry Wong Date: 2016-02-17 05:33 To: users Subject: Optimize the performance

Re: Optimize the performance of inserting data to Cassandra with Kafka and Spark Streaming

2016-02-17 Thread Jerry Wong
Hi LiYuan, Thank you very much for your response. This problem was solved. Actually, a lots of messages are created almost in the same time (even use milliseconds). I changed the key with using "UUID.randomUUID()" with which all messages can be inserted in the Cassandra table without time lag. Re