Re: Migrating data from old brokers to new borkers question

2014-09-16 Thread Gwen Shapira
Since these tools are so useful, I wonder what it requires (from both Airbnb and Kafka) to merge this into Kafka project. I think there are couple of Jira regarding improved tool usability that this resolved. On Mon, Sep 15, 2014 at 11:45 AM, Alexis Midon wrote: > distribution will be even based

Re: Non-blocking High-Level Consumer

2014-09-16 Thread Gwen Shapira
For Fluffka, I created a wrapping function: IterStatus timedHasNext() { try { long startTime = System.currentTimeMillis(); it.hasNext(); long endTime = System.currentTimeMillis(); return new IterStatus(true,endTime-startTime); } catch (ConsumerTimeoutException e)

Re: How to use kafka as flume source.

2014-09-19 Thread Gwen Shapira
Just to update (better late than never!): The Kafka source & sink for Flume were updated to latest Kafka version and improved a bit (offsets are now committed after data is written to Flume channel). If you build Flume from trunk, you'll get these. Gwen On Sun, Aug 3, 2014 at 10:31 AM, Andrew Ehr

Re: Read a specific number of messages using kafka

2014-09-25 Thread Gwen Shapira
Using high level consumer and assuming you already created an iterator: while (msgCount < maxMessages && it.hasNext()) { bytes = it.next().message(); eventList.add(bytes); } (See a complete example here: https://github.com/apache/flume/blob/trunk/flume-ng-sources/flume-kafka-source/src/main/jav

Re: programmatically get number of items in topic/partition

2014-10-01 Thread Gwen Shapira
Take a look at ConsumerOffsetChecker. It does just that: print the offset and lag for each consumer and partition. You can either use that class directly, or use it as a guideline for your implementation On Wed, Oct 1, 2014 at 2:10 AM, Shlomi Hazan wrote: > Hi, > How can I programmatically get t

Re: Reassigning Partition Failing

2014-10-06 Thread Gwen Shapira
Do we have a jira to support removal of dead brokers without having to start a new broker with the same id? I think its something we'll want to allow. On Thu, Oct 2, 2014 at 7:45 AM, Jun Rao wrote: > The reassign partition process only completes after the new replicas are > fully caught up and t

Re: Kafka AWS deployment + UI console

2014-10-07 Thread Gwen Shapira
I'm using Hue's ZooKeeper app: http://gethue.com/new-zookeeper-browser-app/ This UI looks very cute, but I didn't try it yet: https://github.com/claudemamo/kafka-web-console Gwen On Tue, Oct 7, 2014 at 5:08 PM, Shafaq wrote: > We are going to deploy Kafka in Production and also monitor it via c

Re: Producer connection timing out

2014-10-08 Thread Gwen Shapira
can you check that you can connect on port 9092 from producer to broker? (check with telnet or something similar) ping may succeed when a port is blocked. On Wed, Oct 8, 2014 at 9:40 AM, ravi singh wrote: > Even though I am able to ping to the broker machine from my producer > machine , the produ

Re: Load Balancing Consumers or Multiple consumers reading off same topic

2014-10-08 Thread Gwen Shapira
If you use the high level consumer implementation, and register all consumers as part of the same group - they will load-balance automatically. When you add a consumer to the group, if there are enough partitions in the topic, some of the partitions will be assigned to the new consumer. When a con

Re: Load Balancing Consumers or Multiple consumers reading off same topic

2014-10-08 Thread Gwen Shapira
harninder > > > On Wed, Oct 8, 2014 at 11:35 PM, Gwen Shapira wrote: > >> If you use the high level consumer implementation, and register all >> consumers as part of the same group - they will load-balance >> automatically. >> >> When you add a consumer to

Re: Load Balancing Consumers or Multiple consumers reading off same topic

2014-10-08 Thread Gwen Shapira
ddff9-0 Isn't Kafka the best thing ever? :) Gwen On Wed, Oct 8, 2014 at 11:23 AM, Gwen Shapira wrote: > yep. exactly. > > On Wed, Oct 8, 2014 at 11:07 AM, Sharninder wrote: >> Thanks Gwen. >> >> When you're saying that I can add consumers to the same g

Re: Auto Purging Consumer Group Configuration [Especially Kafka Console Group]

2014-10-09 Thread Gwen Shapira
The problem with Kafka is that we never know when a consumer is "truly" inactive. But - if you decide to define inactive as consumer who's last offset is lower than anything available on the log (or perhaps lagging by over X messages?), its fairly easy to write a script to detect and clean them di

Re: kafka java api (written in 100% clojure)

2014-10-13 Thread Gwen Shapira
Out of curiosity: did you choose Redis because ZooKeeper is not well supported in Clojure? Or were there other reasons? On Mon, Oct 13, 2014 at 2:04 PM, Gerrit Jansen van Vuuren wrote: > Hi Steven, > > Redis: > > I've had a discussion on redis today, and one architecture that does come > up is

Re: Achieving Consistency and Durability

2014-10-14 Thread Gwen Shapira
ack = 2 *will* throw an exception when there's only one node in ISR. The problem with ack=2 is that if you have 3 replicas and you got acks from 2 of them, the one replica which did not get the message can still be in ISR and get elected as leader, leading for a loss of the message. If you specify

Re: Consistency and Availability on Node Failures

2014-10-16 Thread Gwen Shapira
Just note that this is not a universal solution. Many use-cases care about which partition you end up writing to since partitions are used to... well, partition logical entities such as customers and users. On Wed, Oct 15, 2014 at 9:03 PM, Jun Rao wrote: > Kyle, > > What you wanted is not supp

Re: Cross-Data-Center Mirroring, and Guaranteed Minimum Time Period on Data

2014-10-16 Thread Gwen Shapira
I assume the messages themselves contain the timestamp? If you use Flume, you can configure a Kafka source to pull data from Kafka, use an interceptor to pull the date out of your message and place it in the event header and then the HDFS sink can write to a partition based on the timestamp. Gwen

Re: Topic doesn't exist exception

2014-10-17 Thread Gwen Shapira
If you have "auto.create.topics.enable" set to "true" (default), producing to a topic creates it. Its a bit tricky because the "send" that creates the topic can fail with "leader not found" or similar issue. retrying few times will eventually succeed as the topic gets created and the leader gets e

Re: Topic doesn't exist exception

2014-10-17 Thread Gwen Shapira
0.8.1.1 producer is Sync by default, and you can set producer.type to async if needed. On Fri, Oct 17, 2014 at 2:57 PM, Mohit Anchlia wrote: > Thanks! How can I tell if I am using async producer? I thought all the > sends are async in nature > On Fri, Oct 17, 2014 at 11:44 AM, Gwe

Re: read N items from topic

2014-10-17 Thread Gwen Shapira
btw. I got a blog post where I show how I work around the blocking hasNext() thing. May be helpful: http://ingest.tips/2014/10/12/kafka-high-level-consumer-frequently-missing-pieces/ On Thu, Oct 16, 2014 at 12:52 PM, Neha Narkhede wrote: > Josh, > > The consumer's API doesn't allow you to specify

Re: Topic doesn't exist exception

2014-10-17 Thread Gwen Shapira
gt; > request.required.acks=0, >> > I thought this sets the producer to be async? >> > On Fri, Oct 17, 2014 at 11:59 AM, Gwen Shapira >> > wrote: >> >> 0.8.1.1 producer is Sync by default, and you can set producer.type to >> >> async if need

Re: Topic doesn't exist exception

2014-10-17 Thread Gwen Shapira
ives the message. And async means it just dispatches the message > without any gurantees that message is delivered. Did I get that part right? > On Fri, Oct 17, 2014 at 1:28 PM, Gwen Shapira wrote: > >> Sorry if I'm confusing you :) >> >> Kafka 0.8.1.1 has two

Re: Topic doesn't exist exception

2014-10-17 Thread Gwen Shapira
s but is there a > place that lists some important performance specific parameters? > On Fri, Oct 17, 2014 at 2:43 PM, Gwen Shapira wrote: > >> If I understand correctly (and I'll be happy if someone who knows more >> will jump in and correct me): >> >> The Sync/

Re: Achieving Consistency and Durability

2014-10-20 Thread Gwen Shapira
3-replica topic in a > 12-node Kafka cluster, there's a relatively high probability that losing 2 > nodes from this cluster will result in an inability to write to the cluster. > > On Tue, Oct 14, 2014 at 4:50 PM, Gwen Shapira wrote: > >> ack = 2 *will* throw an excepti

Re: frequent periods of ~1500 replicas not in sync

2014-10-21 Thread Gwen Shapira
Consumers always read from the leader replica, which is always in sync by definition. So you are good there. The concern would be if the leader crashes during this period. On Tue, Oct 21, 2014 at 2:56 PM, Neil Harkins wrote: > Hi. I've got a 5 node cluster running Kafka 0.8.1, > with 4697 parti

Re: Partition and Replica assignment for a Topic

2014-10-21 Thread Gwen Shapira
Anything missing in the output of: kafka-topics.sh --describe --zookeeper localhost:2181 ? On Tue, Oct 21, 2014 at 4:29 PM, Jonathan Creasy wrote: > I¹d like to be able to see a little more detail for a topic. > > What is the best way to get this information? > > Topic Partition Replica B

Re: How many partition can one single machine handle in Kafka?

2014-10-22 Thread Gwen Shapira
RAID-10? Interesting choice for a system where the data is already replicated between nodes. Is it to avoid the cost of large replication over the network? how large are these disks? On Wed, Oct 22, 2014 at 10:00 AM, Todd Palino wrote: > In fact there are many more than 4000 open files. Many of o

Re: How many partition can one single machine handle in Kafka?

2014-10-22 Thread Gwen Shapira
n > > > On Oct 22, 2014, at 11:01 AM, Gwen Shapira wrote: > >> RAID-10? >> Interesting choice for a system where the data is already replicated >> between nodes. Is it to avoid the cost of large replication over the >> network? how large are these disks? >>

Re: Using Kafka for ETL from DW to Hadoop

2014-10-23 Thread Gwen Shapira
While I agree with Mark that testing the end-to-end pipeline is critical, note that in terms of performance - whatever you write to hook-up Teradata to Kafka is unlikely to be as fast as Teradata connector for Sqoop (especially the newer one). Quite a lot of optimization by Teradata engineers went

Re: How many partition can one single machine handle in Kafka?

2014-10-24 Thread Gwen Shapira
Todd, Did you load-test using SSDs? Got numbers to share? On Fri, Oct 24, 2014 at 10:40 AM, Todd Palino wrote: > Hmm, I haven't read the design doc lately, but I'm surprised that there's > even a discussion of sequential disk access. I suppose for small subsets of > the writes you can write larg

Re: broker down,the cluster can't work normal

2014-10-28 Thread Gwen Shapira
note that --zookeeper is the location of the zookeeper server, not Kafka broker. Are you running zookeeper on both 192.168.100.91 and 192.168.100.92? Zookeeper is based on simple majority, therefore you can't run it with 2 nodes (well you can, but it will freeze if you lose one node), you need ei

Re: High Level Consumer and Close with Auto Commit On

2014-10-28 Thread Gwen Shapira
High level consumer commits before shutting down. If you'll look at ZookeeperConsumerConnector.scala (currently the only implementation of ConsumerConnector) you'll see shutdown() includes the following: if (config.autoCommitEnable) commitOffsets() Gwen On Tue, Oct 28, 201

Re: Error using migrationtool for upgrading 0.7 to 0.8

2014-10-31 Thread Gwen Shapira
The producer configuration should list the kafka brokers, not the zookeeper quorum. See here: http://kafka.apache.org/documentation.html#producerconfigs (and send my regards to Alex Gorbachev ;) Gwen On Fri, Oct 31, 2014 at 8:05 PM, Tomas Nunez wrote: > Hi > > I'm trying to upgrade a 0.7 kaf

Re: Error using migrationtool for upgrading 0.7 to 0.8

2014-10-31 Thread Gwen Shapira
This is part of Scala, so it should be in the scala-library-...jar On Fri, Oct 31, 2014 at 8:26 PM, Tomas Nunez wrote: > Well... I used strace and I found it was looking for some classes in a > wrong path. I fixed most of them, but there's one that isn't anywhere, > neither the new nor the old

Re: Error using migrationtool for upgrading 0.7 to 0.8

2014-10-31 Thread Gwen Shapira
7;m following > https://cwiki.apache.org/confluence/display/KAFKA/Migrating+from+0.7+to+0.8 > and I can't see there anything about downloading classes, and I don't find > much people with the same problem, which leads me to think that I'm doing > something wrong... >

Re: Spark Kafka Performance

2014-11-03 Thread Gwen Shapira
Not sure about the throughput, but: "I mean that the words counted in spark should grow up" - The spark word-count example doesn't accumulate. It gets an RDD every n seconds and counts the words in that RDD. So we don't expect the count to go up. On Mon, Nov 3, 2014 at 6:57 AM, Eduardo Costa Al

Re: Dynamically adding Kafka brokers

2014-11-03 Thread Gwen Shapira
+1 Thats what we use to generate broker id in automatic deployments. This method makes troubleshooting easier (you know where each broker is running), and doesn't require keeping extra files around. On Mon, Nov 3, 2014 at 2:17 PM, Joe Stein wrote: > Most folks strip the IP and use that as the br

Re: Error using migrationtool for upgrading 0.7 to 0.8

2014-11-05 Thread Gwen Shapira
lientCnxn$AuthData.class > org/apache/zookeeper/ClientCnxn$EndOfStreamException.class > org/apache/zookeeper/ClientCnxn$EventThread.class > org/apache/zookeeper/ClientCnxn$Packet.class > org/apache/zookeeper/ClientCnxn$SendThread.class > org/apache/zookeeper/ClientCnxn$SessionExpiredEx

Re: Error using migrationtool for upgrading 0.7 to 0.8

2014-11-05 Thread Gwen Shapira
Regarding more information: Maybe ltrace? If I were you, I'd go to MigrationTool code and start adding LOG lines. because there aren't enough of those to troubleshoot. On Wed, Nov 5, 2014 at 6:13 PM, Gwen Shapira wrote: > org.apache.zookeeper.ClientCnxn is throwing the exceptio

Re: Error using migrationtool for upgrading 0.7 to 0.8

2014-11-05 Thread Gwen Shapira
Also, can you post your configs? Especially the "zookeeper.connect" one? On Wed, Nov 5, 2014 at 6:15 PM, Gwen Shapira wrote: > Regarding more information: > Maybe ltrace? > > If I were you, I'd go to MigrationTool code and start adding LOG lines. > because t

Re: Spark and Kafka

2014-11-06 Thread Gwen Shapira
What's the window size? If the window is around 10 seconds and you are sending data at very stable rate, this is expected. On Thu, Nov 6, 2014 at 9:32 AM, Eduardo Costa Alfaia wrote: > Hi Guys, > > I am doing some tests with Spark Streaming and Kafka, but I have seen > something strange, I hav

Re: No longer supporting Java 6, if? when?

2014-11-06 Thread Gwen Shapira
+1 for dropping Java 6 On Thu, Nov 6, 2014 at 9:31 AM, Steven Schlansker wrote: > Java 6 has been End of Life since Feb 2013. > Java 7 (and 8, but unfortunately that's too new still) has very compelling > features which can make development a lot easier. > > The sooner more projects drop Java 6

Re: No longer supporting Java 6, if? when?

2014-11-06 Thread Gwen Shapira
Java6 is supported on CDH4 but not CDH5. On Thu, Nov 6, 2014 at 9:54 AM, Koert Kuipers wrote: > when is java 6 dropped by the hadoop distros? > > i am still aware of many clusters that are java 6 only at the moment. > > > > On Thu, Nov 6, 2014 at 12:44 PM, Gwen Shapira

Re: powered by kafka

2014-11-08 Thread Gwen Shapira
Done! Thank you for using Kafka and letting us know :) On Sat, Nov 8, 2014 at 2:15 AM, vipul jhawar wrote: > Exponential @exponentialinc is using kafka in production to power the > events ingestion pipeline for real time analytics and log feed consumption. > > Please post on powered by kafka wi

Re: powered by kafka

2014-11-09 Thread Gwen Shapira
Updated. Thanks! On Sat, Nov 8, 2014 at 12:16 PM, Jimmy John wrote: > Livefyre (http://web.livefyre.com/) uses kafka for the real time > notifications, analytics pipeline and as the primary mechanism for general > pub/sub. > > thx... > jim > > On Sat, Nov 8, 2014

Re: Issues Running Kafka Producer Java example

2014-11-09 Thread Gwen Shapira
The producer code here looks fine. It may be an issue with the consumer, or how the consumer is used. If you are running the producer before starting a consumer, make sure you get all messages by setting auto.offset.reset=smallest (in the console consumer you can use --from-beginning) Also, you c

Re: powered by kafka

2014-11-09 Thread Gwen Shapira
I'm not Jay, but fixed it anyways ;) Gwen On Sun, Nov 9, 2014 at 10:34 AM, vipul jhawar wrote: > Hi Jay > > Thanks for posting the update. > > However, i checked the page history and the hyperlink is pointing to the > wrong domain. > Exponential refers to www.exponential.com. I sent the twitter

Re: Programmatic Kafka version detection/extraction?

2014-11-11 Thread Gwen Shapira
In Sqoop we do the following: Maven runs a shell script, passing the version as a parameter. The shell-script generates a small java class, which is then built with a Maven plugin. Our code references this generated class when we expose "getVersion()". Its complex and ugly, so I'm kind of hoping

Re: Programmatic Kafka version detection/extraction?

2014-11-11 Thread Gwen Shapira
T 2011 > version=10.0.1 > groupId=com.google.guava > artifactId=guava > > Thanks, > > Bhavesh > > On Tue, Nov 11, 2014 at 10:34 AM, Gwen Shapira > wrote: > > > In Sqoop we do the following: > > > > Maven runs a shell script, passing the version

Re: No longer supporting Java 6, if? when?

2014-11-11 Thread Gwen Shapira
Perhaps relevant: Hadoop is moving toward dropping Java6 in next release. https://issues.apache.org/jira/browse/HADOOP-10530 On Thu, Nov 6, 2014 at 11:03 AM, Jay Kreps wrote: > Yeah it is a little bit silly that people are still using Java 6. > > I guess this is a tradeoff--being more conserva

Re: Security in 0.8.2 beta

2014-11-11 Thread Gwen Shapira
Nope. Here's the JIRA where we are still actively working on security, targeting 0.9: https://issues.apache.org/jira/browse/KAFKA-1682 Gwen On Tue, Nov 11, 2014 at 7:37 PM, Kashyap Mhaisekar wrote: > Hi, > Is there a way to secure the topics created in Kafka 0.8.2 beta? The need > is to ensure

Re: Programmatic Kafka version detection/extraction?

2014-11-12 Thread Gwen Shapira
, Nov 12, 2014 at 9:09 AM, Mark Roberts wrote: > Just to be clear: this is going to be exposed via some Api the clients can > call at startup? > > > > On Nov 12, 2014, at 08:59, Guozhang Wang wrote: > > > > Sounds great, +1 on this. > > > >> On T

Re: Programmatic Kafka version detection/extraction?

2014-11-12 Thread Gwen Shapira
Actually, Jun suggested exposing this via JMX. On Wed, Nov 12, 2014 at 9:31 AM, Gwen Shapira wrote: > Good question. > > The server will need to expose this in the protocol, so Kafka clients will > know what they are talking to. > > We may also want to expose this in the pro

Re: Programmatic Kafka version detection/extraction?

2014-11-14 Thread Gwen Shapira
every server in the cluster? Is there a reason >> not to include this in the API itself? >> >> -Mark >> >> On Wed, Nov 12, 2014 at 9:50 AM, Joel Koshy wrote: >> >> > +1 on the JMX + gradle properties. Is there any (seamless) way of >> > includ

Re: Create topic creates extra partitions

2014-11-21 Thread Gwen Shapira
I think the issue is that you are: " running the above snippet for every broker ... I am assuming that item.partitionsMetadata() only returns PartitionMetadata for the partitions this broker is responsible for " This is inaccurate. Each broker will check ZooKeeper for PartitionMetadata and return

Re: Two Kafka Question

2014-11-24 Thread Gwen Shapira
Hi Casey, 1. There's some limit based on size of zookeeper nodes, not sure exactly where it is though. We've seen 30 node clusters running in production. 2. For your scenario to work, the new broker will need to have the same broker id as the old one - or you'll need to manually re-assign partiti

Re: rule to set number of brokers for each server

2014-11-28 Thread Gwen Shapira
I don't see any advantage to more than one broker per server. In my experience a single broker is capable of saturating the network link and therefore I can't see how a second or third brokers will give any benefits. Gwen On Fri, Nov 28, 2014 at 9:24 AM, Sa Li wrote: > Dear all > > I am provisio

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-04 Thread Gwen Shapira
Can you elaborate a bit on what an object API wrapper will look like? Since the serialization API already exists today, its very easy to know how I'll use the new producer with serialization - exactly the same way I use the existing one. If we are proposing a change that will require significant c

Re: Producer can writes to a follower during preferred lead election?

2014-12-07 Thread Gwen Shapira
If you write to a non-leader partition, I'd expect you'd get NotLeaderForPartitionException (thrown by Partition.appendMessagesToLeader). This will get sent to the producer as error code 6. I don't see anything special in the producer side to handle this specific (although I'd expect a forced meta

Re: Producer can writes to a follower during preferred lead election?

2014-12-08 Thread Gwen Shapira
I think that A will not be able to become a follower until B becomes a leader. On Sun, Dec 7, 2014 at 11:07 AM, Xiaoyu Wang wrote: > On preferred replica election, controller sends LeaderAndIsr requests to > brokers. Broker will handle the LeaderAndIsr request by either become a > leader or becom

Re: leaderless topicparts after single node failure: how to repair?

2014-12-10 Thread Gwen Shapira
It looks like none of your replicas are in-sync. Did you enable unclean leader election? This will allow one of the un-synced replicas to become leader, leading to data loss but maintaining availability of the topic. Gwen On Tue, Dec 9, 2014 at 8:43 AM, Neil Harkins wrote: > Hi. We've suffered

Re: OutOfMemoryException when starting replacement node.

2014-12-10 Thread Gwen Shapira
There is a parameter called replica.fetch.max.bytes that controls the size of the messages buffer a broker will attempt to consume at once. It defaults to 1MB, and has to be at least message.max.bytes (so at least one message can be sent). If you try to support really large messages and increase t

Re: OutOfMemoryException when starting replacement node.

2014-12-10 Thread Gwen Shapira
to 10MB to > allow larger messages, so perhaps that's related. But should that really be > big enough to cause OOMs on an 8GB heap? Are there other broker settings we > can tune to avoid this issue? > > On Wed, Dec 10, 2014 at 11:05 AM, Gwen Shapira > wrote: > &

Re: OutOfMemoryException when starting replacement node.

2014-12-10 Thread Gwen Shapira
Ah, found where we actually size the request as partitions * fetch size. Thanks for the correction, Jay and sorry for the mix-up, Solon. On Wed, Dec 10, 2014 at 10:41 AM, Jay Kreps wrote: > Hey Solon, > > The 10MB size is per-partition. The rationale for this is that the fetch > size per-partiti

Re: OutOfMemoryException when starting replacement node.

2014-12-11 Thread Gwen Shapira
didn't realize we needed to > take the (partitions * fetch size) calculation into account when choosing > partition counts for our topics, so this is a bit of a rude surprise. > > On Wed, Dec 10, 2014 at 3:50 PM, Gwen Shapira wrote: > >> Ah, found where we actually size the re

Re: Kafka design pattern question - multiple user ids

2014-12-15 Thread Gwen Shapira
When you send messages to Kafka you send a pair. The key can include the user id. Here's how: KeyedMessage data = new KeyedMessage (user_id, user_id, event); producer.send(data); Hope this helps, Gwen On Mon, Dec 15, 2014 at 10:29 AM, Harold Nguyen wrote: > Hello Kafka Experts! > >

Re: Kafka design pattern question - multiple user ids

2014-12-15 Thread Gwen Shapira
many different keys can Kafka > support ? > > Harold > > On Mon, Dec 15, 2014 at 10:46 AM, Gwen Shapira > wrote: >> >> When you send messages to Kafka you send a pair. The key >> can include the user id. >> >> Here's how: >

Re: Number of Consumers Connected

2014-12-15 Thread Gwen Shapira
Currently you can find the number of consumer groups through ZooKeeper: connect to ZK and run ls /consumers and count the number of results On Mon, Dec 15, 2014 at 11:34 AM, nitin sharma wrote: > Hi Team, > > Is it possible to know how many Consumer Group connected to kafka broker Ids > and as

Re: Number of Consumers Connected

2014-12-15 Thread Gwen Shapira
connect to a zookeeper..? > > Regards, > Nitin Kumar Sharma. > > > On Mon, Dec 15, 2014 at 6:36 PM, Neha Narkhede wrote: >> >> In addition to Gwen's suggestion, we actually don't have jmx metrics that >> give you a list of actively consuming processes. >

Re: consumer groups

2014-12-16 Thread Gwen Shapira
" If all the consumers stop listening how long will Kafka continue to store messages for that group?" Kafka retains data for set amount of time, regardless of whether anyone is listening or not. This amount of time is configurable. Because Kafka performance is generally constant with the amount of

Re: consumer groups

2014-12-17 Thread Gwen Shapira
2:33 PM, Greg Lloyd wrote: > Thanks for the reply, > > So if I wanted to add a new group of consumers 6 months into the lifespan > of my implementation and I didn't want that new group to process all the > last six months is there a method to manage this? > > > >

Re: can't produce message in kafka production

2014-12-18 Thread Gwen Shapira
Looks like you can't connect to: 10.100.98.100:9092 I'd validate that this is the issue using telnet and then check the firewall / ipfilters settings. On Thu, Dec 18, 2014 at 2:21 PM, Sa Li wrote: > Dear all > > We just build a kafka production cluster, I can create topics in kafka > production

Re: can't produce message in kafka production

2014-12-18 Thread Gwen Shapira
8:9092. > > Just in case, is it possibly caused by other types of issues? > > thanks > > Alec > > On Thu, Dec 18, 2014 at 2:33 PM, Gwen Shapira wrote: >> >> Looks like you can't connect to: 10.100.98.100:9092 >> >> I'd validate that this i

Re: Uneven disk usage in Kafka 0.8.1.1

2014-12-25 Thread Gwen Shapira
Hi, LogManager.nextLogDir() has the logic for choosing which directory to use. The documentation of the method says: /** * Choose the next directory in which to create a log. Currently this is done * by calculating the number of partitions in each directory and then choosing the * data

Re: Kafka 0.8.2 release - before Santa Claus?

2014-12-25 Thread Gwen Shapira
IMO: KAFKA-1790 - can be pushed out (or even marked as "won't fix") KAFKA-1782 - can be pushed out (not really a blocker) The rest look like actual blockers to me. Gwen On Tue, Dec 23, 2014 at 1:32 PM, Otis Gospodnetic < otis.gospodne...@gmail.com> wrote: > Hi, > > I see 16 open issues for 0.8.

Re: Kafka 0.8.2 release - before Santa Claus?

2014-12-26 Thread Gwen Shapira
Actually, KAFKA-1785 <https://issues.apache.org/jira/browse/KAFKA-1785> can also wait - since it is likely to be part of a larger patch. On Thu, Dec 25, 2014 at 10:39 AM, Gwen Shapira wrote: > IMO: > KAFKA-1790 - can be pushed out (or even marked as "won't fix") >

Re: Consumer and offset management support in 0.8.2 and 0.9

2015-01-05 Thread Gwen Shapira
OffsetCommitRequest has two constructors now: For version 0: OffsetCommitRequest(String groupId, Map offsetData) And version 1: OffsetCommitRequest(String groupId, int generationId, String consumerId, Map offsetData) None of them seem to require timestamps... so I'm not sure where you see that

Re: Consumer and offset management support in 0.8.2 and 0.9

2015-01-05 Thread Gwen Shapira
Ah, I see :) The readFrom function basically tries to read two extra fields if you are on version 1: if (versionId == 1) { groupGenerationId = buffer.getInt consumerId = readShortString(buffer) } The rest looks identical in version 0 and 1, and still no timestamp in sight... Gwe

Re: Consumer and offset management support in 0.8.2 and 0.9

2015-01-05 Thread Gwen Shapira
t; > > Dana Powers > Rdio, Inc. > dana.pow...@rd.io > rdio.com/people/dpkp/ > > On Mon, Jan 5, 2015 at 9:49 AM, Gwen Shapira wrote: > >> Ah, I see :) >> >> The readFrom function basically tries to read two extra fields if you >> are on version 1: >

Re: Current vote - 0.8.2.0-RC1 or 0.8.2.0?

2015-01-14 Thread Gwen Shapira
The Apache process is that you vote for an RC, and if the vote passes (i.e. three +1 from PMC and no -1) the same artifacts will be released (without RC). If issues are discovered, there may be another RC. Note that the RC is published on Jun's directory, not an official Kafka repository. You can

Re: Delete topic

2015-01-14 Thread Gwen Shapira
At the moment, the best way would be: * Wait about two weeks * Upgrade to 0.8.2 * Use kafka-topic.sh --delete :) 2015-01-14 9:26 GMT-08:00 Armando Martinez Briones : > Hi. > > What is the best way to delete a topic into production environment? > > -- > [image: Tralix][image: 1]José Armando Martí

Re: Delete topic

2015-01-14 Thread Gwen Shapira
From: Armando Martinez Briones > To: users@kafka.apache.org > Sent: Wednesday, January 14, 2015 11:33 AM > Subject: Re: Delete topic > > thanks Gwen Shapira ;) > > El 14 de enero de 2015, 11:31, Gwen Shapira > escribió: > >> At the moment, the best way would be

Re: "java.io.IOException: Too many open files" error

2015-01-15 Thread Gwen Shapira
You may find this article useful for troubleshooting and modifying TIME_WAIT: http://www.linuxbrigade.com/reduce-time_wait-socket-connections/ The line you have for increasing file limit is fine, but you may also need to increase the limit system wide: insert "fs.file-max = 10" in /etc/sysctl.

Re: [VOTE] 0.8.2.0 Candidate 1

2015-01-15 Thread Gwen Shapira
Would make sense to enable it after we have authorization feature and admins can control who can delete what. On Thu, Jan 15, 2015 at 6:32 PM, Jun Rao wrote: > Yes, I agree it's probably better not to enable "delete.topic.enable" by > default. > > Thanks, > > Jun > > On Thu, Jan 15, 2015 at 6:29

Re: kafka brokers going down within 24 hrs

2015-01-16 Thread Gwen Shapira
Those errors are expected - if broker 10.0.0.11 went down, it will reset the connection and the other broker will close the socket. However, it looks like 10.0.0.11 crashes every two minutes? Do you have the logs from 10.0.0.11? On Thu, Jan 15, 2015 at 9:51 PM, Tousif wrote: > i'm using kafka 2.

Re: Kafka Out of Memory error

2015-01-19 Thread Gwen Shapira
Two things: 1. The OOM happened on the consumer, right? So the memory that matters is the RAM on the consumer machine, not on the Kafka cluster nodes. 2. If the consumers belong to the same consumer group, each will consume a subset of the partitions and will only need to allocate memory for those

Re: Backups

2015-01-19 Thread Gwen Shapira
Hi, As a former DBA, I hear you on backups :) Technically, you could copy all log.dir files somewhere safe occasionally. I'm pretty sure we don't guarantee the consistency or safety of this copy. You could find yourself with a corrupt "backup" by copying files that are either in the middle of get

Re: Backups

2015-01-20 Thread Gwen Shapira
ld one use ZFS or BTRFS snapshot functionality for this? > > Otis > -- > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > Solr & Elasticsearch Support * http://sematext.com/ > > > On Tue, Jan 20, 2015 at 1:39 AM, Gwen Shapira wrote: &g

Re: Backups

2015-01-20 Thread Gwen Shapira
the console-consumer every once in a while. Although note that the "full" > is constrained by the retention period of the data (controlled at the > queue/cluster level). > From: Gwen Shapira > To: "users@kafka.apache.org" > Sent: Tuesday, January 20, 20

Re: Help: Kafka LeaderNotAvailableException

2015-01-22 Thread Gwen Shapira
It sounds like you have two zookeepers, one for HDP and one for Kafka. Did you move Kafka from one zookeeper to another? Perhaps Kafka finds the topics (logs) on disk, but they do not exist in ZK because you are using a different zookeeper now. Gwen On Thu, Jan 22, 2015 at 6:38 PM, Jun Rao wrot

Re: Can't create a topic; can't delete it either

2015-01-27 Thread Gwen Shapira
Also, do you have delete.topic.enable=true on all brokers? The automatic topic creation can fail if the default number of replicas is greater than number of available brokers. Check the default.replication.factor parameter. Gwen On Tue, Jan 27, 2015 at 12:29 AM, Joel Koshy wrote: > Which versio

Re: Resilient Producer

2015-01-28 Thread Gwen Shapira
It sounds like you are describing Flume, with SpoolingDirectory source (or exec source running tail) and Kafka channel. On Wed, Jan 28, 2015 at 10:39 AM, Fernando O. wrote: > Hi all, > I'm evaluating using Kafka. > > I liked this thing of Facebook scribe that you log to your own machine and >

Re: create topic does not really executed successfully

2015-02-02 Thread Gwen Shapira
IIRC, the directory is only created after you send data to the topic. Do you get errors when your producer sends data? Another common issue is that you specify replication-factor 3 when you have fewer than 3 brokers. Gwen On Mon, Feb 2, 2015 at 2:34 AM, Xinyi Su wrote: > Hi, > > I am using Kaf

Re: New Producer - ONLY sync mode?

2015-02-02 Thread Gwen Shapira
If you want to emulate the old sync producer behavior, you need to set the batch size to 1 (in producer config) and wait on the future you get from Send (i.e. future.get) I can't think of good reasons to do so, though. Gwen On Mon, Feb 2, 2015 at 11:08 AM, Otis Gospodnetic wrote: > Hi, > > Is

Re: New Producer - ONLY sync mode?

2015-02-02 Thread Gwen Shapira
gress into a single >> request--giving a kind of "group commit" effect. >> >> The hope is that this will be both simpler to understand (a single api that >> always works the same) and more powerful (you always get a response with >> error and offset informatio

Re: New Producer - ONLY sync mode?

2015-02-02 Thread Gwen Shapira
he Producer was > using SYNC mode?" is YES, in which case the connection from X to Y would be > open for just as long as with a SYNC producer running in Y? > > Thanks, > Otis > -- > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > Solr & Elast

Re: New Producer - ONLY sync mode?

2015-02-02 Thread Gwen Shapira
whole batch. This significantly complicates recovery logic > where we need to commit a batch as opposed 1 record at a time. > > Do you guys have any plans to add better semantics around batches? > > On Mon, Feb 2, 2015 at 1:34 PM, Gwen Shapira wrote: > >> If I understood the

Re: [kafka-clients] Re: [VOTE] 0.8.2.0 Candidate 3

2015-02-03 Thread Gwen Shapira
When's the party? :) On Mon, Feb 2, 2015 at 8:13 PM, Jay Kreps wrote: > Yay! > > -Jay > > On Mon, Feb 2, 2015 at 2:23 PM, Neha Narkhede wrote: >> >> Great! Thanks Jun for helping with the release and everyone involved for >> your contributions. >> >> On Mon, Feb 2, 2015 at 1:32 PM, Joe Stein wr

Re: New Producer - ONLY sync mode?

2015-02-04 Thread Gwen Shapira
ription you are saying you actually > > > care > > > > how many physical requests are issued. I think it is more like it is > > just > > > > syntactically annoying to send a batch of data now because it needs a > > for > > > > loop. > >

Re: question about new consumer offset management in 0.8.2

2015-02-05 Thread Gwen Shapira
Thanks Jon. I updated the FAQ with your procedure: https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowdowemigratetocommittingoffsetstoKafka(ratherthanZookeeper)in0.8.2 ? On Thu, Feb 5, 2015 at 9:16 AM, Jon Bringhurst < jbringhu...@linkedin.com.invalid> wrote: > There should probably be

Re: Kafka Architecture diagram

2015-02-05 Thread Gwen Shapira
The Kafka documentation has several good diagrams. Did you check it out? http://kafka.apache.org/documentation.html On Thu, Feb 5, 2015 at 6:31 AM, Ankur Jain wrote: > Hi Team, > > I am looking out high and low level architecture diagram of Kafka with > Zookeeper, but haven't got any good one ,

  1   2   3   4   5   6   >