Re: Get Latest Offset for Specific Topic for All Partition

2015-02-05 Thread Gwen Shapira
You can use the metrics Kafka publishes. I think the relevant metrics are: Log.LogEndOffset Log.LogStartOffset Log.size Gwen On Thu, Feb 5, 2015 at 11:54 AM, Bhavesh Mistry wrote: > HI All, > > I just need to get the latest offset # for topic (not for consumer group). > Which API to get this i

Re: Kafka broker core dump

2015-02-06 Thread Gwen Shapira
Hi Xinyi, The core dump never made it to the list. Perhaps its too large for an attachment. Can you upload it somewhere and send a link? Also, the broker logs will be helpful, and a description of what the test did. Gwen On Thu, Feb 5, 2015 at 11:27 PM, Xinyi Su wrote: > Hi, > > I encounter a

Re: Doubts Kafka

2015-02-08 Thread Gwen Shapira
Hi Eduardo, 1. "Why sometimes the applications prefer to connect to zookeeper instead brokers?" I assume you are talking about the clients and some of our tools? These are parts of an older design and we are actively working on fixing this. The consumer used Zookeeper to store offsets, in 0.8.2 t

Re: High Latency in Kafka

2015-02-08 Thread Gwen Shapira
I'm wondering how much of the time is spent by Logstash reading and processing the log vs. time spent sending data to Kafka. Also, I'm not familiar with log.stash internals, perhaps it can be tuned to send the data to Kafka in larger batches? At the moment its difficult to tell where is the slowdo

Re: Doubts Kafka

2015-02-08 Thread Gwen Shapira
storage. > On Sun, Feb 8, 2015 at 9:25 AM, Gwen Shapira > wrote: > > > Hi Eduardo, > > > > 1. "Why sometimes the applications prefer to connect to zookeeper instead > > brokers?" > > > > I assume you are talking about the clients and some o

Re: Apache Kafka 0.8.2 Consumer Example

2015-02-08 Thread Gwen Shapira
I have a 0.8.1 high level consumer working fine with 0.8.2 server. Few of them actually :) AFAIK the API did not change. Do you see any error messages? Do you have timeout configured on the consumer? What does the offset checker tool say? On Fri, Feb 6, 2015 at 4:49 PM, Ricardo Ferreira wrote:

Re: Apache Kafka 0.8.2 Consumer Example

2015-02-08 Thread Gwen Shapira
g.apache.zookeeper.KeeperException$NoNodeException: > KeeperErrorCode = NoNode for /consumers/test-consumer-group/offsets/test/0. > > Perhaps the solution is downgrade the consumer libs to 0.8.1? > > Thanks, > > Ricardo > > On Sun, Feb 8, 2015 at 11:27 AM, Gwen Shapira

Re: [DISCUSS] KIP-8 Add a flush method to the new Java producer

2015-02-08 Thread Gwen Shapira
Looks good to me. I like the idea of not blocking additional sends but not guaranteeing that flush() will deliver them. I assume that with linger.ms = 0, flush will just be a noop (since the queue will be empty). Is that correct? Gwen On Sun, Feb 8, 2015 at 10:25 AM, Jay Kreps wrote: > Follow

Re: Doubts Kafka

2015-02-08 Thread Gwen Shapira
e marker. > > I would almost like the client to have a callback mechanism for processing, > and the marker only gets moved if the high level consumer successfully > implements my callback/processor (with no exceptions, at least). > > > > On Sun, Feb 8, 2015 at 9:49 AM, Gw

Re: Apache Kafka 0.8.2 Consumer Example

2015-02-09 Thread Gwen Shapira
I know from the consumer, and the offset checker gave me nothing > about that group. > > Thanks, > > Ricardo > > On Sun, Feb 8, 2015 at 1:19 PM, Gwen Shapira > wrote: > >> Sorry, both the consumer and the broker are 0.8.2? >> >> So what's on 0.8.1?

Re: Is auto.commit.enable still applicable when setting offsets.storage to kafka

2015-02-09 Thread Gwen Shapira
Yep, still applicable. They will do the same thing (commit offset on regular intervals) only with Kafka instead of Zookeeper. On Mon, Feb 9, 2015 at 2:57 AM, tao xiao wrote: > Hi team, > > If I set offsets.storage=kafka can I still use auto.commit.enable to turn > off auto commit and auto.commi

Re: Handling multi-line messages?

2015-02-09 Thread Gwen Shapira
Since the console-consumer seems to display strings correctly, it sounds like an issue with LogStash parser. Perhaps you'll have better luck asking on LogStash mailing list? Kafka just stores the bytes you put in and gives the same bytes out when you read messages. There's no parsing or encoding d

Re: ping kafka server

2015-02-09 Thread Gwen Shapira
It's safe. Just note that if you send Kafka anything it does not like, it will close the connection on you. This is intentional and doesn't signal an issue with Kafka. Not sure if Nagios does this, but I like "canary" tests - produce a message with timestamp every X seconds and have a monitor tha

Re: Get Latest Offset for Specific Topic for All Partition

2015-02-10 Thread Gwen Shapira
l I can tell . > > 1189855393 LogEndOffset > 1165330350 Size > 1176813232 LogStartOffset > > Thanks for your help !! > > Thanks, > > Bhaevsh > > Thanks, > Bhavesh > > On Thu, Feb 5, 2015 at 12:55 PM, Gwen Shapira > wrote: > > > You can use the m

Re: Lack of JMX LogCleaner and LogCleanerManager metrics

2015-02-10 Thread Gwen Shapira
btw. the name LogCleaner is seriously misleading. Its more of a log compacter. Deleting old logs happens elsewhere from what I've seen. Gwen On Tue, Feb 10, 2015 at 8:07 AM, Jay Kreps wrote: > Probably you need to enable the log cleaner for those to show up? We > disable it by default and so I

Re: Ship Kafka in on prem product

2015-02-10 Thread Gwen Shapira
Thanks Joe :) We definitely found Kafka to be solid enough for enterprise. It does require taking care in shipping "safe" configurations. Whether its secure enough depends on requirements - with no authentication, authorization or encryptions, its definitely not secure enough for some use cases (

Re: Lack of JMX LogCleaner and LogCleanerManager metrics

2015-02-10 Thread Gwen Shapira
shy wrote: > > > +1 > > > > On Tue, Feb 10, 2015 at 01:32:13PM -0800, Jay Kreps wrote: > > > I agree that would be a better name. We could rename it if everyone > likes > > > Compactor better. > > > > > > -Jay > > > > > >

Re: [DISCUSS] KIP-12 - Kafka Sasl/Kerberos implementation

2015-02-11 Thread Gwen Shapira
Looks good. Thanks for working on this. One note, the Channel implementation from SSL only works on Java7 and up. Since we are still supporting Java 6, I'm working on a lighter wrapper that will be a composite on SocketChannel but will not extend it. Perhaps you'll want to use that. Looking forwa

Re: Producing message set

2015-02-11 Thread Gwen Shapira
Looking at the code, it does look like appending a message set to a partition will succeed or fail as a whole. You can look at Log.append() for the exact logic if you are interested. Gwen On Wed, Feb 11, 2015 at 6:13 AM, Piotr Husiatyński wrote: > Hi, > > I'm writing new client library for kaf

Re: [DISCUSS] KIP-12 - Kafka Sasl/Kerberos implementation

2015-02-12 Thread Gwen Shapira
. > > ~ Joe Stein > - - - - - - - - - - - - - - - - - > > http://www.stealth.ly > - - - - - - - - - - - - - - - - - > > On Thu, Feb 12, 2015 at 11:37 AM, Harsha wrote: > > > > > Thanks for the review Gwen. I'll keep in mind about java 6 support. > > -Harsha > > On Wed, Feb 11, 2015,

Re: Does RequestTimedOut Error Commit Locally?

2015-02-12 Thread Gwen Shapira
It would. The way we see things is that if retrying has a chance of success, we will retry. Duplicates are basically unavoidable, because the producer can't always know if the leader received the message or not. So we expect the consumers to de-duplicate messages. Gwen On Thu, Feb 12, 2015 at 7:

Re: Having trouble with the simplest remote kafka config

2015-02-17 Thread Gwen Shapira
Is it possible that you have iptables on the Ubuntu where you run your broker? Try disabling iptables and see if it fixes the issue. On Tue, Feb 17, 2015 at 8:47 PM, Richard Spillane wrote: > So I would like to have two machines: one running zookeeper and a single > kafka node and another machi

Re: Having trouble with the simplest remote kafka config

2015-02-17 Thread Gwen Shapira
ort is explicitly > EXPOSEd and other ports in a similar range (e.g., 8080) can be accessed > just fine. > > > On Feb 17, 2015, at 8:56 PM, Gwen Shapira wrote: > > > > Is it possible that you have iptables on the Ubuntu where you run your > > broker? > > > > T

Re: Having trouble with the simplest remote kafka config

2015-02-17 Thread Gwen Shapira
refused > telnet: Unable to connect to remote host > > > > On Feb 17, 2015, at 10:03 PM, Gwen Shapira > wrote: > > > > What happens when you telnet to port 9092? try it from both your mac and > > the ubuntu vm. > > > > > > On Tue, Feb 17, 2015 at 9

Re: New Producer - Is the configurable partitioner gone?

2015-02-20 Thread Gwen Shapira
Hi Daniel, I think you can still use the same logic you had in the custom partitioner in the old producer. You just move it to the client that creates the records. The reason you don't cache the result of partitionsFor is that the producer should handle the caching for you, so its not necessarily

Re: Offset management implementation

2015-02-20 Thread Gwen Shapira
We store offsets in INT64, so you can go as high as: 9,223,372,036,854,775,807 messages per topic-partition before looping around :) Gwen On Fri, Feb 20, 2015 at 12:21 AM, Clement Dussieux | AT Internet < clement.dussi...@atinternet.com> wrote: > Hi, > > > I am using Kafka_2.9.2-0.8.2 and play a

Re: "at least once" consumer recommendations for a load of 5 K messages/second

2015-02-24 Thread Gwen Shapira
* ZK was not built for 5K/s writes type of load * Kafka 0.8.2.0 allows you to commit messages to Kafka rather than ZK. I believe this is recommended. * You can also commit batches of messages (i.e. commit every 100 messages). This will reduce the writes and give you at least once while controlling

Re: Stream naming conventions?

2015-02-24 Thread Gwen Shapira
Nice :) I like the idea of tying topic name to avro schemas. I have experience with other people's data, and until now I mostly recommended: ... So we end up with things like: etl.onlineshop.searches.validated Or if I have my own test dataset that I don't want to share: users.gshapira.newapp.tes

Re: what groupID does camus use?

2015-02-24 Thread Gwen Shapira
Camus uses the simple consumer, which doesn't have the concept of "consumer group" in the API (i.e. Camus is responsible for allocating threads to partitions on its own). The client-id is hard coded and is "hadoop-etl" in some places (when it initializes the offsets) and "camus" in other places. T

Re: "at least once" consumer recommendations for a load of 5 K messages/second

2015-02-25 Thread Gwen Shapira
e to be able to improve throughput on the consumer, will need to play > with the number of partitions. Is there any recommendation on that ratio > partition/topic or that can be scaled up/out with powerful/more hardware? > > Thanks > Anand > > On Tue, Feb 24, 2015 at 8:11 PM, Gwen Shapira

Re: 0.8 scala API reference

2015-03-01 Thread Gwen Shapira
Actually, thats a good point. I don't think we are publishing our Scala / Java docs anywhere (well, Jun Rao has the RC artifacts in his personal apache account: https://people.apache.org/~junrao/) Any reason we are not posting those to our docs SVN and linking our website? Many Apache projects wit

Re: moving replications

2015-03-02 Thread Gwen Shapira
Take a look at the Reassign Partition Tool. It lets you specify which replica exists on which broker: https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-6.ReassignPartitionsTool Its a bit tricky to use, so feel free to follow up with more questions :) Gwen On Mo

Re: Camus Issue about Output File EOF Issue

2015-03-02 Thread Gwen Shapira
Do you have the command you used to run Camus? and the config files? Also, I noticed your file is on maprfs - you may want to check with your vendor... I doubt Camus was extensively tested on that particular FS. On Mon, Mar 2, 2015 at 3:59 PM, Bhavesh Mistry wrote: > Hi Kakfa User Team, > > I ha

Re: moving replications

2015-03-02 Thread Gwen Shapira
new broker and eventually move leader also to new broker (with out > incrementing replica count for that partition) > Please let me know if more information required. > > t > SunilKalva > > On Mon, Mar 2, 2015 at 10:43 PM, Gwen Shapira > wrote: > >> Take a look at

Re: Camus Issue about Output File EOF Issue

2015-03-02 Thread Gwen Shapira
ot;This file was not closed so try to close during the > JVM finalize.."); > > try{ > > writer.close(); > > }catch(Throwable th){ > > log.error("File Close erorr during finalize()"); > > } > > } > >

Re: Got negative offset lag after restarting brokers

2015-03-02 Thread Gwen Shapira
of course :) unclean.leader.election.enable On Mon, Mar 2, 2015 at 9:10 PM, tao xiao wrote: > How do I achieve point 3? is there a config that I can set? > > On Tue, Mar 3, 2015 at 1:02 PM, Jiangjie Qin > wrote: > >> The scenario you mentioned is equivalent to an unclean leader election. >> The

Re: [kafka-clients] Re: [VOTE] 0.8.2.1 Candidate 2

2015-03-03 Thread Gwen Shapira
Hi, Good catch, Joe. Releasing with a broken test is not a good habit. I provided a small patch that fixes the issue in KAFKA-1999. Gwen On Tue, Mar 3, 2015 at 9:08 AM, Joe Stein wrote: > Jun, I have most everything looks good except I keep getting test failures > from wget > https://people.a

Re: reassign a topic partition which has no ISR and leader set to -1

2015-03-03 Thread Gwen Shapira
I hate bringing bad news, but... You can't really reassign replicas if the leader is not available. Since the leader is gone, the replicas have no where to replicate the data from. Until you bring the leader back (or one of the replicas with unclean leader election), you basically lost this parti

Re: Set up kafka cluster

2015-03-05 Thread Gwen Shapira
Jay Kreps has a gist with step by step instructions for reproducing the benchmarks used by LinkedIn: https://gist.github.com/jkreps/c7ddb4041ef62a900e6c And the blog with the results: https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines Gwe

Re: Set up kafka cluster

2015-03-05 Thread Gwen Shapira
be really helpful. > > Thanks gain. > > best, > Yuheng > > On Thu, Mar 5, 2015 at 3:22 PM, Gwen Shapira wrote: > >> Jay Kreps has a gist with step by step instructions for reproducing >> the benchmarks used by LinkedIn: >> https://gist.github.com/jkreps/c7ddb4041

Re: Possible to count for unclosed resources in process

2015-03-06 Thread Gwen Shapira
It doesn't keep track specifically, but there are open sockets that may take a while to clean themselves up. Note that if you use the async producer and don't close the producer nicely, you may miss messages as the connection will close before all messages are sent. Guess how we found out? :) Sim

Re: blog on choosing # topics/partitions in Kafka

2015-03-12 Thread Gwen Shapira
Nice! Thank you, this is super helpful. I'd add that not just the client can run out of memory with large number of partitions - brokers can run out of memory too. We allocate max.message.size * #partitions on each broker for replication. Gwen On Thu, Mar 12, 2015 at 8:53 AM, Jun Rao wrote: > S

Re: Alternative to camus

2015-03-13 Thread Gwen Shapira
There's a KafkaRDD that can be used in Spark: https://github.com/tresata/spark-kafka. It doesn't exactly replace Camus, but should be useful in building Camus-like system in Spark. On Fri, Mar 13, 2015 at 12:15 PM, Alberto Miorin wrote: > We use spark on mesos. I don't want to partition our clust

Re: Alternative to camus

2015-03-13 Thread Gwen Shapira
ed-driver-fault-tolerance-and-zero-data-loss-in-spark-streaming.html >> >> -Will >> >> On Fri, Mar 13, 2015 at 3:28 PM, Alberto Miorin > > wrote: >> >>> I'll try this too. It looks very promising. >>> >>> Thx >>> >>>

Re: Alternative to camus

2015-03-13 Thread Gwen Shapira
using an HDFS write-ahead log to ensure zero data loss while >>> streaming: >>> https://databricks.com/blog/2015/01/15/improved-driver-fault-tolerance-and-zero-data-loss-in-spark-streaming.html >>> >>> -Will >>> >>> On Fri, Mar 13, 2015 at 3:28

Re: Alternative to camus

2015-03-13 Thread Gwen Shapira
Camus uses MapReduce though. If Alberto uses Spark exclusively, I can see why installing MapReduce cluster (with or without YARN) is not a desirable solution. On Fri, Mar 13, 2015 at 1:06 PM, Thunder Stumpges wrote: > Sorry to go back in time on this thread, but Camus does NOT use YARN. We hav

Re: High level consumer group

2015-03-16 Thread Gwen Shapira
If you want each application to handle half of the partitions for the topic, you need to configure the same group.id for both applications. In this case, Kafka will just see each app as a consumer in the group and it won't care that they are on different boxes. If you want each application to hand

Re: KafkaException: Wrong request type 17260

2015-03-16 Thread Gwen Shapira
Kafka currently has request types 0-12. If the bytes Kafka got were parsed to request type 17260, it looks like someone is sending malformed or even random data. Do you recognize the sending IP? do you know about client errors from the same point in time? Perhaps someone is running a port scanner?

Re: High level consumer group

2015-03-16 Thread Gwen Shapira
; or > If i start one application does it start consuming all 6 partitions, and if > start later the other one, does it start sharing the partitions with the > other application threads ? > > t > SunilKalva > > On Mon, Mar 16, 2015 at 10:48 PM, Gwen Shapira > wrote: >>

Re: Assigning partitions to specific nodes

2015-03-16 Thread Gwen Shapira
Any reason not to use SparkStreaming directly with HDFS files, so you'll get locality guarantees from the Hadoop framework? StreamContext has textFileStream() method you could use for this. On Mon, Mar 16, 2015 at 12:46 PM, Daniel Haviv wrote: > Hi, > Is it possible to assign specific partitions

Re: [ANN] sqlstream: Simple MySQL binlog to Kafka stream

2015-03-16 Thread Gwen Shapira
Really really nice! Thank you. On Mon, Mar 16, 2015 at 7:18 AM, Pierre-Yves Ritschard wrote: > Hi kafka, > > I just wanted to mention I published a very simple project which can > connect as MySQL replication client and stream replication events to > kafka: https://github.com/pyr/sqlstream > >

Re: Assigning partitions to specific nodes

2015-03-16 Thread Gwen Shapira
t; to make several copies of it, which wasteful in terms of space and time. > > Thanks, > Daniel > >> On 16 במרץ 2015, at 22:12, Gwen Shapira wrote: >> >> Any reason not to use SparkStreaming directly with HDFS files, so >> you'll get locality guarantees from the

Re: Assigning partitions to specific nodes

2015-03-16 Thread Gwen Shapira
ill > listen on directoryA, B on directoryB, C on directoryC. > For the file to be processed I will have to copy the file both to > directoryA and to directoyB and only then will the streaming apps will > start processing it. > > Daniel > > > > On Mon, Mar 16, 2015 at 10:25 P

Re: Problem starting Kafka Broker

2015-03-16 Thread Gwen Shapira
Your kafka log directory (in config file, under log.dir) contains directories that are not KafkaTopics. Possibly hidden directory. Check what "ls -la" shows in that directory. Gwen On Mon, Mar 16, 2015 at 1:58 PM, Zakee wrote: > Running into problem starting Kafka Brokes after migrating to a ne

Re: No broker partitions consumed by consumer thread

2015-03-20 Thread Gwen Shapira
I think the issue is with " --from-latest" - this means consumers will consume only data that arrives AFTER they start. If you do that, first start consumers, leave them running, and then start producers. If you want to run producers first and only start consuming when producers are done, remove

Re: Updates To cwiki For Producer

2015-03-20 Thread Gwen Shapira
We have a patch with examples: https://issues.apache.org/jira/browse/KAFKA-1982 Unfortunately, its not committed yet. Gwen On Fri, Mar 20, 2015 at 11:24 AM, Pete Wright wrote: > Thanks that's helpful. I am working on an example producer using the > new API, if I have any helpful notes or exam

Re: Updates To cwiki For Producer

2015-03-23 Thread Gwen Shapira
If you have feedback, don't hesitate to comment on the JIRA. On Mon, Mar 23, 2015 at 4:19 PM, Pete Wright wrote: > Hi Gwen - thanks for sending this along. I'll patch my local checkout > and take a look at this. > > Cheers, > -pete > > On 03/20/15 21:16, Gwen Sha

Re: reusing node ids

2015-04-02 Thread Gwen Shapira
On Thu, Apr 2, 2015 at 12:19 PM, Wes Chow wrote: > > How reusable are node ids? Specifically: > > 1. Node goes down and loses all data in its local Kafka logs. I bring it > back up, same hostname, same IP, same node id, but with empty logs. Does it > properly sync with the replicas and continue o

Re: Problem with node after restart no partitions?

2015-04-02 Thread Gwen Shapira
wow, thats scary for sure. Just to be clear - all you did is restart *one* broker in the cluster? everything else was ok before the restart? and that was controlled shutdown? Gwen On Wed, Apr 1, 2015 at 11:54 AM, Thunder Stumpges wrote: > Well it appears we lost all the data on the one node ag

Re: offset-management-in-kafka

2015-04-08 Thread Gwen Shapira
This is available from 0.8.2.0, and is enabled on server by default. The consumer needs to specify offsets.storage parameter - the default is still zookeeper, so the consumers should set it to 'kafka'. The documentation also explain how to migrate from zookeeper offsets to kafka offsets. Gwen On

Re: offset-management-in-kafka

2015-04-08 Thread Gwen Shapira
a > class org.apache.kafka.clients.consumer.ConsumerConfig and I do not see > there a constant for offsets.storage > > Am I missing something? > > This is my pom dependency definition: > > > org.apache.kafka > kafka_2.10 > 0.8.2.1 > > > On Wed, Apr 8, 2015 at 6:29 PM,

Re: Change default min.insync.replicas for cluster

2015-04-11 Thread Gwen Shapira
this seems like a bug. We expect broker settings to set defaults for topics. Perhaps open a JIRA? On Fri, Apr 10, 2015 at 1:32 PM, Bryan Baugher wrote: > To answer my own question via testing, setting min.insync.replicas on the > broker does not change the default. The only way I can fin

Re: Topic to broker assignment

2015-04-13 Thread Gwen Shapira
Here's the algorithm as described in AdminUtils.scala: /** * There are 2 goals of replica assignment: * 1. Spread the replicas evenly among brokers. * 2. For partitions assigned to a particular broker, their other replicas are spread over the other brokers. * * To achieve this goa

Re: Automatic recreation of the topic after the topic was deleted

2015-04-14 Thread Gwen Shapira
Were there any errors on broker logs when you sent the new batch of messages? On Tue, Apr 14, 2015 at 7:41 AM, Pavel Bžoch wrote: > Hi all, > I have a question about automatic re-/creation of the topic. I started my > broker with property "auto.create.topics.enable " set to true. Then I tried > t

Re: serveral questions about auto.offset.reset

2015-04-15 Thread Gwen Shapira
I think of it as default.offset :) Realistically though, changing the configuration name will probably cause more confusion than leaving a bad name as is. On Tue, Apr 14, 2015 at 11:10 AM, James Cheng wrote: > "What to do when there is no initial offset in ZooKeeper or if an offset is > out of

Re: what response does broker send when there is not message on the partition

2015-04-17 Thread Gwen Shapira
You should receive OffsetOutOfRange code (1) for the partition. You can also send a FetchOffsetRequest and check for the last available offset (log end offset) - this way you won't have to send a fetch request that is likely to fail. Are you implementing your own consumer from scatch? or using on

Re: what response does broker send when there is not message on the partition

2015-04-20 Thread Gwen Shapira
nd offset) - this way you won't have to send a >> fetch request that is likely to fail. > > Does this takes in account specific consumer offsets stored in Zookeeper? > > On Fri, Apr 17, 2015 at 5:57 PM, Gwen Shapira wrote: > >> You should receive OffsetOutOfRange code

Re: SimpleConsumer.getOffsetsBefore

2015-04-20 Thread Gwen Shapira
1. getOffsetsBefore is per partition, not per consumer group. The SimpleConsumer is completely unaware of consumer groups. 2. "offsets closest to specified timestamp per topic-consumerGroup" <- I'm not sure I understand what you mean. Each consumer group persists one offset per partition, thats the

Re: what response does broker send when there is not message on the partition

2015-04-21 Thread Gwen Shapira
nd* procedure. For this we need to be > able to get offset for topic and partition for specified timestamp. > > On Tue, Apr 21, 2015 at 4:36 AM, Gwen Shapira wrote: > >> I believe it doesn't take consumers into account at all. Just the >> offset available on the partiti

Re: what to do if replicas are not in sync

2015-04-21 Thread Gwen Shapira
They should be trying to get back into sync on their own. Do you see any errors in broker logs? Gwen On Tue, Apr 21, 2015 at 10:15 AM, Thomas Kwan wrote: > We have 5 kafka brokers available, and created a topic with replication > factor of 3. After a few broker issues (e.g. went out of file desc

Re: Remote kafka - Connection refused.

2015-04-23 Thread Gwen Shapira
Perhaps this will help: https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whycan'tmyconsumers/producersconnecttothebrokers? On Thu, Apr 23, 2015 at 3:24 PM, madhavan kumar wrote: > Dear all, > I am trying to connect my python consumer to a remote kafka server. But > in kafka/conn.py#r

Re: NoClassDefFoundError at producer end

2015-04-23 Thread Gwen Shapira
Since it is a runtime error, Maven dependency is less relevant than what you have in your class path (unless you built a shaded uber-jar). You'll need Scala runtime and zkclient jar in the classpath, can you check that you have those around? On Thu, Apr 23, 2015 at 6:15 AM, abdul hameed pathan w

Re: Kafka client - 0.9

2015-04-23 Thread Gwen Shapira
We don't normally plan dates for releases, when we are done with features we want in the next release and happy with quality, we release. Many Apache communities are like that. If you need firmer roadmaps and specific release dates, there are few vendors selling Kafka distributions and support. A

Re: New and old producers partition messages differently

2015-04-25 Thread Gwen Shapira
Ouch. That can be a painful discovery after a client upgrade. It can break a lot of app code. I can see the reason for custom hash algorithm (lots of db products do this, usually for stability, but sometimes for other hash properties (Oracle has some cool guarantees around modifying number of part

Re: New and old producers partition messages differently

2015-04-26 Thread Gwen Shapira
We are doing work for supporting custom partitioner, so everything is customizable :) On Sun, Apr 26, 2015 at 8:52 PM, Wes Chow wrote: > > Along these lines too, is the function customizable? I could see how mmh3 > (or 2) would be generally sufficient, however in some cases you may want > someth

Re: New and old producers partition messages differently

2015-04-26 Thread Gwen Shapira
Definitely +1 for advertising this in the docs. What I can't figure out is the upgrade path... if my application assumes that all data for a single user is in one partition (so it subscribes to a single partition and expects everything about a specific subset of users to be in that partition), thi

Re: New Producer API - batched sync mode support

2015-04-27 Thread Gwen Shapira
Batch failure is a bit meaningless, since in the same batch, some records can succeed and others may fail. To implement an error handling logic (usually different than retry, since the producer has a configuration controlling retries), we recommend using the callback option of Send(). Gwen P.S Aw

Re: New Producer API - batched sync mode support

2015-04-27 Thread Gwen Shapira
client internal construct? > If the former I was under the impression that a produced MessageSet either > succeeds delivery or errors in its entirety on the broker. > > Thanks, > Magnus > > > 2015-04-27 23:05 GMT+02:00 Gwen Shapira : > > > Batch failure is a bit meani

Re: New Producer API - batched sync mode support

2015-04-27 Thread Gwen Shapira
@Roshan - if the data was already written to Kafka, your approach will generate LOTS of duplicates. I'm not convinced its ideal. What's wrong with callbacks? On Mon, Apr 27, 2015 at 2:53 PM, Roshan Naik wrote: > @Gwen > > - A failure in delivery of one or more events in the batch (typical Flum

Re: hive output to kafka

2015-04-28 Thread Gwen Shapira
Kind of what you need but not quiet: Sqoop2 is capable of getting data from HDFS to Kafka. AFAIK it doesn't support Hive queries, but feel free to open a JIRA for Sqoop :) Gwen On Tue, Apr 28, 2015 at 4:09 AM, Svante Karlsson wrote: > What's the best way of exporting contents (avro encoded) fr

Re: Kafka client - 0.9

2015-04-29 Thread Gwen Shapira
on, it seems like the current high level consumer API does > > not seem to be supporting it, atleast not in a straight forward fashion. > > > > Appreciate any alternate solutions. > > > > On Thu, Apr 23, 2015 at 8:08 PM, Gwen Shapira > > wrote: > > >

Re: New Producer API - batched sync mode support

2015-04-29 Thread Gwen Shapira
I'm starting to think that the old adage "If two people say you are drunk, lie down" applies here :) Current API seems perfectly clear, useful and logical to everyone who wrote it... but we are getting multiple users asking for the old batch behavior back. One reason to get it back is to make upgr

Re: New Producer API - batched sync mode support

2015-04-30 Thread Gwen Shapira
Why do we think atomicity is expected, if the old API we are emulating here lacks atomicity? I don't remember emails to the mailing list saying: "I expected this batch to be atomic, but instead I got duplicates when retrying after a failed batch send". Maybe atomicity isn't as strong requirement a

Re: Pulling Snapshots from Kafka, Log compaction last compact offset

2015-04-30 Thread Gwen Shapira
I feel a need to respond to the Sqoop-killer comment :) 1) Note that most databases have a single transaction log per db and in order to get the correct view of the DB, you need to read it in order (otherwise transactions will get messed up). This means you are limited to a single producer reading

Re: Leaderless topics

2015-04-30 Thread Gwen Shapira
Which Kafka version are you using? On Thu, Apr 30, 2015 at 4:11 PM, Dillian Murphey wrote: > Scenerio with 1 node broker, and 3 node zookeeper ensemble. > > 1) Create topic > 2) Delete topic > 3) Re-create with same name > > I'm noticing this recreation gives me Leader: non, and Isr: as empty. >

Re: Leaderless topics

2015-05-02 Thread Gwen Shapira
en. > > On Thu, Apr 30, 2015 at 5:34 PM, Gwen Shapira > wrote: > > > Which Kafka version are you using? > > > > On Thu, Apr 30, 2015 at 4:11 PM, Dillian Murphey < > crackshotm...@gmail.com> > > wrote: > > > > > Scenerio with 1 node broker,

Re: Kafka - is there a quorum a used for reading when min isrs greater than 1

2015-05-02 Thread Gwen Shapira
Hi, The leader always serves all requests. Replicas are used just for high availability (i.e to become leader if needed). On Fri, May 1, 2015 at 3:27 PM, Gomathivinayagam Muthuvinayagam < sankarm...@gmail.com> wrote: > I am just wondering if I have no of replicas as 3, and > if min.insync.replic

Re: Pulling Snapshots from Kafka, Log compaction last compact offset

2015-05-09 Thread Gwen Shapira
to achieve nice parallelism. So it seems the producer is the potential > bottleneck, but I imagine you can scale that appropriately vertically and > put the proper HA. > > Would love to hear your thoughts on this. > > Jonathan > > > > On Thu, Apr 30, 2015 at 5:

Re: Is there a way to know when I've reached the end of a partition (consumed all messages) when using the high-level consumer?

2015-05-10 Thread Gwen Shapira
For Flume, we use the timeout configuration and catch the exception, with the assumption that "no messages for few seconds" == "the end". On Sat, May 9, 2015 at 2:04 AM, James Cheng wrote: > Hi, > > I want to use the high level consumer to read all partitions for a topic, > and know when I have

Re: Kafka Client in Rust

2015-05-11 Thread Gwen Shapira
You may want to announce this at kafka-clie...@googlegroups.com, a mailing list specifically for Kafka clients. I'm sure they'll be thrilled to hear about it. It is also a good place for questions on client development, if you ever need help. On Mon, May 11, 2015 at 4:57 AM, Yousuf Fauzan wrote:

Re: Kafka integration with Hadoop

2015-05-11 Thread Gwen Shapira
Also, Flume 1.6 has Kafka source, sink and channel. Using Kafka source or channel with Dataset sink should work. Gwen On Tue, May 12, 2015 at 6:16 AM, Warren Henning wrote: > You could start by looking at Linkedin's Camus and go from there? > > On Mon, May 11, 2015 at 8:10 PM, Rajesh Datla > w

Re: questions

2015-05-12 Thread Gwen Shapira
Kafka with Flume is one way (Just use Flume's SpoolingDirectory source with Kafka Channel or Kafka Sink). Also, Kafka itself has a Log4J appender as part of the project, this will work if the log is written with log4j. Gwen On Tue, May 12, 2015 at 12:52 PM, ram kumar wrote: > hi, > > How can i

Re: Kafka 0.9.0 release date

2015-05-16 Thread Gwen Shapira
Being an open source project, we don't have committed release dates. You can follow the progress of our security features by watching https://issues.apache.org/jira/browse/KAFKA-1682 and its sub-tasks. Hope this helps! Gwen On Sat, May 16, 2015 at 1:22 PM, Madabhattula Rajesh Kumar < mrajaf...@

Re: kafkacat

2015-05-18 Thread Gwen Shapira
Flume with netcat source and Kafka Channel or Kafka Sink will do that. A bit more complex than a Kafkacat equivalent, but will get the job done. On Tue, May 19, 2015 at 3:02 AM, clay teahouse wrote: > Hi All, > > Does anyone know of an implementation of kafkacat that reads from socket? > Or for

Re: Simple consumer rebalancing

2015-05-20 Thread Gwen Shapira
See inline. On Wed, May 20, 2015 at 12:51 PM, Harut Martirosyan < harut.martiros...@gmail.com> wrote: > Hi. I've got several questions. > > 1. As far as I understood from docs, if rebalancing feature is needed (when > consumer added/removed), High-level Consumer should be used, what about > Simpl

Re: Kafka broker - Ip-address instead of host naem

2015-05-24 Thread Gwen Shapira
If you set advertised.hostname in server.properties to the ip address, the IP will be registered in ZooKeeper. On Fri, May 22, 2015 at 2:20 PM, Achanta Vamsi Subhash < achanta.va...@flipkart.com> wrote: > Hi, > > Currently Kakfa brokers register the hostname in zookeeper. > > [zk: localhost:2181

Re: Kafka broker - Ip-address instead of host naem

2015-05-24 Thread Gwen Shapira
t; configs to be reloaded but only variable external config changes should be > allowed. > > https://issues.apache.org/jira/browse/KAFKA-1229 > > On Sun, May 24, 2015 at 1:14 PM, Gwen Shapira > wrote: > > > If you set advertised.hostname in server.properties to the ip address,

Re: Difference between NOT_ENOUGH_REPLICAS and NOT_ENOUGH_REPLICAS_AFTER_APPEND

2015-06-08 Thread Gwen Shapira
Hi, What you said is exactly correct :) We check ISR size twice. Once before writing to leader, and once when checking for acks. The first error is thrown if we detect a small ISR before writing to the leader. The second if the ISR shrank after we wrote to the leader but before we got enough acks

Re: offsets.storage=kafka, dual.commit.enabled=false still requires ZK

2015-06-09 Thread Gwen Shapira
The existing consumer uses Zookeeper both to commit offsets and to assign partitions to different consumers and threads in the same consumer group. While offsets can be committed to Kafka in 0.8.2 releases, which greatly reduces the load on Zookeeper, the consumer still requires Zookeeper to manage

Re: Increased replication factor. Replication didn't happen!

2015-06-10 Thread Gwen Shapira
What do the logs show? On Wed, Jun 10, 2015 at 5:07 PM, Dillian Murphey wrote: > Ran this: > > $KAFKA_HOME/bin/kafka-reassign-partitions.sh > > But Kafka did not actually do the replication. Topic description shows the > right numbers, but it just didn't replicate. > > What's wrong, and how do I

<    1   2   3   4   5   6   >