Re: Brokers is down by “java.io.IOException: Too many open files”

2017-05-12 Thread Caleb Welton
You need to up your OS open file limits, something like this should work: # /etc/security/limits.conf * - nofile 65536 On Fri, May 12, 2017 at 6:34 PM, Yang Cui wrote: > Our Kafka cluster is broken down by the problem “java.io.IOException: Too > many open files” three times in 3 weeks. > >

Brokers is down by “java.io.IOException: Too many open files”

2017-05-12 Thread Yang Cui
Our Kafka cluster is broken down by the problem “java.io.IOException: Too many open files” three times in 3 weeks. We encounter these problem on both 0.9.0.1 and 0.10.2.1 version. The error is like: java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0

Reassign Partitions Programmatically

2017-05-12 Thread Kerry Wei
Hi there, Can't find Java API to do partition reassignment, so i took a look at the source code (not a deep look). It seems that the kafka-reassign-partitions.sh script created the znode /admin/reassign_partitions with the new plan (in JSON) as the data. I wonder if I could post data to zookeeper i

Re: Order of punctuate() and process() in a stream processor

2017-05-12 Thread Matthias J. Sax
I added the feedback to https://issues.apache.org/jira/browse/KAFKA-3514 -Matthias On 5/12/17 10:38 AM, Thomas Becker wrote: > Thanks. I think the system time based punctuation scheme we were discussing > would not result in repeated punctuations like this, but even using stream > time it seem

RE: Order of punctuate() and process() in a stream processor

2017-05-12 Thread Thomas Becker
Thanks. I think the system time based punctuation scheme we were discussing would not result in repeated punctuations like this, but even using stream time it seems a bit odd. If you do anything in a punctuate call that is relatively expensive it's especially bad. __

Re: Can state stores function as a caching layer for persistent storage

2017-05-12 Thread João Peixoto
Thanks for the comments, here are some clarifications: I did look at interactive queries, if I understood them correctly it means that my state store must hold all the results in order for it to be queried, either in memory or through disk (RocksDB). 1. The retention policy on my aggregate operati

Re: Order of punctuate() and process() in a stream processor

2017-05-12 Thread Matthias J. Sax
Thanks for sharing. As punctuate is called with "streams time" you see the same time value multiple times. It's again due to the coarse grained advance of "stream time". @Thomas: I think, the way we handle it just simplifies the implementation of punctuations. I don't see any other "advantage".

RE: Order of punctuate() and process() in a stream processor

2017-05-12 Thread Peter Sinoros Szabo
Well, this is also a good question, because it is triggered with the same timestamp 3 times, so in order to create my update for both three seconds, I will have to count the number of punctuations and calculate the missed stream times for myself. It's ok for me to trigger it 3 times, but the ti

Rebalance topic partitions over HDD

2017-05-12 Thread Igor Kuzmenko
My kafka broker v0.10.1 failed with exception, because no disk space was left: [root@queue001 log]# df -h Filesystem Size Used Avail Use% Mounted on */dev/mapper/ol-queue00103-data_3 1.1T 1.1T 0 100% /data/3/dev/mapper/ol-queue00101-data_1 1.1T 628G 417G

Re: Can state stores function as a caching layer for persistent storage

2017-05-12 Thread Matthias J. Sax
Hi, I am not sure about your overall setup. Are your stores local (similar to RocksDB) or are you using one global remote store? If you use a global remote store, you would not need to back your changes in a changelog topic, as the store would not be lost if case of failure. Also (in case that yo

Re: Can state stores function as a caching layer for persistent storage

2017-05-12 Thread Eno Thereska
Hi there, A couple of general comments, plus some answers: - general comment: have you thought of using Interactive Queries to directly query the aggregate data, without needing to store them to an external database (see this blog: https://www.confluent.io/blog/unifying-stream-processing-and-i

RE: Order of punctuate() and process() in a stream processor

2017-05-12 Thread Thomas Becker
I'm a bit troubled by the fact that it fires 3 times despite the stream time being advanced all at once; is there a scenario when this is beneficial? From: Matthias J. Sax [matth...@confluent.io] Sent: Friday, May 12, 2017 12:38 PM To: users@kafka.apache.o

Simulating Kafka being down

2017-05-12 Thread Omar Othman
Hi All, We depend on Kafka a lot in my company and one of the projects we wanted to do is writing all the logs from SysLog to Kafka via "omkafka" module that rsyslogd provides. We enabled that once and, because of some rsyslog

Re: Order of punctuate() and process() in a stream processor

2017-05-12 Thread Peter Sinoros Szabo
Hi, It is actually critical for me (and of course I would like to understand it too), because using this design the "first message" will be processed in a bad window/segment (I mean the timeframe between the punctuate() calls): it will be processed in the "3 seconds before segment" instead of

Re: Order of punctuate() and process() in a stream processor

2017-05-12 Thread Matthias J. Sax
Hi Peter, It's by design. Streams internally tracks time progress (so-called "streams time"). "streams time" get advanced *after* processing a record. Thus, in your case, "stream time" is still at its old value before it processed the first message of you send "burst". After that, "streams time"

RE: Why do I need to specify replication factor when creating a topic?

2017-05-12 Thread Thomas Becker
I can only assume that it was. I have heard it stated as a goal that it should be easier to simply create topics with the "defaults" but we seem to be making only incremental progress towards that goal. I don't know if the assumption is that most users are routinely varying the number of partiti

Can state stores function as a caching layer for persistent storage

2017-05-12 Thread João Peixoto
On a stream definition I perform an "aggregate" which is configured with a state store. *Goal*: Persist the aggregation results into a database, e.g. MySQL or MongoDB *Working approach*: I have created a custom StateStore backed by a changelog topic like the builtin state stores. Whenever the sto

Order of punctuate() and process() in a stream processor

2017-05-12 Thread Peter Sinoros Szabo
Hi, Let's assume the following case. - a stream processor that uses the Processor API - context.schedule(1000) is called in the init() - the processor reads only one topic that has one partition - using custom timestamp extractor, but that timestamp is just a wall clock time Image the followin

Re: Securing Kafka - Keystore and Truststore question

2017-05-12 Thread Rajini Sivaram
Raqhav, 1. Clients need a keystore if you are using TLS client authentication. To enable client authentication, you need to configure ssl.client.auth in server.properties. This can be set to required|requested|none. If you don't enable client authentication, any client will be able to connect to y

Re: Why do I need to specify replication factor when creating a topic?

2017-05-12 Thread Jeff Widman
> The problem is that the AdminUtils requires this info to be known client side, but there is no API to get it. Why does the client side need it? If the broker can auto-create topics, then the broker is aware of the default param. > I think things will be better in 0.11.0 where we have the AdminC

Securing Kafka - Keystore and Truststore question

2017-05-12 Thread Raghav
Hi I read the documentation here: https://kafka.apache.org/documentation/#security_ssl I have few questions about trust store and keystore based on this scenario: We have 5 Kafka Brokers in our cluster. We want our clients to write to our Kafka brokers in a secure way. Suppose, we also host a pr

Partition offsets decreased

2017-05-12 Thread Andrey Bratus
Hi, we are running a 0.10.1.1 Kafka cluster with 2 brokers and recently encountered an issue with partition offsets. Our application relies on saving the offsets for each partition in order to be able to replay the events that occurred after a certain point in time. The issue is that we found

Re: How to chain increasing window operations one after another

2017-05-12 Thread Garrett Barton
Hey all, Just wanted to follow up. I have not had much time to work on this, been trying to figure out why leaving the stream on for a few days causes some of the stream apps to go into this death spiral of connecting/disconnecting. The store growing unbounded stopped me in my tracks and I ha

RE: Why do I need to specify replication factor when creating a topic?

2017-05-12 Thread Thomas Becker
Yes, this has been an issue for some time. The problem is that the AdminUtils requires this info to be known client side, but there is no API to get it. I think things will be better in 0.11.0 where we have the AdminClient that includes support for both topic CRUD APIs (not just ZK modifications