Re: Streams RocksDBException with no message?

2017-03-27 Thread Michael Noll
We're talking about `ulimit` (CLI tool) and the `nofile` limit (number of open files), which you can access via `ulimit -n`. Examples: https://access.redhat.com/solutions/61334 https://stackoverflow.com/questions/21515463/how-to-increase-maximum-file-open-limit-ulimit-in-ubuntu Depending on the o

Re: YASSQ (yet another state store question)

2017-03-27 Thread Michael Noll
IIRC this may happen, for example, if the first host runs all the stream tasks (here: 2 in total) and migration of stream task(s) to the second host hasn't happened yet. -Michael On Sun, Mar 26, 2017 at 3:14 PM, Jon Yeargers wrote: > Also - if I run this on two hosts - what does it imply if t

Re: kafka streams 0.10.2 failing while restoring offsets

2017-03-27 Thread Damian Guy
Hi Sachin, This should not happen. The previous owner of the task should have stopped processing before the restoration begins. So if this is happening, then that signals a bug. Do you have more logs? Thanks, Damian On Sat, 25 Mar 2017 at 08:20 Sachin Mittal wrote: > Hi, > So recently we fixed

Re: YASSQ (yet another state store question)

2017-03-27 Thread Damian Guy
Hi Jon, The store you have created is a window store, so you need to use: kafkaStreams.store("AggStore",* QueryableStoreTypes.windowStore()*) Thanks, Damian On Sun, 26 Mar 2017 at 14:14, Jon Yeargers wrote: Also - if I run this on two hosts - what does it imply if the response to 'streams.allM

Re: kafka streams 0.10.2 failing while restoring offsets

2017-03-27 Thread Sachin Mittal
Hi, These are the logs https://www.dropbox.com/s/2t4ysfdqbtmcusq/complete_84_85_87_log.zip?dl=0 I think this may not always be the case especially if previous owner is on a different machine. Say it is processing and it takes more than the poll timeout to process or commit the offset. The group

offset commitment from another client

2017-03-27 Thread Vova Shelgunov
Hi, I have an application which consumes messages from Kafka, then it creates a Docker container via Mesos which processes incoming message (image), but I need to commit an offset only once message is processed inside a Docker container. So basically I need to commit offset from another broker (th

RE: Kafka queue full configuration

2017-03-27 Thread Mohapatra, Sudhir (Nokia - IN/Gurgaon)
Can you please let me know what is the new parameter name for the same functionality to simulate the queue full scenario? Regards, Sudhir From: Mohapatra, Sudhir (Nokia - IN/Gurgaon) Sent: Thursday, March 23, 2017 11:01 AM To: 'users@kafka.apache.org' ; 'd...@kafka.apache.org' Subject: Kafka q

Re: Kafka queue full configuration

2017-03-27 Thread Manikumar
looks like you are referring to scala producer configs. https://kafka.apache.org/082/documentation.html#producerconfigs scala producer is deprecated now. It will be removed in future. you can use java producer http://kafka.apache.org/documentation/#producerconfigs buffering related are configs are

Re: kafka streams 0.10.2 failing while restoring offsets

2017-03-27 Thread Damian Guy
Yes, but we don't know why it is still processing the data. We don't want to have multiple processes acting on the same tasks, hence the exception. What if for some reason the other task is processing a large backlog, how long do we wait before we give up? I think in this case the exception is the

Re: kafka streams in-memory Keyvalue store iterator remove broken on upgrade to 0.10.2.0 from 0.10.1.1

2017-03-27 Thread Thomas Becker
Couldn't this have been solved by returning a ReadOnlyKeyValueIterator that throws an exception from remove() from the ReadOnlyKeyValueStore.iterator()? That preserves the ability to call remove() when it's appropriate and moves the refused bequest to when you shouldn't. On Thu, 2017-03-23 at 11:0

Iterating stream windows

2017-03-27 Thread Jon Yeargers
Im hoping to support external queries into a windowed state store aggregator. Thanks to a previous question here I see where to use a ReadOnlyWindowStore but Im not clear on how to define the boundaries for the call. Assumie I have a one hour window with a 5 minute 'slide' between new windows. If

using a state store for deduplication

2017-03-27 Thread Jon Yeargers
Ive been (re)reading this document( http://docs.confluent.io/3.2.0/streams/developer-guide.html#state-stores) hoping to better understand StateStores. At the top of the section there is a tantalizing note implying that one could do deduplication using a store. At present we using Redis for this as

Re: using a state store for deduplication

2017-03-27 Thread Damian Guy
Jon, You don't need all the data for every topic as the data is partitioned by key. Therefore each state-store instance is de-duplicating a subset of the key set. Thanks, Damian On Mon, 27 Mar 2017 at 13:47 Jon Yeargers wrote: > Ive been (re)reading this document( > http://docs.confluent.io/3.2.

kafka connector for mongodb as a source

2017-03-27 Thread VIVEK KUMAR MISHRA 13BIT0066
Hi All, I am creating kafka connector for mongodb as a source .My connector is starting and connecting with kafka but it is not committing any offset. This is output after starting connector. [root@localhost kafka_2.11-0.10.1.1]# bin/connect-standalone.sh config/connect-standalone.properties con

[Kafka Streams] - problem joining 2 different streams

2017-03-27 Thread Marco Abitabile
Hi all, I'm struggling with an apparently simple problem. I'm joining 2 different streams: Stream1. User activity data, with key, value --> Stream2. User location data (such as the city name) with key, value --> Keys are homogeneous in content and represents the id of the user's device. The

Re: kafka streams 0.10.2 failing while restoring offsets

2017-03-27 Thread Sachin Mittal
Well lets say previous thread was processing a large block and while processing it got bumped out. So new thread which got this partition may have different offset when it starts the restore and by the time it completes the restore. If that is the case then it should just ignore that task for that

Re: [Kafka Streams] - problem joining 2 different streams

2017-03-27 Thread Damian Guy
Hi Marco, It looks like you are creating 2 independent instances of KafkaStreams and trying to join across those instances. This wouldn't work and i'm surprised it has let you get that far without some other exception. You should remove this bit: >KafkaStreams userLocationKafkaStream = new >Kafka

Re: using a state store for deduplication

2017-03-27 Thread Michael Noll
Jon, Damian already answered your direct question, so my comment is a FYI: There's a demo example at https://github.com/confluentinc/examples/blob/3.2.x/kafka-streams/src/test/java/io/confluent/examples/streams/EventDeduplicationLambdaIntegrationTest.java (this is for Confluent 3.2 / Kafka 0.10.2

Kafka heap space issue

2017-03-27 Thread Karthik Jayaraman
I have posted my issue in - http://stackoverflow.com/questions/43050796/kafka-server-kafkaserverstartable-java-lang-outofmemoryerror-java-heap-space . Any help appreciated. Thanks, JK

Re: kafka streams in-memory Keyvalue store iterator remove broken on upgrade to 0.10.2.0 from 0.10.1.1

2017-03-27 Thread Matthias J. Sax
Yes, that is what we do in 0.10.2 -- it was a bug in 0.10.1 to not throw an exception :) -Matthias On 3/27/17 5:07 AM, Thomas Becker wrote: > Couldn't this have been solved by returning a ReadOnlyKeyValueIterator > that throws an exception from remove() from the > ReadOnlyKeyValueStore.iterator(

Re: APPLICATION_SERVER_CONFIG ?

2017-03-27 Thread Michael Noll
Yes, agreed -- the re-thinking pre-existing notions is a big part of such conversations. A bit like making the mental switch from object-oriented programming to functional programming -- and, just like in this case, neither is more "right" than the other. Personal opinion/preference/context matte

Re: more uniform task assignment across kafka stream nodes

2017-03-27 Thread Ara Ebrahimi
Hi, So, I simplified the topology by making sure we have only 1 source topic. Now I have 1 source topic, 8 partitions, 2 instances. And here’s how the topology looks like: instance 1: KafkaStreams processID: 48b58bc0-f600-4ec8-bc92-8cb3ea081aac StreamsThread appId: mar-23-modular StreamsThread

Re: 3 machines 12 threads all fail within 2 hours of starting the streams application

2017-03-27 Thread Matthias J. Sax
Sachin, about this statement: >> Also note that when an identical streams application with single thread on >> a single instance is pulling data from some other non partitioned identical >> topic, the application never fails. What about some "combinations": - single threaded multiple instances

Re: more uniform task assignment across kafka stream nodes

2017-03-27 Thread Matthias J. Sax
Ara, thanks for the detailed information. If I parse this correctly, both instances run the same number of tasks (12 each). That is all Streams promises. To come back to your initial question: > Is there a way to tell kafka streams to uniformly assign partitions across > instances? If I have n

Re: more uniform task assignment across kafka stream nodes

2017-03-27 Thread Ara Ebrahimi
Thanks for the response Mathias! The reason we want this exact task assignment to happen is that a critical part of our pipeline involves grouping relevant records together (that’s what the aggregate function in the topology is for). And for hot keys this can lead to sometimes 100s of records t

Re: 3 machines 12 threads all fail within 2 hours of starting the streams application

2017-03-27 Thread Eno Thereska
Also just a heads up that a PR that increases resiliency is being currently reviewed and should hopefully hit trunk soon: https://github.com/apache/kafka/pull/2719 . This covers certain broker failure scenarios as well as a (hopefully last) case when s

Re: more uniform task assignment across kafka stream nodes

2017-03-27 Thread Damian Guy
Hi Ara, There is a performance issue in the 0.10.2 release of session windows. It is fixed with this PR: https://github.com/apache/kafka/pull/2645 You can work around this on 0.10.2 by calling the aggregate(..), reduce(..) etc methods and supplying StateStoreSupplier with caching disabled, i.e, by

KafkaProducer overriding security.protocol config value and unable to connect to Broker

2017-03-27 Thread Srikrishna Alla
Hi everyone, I am facing an issue when writing to Kafka broker. I am instantiating a KafkaProducer by passing it configuration properties, on a secured Kafka Broker with SASL_PLAINTEXT security.protocol. I can see that I am passing the right security.protocol when instantiating the Producer, but l

Re: [DISCUSS] KIP 130: Expose states of active tasks to KafkaStreams public API

2017-03-27 Thread Florian Hussonnois
Hi Guozhang, Matthias, It's a great idea to add sub topologies descriptions. This would help developers to better understand topology concept. I agree that is not really user-friendly to check if `StreamsMetadata#streamThreads` is not returning null. The method name localThreadsMetadata looks go

Re: more uniform task assignment across kafka stream nodes

2017-03-27 Thread Matthias J. Sax
Ara, I assume your performance issue is most likely related to the fix Damian pointed out already. Couple of follow up comments: > critical part of our pipeline involves grouping relevant records together Can you explain this a little better? The abstraction of a task does group data together a

Re: more uniform task assignment across kafka stream nodes

2017-03-27 Thread Ara Ebrahimi
Holy s...! This lead to ~4-5x better performance. I’m gonna try this with more nodes and if performance improves almost linearly then we are good for now. Thanks! Ara. > On Mar 27, 2017, at 2:10 PM, Damian Guy wrote: > > Hi Ara, > > There is a performance issue in the 0.10.2 release of session

Re: more uniform task assignment across kafka stream nodes

2017-03-27 Thread Ara Ebrahimi
Let me clarify, cause I think we’re using different terminologies: - message key is phone number, reversed - all call records for a phone number land on the same partition - then we apply a session window on them and aggregate+reduce - so we end up with a group of records for a phone number. This

Re: more uniform task assignment across kafka stream nodes

2017-03-27 Thread Matthias J. Sax
Great! So overall, the issue is not related to task assignment. Also the description below, does not indicate that different task assignment would change anything. -Matthias On 3/27/17 3:08 PM, Ara Ebrahimi wrote: > Let me clarify, cause I think we’re using different terminologies: > > - messa

Upgrade ssl.enabled.protocols to TLSv1.2

2017-03-27 Thread Samuel Zhou
Hi, In my previous server and consumer configuration, we have set ssl.enabled.protocols=TLSv1 , but we want to upgrade ssl.enabled.protocols=TLSv1.2 now. Since the default value of ssl.enabled.protocols support 3 versions: v1, v1.1 and v1.2, what should I do to force both broker and consumer to us

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-27 Thread Matthias J. Sax
Hi, I would like to trigger this discussion again. It seems that the naming question is rather subjective and both main alternatives (w/ or w/o the word "Topology" in the name) have pros/cons. If you have any further thought, please share it. At the moment I still propose `StreamsBuilder` in the

Re: more uniform task assignment across kafka stream nodes

2017-03-27 Thread Ara Ebrahimi
Well, even with 4-5x better performance thanks to the session window fix, I expect to get ~10x better performance if I throw 10x more nodes at the problem. That won’t be the case due to task assignment unfortunately. I may end up with say 5-6 nodes with aggregation assigned to them and 4-5 nodes

Re: 3 machines 12 threads all fail within 2 hours of starting the streams application

2017-03-27 Thread Sachin Mittal
- single threaded multiple instances This option we could not try. However what we observed that running multiple instances on same machine with single thread would still create multiple rocksdb instances and somehow the VM is not able to handle many rocksdb instances running. Here bottleneck used

Re: org.apache.kafka.common.errors.TimeoutException

2017-03-27 Thread R Krishna
Are you able to publish any messages at all? If it is one off, then it is possible that the broker is busy and the client busy that it could not publish that batch of messages in that partition 0 within 1732 ms in which case you should increase the message timeouts and retries. Search the timeout e