Re: [DISCUSS] KIP-114: KTable materialization and improved semantics

2017-01-30 Thread Matthias J. Sax
I understand point (1) about when materialization happens. But I cannot follow your conclusion about how this should influence the DSL because I don't see a functional difference in "provide a store name in a overload" vs "call .materialize()" -- both mechanism can do the exact some thing. I also

Re: At Least Once semantics for Kafka Streams

2017-01-30 Thread Matthias J. Sax
Hi, yes, all examples have at-least-once semantics because this is the only "mode" Kafka Streams supports -- you cannot "disable" it. (btw: we are currently working on exactly-once for Streams that you will be able to turn off/on). There is not much documentation about how it work internally, bec

Re: [DISCUSS] KIP-114: KTable materialization and improved semantics

2017-01-30 Thread Eno Thereska
Hi there, The inconsistency will be resolved, whether with materialize or overloaded methods. With the discussion on the DSL & stores I feel we've gone in a slightly different tangent, which is worth discussing nonetheless. We have entered into an argument around the scope of the DSL. The DSL

Re: [DISCUSS] KIP-114: KTable materialization and improved semantics

2017-01-30 Thread Jan Filipiak
Hi Eno, I have a really hard time understanding why we can't. From my point of view everything could be super elegant DSL only + public api for the PAPI-people as already exist. The above aproach implementing a .get(K) on KTable is foolisch in my opinion as it would be to late to know that m

Cannot access Kafka Streams JMX metrics using jmxterm

2017-01-30 Thread Jendrik Poloczek
Hi, I want to read the Kafka Streams JMX metrics using jmxterm, similiar to this Kafka documentation: https://cwiki.apache.org/confluence/display/KAFKA/jmxterm+quickstart. I am using the same version: jmxterm-1.0-alpha-4-uber.jar. I managed to retrieve metrics from the Kafka Streams application v

Re: [DISCUSS] KIP-114: KTable materialization and improved semantics

2017-01-30 Thread Eno Thereska
So I think there are several important discussion threads that are emerging here. Let me try to tease them apart: 1. inconsistency in what is materialized and what is not, what is queryable and what is not. I think we all agree there is some inconsistency there and this will be addressed with a

Re: Cannot access Kafka Streams JMX metrics using jmxterm

2017-01-30 Thread Eno Thereska
Hi Jendrik, I haven't tried jmxterm. Can you confirm if it is able to access the Kafka producer/consumer metrics (they exist since Kafka Streams internally uses Kafka)? I've personally used jconsole to look at the collected streams metrics, but that might be limited for your needs. Thanks Eno

Re: [DISCUSS] KIP-114: KTable materialization and improved semantics

2017-01-30 Thread Jan Filipiak
Hi Eno, thanks for putting into different points. I want to put a few remarks inline. Best Jan On 30.01.2017 12:19, Eno Thereska wrote: So I think there are several important discussion threads that are emerging here. Let me try to tease them apart: 1. inconsistency in what is materialized

Re: KafkaAvroSerializer to produce to a single topic with different schemas used for records

2017-01-30 Thread Gerard Klijs
Not really, as you can update the schema, and have multiple of them at the same time. By default each schema has to backwards compatible, so you do have to exclude the specific topic you use with different schema's. With every write, the 'id' of the schema used is also written, so when you deserial

Re: Cannot access Kafka Streams JMX metrics using jmxterm

2017-01-30 Thread Jendrik Poloczek
Hi Eno, I tried accessing the Kafka consumer and producer beans using: info -d kafka.consumer -b kafka.consumer:client-id=app-c7117b6f-3af1-473a-a87a-1d981574c071-StreamThread-1-consumer,type=kafka-metrics-count info -d kafka.producer -b kafka.producer:client-id=app-c7117b6f-3af1-473a-a87a-1d981

Re: KafkaAvroSerializer to produce to a single topic with different schemas used for records

2017-01-30 Thread Mike Cargal
This helps some. W’re planning to write a non-homogeneous set of records to a single topic (to preserve order). There would be no compatibility between records of different types. I assume that if I set the schema compatibility for this subject to “none” this would not be a problem. (can you

kafka connect architecture

2017-01-30 Thread Koert Kuipers
i have been playing with kafka connect in standalone and distributed mode. i like standalone because: * i get to configure it using a file. this is easy for automated deployment (chef, puppet, etc.). configuration using a rest api i find inconvenient. * erors show up in log files instead of having

Re: KafkaAvroSerializer to produce to a single topic with different schemas used for records

2017-01-30 Thread Andy Chambers
How about defining an avro union type containing all the schemas you wish to put on this topic (the schemas themselves could be defined independently and then bundled into an "uber-schema" at build time)? That means any messages you put on the topic must match one of the schemas defined in the uni

Viewing timestamps with console consumer (0.10.1.1)

2017-01-30 Thread Meghana Narasimhan
Hi, Is there a way to view message timestamp using console consumer ? Thanks, Meghana

Re: [DISCUSS] KIP-114: KTable materialization and improved semantics

2017-01-30 Thread Matthias J. Sax
Hi, I think Eno's separation is very clear and helpful. In order to streamline this discussion, I would suggest we focus back on point (1) only, as this is the original KIP question. Even if I started to DSL design discussion somehow, because I thought it might be helpful to resolve both in a sin

Re: Cannot access Kafka Streams JMX metrics using jmxterm

2017-01-30 Thread Guozhang Wang
Hi Jendrik, Could you share with us what is your specified application id and client id? The reason that "app-c7117b6f-3af1-473a-a87a-1d981574c071" is used as the client id could be due to that client id was not specified in the configs. Guozhang On Mon, Jan 30, 2017 at 4:36 AM, Jendrik Poloc

Re: Viewing timestamps with console consumer (0.10.1.1)

2017-01-30 Thread Amrit Jangid
Hi Meghana, Please Try : kafka-console-consumer.sh --property print.timestamp=true - Amrit On Mon, Jan 30, 2017 at 10:20 PM, Meghana Narasimhan < mnarasim...@bandwidth.com> wrote: > Hi, > Is there a way to view message timestamp using console consumer ? > > Thanks, > Meghana >

Re: Viewing timestamps with console consumer (0.10.1.1)

2017-01-30 Thread Meghana Narasimhan
Hi Amrit, I tried that but received the following warning message, WARN The configuration 'print.timestamp' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig) Thanks, Meghana On Mon, Jan 30, 2017 at 1:09 PM, Amrit Jangid wrote: > Hi Meghana, > > > Please

Re: KafkaAvroSerializer to produce to a single topic with different schemas used for records

2017-01-30 Thread Mike Cargal
If it comes to that we may consider it. However will will have a LOT of different schemas coming through and new ones added frequently. (Seems we’ve also seen issues that the Schema Registry doesn’t allow references to anything not in the same “file” for lack of a better term, that would become

Understanding output of KTable->KTable join

2017-01-30 Thread Jon Yeargers
I want to do a one:many join between two streams. There should be ~ 1:100 with < 1% having no match. My topology is relatively simple: KTable1.join(KTable2)->to("other topic") \ \---> toStream().print() In the join it takes both Value1 and Value2 as JSON, converts

Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2017-01-30 Thread Guozhang Wang
Hello Folks, We have addressed all the comments collected so far, and would like to propose a voting thread this Wednesday. If you have any further comments on this KIP, please feel free to continue sending them on this thread before that. Guozhang On Mon, Jan 30, 2017 at 1:10 PM, Jason Gustaf

Re: Understanding output of KTable->KTable join

2017-01-30 Thread Matthias J. Sax
If you join two KTables, one-to-many join is currently not supported (only one-to-one, ie, primary key join). In upcoming 0.10.2 there will be global-KTables that allow something similar to one-to many joins -- however, only for KStream-GlobalKTable joins, so not sure if this can help you. About

Re: "End of Batch" event

2017-01-30 Thread Eric Dain
Thanks Matthias for your reply. I'm not trying to stop the application. I'm importing inventory from CSV files coming from 3rd party sources. The CSVs are snapshots for each source's inventory. I need to delete all items from that source that doesn't exist in the latest CSV file. I was thinking o

Re: Ideal value for Kafka Connect Distributed tasks.max configuration setting?

2017-01-30 Thread Ewen Cheslack-Postava
On Fri, Jan 27, 2017 at 10:49 AM, Phillip Mann wrote: > I am looking to product ionize and deploy my Kafka Connect application. > However, there are two questions I have about the tasks.max setting which > is required and of high importance but details are vague for what to > actually set this va

Re: special characters in kafka log

2017-01-30 Thread Ewen Cheslack-Postava
Not sure what special characters you are referring to, but for data in the key and value fields in Kafka, it handles arbitrary binary data. "Special characters" aren't special because Kafka doesn't even inspect the data it is handling: clients tell it the length of the data and then it copies that

Re: kafka connect architecture

2017-01-30 Thread Ewen Cheslack-Postava
On Mon, Jan 30, 2017 at 8:24 AM, Koert Kuipers wrote: > i have been playing with kafka connect in standalone and distributed mode. > > i like standalone because: > * i get to configure it using a file. this is easy for automated deployment > (chef, puppet, etc.). configuration using a rest api i

Re: Kafka JDBC connector vs Sqoop

2017-01-30 Thread Ewen Cheslack-Postava
For MySQL you would either want to use Debezium's connector (which can handle bulk dump + incremental CDC, but requires direct access to the binlog) or the JDBC connector (does an initial bulk dump + incremental queries, but has limitations compared to a "true" CDC solution). Sqoop and the JDBC co

Re: using kafka log compaction withour key

2017-01-30 Thread Ewen Cheslack-Postava
The log compaction functionality uses the key to determine which records to deduplicate. You can think of it (very roughly) as deleting entries from a hash map as the value for each key is overwritten. This functionality doesn't have much of a point unless you include keys in your records. -Ewen

Re: Upgrade questions

2017-01-30 Thread Ewen Cheslack-Postava
Note that the documentation that you linked to for upgrades specifically lists configs that you need to be careful to adjust in your server.properties. In fact, the server.properties shipped with Kafka is meant for testing only. There are some configs in the example server.properties that are not

Re: kafka_2.10-0.8.1 simple consumer retrieves junk data in the message

2017-01-30 Thread Ewen Cheslack-Postava
What are the 26 additional bytes? That sounds like a header that a decoder/deserializer is handling with the high level consumer. What class are you using to deserialize the messages with the high level consumer? -Ewen On Fri, Jan 27, 2017 at 10:19 AM, Anjani Gupta wrote: > I am using kafka_2.1

Re: When publishing to non existing topic, TimeoutException is thrown instead of UnknownTopicOrPartitionException

2017-01-30 Thread Ewen Cheslack-Postava
Stevo, Agreed that this seems broken if we're just timing out trying to fetch metadata if we should be able to tell that the topic will never be created. Clients can't explicitly tell whether auto topic creation is on. Implicit indication via the error code seems like a good idea. My only concern