[DISCUSS] KIP-372: Naming Joins and Grouping

2018-09-12 Thread Bill Bejeck
All I'd like to start a discussion on KIP-372 for the naming of joins and grouping operations in Kafka Streams. The KIP page can be found here: https://cwiki.apache.org/confluence/display/KAFKA/KIP-372%3A+Naming+Joins+and+Grouping I look forward to feedback and comments. Thanks, Bill

Re: SAM Scala aggregate

2018-09-12 Thread John Roesler
I'm not 100% certain, but you might need to do "import _root_.scala.collection.JavaConverters._" etc. Sometimes, you run into trouble with ambiguity if the compiler can't tell if "scala" references the top-level package or the intermediate package inside Streams. Hope this helps! -John On Wed, Se

Re: KStreams / KSQL processes that span multiple clusters

2018-09-12 Thread John Roesler
Hi Elliot, This is not currently supported, but I, for one, think it would be awesome. It's something I have considered tackling in the future. Feel free to create a Jira ticket asking for it (but please take a minute to search for preexisting tickets). Offhand, my #1 concern would be how it wor

Re: SAM Scala aggregate

2018-09-12 Thread Michael Eugene
Hey thanks for the help everyone, I’m gonna use the new scala 2.0 libraries. Im getting the craziest errorwhen building this though but I’m not a maven expert. I have to use maven right now (not sbt) because I don’t own this project at work. Anyway whenever I add the maven dependency - or

Re: Timing state changes?

2018-09-12 Thread John Roesler
Hi Tim, The general approach used by Streams is resilience by wrapping all state updates in a "changelog topic". That is, when Streams updates a key/value pair in the state store, it also sends the update to a special topic associated with that store. The record is only considered "committed" aka

Re: Need info

2018-09-12 Thread Mich Talebzadeh
Hi James, As a matter of interest this streaming data is fed into some Operational Data Store ODS) like MongoDB? In general using this method will create a near real time snapshot for business users and customers. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=A

Re: Need info

2018-09-12 Thread James Kwan
We have banking customers sending data from DB2 z to Kafka linux (not cloud) with transaction rate 30K per seconds. Kafka can handle more than this rate. > On Sep 12, 2018, at 2:31 AM, Chanchal Chatterji > wrote: > > Hi, > > In the process of mainframe modernization, we are attempting to str

Re: Java 11 OpenJDK/Oracle Java Release Cadence Questions

2018-09-12 Thread Jeremiah Adams
Thank you Ismael. Jeremiah Adams Software Engineer www.helixeducation.com Blog | Twitter | Facebook | LinkedIn From: Ismael Juma Sent: Wednesday, September 12, 2018 8:54 AM To: Kafka Users Subject: Re: Java 11 OpenJDK/Oracle Java Release Cadence Question

Re: Java 11 OpenJDK/Oracle Java Release Cadence Questions

2018-09-12 Thread Ismael Juma
The release plan: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044 Ismael On Wed, Sep 12, 2018 at 7:51 AM Jeremiah Adams wrote: > Thanks Ismael, > > Is there a rough estimated time of arrival for Kafka 2.1.0? > > > Jeremiah Adams > Software Engineer > www.helixeducatio

Re: Java 11 OpenJDK/Oracle Java Release Cadence Questions

2018-09-12 Thread Jeremiah Adams
Thanks Ismael, Is there a rough estimated time of arrival for Kafka 2.1.0? Jeremiah Adams Software Engineer www.helixeducation.com Blog | Twitter | Facebook | LinkedIn From: Ismael Juma Sent: Tuesday, September 11, 2018 6:04 PM To: Kafka Users Subject:

RE: Need info

2018-09-12 Thread Tauzell, Dave
If you size your cluster right, you can send large messages of many megabytes. We send lots (millions per day) of medium sized messages (5-10k) without any issues. -Dave -Original Message- From: Chanchal Chatterji [mailto:chanchal.chatte...@infosys.com] Sent: Wednesday, September 12, 2

RE: Timing state changes?

2018-09-12 Thread Tim Ward
From: John Roesler > As you noticed, a windowed computation won't work here, because you would > be wanting to alert on things that are absent from the window. > Instead, you can use a custom Processor with a Key/Value store and schedule > punctuations to send the alerts. For example, you can sto

Re: Kafka compression - results

2018-09-12 Thread darekAsz
i run tests with this command: bin/kafka-producer-perf-test.sh --topic topicname --num-records 5 --throughput -1 --producer.config config/producer.properties --record-size 64 śr., 12 wrz 2018 o 13:38 Liam Clarke napisał(a): > 500 million * 64B is 32GB. Are you sure you actually sent 500

Re: Need info

2018-09-12 Thread Liam Clarke
So you need to figure out your needs. Kafka can deliver near real time streaming, and it can function as a data store. It can handle significantly large messages if you want, but there are tradeoffs - you'd obviously need more hardware. I have no idea how many MB a bank transaction is, but you nee

Re: Kafka compression - results

2018-09-12 Thread Liam Clarke
500 million * 64B is 32GB. Are you sure you actually sent 500 million messages? (I assumed that mln = million) On Wed, 12 Sep. 2018, 9:54 pm darekAsz, wrote: > sorry, I wrote bad results :/ > here are correctly > > Directory size after sending uncompressed data: 1.5 GB > > Directory size after s

Re: Kafka compression - results

2018-09-12 Thread darekAsz
sorry, I wrote bad results :/ here are correctly Directory size after sending uncompressed data: 1.5 GB Directory size after sending data compressed with gzip: 1.2 GB Directory size after sending data compressed with snappy: 366 MB Directory size after sending data compressed with lz4: 2.4 GB

KStreams / KSQL processes that span multiple clusters

2018-09-12 Thread Elliot West
Hello, Apologies if this is a naïve question, but I'd like to understand if and how KStreams and KSQL can deal with topics that reside in more than one cluster. For example, is it possible for a single KS[treams|QL] application to: 1. Source from a topic on cluster A 2. Produce/Consume to i

RE: Kafka compression - results

2018-09-12 Thread Chanchal Chatterji
It's not just lz4 , except in case if gzip everything else increases the directory size. -Original Message- From: darekAsz Sent: Wednesday, September 12, 2018 2:43 PM To: users@kafka.apache.org Subject: Kafka compression - results Hi I made some tests of compression in kafka. First I

Kafka compression - results

2018-09-12 Thread darekAsz
Hi I made some tests of compression in kafka. First I want to check speed of producer with compression. There are my results: with no compression: 112.58 MB/s with gzip compression: 63.24 MB/s with snappy compression: 132.43 MB/s with lz4 compression: 136.66 MB/s Then I want to check how looks siz

RE: Need info

2018-09-12 Thread Chanchal Chatterji
In simple words it is like : We have MF application which is sending statement data to Kafka from internal data sources after some processing . Which would later be pushed to cloud ( through Kafka) and will be staged in Amazon S3 bucket in cloud. The first time entire relevant data will be pus

Re: Need info

2018-09-12 Thread Mich Talebzadeh
Hi, As I understand you are trying to create an operational data store from your transactional database(s) upstream? Do you have stats on the rate of DML in the primary source? These insert/update/deletes need to pass to Kafka as messages. Besides what Kafka can handle (largely depending on the a

RE: Need info

2018-09-12 Thread Chanchal Chatterji
We are planning to produce Bank statements out of data traversing through the Kafka. ( A simple example of it would be Bank statement for saving account / current account in printable format in our daily life. So your three suggestions : 1. Build your cluster right 2. Size your Message righ

Re: Need info

2018-09-12 Thread Liam Clarke
The answer to your question is "It depends". You build your cluster right and size your messages right and tune your producers right, you can achieve near real time transport of terabytes of data a day. There's been plenty of articles written about Kafka performance. E.g., https://engineering.lin

Need info

2018-09-12 Thread Chanchal Chatterji
Hi, In the process of mainframe modernization, we are attempting to stream Mainframe data to AWS Cloud , using Kafka. We are planning to use Kafka 'Producer API' at mainframe side and 'Connector API' on the cloud side. Since our data is processed by a module called 'Central dispatch' located in