I am using RocksDbSessionBytesStoreSupplier in my kafka streaming application
for an aggregation like this:
var materialized =
Materialized.>as(
new
RocksDbSessionBytesStoreSupplier(env.getProperty("messages.cdc.pft.topic",
"N
Hi:
I am using RocksDbSessionBytesStoreSupplier in my kafka streaming application
for an aggregation like this:
var materialized =
Materialized.>as(
new
RocksDbSessionBytesStoreSupplier(env.getProperty("messages.cdc.p
Thank you Matthias - we're using version 1.0. I can tell my team to relax
and look at upgrading :)
On Mon, May 14, 2018 at 3:48 PM, Matthias J. Sax
wrote:
> It depends on your version. The behavior is known and we put one
> improvement into 1.1 release: https://github.com/apache/kafka/pull/4410
It depends on your version. The behavior is known and we put one
improvement into 1.1 release: https://github.com/apache/kafka/pull/4410
Thus, it's "by design" (for 1.0 and older) but we we want to improve it.
Cf: https://issues.apache.org/jira/browse/KAFKA-4969
-Matthias
On 5/13/18 7:52 PM, Lia
Hi all,
We are running a KStreaming app with a basic topology of
consume from topic A -> transform and write through topic B (making the app
a consumer of topic B also) -> finally write to topic C
We are running it with two instances of the application. Topic A has 100
partitions, topics B and
Is your network saturated? If yes, you can try to start more instances
of Kafka Streams instead of running with multiple thread within one
instance to increase available network capacity.
-Matthias
On 2/8/18 12:30 AM, ? ? wrote:
> Hi:
> I used kafka streaming for real time analysis.
>
Hi:
I used kafka streaming for real time analysis.
and I put stream_thread_num same with partitions of topic
and set ConsumerConfig.max_poll_records =500
I use foreach method only in this
but find with large records in kafka. the cosumer LAG is big some times and
trigger kafka topic rebalance
You can add more instances of your application, to allow processing the
incoming data in parallel.
On 6 November 2017 at 20:11, Ranjit Kumar wrote:
> Hi,
>
> I am using kafka streaming and state store in my application using java but
> my application logic is taking more time (aro
Hi,
I am using kafka streaming and state store in my application using java but
my application logic is taking more time (around 5 ms) to pick up the next
packet from que, do to that data is pipe lining in topic que(mean waiting
time is increasing to pick the next packet from que).
Can you
Hi Kafka Community,
I would like to know if KAFKA can stream out the data including large
objects (with images and videos - traditionally known as BLOBs) from one
RDBMS system to other with better performance ?
Is it something KAFKA is good at ?
I understand that KAFKA can do it and would like t
chine having 32GB ram. This application is
> written in Java using Kafka-Streaming low level api. When I ran it for 72
> hours or more and load (messages) are keep on pumping to kafka-topic then
> entire memory i.e. 32 GB of aws machine gets exhausted. While I ran my java
> application w
Hi,
I ran my application on AWS machine having 32GB ram. This application is
written in Java using Kafka-Streaming low level api. When I ran it for 72
hours or more and load (messages) are keep on pumping to kafka-topic then
entire memory i.e. 32 GB of aws machine gets exhausted. While I ran my
Hi Sachin,
If you check kafka-run-class.bat you can see that when environment variable
KAFKA_LOG4J_OPTS is not provided, a default log4j configuration under
"tools" will be loaded. So setting the environment variable to something
like
"-Dlog4j.configurationFile=file:///D:/kafka_2.10-0.10.1.1/log4
Hi,
As suggested this is how I am starting my stream
D:\kafka_2.10-0.10.1.1>bin\windows\kafka-run-class.bat -Dlog4j.debug
-Dlog4j.configurationFile=file:///D:/kafka_2.10-0.10.1.1/log4js.properties
TestKafkaWindowStream
log4j: Using URL
[file:D:/kafka_2.10-0.10.1.1/config/tools-log4j.properties] fo
Matthias J. Sax <
> matth...@confluent.io>
> > > wrote:
> > >
> > > > You should check out Kafka Streams Metrics (for upcoming 0.10.2 they
> > are
> > > > even more detailed).
> > > >
> > > > There is not a lot of docum
gt; On Fri, Jan 27, 2017 at 6:48 AM, Matthias J. Sax
> > wrote:
> >
> > > You should check out Kafka Streams Metrics (for upcoming 0.10.2 they
> are
> > > even more detailed).
> > >
> > > There is not a lot of documentation for 0.10.0 or 0.10.1,
.
> >
> > There is not a lot of documentation for 0.10.0 or 0.10.1, but it work
> > the same way as for consumer/producer metric that are documented.
> >
> >
> > -Matthias
> >
> > On 1/24/17 10:38 PM, Sachin Mittal wrote:
> > > Hi All,
> > &
> the same way as for consumer/producer metric that are documented.
>
>
> -Matthias
>
> On 1/24/17 10:38 PM, Sachin Mittal wrote:
> > Hi All,
> > I am running a kafka streaming application with a simple pipeline of:
> > source topic -> group -> aggregate
All,
> I am running a kafka streaming application with a simple pipeline of:
> source topic -> group -> aggregate by key -> for each > save to a sink.
>
> I source topic gets message at rate of 5000 - 1 messages per second.
> During peak load we see the delay reaching to
Hi All,
I am running a kafka streaming application with a simple pipeline of:
source topic -> group -> aggregate by key -> for each > save to a sink.
I source topic gets message at rate of 5000 - 1 messages per second.
During peak load we see the delay reaching to 3 million mes
age.com/>
<https://www.facebook.com/myheritage>
<https://twitter.com/myheritage> <http://blog.myheritage.com/>
<https://www.youtube.com/user/MyHeritageLtd>
On Mon, Dec 26, 2016 at 8:07 AM, 황보동규 wrote:
> Hi there!
>
> I’m newbie on Kafka.
> I have an i
Hi there!
I’m newbie on Kafka.
I have an interest in streaming service, especially Kafka streaming. But I have
no Idea what’s the difference between Kafka streaming and samza.
Both has similiar architecture and functionality, I think.
What’s the main difference? What’s the pros and cons? It’s
ke it work -- but it is not recommended.
-Matthias
On 12/17/16 10:42 PM, Sachin Mittal wrote:
> Hi folks,
> I needed bit of feedback from you based on your experiences using kafka
> streaming application.
>
> We have a replicated kafka cluster running in a data center in one city
Hi folks,
I needed bit of feedback from you based on your experiences using kafka
streaming application.
We have a replicated kafka cluster running in a data center in one city.
We are running a kafka streaming application which reads from a source
topic from that cluster and commits the output
Hello Mohit,
I'm copy-pasting Mathieu's previous email on making Streams to work on
Windows, note it was for 0.10.0.1 but I think the process should be very
similar.
To any who follow in my footsteps, here is my trail:
1. Upgrade to at least Kafka Streams 0.10.0.1 (currently only in
I just cloned 3.1x and tried to run a test. I am still seeing rocksdb error:
Exception in thread "StreamThread-1" java.lang.UnsatisfiedLinkError:
C:\Users\manchlia\AppData\Local\Temp\librocksdbjni108789031344273.dll:
Can't find dependent libraries
On Mon, Oct 24, 2016 at 11:26 AM, Matthias J
ould share some of your code so we can have a look? One
>> thing I'd check is if you are using compacted Kafka topics. If so, and if
>> you have non-unique keys, compaction happens automatically and you might
>> only see the latest value for a key.
>>
>> Tha
t; Eno
> > On 18 Nov 2016, at 13:49, Ryan Slade wrote:
> >
> > Hi
> >
> > I'm trialling Kafka Streaming for a large stream processing job, however
> > I'm seeing message loss even in the simplest scenarios.
> >
> > I've tried to boil
2016, at 13:49, Ryan Slade wrote:
>
> Hi
>
> I'm trialling Kafka Streaming for a large stream processing job, however
> I'm seeing message loss even in the simplest scenarios.
>
> I've tried to boil it down to the simplest scenario where I see loss which
>
Hi
I'm trialling Kafka Streaming for a large stream processing job, however
I'm seeing message loss even in the simplest scenarios.
I've tried to boil it down to the simplest scenario where I see loss which
is the following:
1. Ingest messages from an input stream (String, St
can think off is that I can set some global variable in the
> output sink.
>
> So next time aggregate function is run it can lookup the global variable
> and remove items from the list.
> So new list = old list + new value added - old values removed.
>
> In spark we have
global variable in
> the output sink.
>
> So next time aggregate function is run it can lookup the global
> variable and remove items from the list. So new list = old list +
> new value added - old values removed.
>
> In spark we have something like broadcast variables to do th
value added - old values removed.
In spark we have something like broadcast variables to do the same.
Is there any such similar concept in kafka streaming.
This way we can keep the changelog topic message from growing and prevent
the max message bytes exception.
Thanks
Sachin
On Fri, Nov 11
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512
Sachin,
my commend about deleting was about deleting values from the list, not
about deleting the whole key/value record.
If you want to delete a whole key/value record it there is not update
for it for some time, you can combine compaction with re
Hi Sachin,
You can achieve what you want by setting the correct cleanup.policy on
these topics.
In this case you want cleanup.policy=compact,delete - you'll also want to
set retention.ms and/or retention.bytes.
The topic will then be compacted, but it will also delete any segments
based on the re
Hi,
As per Eno suggestion I have pre-created internal changelog topics with
increased max.message.bytes config to handle big messages that gets
incremented over the time.
As Matthias has pointed that we cannot use retention.ms setting to delete
older message data after a given time, is there a way
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512
My two cents:
Changelog topics are compacted topics, thus they do not have a
retention time (there is an exception for windowed KTable changlog
topics that are compacted and do have a retention time though).
However, I do not understand how changin
Hi Sachin,
One option right now would be to precreate all internal topics in Kafka, and
only after that start the Kafka Streams application. This would require you
knowing the internal name of the topics (in this case you probably already know
it, but I agree that in general this is a bit cumbe
Per message payload size. The basic question is how can I control the
internal change log topics parameters so as to avoid these errors.
On Tue, Nov 8, 2016 at 11:37 PM, R Krishna wrote:
> Are you talking about total messages and therefore size or per message
> payload size.
>
> On Tue, Nov 8,
Are you talking about total messages and therefore size or per message
payload size.
On Tue, Nov 8, 2016 at 10:00 AM, Sachin Mittal wrote:
> Message size itself increases over the time.
>
> Message is something like
> key=[list on objects]
>
> This increases with time and then at a point kafka i
Message size itself increases over the time.
Message is something like
key=[list on objects]
This increases with time and then at a point kafka is not able to add any
message to its topic because message size is greater than max.message.bytes.
Since this is an internal topic based off a table I d
Hi Sachin,
Could you clarify what you mean by "message size increases"? Are messages going
to the changelog topic increasing in size? Or is the changelog topic getting
full?
Thanks
Eno
> On 8 Nov 2016, at 16:49, Sachin Mittal wrote:
>
> Hi,
> We are using aggregation by key on a kstream to
Hi,
We are using aggregation by key on a kstream to create a ktable.
As I read from
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams%3A+Internal+Data+Management
it creates an internal changelog topic.
However over the time the streaming application is run message size
increases and
look further into it.
>
> Thanks Ara.
>
> On Mon, Oct 24, 2016 at 11:04 PM, Ara Ebrahimi <
> ara.ebrah...@argyledata.com> wrote:
>
>> This was in 10.1.0. What happened was that a kafka broker went down and
>> then this happened on the kafka streaming instance which
Logs would be very helpful, I can look further into it.
Thanks Ara.
On Mon, Oct 24, 2016 at 11:04 PM, Ara Ebrahimi
wrote:
> This was in 10.1.0. What happened was that a kafka broker went down and
> then this happened on the kafka streaming instance which had connection to
> this brok
This was in 10.1.0. What happened was that a kafka broker went down and then
this happened on the kafka streaming instance which had connection to this
broker. I can send you all logs I got.
Ara.
On Oct 24, 2016, at 10:41 PM, Guozhang Wang
mailto:wangg...@gmail.com>> wrote:
Hello Ara,
s.processor.internals.
> ProcessorStateManager.(ProcessorStateManager.java:98)
> at org.apache.kafka.streams.processor.internals.AbstractTask.(
> AbstractTask.java:69)
> ... 13 more
>
> Ara.
>
> On Oct 23, 2016, at 1:24 PM, Ara Ebrahimi mailto:ara.ebrah...@argyledata.com>&g
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512
It's a client issues... But CP 3.1 should be our in about 2 weeks...
Of course, you can use Kafka 0.10.1.0 for now. It was released last
week and does contain the fix.
- -Matthias
On 10/24/16 9:19 AM, Mohit Anchlia wrote:
> Would this be an issue i
Would this be an issue if I connect to a remote Kafka instance running on
the Linux box? Or is this a client issue. What's rockdb used for to keep
state?
On Mon, Oct 24, 2016 at 12:08 AM, Matthias J. Sax
wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
>
> Kafka 0.10.1.0 which was rele
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512
Kafka 0.10.1.0 which was release last week does contain the fix
already. The fix will be in CP 3.1 coming up soon!
(sorry that I did mix up versions in a previous email)
- -Matthias
On 10/23/16 12:10 PM, Mohit Anchlia wrote:
> So if I get it right
You can build librocksdbjni locally to fix it.
I did that in my case.
It is bit tricky and you need MS visual studio 15.
I suggest use the following link:
http://mail-archives.apache.org/mod_mbox/kafka-users/201608.mbox/%3CCAHoiPjweo-xSj3TiodcDVf4wNnnJ8u6PcwWDPF7LT5ps%2BxQ3eA%40mail.gmail.com%3E
.(ProcessorStateManager.java:98)
at
org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:69)
... 13 more
Ara.
On Oct 23, 2016, at 1:24 PM, Ara Ebrahimi
mailto:ara.ebrah...@argyledata.com>> wrote:
Hi,
This happens when I hammer our 5 kafka streaming nodes (each with 4 str
Hi,
This happens when I hammer our 5 kafka streaming nodes (each with 4 streaming
threads) hard enough for an hour or so:
2016-10-23 13:04:17 ERROR StreamThread:324 - stream-thread [StreamThread-2]
Failed to flush state for StreamTask 3_8
So if I get it right I will not have this fix until 4 months? Should I just
create my own example with the next version of Kafka?
On Sat, Oct 22, 2016 at 9:04 PM, Matthias J. Sax
wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
>
> Current version is 3.0.1
> CP 3.1 should be release th
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512
Current version is 3.0.1
CP 3.1 should be release the next weeks
So CP 3.2 should be there is about 4 month (Kafka follows a time base
release cycle of 4 month and CP usually aligns with Kafka releases)
- -Matthias
On 10/20/16 5:10 PM, Mohit Anch
Any idea of when 3.2 is coming?
On Thu, Oct 20, 2016 at 4:53 PM, Matthias J. Sax
wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
>
> No problem. Asking questions is the purpose of mailing lists. :)
>
> The issue will be fixed in next version of examples branch.
>
> Examples branch is
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512
I mixed up version numbers... Current CP version is 3.0.1 and not 3.1
Ie. 3.0.1 contains Kafka 0.10.0.1 and 3.1 will be released soon will
contain Kafka 0.10.1.0
examples master uses CP-3.1-SNAPSHOT.
Sorry for the confusion.
On 10/20/16 4:53 P
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512
No problem. Asking questions is the purpose of mailing lists. :)
The issue will be fixed in next version of examples branch.
Examples branch is build with CP dependency and not with Kafka
dependency. CP-3.2 is not available yet; only Kafka 0.10.1.0
So this issue I am seeing is fixed in the next version of example branch?
Can I change my pom to point it the higher version of Kafka if that is the
issue? Or do I need to wait until new branch is made available? Sorry lot
of questions :)
On Thu, Oct 20, 2016 at 3:56 PM, Matthias J. Sax
wrote:
>
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512
The branch is 0.10.0.1 and not 0.10.1.0 (sorry for so many zeros and
ones -- super easy to mix up)
However, examples master branch uses CP-3.1-SNAPSHOT (ie, Kafka
0.10.1.0) -- there will be a 0.10.1 examples branch, after CP-3.1 was
released
- -Ma
I just now cloned this repo. It seems to be using 10.1
https://github.com/confluentinc/examples and running examples in
https://github.com/confluentinc/examples/tree/kafka-0.10.0.1-cp-3.0.1/kafka-streams
On Thu, Oct 20, 2016 at 3:10 PM, Michael Noll wrote:
> I suspect you are running Kafka 0.10
I suspect you are running Kafka 0.10.0.x on Windows? If so, this is a
known issue that is fixed in Kafka 0.10.1 that was just released today.
Also: which examples are you referring to? And, to confirm: which git
branch / Kafka version / OS in case my guess above was wrong.
On Thursday, October
I am trying to run the examples from git. While running the wordcount
example I see this error:
Caused by: *java.lang.RuntimeException*: librocksdbjni-win64.dll was not
found inside JAR.
Am I expected to include this jar locally?
Haha, I feel the same pain with you man.
On Tue, Oct 11, 2016 at 8:59 PM, Ali Akhtar wrote:
> Thanks. That filter() method is a good solution. But whenever I look at it,
> I feel an empty spot in my heart which can only be filled by:
> filter(Optional::isPresent)
>
> On Wed, Oct 12, 2016 at 12:1
Thanks. That filter() method is a good solution. But whenever I look at it,
I feel an empty spot in my heart which can only be filled by:
filter(Optional::isPresent)
On Wed, Oct 12, 2016 at 12:15 AM, Guozhang Wang wrote:
> Ali,
>
> We are working on moving from Java7 to Java8 in Apache Kafka, an
Ali,
We are working on moving from Java7 to Java8 in Apache Kafka, and the
Streams client is one of the motivations doing so. Stay tuned on the
mailing list when it will come.
Currently Streams won't automatically filter out null values for you since
in some other cases they may have semantic mea
It isn't a fatal error. It should be logged as a warning, and then the
stream should be continued w/ the next message.
Checking for null is 'ok', in the sense that it gets the job done, but
after java 8's release, we really should be using optionals.
Hopefully we can break compatibility w/ the ba
Ali,
In your scenario, if serde fails to parse the bytes should that be treated
as a fatal failure or it is expected?
In the former case, instead of returning a null I think it is better to
throw a runtime exception in order to let the whole client to stop and
notify the error; in the latter case
Hey G,
Looks like the only difference is a valueSerde parameter.
How does that prevent having to look for nulls in the consumer?
E.g, I wrote a custom Serde which converts the messages (which are json
strings) into a Java class using Jackson.
If the json parse fails, it sends back a null.
When
Hello Ali,
We do have corresponding overloaded functions for most of KStream / KTable
operators to avoid enforcing users to specify "null"; in these cases the
default serdes specified in the configs are then used. For example:
KTable aggregate(Initializer initializer,
Ali, the Apache Kafka project still targets Java 7, which means we can't
use Java 8 features just yet.
FYI: There's on ongoing conversation about when Kafka would move from Java
7 to Java 8.
On Fri, Oct 7, 2016 at 6:14 PM, Ali Akhtar wrote:
> Since we're using Java 8 in most cases anyway, Serde
Since we're using Java 8 in most cases anyway, Serdes / Serialiazers should
use options, to avoid having to deal with the lovely nulls.
ng 0.10.1 release you can do regex subscription - will
> > that
> > > > help?
> > > >
> > > > Thanks,
> > > > Damian
> > > >
> > > > On Fri, 30 Sep 2016 at 14:57 Gary Ogden wrote:
> > > >
> > > > > Is it possible to use the topic filter whitelist within a Kafka
> > > Streaming
> > > > > application? Or can it only be done in a consumer job?
> > > > >
> > > >
> > >
> >
>
y,
> > >
> > > In the upcoming 0.10.1 release you can do regex subscription - will
> that
> > > help?
> > >
> > > Thanks,
> > > Damian
> > >
> > > On Fri, 30 Sep 2016 at 14:57 Gary Ogden wrote:
> > >
> > > > Is it possible to use the topic filter whitelist within a Kafka
> > Streaming
> > > > application? Or can it only be done in a consumer job?
> > > >
> > >
> >
>
0.10.1 release you can do regex subscription - will that
> > help?
> >
> > Thanks,
> > Damian
> >
> > On Fri, 30 Sep 2016 at 14:57 Gary Ogden wrote:
> >
> > > Is it possible to use the topic filter whitelist within a Kafka
> Streaming
> > > application? Or can it only be done in a consumer job?
> > >
> >
>
> > Is it possible to use the topic filter whitelist within a Kafka Streaming
> > application? Or can it only be done in a consumer job?
> >
>
Hi Gary,
In the upcoming 0.10.1 release you can do regex subscription - will that
help?
Thanks,
Damian
On Fri, 30 Sep 2016 at 14:57 Gary Ogden wrote:
> Is it possible to use the topic filter whitelist within a Kafka Streaming
> application? Or can it only be done in a consumer job?
>
Is it possible to use the topic filter whitelist within a Kafka Streaming
application? Or can it only be done in a consumer job?
Hello Farhon,
I think your idea about KStream-KTable join is a good approach with some
tweaks, more specifically:
1. Model your rider request as a normal record stream with the combo key of
(latitude, longitude).
2. Model your driver location as an ever-updating table with the combo key
of (lati
Quick reply only, since I am on my mobile. Not an exact answer to your
problem but still somewhat related:
http://www.infolace.com/blog/2016/07/14/simple-spatial-windowing-with-kafka-streams/
(perhaps you have seen this already).
-Michael
On Sun, Aug 28, 2016 at 4:55 AM, Farhon Zaharia
wrote:
Hello friends,
I am designing a new streaming component and am looking at how to use Kafka
Streams.
I need some guidance with the appropriate flow.
*Problem to solve:*
The problem I am working on is I have a large group of riders and drivers.
I would like to match available riders to nearby driv
an improvement for this -- should be available
> soon.
>
> See https://issues.apache.org/jira/browse/KAFKA-3185
>
>
> -Matthias
>
> On 07/20/2016 01:57 PM, Pariksheet Barapatre wrote:
> > Hi Experts,
> >
> > I have written Kafka Streaming app that just filter
/20/2016 01:57 PM, Pariksheet Barapatre wrote:
> Hi Experts,
>
> I have written Kafka Streaming app that just filters rows based on some
> condition and load it to MongoDB.
>
> The streaming process is working fine but due to some flaw in my code, I
> want to reprocess whole
Hi Experts,
I have written Kafka Streaming app that just filters rows based on some
condition and load it to MongoDB.
The streaming process is working fine but due to some flaw in my code, I
want to reprocess whole data again.
One way is to do this - kill streaming app , change consumer group
Many Thanks Eno. I will try .until method.
Cheers
Pari
On 13 June 2016 at 14:10, Eno Thereska wrote:
> Hi Pari,
>
> Try the .until method like this:
>
> > (TimeWindows) TimeWindows.of("tumbling-window-example",
> windowSizeMs).until(60 * 1000L)
>
> Thanks
> Eno
>
>
> > On 13 Jun 2016, at 08:31
Hi Pari,
Try the .until method like this:
> (TimeWindows) TimeWindows.of("tumbling-window-example",
> windowSizeMs).until(60 * 1000L)
Thanks
Eno
> On 13 Jun 2016, at 08:31, Pariksheet Barapatre wrote:
>
> Hello Experts,
>
> As per documentation in kafka docs -
> *Windowing* is a common pre
Hello Experts,
As per documentation in kafka docs -
*Windowing* is a common prerequisite for stateful transformations which
group records in a stream, for example, by their timestamps. A local state
store is usually needed for a windowing operation to store recently
received records based on the w
87 matches
Mail list logo