My understanding is the FlinkKafkaConsumer is a wrapper around the
Kafka consumer libraries so if you've set the group.id property you
should be able to see the offsets with something like
kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe
--group my-flink-application.
On Tue, D
Based on these docs,
https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/connectors/kafka.html,
the default partitioning behavior is not quite clear to me.
If no value for sink-partitioner is given, is the default behavior
just that of the native Kafka library? (with key use murm
I've upgraded from 1.11.1 to 1.12 in hopes of using the key.fields
feature of the Kafa SQL Connector. My current connector is configured
as ,
connector.type= 'kafka'
connector.version = 'universal'
connector.topic = 'my-topic'
connector.properties.group.id = 'my-consumer-group'
connector.pro
https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/kafka.html#dependencies
>
> Best,
> Piotrek
>
> śr., 6 sty 2021 o 04:37 Aeden Jameson napisał(a):
>>
>> I've upgraded from 1.11.1 to 1.12 in hopes of using the key.fields
>> feature of the K
and
> `key.fields` is only available there. Please check all properties again
> when upgrading, they are mentioned here [2].
>
> Regards,
> Timo
>
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-122%3A+New+Connector+Property+Keys+for+New+Factory
> [2] htt
n the new design. The Avro
> schema is derived entirely from the table's schema.
>
> Regards,
> Timo
>
>
>
> On 07.01.21 09:41, Aeden Jameson wrote:
> > Hi Timo,
> >
> > Thanks for responding. You're right. So I did update the properties.
> &g
When using statement sets, if two select queries use the same table
(e.g. Kafka Topic), does each query get its own copy of data?
Thank you,
Aeden
This may help you out.
https://github.com/apache/flink/blob/master/flink-libraries/flink-cep/src/test/java/org/apache/flink/cep/CEPITCase.java
On Sun, Jan 17, 2021 at 10:32 AM narasimha wrote:
>
> Hi,
>
> I'm using Flink CEP, but couldn't find any examples for writing test cases
> for the stream
Hi
How does one specify computed columns when converting a DataStream
to a temporary view? For example
final DataStream stream = env.addSource(..);
tEnv.createTemporaryView(
"myTable"
stream
,$("col1")
,$("col2")
In my job graph viewed through the Flink UI I see a task named,
rowtime field: (#11: event_time TIME ATTRIBUTE(ROWTIME))
that has an upstream Kafka source task. What exactly does the rowtime task do?
--
Thank you,
Aeden
I have a job made up of a few FlinkSQL statements using a
statement set. In my job graph viewed through the Flink UI a few of
the tasks/statements are preceded by this task
rowtime field: (#11: event_time TIME ATTRIBUTE(ROWTIME))
that has an upstream Kafka source/sink task.
Occasionally,
Do all operators have the same parallelism?
>
> Regards,
> Timo
>
>
> On 25.02.21 00:49, Aeden Jameson wrote:
> > I have a job made up of a few FlinkSQL statements using a
> > statement set. In my job graph viewed through the Flink UI a few of
> > the task
I'm hoping to have my confusion clarified regarding the settings,
1. pipeline.auto-watermark-interval
https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/api/common/ExecutionConfig.html#setAutoWatermarkInterval-long-
2. setAutoWatermarkInterval
https://ci.apache.org/p
Correction: The first link was supposed to be,
1. pipeline.auto-watermark-interval
https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/config.html#pipeline-auto-watermark-interval
On Wed, Mar 3, 2021 at 7:46 PM Aeden Jameson wrote:
>
> I'm hoping to have m
I have a cluster with 18 task managers 4 task slots each running a
job whose source/sink(s) are declared with FlinkSQL using the Kafka
connector. The topic being read has 36 partitions. The problem I'm
observing is that the subtasks for the sources are not evenly
distributed. For example, 1 tas
managers 2 slots. In that way, the
> scheduler can only evenly distribute.
>
> On Wed, Mar 10, 2021 at 7:21 PM Aeden Jameson wrote:
>>
>> I have a cluster with 18 task managers 4 task slots each running a
>> job whose source/sink(s) are declared with FlinkSQL using the Ka
/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/SlotManagerConfiguration.java#L141
>
> On Fri, Mar 12, 2021 at 12:58 AM Aeden Jameson
> wrote:
>>
>> Hi Arvid,
>>
>> Thanks for responding. I did check the configuration tab of the job
>> manager and the setting cluster
g Song
>
>
>
> On Mon, Mar 15, 2021 at 3:54 AM Chesnay Schepler wrote:
>>
>> Is this a brand-new job, with the cluster having all 18 TMs at the time
>> of submission? (or did you add more TMs while the job was running)
>>
>> On 3/12/2021 5:47 PM, Aeden
>
> If all the tasks have the same parallelism 36, your job should only allocate
> 36 slots. The evenly-spread-out-slots option should help in your case.
>
> Is it possible for you to share the complete jobmanager logs?
>
>
> Thank you~
>
> Xintong Song
>
>
>
I'm trying to get my head around the impact of setting max parallelism.
* Does max parallelism primarily serve as a reservation for future
increases to parallelism? The reservation being the ability to restore
from checkpoints and savepoints after increases to parallelism.
* Does it serve as a ru
It's my understanding that a group by is also a key by under the hood.
As a result that will cause a shuffle operation to happen. Our source
is a Kafka topic that is keyed so that any give partition contains all
the data that is needed for any given consuming TM. Is there a way
using FlinkSQL to el
I've probably overlooked something simple, but when converting a
datastream to a table how does one convert a long to timestamp(3) that
will not be your event or proc time.
I've tried
tEnv.createTemporaryView(
"myTable"
,myDatastream
,
One way to deal with this is by using the source idleness setting.
https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/datastream/event-time/generating_watermarks/#dealing-with-idle-sources
On Sat, Jun 19, 2021 at 12:06 PM Dan Hill wrote:
> Hi.
>
> I remember listening to Flink For
Hi Li,
How big is your keyspace? Had a similar problem which turns out to
be scenario 2 in this issue
https://issues.apache.org/jira/browse/FLINK-19970. Looks like the bug
in scenario 1 got fixed by scenario 2 did not. There's more detail in
this thread,
http://deprecated-apache-flink-user-ma
't know it is not fixed yet.
> thank you again and do you have any solutions ?
>
> On 2021/07/07 01:47:00, Aeden Jameson wrote:
> > Hi Li,
> >
> >How big is your keyspace? Had a similar problem which turns out to
> > be scenario 2 in this issue
> > ht
This can happen if you have an idle partition. Are all partitions
receiving data consistently?
On Tue, Jul 13, 2021 at 2:59 PM Jerome Li wrote:
>
> Hi,
>
>
>
> I got question about Flink Kafka consumer. I am facing the issue that the
> Kafka consumer somehow stop consuming data from Kafka after
state and not new watermark
> generated from there and then it stunk all the downsteams and stop
> consuming data from Kafka? I didn’t use watermark in my application through.
>
>
>
> I checked that all the Kafka partition has data consistently coming in.
>
>
>
> Bes
My use case is that I'm producing a set of measurements by key every
60 seconds. Currently, this is handled with the usual pattern of
keyBy().window(Tumbling...(60)).process(...) I need to provide the same
output, but as a result of a timeout. The data needed for the timeout
summary will be in the
Flink Version: 1.13.2
In the section on Default Triggers of Window Assigners the documentation
States
By specifying a trigger using trigger() you are overwriting the default
trigger of a WindowAssigner. For example, if you specify a CountTrigger for
TumblingEventTimeWindows you will no longer get
ratorBuilder will change when you call that method, and thus
> the default trigger will be overwritten by calling WindowedStream#trigger.
>
> Aeden Jameson 于2021年9月1日周三 上午12:32写道:
>
>> Flink Version: 1.13.2
>>
>> In the section on Default Triggers of Window Assign
When using tumbling windows the windows materialize all at once which
results in burst-y traffic. How does one go about unaligned tumbling
windows? Does this require going down the road of custom window, assigner
and triggers?
--
Cheers,
Aeden
reaming windows?
>
> Also the only type of window capable of emitting results for different
> parallelisms at different times is the session window [1]. Does that meet
> your needs?
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/windows/#
On Thu, Jan 20, 2022 at 2:46 AM yidan zhao wrote:
> self-define the window assigners.
>
Thanks, I'll check that out. If you have links to especially good examples
and explanations, that would be great. Otherwise, I presume the Flink
codebase itself is the place to start.
--
Cheers,
Aeden
I believe you can solve this iss with,
.setDeserializer(KafkaRecordDeserializationSchema.of(new
JSONKeyValueDeserializationSchema(false)))
On Thu, Mar 3, 2022 at 8:07 AM Kamil ty wrote:
>
> Hello,
>
> Sorry for the late reply. I have checked the issue and it seems to be a type
> issue as the e
I've had success using Kafka for Junit,
https://github.com/mguenther/kafka-junit, for these kinds of tests.
On Wed, Apr 20, 2022 at 3:01 PM Alexey Trenikhun wrote:
>
> Hello,
> We have Flink job that read data from multiple Kafka topics, transforms data
> and write in output Kafka topics. We wan
We're using S3 to store checkpoints. They are taken every minute. I'm
seeing a large number of 404 responses from S3 being generated by the
job manager. The order of the entries in the debugging log would imply
that it's a result of a HEAD request to a key. For example all the
incidents look like t
I have checkpoints setup against s3 using the hadoop plugin. (I'll
migrate to presto at some point) I've setup entropy injection per the
documentation with
state.checkpoints.dir: s3://my-bucket/_entropy_/my-job/checkpoints
s3.entropy.key: _entropy_
I'm seeing some behavior that I don't quite unde
in the
>> metadata file and you don't see any entropy injection happening. See the
>> comments on [2] for more on this.
>>
>> FWIW, I would urge you to use presto instead of hadoop for checkpointing on
>> S3. The performance of the hadoop "filesystem" i
ause that's all Flink
> needs to checkpointing, this works much better.
>
> Best,
> David
>
> On Thu, May 12, 2022 at 1:53 AM Aeden Jameson wrote:
>>
>> We're using S3 to store checkpoints. They are taken every minute. I'm
>> seeing a large numb
the parallelism, and the checkpointing
> interval.
>
> On Thu, May 19, 2022 at 8:04 PM Aeden Jameson wrote:
>>
>> Thanks for the response David. That's the conclusion I came to as
>> well. The Hadoop plugin behavior doesn't appear to reflect more
>> re
Depending on the kind of testing you're hoping to do you may want to
look into https://github.com/mguenther/kafka-junit. For example,
you're looking for some job level smoke tests that just answer the
question "Is everything wired up correctly?" Personally, I like how
this approach doesn't require
41 matches
Mail list logo