Hey,
I was wondering about the relation between the parallelism set by YARN in
Yarn properties file. Currently, as far as I know there is only one
execution of `writeYarnPropertiesFIle` method and it sets the parallelism
in the YARN properties to the number of workers * number of slots per
worker.
Hey,
I just want to understand something, because I am observing weird behavior
of Kafka Consumer > 0.8 .
So the idea is, if we enable the checkpointing and enable the commit
offsets on checkpoint, which AFAIK is enabled by default, then for versions
of Kafka > 0.8 we should see the changes in the
Hey,
I was wondering whether something has changed for KafkaConsumer, since I am
using Kafka 2.0.0 with Flink and I wanted to use group offsets but there
seems to be no change in the topic where Kafka stores it's offsets, after
restart Flink uses the `auto.offset.reset` so it seems that there is no
your setting falls in one of the two cases?
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
>
>
> On Wed, Sep 4, 2019 at 9:03 PM Dominik Wosiński wrote:
>
> > Hey,
> > I was wondering whether something has changed for KafkaConsumer, since I
> am
> >
Hello,
I have a slight doubt on checkpointing in Flink and wanted to clarify my
understanding. Flink uses barriers internally to keep track of the records
that were processed. The documentation[1] describes it as the checkpoint
was only happening when the barriers are transferred to the sink. So l
Okay, thanks for clarifying. I have some followup question here. If we
consider Kafka offsets commits, this basically means that
the offsets committed during the checkpoint are not necessarily the
offsets that were really processed by the pipeline and written to sink ? I
mean If there is a window i
Hey,
I have a question regarding CEP, assume I have a stream of readings from
various sensors. The application is running in EventTime, so according to
the CEP docs the events are buffered and sorted by timestamp ascending.
So, I want to record the situations when reading from the sensor goes abov
Hey all,
I was wondering whether for CEP the *AfterMatchSkipStrategy *is applied
during matching or if simply the results are removed after the match. The
question is the result of the experiments I was doing with CEP. Say I have
the readings from some sensor and I want to detect events over some
Hey Guys,
I have observed a weird behavior on using the Temporal Table Join and the
way it pushes the Watermark forward. Generally, I think the question is *When
is the Watermark pushed forward by the Temporal Table Join?*
The issue I have noticed is that Watermark seems to be pushed forward even
es then I can help to
> analyze and debug?
>
> Best,
> Kurt
>
>
> On Tue, Mar 17, 2020 at 9:53 PM Dominik Wosiński wrote:
>
> > Hey Guys,
> > I have observed a weird behavior on using the Temporal Table Join and the
> > way it pushes the Watermark forward.
Hey,
generally, that's what I thought more or less. I think I understand the
behavior itself, thanks for explaining it to me.
But what actually concerns me is the fact that this
*assignTimestampsAndWatermarks* is required if You will select this Long
field, which basically means that the type of s
Hey, thanks for the answer.
But if I add the *AfterMatchSkipStrategy* it simply seems to emit event by
event so in the case described above it does emit: [400], [500]
Shouldn't the *greedy* quantifier guarantee that this will be matched as
many times as possible thus creating [400, 500] ??
Thanks
P.S.
So now my pattern looks like this:
Pattern.begin[AccelVector](EventPatternName,
AfterMatchSkipStrategy.skipPastLastEvent())
.where(_.data() > Threshold)
.oneOrMore
.greedy
.consecutive()
.within(Time.minutes(1))
śr., 25 mar 2020 o 10:03 Dominik Wosiński napisał(a):
&g
Hello,
I have a question, since I am observing quite weird behavior. In the
documentation[1] the example of FlinkMiniCluster usage, shows that we can
expect the results to appear in the same order as they were injected to the
stream by use of *fromElements(). *I mean that Java version of the code i
gt;
>.setParallelism(2);
>
> This is not:
>
> env.fromElements(1L, 21L, 22L)
>.map(x -> x * 2)
>.setParallelism(2)
>
>.setParallelism(1);
>
> In other words, if you never reduce the parallelism your functions should
> be fine.
> If you h
Hey,
I have a question that I have not been able to find an answer for in the
docs nor in any other source. Suppose we have a business system and we are
using Elasticsearch sink, but not for the purpose of business case, but
rather for keeping info on the data that is flowing through the system. Th
Hey,
I have a question regarding Avro Types and schema evolution. According to
the docs the schema resolution is compatible with the Avro docs [1].
But I have done some testing. For example, I have created a record, written
it to Kafka, and then changed the order the fields in schema and tried to
new
> snapshot will be written entirely with the new updated avro schema.
>
> Hope this clarifies how the integration with Avro in Flink works.
>
> Best,
>
> Dawid
>
>
> [1]
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/stream/state/schema
Hey,
I wanted to ask if the *HadoopInputFormat* does currently support some
custom partitioning scheme ? Say I have 200 files in HDFS each having the
partitioning key in name, can we ATM use HadoopInputFormat to distribute
reading to multiple TaskManagers using the key ??
Best Regards,
Dom.
Hey,
I have a very specific use case. I have a history of records stored as
Parquet in S3. I would like to read and process them with Flink. The issue
is that the number of files is quite large ( >100k). If I provide the full
list of files to HadoopInputFormat that I am using it will fail with
AskT
t;>
>> `akka.ask.timeout` is the default RPC timeout, while some RPCs may
>> override
>> this timeout for their own purpose. e.g. the RPCs from web usually use
>> `web.timeout`.
>> Providing the detailed call stack of the AskTimeoutException may help to
>> identif
an take a look and try.[2]
>
> [1] https://issues.apache.org/jira/browse/FLINK-14722
> [2]
>
> https://github.com/JingsongLi/flink/commit/90c021ab8e7a175c6644c345e63383d828c415d7
>
> Best,
> Jingsong Lee
>
> On Tue, Nov 12, 2019 at 6:49 PM Dominik Wosiński wrote:
>
>
I have managed to locate the issue with timeout, changing `web.timeout` was
the solution. However, now I am getting the following error :
019-11-12 16:58:00,741 INFO
org.apache.parquet.hadoop.ParquetInputFormat - Total
input paths to process : 671
2019-11-12 16:58:04,878 INFO
org.
Hey,
I have a question since I have observed the following situation:
We have two streams A and B that will be read from Kafka. Let's say we
have a set of rules for processing that is stored in A and B is the stream
that we will process.
Since there is no guarantee that elements from A will be pr
Hello,
I wanted to ask whether the idea and the general concept that I have is
correct or if there is anything better in Flink to use.
Say, I that I have 3 different streams :
- currency_codes, which has the name of the currency. It has the
following fields (*currency_iso_code, tst, curren
-stable/dev/table/streaming/temporal_tables.html
> <
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/streaming/temporal_tables.html
> >
>
> > On 20 Jan 2020, at 17:02, Dominik Wosiński wrote:
> >
> > Hello,
> > I wanted to ask whether the idea and t
lease-1.9/dev/table/streaming/joins.html#event-time-temporal-joins
> >
>
> Piotrek
>
> > On 21 Jan 2020, at 11:25, Dominik Wosiński wrote:
> >
> > Hey,
> > I have considered the Temporal Table Joins, but as far as I know from the
> > docs, it is only curre
Hey,
But isn't the temporal table join generating output only on watermark ??? I
have found such info here:
https://stackoverflow.com/questions/54441594/how-to-use-flink-temporal-tables.
But since one of the tables will have data that changes very rarely, this
would mean that using a temporal table
+1, I agree that frequent releases are good for users.
Dominik
Wysłane z aplikacji Poczta dla Windows 10
Od: Stefan Richter
Wysłano: czwartek, 16 sierpnia 2018 10:57
Do: dev
Temat: Re: [DISCUSS] Release Flink 1.5.3
+1 sounds good.
Stefan
> Am 16.08.2018 um 09:55 schrieb Piotr Nowojski :
>
>
Imho,
Such tests should be off by default as they impose some requirements on the
people trying to build Flink from source. And one would have to install docker
only for the tests to pass. But of course they should be enabled for CI builds.
Best Regards,
Dominik.
Wysłane z aplikacji Poczta dla
Hey,
Do we have any list of current limitations of SQL Client available
somewhere or the only way is to go through JIRA issues?
For example:
I tried to make Group By Tumble Window and Inner Join in one query and it
seems that it is not possible currently and I was wondering whether it's
and issue
Hey,
I don't think we are currently supporting this, but it would be a good idea
to have Kafka sink with support for keys. I have worked on something
similar to SQL Client before it was created, but the keys in Kafka are
crucial to us and this is currently the limitation that keeps us from
switchin
36ca230d@%3Cdev.flink.apache.org%3E
> [2]
>
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y
>
> Am Di., 23. Okt. 2018 um 14:46 Uhr schrieb Dominik Wosiński <
> wos...@gmail.com>:
>
> > Hey,
> > I don't think we are currently supp
+1, Thanks for the proposal.
I guess this is a long-awaited change. This can vastly increase the
functionalities of the SQL Client as it will be possible to use complex
extensions like for example those provided by Apache Bahir[1].
Best Regards,
Dom.
[1]
https://github.com/apache/bahir-flink
s
Hey Boris,
We have developed something very similar for our needs, but we faced some
issues when running it in HA mode, it was mainly because of the fact that
Tensorflow uses native functions and this caused some issues when connected
with automatic job restarts.
As far as I remember, the issue w
Thanks Till for being the release manager!
Thanks Everyone and Great Job.
Best Regards,
Dom.
pt., 30 lis 2018 o 13:19 vino yang napisał(a):
> Thanks Till for your great work, also thanks to the whole community!
>
> Thanks, vino.
>
> Timo Walther 于2018年11月30日周五 下午7:42写道:
>
> > Thanks for being
>
> Hey,
>
I agree with Timo here that we should introduce labels that will improve
communication for PRs. IMHO this will show what percentage of PRs is really
stale and not just abandoned due to the misunderstanding or other
communication issues.
Best Regards,
Dom.
Hey,
Could You please format Your snippet? It is very hard to understand what is
going on in there.
Best Regards,
Dom.
śr., 16 sty 2019 o 13:05 Ramya Ramamurthy napisał(a):
> Hi
>
> I have a Flink 1.7 with Kafka 0.11 and ES 6.5 setup.
>
> I can see the Flink Kafka Consumer consuming messages, bu
Hey!
I also think that creating the separate branch for Blink in Flink repo is a
better idea than creating the fork as IMHO it will allow merging changes
more easily.
Best Regards,
Dom.
wt., 22 sty 2019 o 10:09 Ufuk Celebi napisał(a):
> Hey Stephan and others,
>
> thanks for the summary. I'm ve
Hey, Currently when trying to get the state inside the *open* method for
Flink RichAsyncFunction we will get *State is not supported in rich async
functions *exception.Why doesn't AsyncFunction support state ?
Thanks in advance,
Best Regards,
Dom.
Hey,
I am building jobs that use Typesafe Config under the hood. The configs
tend to grow big. I was wondering whether there is a possibility to use
WebUI to show the config that the job was run with, currently the only idea
is to log the config and check it inside the logs, but with dozens of job
Hey all,
I've seen several tasks with propositions for this both in Flink and
Bahir[1]. I haven't seen any discussion neither any work was actually done
to implement this (that I know of). So I would like to start a discussion
about whether this is something You think would be beneficial to have a
Hey,
I have a question about the ordering of the messages in the Temporal Table.
I can observe that for one of my jobs the order of input is correct but the
order of the output is not correct.
Say I have two streams that both have *id* field which will be used to join
and also for Kafka partitionin
Hey,
Is it currently possible to obtain the state that was created by SQL query
via the State Processor API? I am able to load the checkpoint via the State
Processor API, but I wasn't able to think of a way to access the internal
state of my JOIN Query.
Best Regards,
Dom.
join/stream/StreamingJoinOperator.java#L83
>
> Best
> Yun Tang
>
>
>
>
> From: Dominik Wosiński
> Sent: Tuesday, December 1, 2020 21:05
> To: dev
> Subject: State Processor API SQL State
>
> Hey,
> Is it currently possible to obtain the state that was create
Hey,
I was wondering if that's currently possible to use KeyedStream to create a
properly partitioned Table in Flink 1.11 ? I have a use case where I wanted
to first join two streams using Flink SQL and then process them via
*KeyedProcessFunction.* So I do something like:
implicit val env = Stream
Hey,
Thanks for the answer. That's what I've been observing but wanted to know
for sure.
Best Regards,
Dom.
Hey,
I have a question regarding DataStream created from multiple files in s3. I
have several files in AWS s3, say the path is s3://files/, and then there
are several folders for different days, so in the end the full paths look
like : s3://files/day=1/file.parquet, s3://files/day=2/file.parquet. I
Hey,
Thanks for the answer, as I've mentioned in the email the data range is
only 30 days, for the tests I've used the data from october so I basically
have timestamps starting at midningt of 1st october 2020 and finishing at
23:59 30 october 2020, so if I understand correctly this shouldn't cause
Hey Till,
You were obviously right, my bad here. My math was incorrect. The correct
reasoning is that indeed first 5 days of october will be added to the
window number 1 and the rest of days will end up in the second window.
Solved!
Thanks a lotte,
Best Regards,
Dom.
Hey all,
I think I've hit some weird issue in Flink TypeInformation generation. I
have the following code:
val stream: DataStream[Event] = ...
tableEnv.createTemporaryView("TableName",stream)
val table = tableEnv
.sqlQuery("SELECT id, timestamp, eventType from TableName")
tableEnvironment.toAppen
owing error:
*Invalid argument type at position 0. Data type RAW('org.test.OneEnum',
'...') expected but RAW('org.test.OneEnum', '...') passed.*
pon., 9 sie 2021 o 15:13 Dominik Wosiński napisał(a):
> Hey all,
>
> I think I've hit some weird
elp is to explicitly define the types:
> >
> > E.g. you can also use `DataTypes.of(TypeInformation)` both in
> > `ScalarFunction.getTypeInference` and
> > `StreamTableEnvironment.toDataStream()`.
> >
> > I hope this helps.
> >
> > Timo
> >
> > On 09.08.21 16:27, Dominik Wos
Hey,
I've stumbled across a weird behavior and was wondering whether this is
intentional for some reason or the result of a weird bug. So, basically
currently if we want to create *org.apache.flink.table.api.Schema *taht has
one of the types defined as *RAW (*AVRO enum in my case) it's probably not
54 matches
Mail list logo