Hi Juan,
Both, the local execution environment and the remote execution environment
run the same code to execute the program.
The implementation of the sortPartition operator was designed to scale to
data sizes that exceed the memory.
Internally, it serializes all records into byte arrays and sort
Hi Jun,
Thank you very much for your contribution.
I think a Bucketing File System Table Sink would be a great addition.
Our code contribution guidelines [1] recommend to discuss the design with
the community before opening a PR.
First of all, this ensures that the design is aligned with Flink's
at least 64 bytes.
> If we have 200,000,000 per day and the allowed lateness is
> set to 7 days:
> 200,000,000 * 64 * 7 = ~83GB
>
> *For the scenario above the window metadata is useless*.
> Is there a possibility to *keep using window API*, *set allowed lateness*
> and *not
Hi,
It depends.
There are many things that can be changed. A savepoint in Flink contains
only the state of the application and not the configuration of the system.
So an application can be migrated to another cluster that runs with a
different configuration.
There are some exceptions like the con
Btw. there is a set difference or minus operator in the Table API [1] that
might be helpful.
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/tableApi.html#set-operations
Am Fr., 20. Sept. 2019 um 15:30 Uhr schrieb Fabian Hueske :
> Hi Juan,
>
> Both,
rty(ProducerConfig.RETRIES_CONFIG, "3");
>
> ObjectMapper mapper = new ObjectMapper();
> DataStream sinkStreamMaliciousData = outStreamMalicious
> .map(new MapFunction,String>() {
> private static final long serialVersionUID = -6347120202L;
> @Override
> public St
Hi,
To expand on Dian's answer.
You should not add Flink's core libraries (APIs, core, runtime, etc.) to
your fat JAR. However, connector dependencies (like Kafka, Cassandra, etc.)
should be added.
If all your jobs require the same dependencies, you can also add JAR files
to the ./lib folder of y
Hi QiShu,
It might be that Flink's OrcInputFormat has a bug.
Can you open a Jira issue to report the problem?
In order to be able to fix this, we need as much information as possible.
It would be great if you could create a minimal example of an ORC file and
a program that reproduces the issue.
If
Hi,
AFAIK, Flink SQL Temporal table function joins are only supported as inner
equality joins.
An extension to left outer joins would be great, but is not on the
immediate roadmap AFAIK.
If you need the inverse, I'd recommend to implement the logic in a
DataStream program with a KeyedCoProcessFun
Hi,
It's not possible to create a temporal table function from SQL, but you can
define it in the config.yaml of the SQL client as described in the
documentation [1].
Best, Fabian
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sqlClient.html#temporal-tables
Am Di., 24.
atch the column, so it can read the fields.
>
> Thanks for your Help!
>
> Qi Shu
>
>
> 在 2019年9月24日,下午4:36,Fabian Hueske 写道:
>
> Hi QiShu,
>
> It might be that Flink's OrcInputFormat has a bug.
> Can you open a Jira issue to report the problem?
> In order
Hi Nishant,
To answer your questions:
1) yes, the SQL time-windowed join and the DataStream API Interval Join are
the same (with different implementations though)
2) DataStream Session-window joins are not directly supported in SQL. You
can play some tricks to make it work, but it wouldn't be eleg
Hi,
You enabled incremental checkpoints.
This means that parts of older checkpoints that did not change since the
last checkpoint are not removed because they are still referenced by the
incremental checkpoints.
Flink will automatically remove them once they are not needed anymore.
Are you sure t
Hi,
I don' think that the memory configuration is the issue.
The problem is the join query. The join does not have any temporal
boundaries.
Therefore, both tables are completely stored in memory and never released.
You can configure a memory eviction strategy via idle state retention [1]
but you
Hi,
State is always associated with a single task in Flink.
The state of a task cannot be accessed by other tasks of the same operator
or tasks of other operators.
This is true for every type of state, including broadcast state.
Best, Fabian
Am Di., 1. Okt. 2019 um 08:22 Uhr schrieb Navneeth Kr
Hi Oliwer,
I think you are right. There seems to be something going wrong.
Just to clarify, you are sure that the growing state size is caused by the
window operator?
>From your description I assume that the state size does not depend (solely)
on the number of distinct keys.
Otherwise, the state
Hi Bruce,
I haven't seen such an exception yet, but maybe Till (in CC) can help.
Best,
Fabian
Am Di., 1. Okt. 2019 um 05:51 Uhr schrieb Hanson, Bruce <
bruce.han...@here.com>:
> Hi all,
>
>
>
> We are running some of our Flink jobs with Job Manager High Availability.
> Occasionally we get a clu
Hi Vishwas,
First of all, 8 GB for 60 cores is not a lot.
You might not be able to utilize all cores when running Flink.
However, the memory usage depends on several things.
Assuming your are using Flink for stream processing, the type of the state
backend is important. If you use the FSStateBack
Hi,
the exception says: "Rowtime attributes must not be in the input rows of a
regular join. As a workaround you can cast the time attributes of input
tables to TIMESTAMP before.".
The problem is that your query first joins the two tables without a
temporal condition and then wants to do a window
Hi Michael,
One reason might be that S3's file listing command is only eventually
consistent.
It might take some time until the file appears and is listed.
Best, Fabian
Am Mi., 23. Okt. 2019 um 22:41 Uhr schrieb Nguyen, Michael <
michael.nguye...@t-mobile.com>:
> Hello all,
>
>
>
> I am running
Hi Komal,
Measuring latency is always a challenge. The problem here is that your
functions are chained, meaning that the result of a function is directly
passed on to the next function and only when the last function emits the
result, the first function is called with a new record.
This makes meas
Hi Fanbin,
One approach would be to ingest the field as a VARCHAR / String and
implement a Scalar UDF to convert it into a nested tuple.
The UDF could use the code of the flink-json module.
AFAIK, there is some work on the way to add built-in JSON functions.
Best, Fabian
Am Do., 24. Okt. 2019 u
Hi,
I did not understand what you are trying to achieve.
Which field of the input table do you want to write to the output table?
Flink SQL> insert into nestedSink select nested from nestedJsonStream;
[INFO] Submitting SQL update statement to the cluster...
[ERROR] Could not execute SQL statement
Hi Vinay,
Maybe Gordon (in CC) has an idea about this issue.
Best, Fabian
Am Do., 24. Okt. 2019 um 14:50 Uhr schrieb Vinay Patil <
vinay18.pa...@gmail.com>:
> Hi,
>
> Can someone pls help here , facing issues in Prod . I see the following
> ticket in unresolved state.
>
> https://issues.apache.
Hi Jakub,
I had a look at the changes of Flink 1.5 [1] and didn't find anything
obvious.
Something that might cause a different behavior is the new deployment and
process model (FLIP-6).
In Flink 1.5, there is a switch to disable it and use the previous
deployment mechanism.
You could try to disa
Hi Wojciech,
I posted an answer on StackOverflow.
Best, Fabian
Am Do., 24. Okt. 2019 um 13:03 Uhr schrieb Wojciech Indyk <
wojciechin...@gmail.com>:
> Hi!
> I use Flink 1.8.0 with Kafka 2.2.1. I need to guarantee of correct order
> of events by event timestamp. I generate periodic watermarks ev
Hi,
Dynamic tables might not be persisted at all but only when it is necessary
for the computation of a query.
For example a simple "SELECT * FROM t WHERE a = 1" query on an append only
table t does not require to persist t.
However, there are a bunch of operations that require to store some part
tream with various filters applied to it. I usually see
> around 6-7 of my datastreams successfully list the JSON file in my S3
> bucket upon cancelling my Flink job.
>
>
>
> Even in my situation, would this still be an issue with S3’s file listing
> command?
>
>
>
&g
Hi all,
Flink Forward North America returns to San Francisco on March 23-25, 2020.
For the first time in North America, the conference will feature two days
of talks and one day of training.
We are happy to announce that the Call for Presentations is open!
If you'd like to give a talk and share
Hi,
The inline lambda MapFunction produces a Row with 12 String fields (12
calls to String.join()).
You use RowTypeInfo rowTypeDNS to declare the return type of the lambda
MapFunction. However, rowTypeDNS is defined with much more String fields.
The exception tells you that the number of fields r
Hi Chris,
Your query looks OK to me.
Moreover, you should get a SQLParseException (or something similar) if it
wouldn't be valid SQL.
Hence, I assume you are running in a bug in one of the optimizer rules.
I tried to reproduce the problem on the SQL training environment and
couldn't write a query
you suggested,
> https://issues.apache.org/jira/browse/FLINK-15112
>
> Many thanks,
> Chris
>
>
> ------ Original Message --
> From: "Fabian Hueske"
> To: "Chris Miller"
> Cc: "user@flink.apache.org"
> Sent: 06/12/2019 14:52:16
> S
Congrats Zhu Zhu and welcome on board!
Best, Fabian
Am Fr., 13. Dez. 2019 um 17:51 Uhr schrieb Till Rohrmann <
trohrm...@apache.org>:
> Hi everyone,
>
> I'm very happy to announce that Zhu Zhu accepted the offer of the Flink PMC
> to become a committer of the Flink project.
>
> Zhu Zhu has been
Hi all,
First of all, Happy New Year to everyone!
Many of you probably didn't spent the holidays thinking a lot about Flink.
Now, however, is the right time to focus again and decide which talk(s) to
submit for Flink Forward San Francisco because the Call for Presentations
is closing this Sunday,
Hi everyone,
We know some of you only came back from holidays last week.
To give you more time to submit a talk, we decided to extend the Call for
Presentations for Flink Forward San Francisco 2020 until Sunday January
19th.
The conference takes place on March 23-25 with two days of talks and one
Hi,
Large state is mainly an issue for Flink's fault tolerance mechanism which
is based on periodic checkpoints, which means that the state is copied to a
remote storage system in regular intervals.
In case of a failure, the state copy needs to be loaded which takes more
time with growing state si
Hi Eleanore,
A dynamic filter like the one you need, is essentially a join operation.
There is two ways to do this:
* partitioning the key set and the message on the attribute. This would be
done with a KeyedCoProcessFunction.
* broadcasting the key set and just locally forwarding the messages. T
Hi,
Which version are you using?
I can't find the error message in the current code base.
When writing data to a JDBC database, all Flink types must be correctly
matched to a JDBC type.
The problem is probably that Flink cannot match the 8th field of your Row
to a JDBC type.
What's the type of th
Hi,
The exception is thrown by Postgres.
I'd start investigating there what the problem is.
Maybe you need to tweak your Postgres configuration, but it might also be
that the Flink connector needs to be differently configured.
If the necessary config option is missing, it would be good to add.
H
Hi everyone,
The registration for Flink Forward SF 2020 is open now!
Flink Forward San Francisco 2020 will take place from March 23rd to 25th.
The conference will start with one day of training and continue with two
days of keynotes and talks.
We would like to invite you to join the Apache Flink
Hi,
I think you are looking for BroadcastState [1].
Best, Fabian
[1]
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/broadcast_state.html
Am Fr., 17. Jan. 2020 um 14:50 Uhr schrieb Soheil Pourbafrani <
soheil.i...@gmail.com>:
> Hi,
>
> According to the processing logic,
Congrats team and a big thank you to the release managers!
Am Mi., 12. Feb. 2020 um 16:33 Uhr schrieb Timo Walther :
> Congratualations everyone! Great stuff :-)
>
> Regards,
> Timo
>
>
> On 12.02.20 16:05, Leonard Xu wrote:
> > Great news!
> > Thanks everyone involved !
> > Thanks Gary and Yu fo
Hi everyone,
We announced the program of Flink Forward San Francisco 2020.
The conference takes place at the Hyatt Regency in San Francisco from March
23rd to 25th.
On the first day we offer four training sessions [1]:
* Apache Flink Developer Training
* Apache Flink Runtime & Operations Training
Fr., 14. Feb. 2020 um 17:48 Uhr schrieb Fabian Hueske :
> Hi everyone,
>
> We announced the program of Flink Forward San Francisco 2020.
> The conference takes place at the Hyatt Regency in San Francisco from
> March 23rd to 25th.
>
> On the first day we offer four
Congrats Jingsong!
Cheers, Fabian
Am Fr., 21. Feb. 2020 um 17:49 Uhr schrieb Rong Rong :
> Congratulations Jingsong!!
>
> Cheers,
> Rong
>
> On Fri, Feb 21, 2020 at 8:45 AM Bowen Li wrote:
>
> > Congrats, Jingsong!
> >
> > On Fri, Feb 21, 2020 at 7:28 AM Till Rohrmann
> > wrote:
> >
> >> Congr
the actual timestamps of their input data. For me it was helpful to make
> this change in my Flink job: for late data output, include both processing
> time (DateTime.now()) along with the event time (original timestamp).
>
> On Mon, May 14, 2018 at 12:42 PM, Fabian Hueske wrote:
>
Hi Antonio,
Cascading window aggregations as done in your example is a good idea and is
preferable if the aggregation function is combinable, which is true for sum
(count can be done as sum of 1s).
Best, Fabian
2018-06-09 4:00 GMT+02:00 antonio saldivar :
> Hello
>
> Has anyone work this way? I
Hi Angelica,
The Flink cluster needs to provide a sufficient number of slots to process
the tasks of all submitted jobs.
Besides that there is no limit. However, if you run super many jobs, you
might need to tune a few configuration parameters.
Best, Fabian
2018-06-12 8:46 GMT+02:00 Sampath Bhat
ike the issue is caused by the fact
> that Memory states are large as it is throwing error states are larger than
> certain size. So solution of (1) will possibly solve (2) as well.
>
> Thanks again,
>
> Ashish
>
>
> On Jun 7, 2018, at 4:25 PM, Fabian Hueske wrote:
>
Hi everyone,
*Flink Forward Berlin 2018 will take place from September 3rd to 5th.*
The conference will start with one day of training and continue with two
days of keynotes and talks.
*The registration for Flink Forward Berlin 2018 is now open!*
A limited amount of early-bird passes is available
Hi,
At the moment (Flink 1.5.0), the operator UIDs depend on the overall
application and not only on the query.
Hence, changing the application by adding another query might change the
existing UIDs.
In general, you can only expect savepoint restarts to work if you don't
change the application an
Hi,
Which Flink version are you using?
Did you try to analyze the bottleneck of the application, i.e., is it CPU,
disk IO, or network bound?
Regarding the task scheduling. AFAIK, before version 1.5.0, Flink tried to
schedule tasks on the same machine to reduce the amount of network transfer.
Henc
Hi Johannes,
EventTimeSessionWindows [1] use the EventTimeTrigger [2] as default trigger
(see EventTimeSessionWindows.getDefaultTrigger()).
I would take the EventTimeTrigger and extend it with early firing
functionality.
However, there are a few things to consider
* you need to be aware that sess
fectly 1-2-4-8-16 because all happens in same TM. When
> scale to 32 the performance drop, not even in par with case of parallelism
> 16. Is this something expected? Thank you.
>
> Regards,
> Yow
>
> --
> *From:* Fabian Hueske
> *Sent:* Mon
h
> more TM in play.
>
> @Ovidiu question is interesting to know too. @Till do you mind to share
> your thoughts?
>
> Thank you guys!
>
> --
> *From:* Ovidiu-Cristian MARCU
> *Sent:* Monday, June 18, 2018 6:28 PM
> *To:* Fabian Hueske
>
intain an operator state inside a trigger.
> TriggerContext only allows to interact with state that is scoped to the
> window and the key of the current trigger invocation (as shown in
> Trigger#TriggerContext)
>
> Now I've come to a conclusion that it might not be possible using
approach wasn't driven by the requirements but by operational
> aspects (state size), so using a concept like idle state retention time
> would be a more natural fit.
>
> Thanks,
>
> Johannes
>
> On Mon, Jun 18, 2018 at 9:57 AM Fabian Hueske wrote:
>
>> Hi
Hi,
Which version are you using? We fixed a similar issue for Flink 1.5.0.
If you can't upgrade yet, you can also implement a user-defined function
that evaluates the big CASE WHEN statement.
Best, Fabian
2018-06-19 16:27 GMT+02:00 zhangminglei <18717838...@163.com>:
> Hi, friends.
>
> When I e
nt. Is it hard to implement ? I am a new to flink table api & sql.
>
> Best Minglei.
>
> 在 2018年6月19日,下午10:36,Fabian Hueske 写道:
>
> Hi,
>
> Which version are you using? We fixed a similar issue for Flink 1.5.0.
> If you can't upgrade yet, you can also implement
0, we'll
> see an incorrect value from a dashboard.
> This is the biggest concern of mine at this point.
>
> Best,
>
> - Dongwon
>
>
> On Tue, Jun 19, 2018 at 7:14 PM, Fabian Hueske wrote:
>
>> Hi Dongwon,
>>
>> Do you need to n
Hi Avihai,
Rafi pointed out the two common approaches to deal with this situation. Let
me expand a bit on those.
1) Transactional producing in to queues: There are two approaches to
accomplish exactly-once producing into queues, 1) using a system with
transactional support such as Kafka or 2) mai
Hi Vishal,
In general, Kryo serializers are not very upgrade friendly.
Serializer compatibility [1] might be right approach here, but Gordon (in
CC) might know more about this.
Best, Fabian
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/state/custom_serialization.html
Hi Dulce,
This functionality is not supported by the JDBCOutputFormat.
Some database systems (AFAIK, MySQL) support Upsert writes, i.e., writes
that insert if the primary key is not present or update the row if the PK
exists. Not sure if that would meet your requirements.
If you don't want to go
Hi Garrett,
I agree, there seems to be an issue and increasing the timeout should not
be the right approach to solve it.
Are you running streaming or batch jobs, i.e., do some of the tasks finish
much earlier than others?
I'm adding Till to this thread who's very familiar with scheduling and
proc
Hi Manuel,
I had a look and couldn't find a way to do it.
However, this sounds like a very useful feature to me.
Would you mind creating a Jira issue [1] for that?
Thanks, Fabian
[1] https://issues.apache.org/jira/projects/FLINK
2018-06-18 16:23 GMT+02:00 Haddadi Manuel :
> Hi all,
>
>
> I wo
re right. It is Trigger.clear(), not
> Trigger.onClose().
>
> Best,
> - Dongwon
>
>
> On Wed, Jun 20, 2018 at 7:30 PM, Chesnay Schepler
> wrote:
>
>> Checkpointing of metrics is a manual process.
>> The operator must write the current value into state, retrie
Hi,
if the list is static and not too large, you can pass it as a parameter to
the function.
Function objects are serialized (using Java's default serialization) and
shipped to the workers for execution.
If the data is dynamic, you might want to have a look at Broadcast state
[1].
Best, Fabian
state seems not supported in Flink-1.3 .
> I found this in Flink-1.3:
> Broadcasting
> DataStream → DataStream
>
> Broadcasts elements to every partition.
>
> dataStream.broadcast();
>
> But I don’t know how to convert it to list and get it in stream context .
>
> 在 20
Hi,
Although this solution looks straight-forward, custom triggers cannot be
added that easily.
The problem is that a window operator with a Trigger that emit early
results produces updates, i.e., results that have been emitted might be
updated later.
The default Trigger only emits the final resu
Hi Vinay,
This looks like a bug.
Would you mind creating a Jira ticket [1] for this issue?
Thank you very much,
Fabian
[1] https://issues.apache.org/jira/projects/FLINK
2018-06-21 9:25 GMT+02:00 Vinay Patil :
> Hi,
>
> I have deployed Flink 1.3.2 and enabled SSL settings. From the ssl debug
>
Great, thank you!
2018-06-22 10:16 GMT+02:00 Vinay Patil :
> Hi Fabian,
>
> Created a JIRA ticket : https://issues.apache.org/jira/browse/FLINK-9643
>
> Regards,
> Vinay Patil
>
>
> On Fri, Jun 22, 2018 at 1:25 PM Fabian Hueske wrote:
>
>> Hi Vinay,
>&
Hi,
I would not encode this information in watermarks. Watermarks are rather an
internal mechanism to reason about event-time.
Flink also generates watermarks internally. This makes the behavior less
predictive.
You could either inject special meta data records (which Flink handles just
like othe
Hi,
Flink distributes task instances to slots and does not expose physical
machines.
Records are partitioned to task instances by hash partitioning. It is also
not possible to guarantee that the records in two different operators are
send to the same slot.
Sharing information by side-passing it (e
Hi Vishal,
1. I don't think a rolling update is possible. Flink 1.5.0 changed the
process orchestration and how they communicate. IMO, the way to go is to
start a Flink 1.5.0 cluster, take a savepoint on the running job, start
from the savepoint on the new cluster and shut the old job down.
2. Sav
different
>> slots/threads on the same Task Manager instance(aka cam1 partition) using
>> keyBy(seq#) & setParallelism() ? Can *forward* Strategy be used to
>> achieve this ?
>>
>> TIA
>>
>>
>> On Mon, Jun 25, 2018 at 1:03 AM Fabian Hueske wrote:
Hi,
Measuring latency is tricky and you have to be careful about what you
measure.
Aggregations like window operators make things even more difficult because
you need to decide which timestamp(s) to forward (smallest?, largest?, all?)
Depending on the operation, the measurement code might even add
Hi Sagar,
That's more a question for the ORC community, but AFAIK, the top-level type
is always a struct because it needs to wrap the fields, e.g.,
struct(name:string, age:int)
Best, Fabian
2018-06-26 22:38 GMT+02:00 sagar loke :
> @zhangminglei,
>
> Question about the schema for ORC format:
>
Hi,
You can just add a cast to StateBackend to get rid of the deprecation
warning:
env.setStateBackend((StateBackend) new
FsStateBackend("hdfs://myhdfsmachine:9000/flink/checkpoints"));
Best, Fabian
2018-06-27 5:47 GMT+02:00 Rong Rong :
> Hmm.
>
> If you have a wrapper function like this, i
Hi Elias,
Till (in CC) is familiar with Flink's HA implementation.
He might be able to answer your question.
Thanks,
Fabian
2018-06-25 23:24 GMT+02:00 Elias Levy :
> I noticed in one of our cluster that they are relatively old
> submittedJobGraph* and completedCheckpoint* files. I was wonderin
Hi,
The OVER window operator can only emit result when the watermark is
advanced, due to SQL semantics which define that all records with the same
timestamp need to be processed together.
Can you check if the watermarks make sufficient progress?
Btw. did you observe state size or IO issues? The O
a full day's worth of data is
>>>> loaded into the system before the watermark advances. At that point the
>>>> checkpoints stall indefinitely with a couple of the tasks in the 'over'
>>>> operator never acknowledging. Any thoughts on what wou
Hi Osh,
You can certainly apply multiple reduce function on a DataSet, however, you
should make sure that the data is only partitioned and sorted once.
Moreover, you would end up with multiple data sets that you need to join
afterwards.
I think the easier approach is to wrap your functions in a s
ipeline.
>
> I guess I might have to use a ThreadPool within each Slot(cam partition)
> to work on each seq# ??
>
> TIA
>
> On Tue, Jun 26, 2018 at 1:06 AM Fabian Hueske wrote:
>
>> Hi,
>>
>> keyBy() does not work hierarchically. Each keyBy() overrides the prev
t;> have physical partitioning in a way where physical partiotioning happens
>> first by parent key and localize grouping by child key, is there a need to
>> using custom partitioner? Obviously we can keyBy twice but was wondering if
>> we can minimize the re-partition stress.
>
Hi,
Let me summarize:
1) Sometimes you get the error message
"org.apache.flink.client.program.ProgramMissingJobException: The program
didn't contain a Flink job.". when submitting a program through the
YarnClusterClient
2) The logs and the dashboard state that the job ran successful
3) The job pe
Hi Mich,
FlinkKafkaConsumer09 is the connector for Kafka 0.9.x.
Have you tried to use FlinkKafkaConsumer011 instead of FlinkKafkaConsumer09?
Best, Fabian
2018-07-02 22:57 GMT+02:00 Mich Talebzadeh :
> This is becoming very tedious.
>
> As suggested I changed the kafka dependency from
>
> ibra
Hi Will,
The community is currently working on improving the Kafka Avro integration
for Flink SQL.
There's a PR [1]. If you like, you could try it out and give some feedback.
Timo (in CC) has been working Kafka Avro and should be able to help with
any specific questions.
Best, Fabian
[1] https:
Hi,
The docs explain that the ExternalCatalog interface *can* be used to
implement a catalog for HCatalog or Metastore.
However, there is no such implementation in Flink yet. You would need to
implement such as catalog connector yourself.
I think there would be quite a few people interested in su
Hi Xilang,
Let me try to summarize your requirements.
If I understood you correctly, you are not only concerned about the
exactly-once guarantees but also need a consistent view of the data.
The data in all files that are finalized need to originate from a prefix of
the stream, i.e., all records w
Hi,
In addition to what Rong said:
- The types look OK.
- You can also use Types.STRING, and Types.LONG instead of BasicTypeInfo.xxx
- Beware that in the failure case, you might have multiple entries in the
database table. Some databases support an upsert syntax which (together
with key or unique
d example source code that does that.
> Thanks again,
> Chris
>
>
> On Tue, Jul 3, 2018 at 5:24 AM, Fabian Hueske wrote:
>
>> Hi,
>>
>> In addition to what Rong said:
>>
>> - The types look OK.
>> - You can also use Types.STRING, and Types.LON
There is also the SQL:2003 MERGE statement that can be used to implement
UPSERT logic.
It is a bit verbose but supported by Derby [1].
Best, Fabian
[1] https://issues.apache.org/jira/browse/DERBY-3155
2018-07-04 10:10 GMT+02:00 Fabian Hueske :
> Hi Chris,
>
> MySQL (and maybe othe
Looking at the other threads, I assume you solved this issue.
The problem should have been that FlinkKafka09Consumer is not included in
the flink-connector-kafka-0.11 module, because it is the connector for
Kafka 0.9 and not Kafka 0.11.
Best, Fabian
2018-07-02 11:20 GMT+02:00 Mich Talebzadeh :
2018-07-02 15:37 GMT+02:00 ashish pok :
> Thanks Fabian! It sounds like KeyGroup will do the trick if that can be
> made publicly accessible.
>
> On Monday, July 2, 2018, 5:43:33 AM EDT, Fabian Hueske
> wrote:
>
>
> Hi Ashish, hi Vijay,
>
> Flink does not disting
Hi Elias,
I agree, the docs lack a coherent discussion of event time features.
Thank you for this write up!
I just skimmed your document and will provide more detailed feedback later.
It would be great to add such a page to the documentation.
Best, Fabian
2018-07-03 3:07 GMT+02:00 Elias Levy :
Hi,
The Evictor is useful if you want to remove some elements from the window
state but not all.
This also implies that a window is evaluated multiple times because
otherwise you could just filter in the the user function (as you suggested)
and purge the whole window afterwards.
Evictors are commo
Hi Amol,
The memory consumption depends on the query/operation that you are doing.
Time-based operations like group-window-aggregations,
over-window-aggregations, or window-joins can automatically clean up their
state once data is not no longer needed.
Operations such as non-windowed aggregations
Hi Jungtaek,
Flink 1.5.0 features a TableSource for Kafka and JSON [1], incl. timestamp
& watemark generation [2].
It would be great if you could let us know, if that addresses your use case
and if not what's missing or not working.
So far Table API / SQL does not have support for late-data side
ording to above conversation flink will persist state forever for non
> windowed operations. I want to know how flink persiat the state i.e.
> Database or file system or in memory etc.
>
> On Wed, 4 Jul 2018 at 2:12 PM, Fabian Hueske wrote:
>
>> Hi Amol,
>>
>> The memory
Hi Ahmad,
Some tricks that might help to bring down the effort per tenant if you run
one job per tenant (or key per tenant):
- Pre-aggregate records in a 5 minute Tumbling window. However,
pre-aggregation does not work for FoldFunctions.
- Implement the window as a custom ProcessFunction that mai
301 - 400 of 1728 matches
Mail list logo