in SQL might be tricky (as the semantic of SQL query is not for
> multiple outputs).
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> 2018년 7월 4일 (수) 오후 5:49, Fabian Hueske 님이 작성:
>
>> Hi Jungtaek,
>>
>> Flink 1.5.0 features a TableSource for Kafka and JSON [1], incl.
>
Hi Xilang,
I thought about this again.
The bucketing sink would need to roll on event-time intervals (similar to
the current processing time rolling) which are triggered by watermarks in
order to support consistency.
However, it would also need to maintain a write ahead log of all received
rows an
ngAwareExistingField("eventTime"),
> new BoundedOutOfOrderTimestamps(Time.minutes(1).toMilliseconds)
> )
> .build()
>
> Thanks again!
> Jungtaek Lim (HeartSaVioR)
>
> 2018년 7월 4일 (수) 오후 8:18, Fabian Hueske 님이 작성:
>
>> Hi Jungtaek,
>>
>> I
Hi Yersinia,
The main idea of an event-driven application is to hold the state (i.e.,
the account data) in the streaming application and not in an external
database like Couchbase.
This design is very scalable (state is partitioned) and avoids look-ups
from the external database because all state
e.getSchema.getColumnNames,
> outTable.getSchema.getTypes)
> tableEnv.toRetractStream[Row](outTable).print()
>
>
> Thanks again,
> Jungtaek Lim (HeartSaVioR)
>
> [1] https://issues.apache.org/jira/browse/FLINK-9742
>
> 2018년 7월 4일 (수) 오후 10:03, Fabian Hueske 님이 작성:
>
Hi,
> Flink doesn't support connecting multiple streams with heterogeneous
schema
This is not correct.
Flink is very well able to connect streams with different schema. However,
you cannot union two streams with different schema.
In order to reconfigure an operator with changing rules, you can us
Hi Elias,
Thanks for the great document!
I made a pass over it and left a few comments.
I think we should definitely add this to the documentation.
Thanks,
Fabian
2018-07-04 10:30 GMT+02:00 Fabian Hueske :
> Hi Elias,
>
> I agree, the docs lack a coherent discussion of event time
Hi Yennie,
You might want to have a look at the OVER windows of Flink's Table API or
SQL [1].
An OVER window computes an aggregate (such as a count) for each incoming
record over a range of previous events.
For example the query:
SELECT ip, successful, COUNT(*) OVER (PARTITION BY ip, successful
a problem cause you have two separate events).
>>>
>>> Moreover, a second PoC I was considering is related to Flink CEP. Let's
>>> say I am elaborating sensor data, I want to have a rule which is working on
>>> the following principle:
>>> - If the
Hi,
I don't think that is possible.
The Evictor interface does not provide access to a state store, so there is
no way to access state.
Best, Fabian
2018-07-10 13:26 GMT+02:00 Jayant Ameta :
> Hi,
> I'm using the GlobalWindow with a custom CountTrigger (similar to the
> CountTrigger provided by
Hi Gerard,
Thanks for reporting this issue. I'm pulling in Nico and Piotr who have
been working on the networking stack lately and might have some ideas
regarding your issue.
Best, Fabian
2018-07-13 13:00 GMT+02:00 Zhijiang(wangzhijiang999) <
wangzhijiang...@aliyun.com>:
> Hi Gerard,
>
> I thou
Hi Alexei,
Till (in CC) is familiar with Flink's Mesos support in 1.4.x.
Best, Fabian
2018-07-13 15:07 GMT+02:00 NEKRASSOV, ALEXEI :
> Can someone please clarify how Flink on Mesos in containerized?
>
>
>
> On 5-node Mesos cluster I started Flink (1.4.2) with two Task Managers.
> Mesos shows “f
Hi everyone,
I'd like to announce the program for Flink Forward Berlin 2018.
The program committee [1] assembled a program of about 50 talks on use
cases, operations, ecosystem, tech deep dive, and research topics.
The conference will host speakers from Airbnb, Amazon, Google, ING, Lyft,
Microsof
Hi Shay,
This sounds very much like the off-by-one bug described by FLINK-9857 [1].
The problem was identified in another recent user ml thread and fixed for
Flink 1.5.2 and 1.6.0.
Best, Fabian
[1] https://issues.apache.org/jira/browse/FLINK-9857
2018-07-18 19:00 GMT+02:00 Andrey Zagrebin :
>
Hi Soheil,
Hequn is right. This might be an issue with advancing event-time.
You can monitor that by checking the watermarks in the web dashboard or
print-debug it with a ProcessFunction which can lookup the current
watermark.
Best, Fabian
2018-07-19 3:30 GMT+02:00 Hequn Cheng :
> Hi Soheil,
>
Hi Chirag,
Stop with savepoint is not mentioned in the 1.5.0 release notes [1].
Since its a frequently requested feature, I'm pretty sure that it would
have been mentioned if it was added.
Best, Fabian
[1] http://flink.apache.org/news/2018/05/25/release-1.5.0.html
2018-07-19 8:39 GMT+02:00 vin
ument.
>>> I think you did not to enable access for comments for the link. Would
>>> you mind enabling comments for the google doc?
>>>
>>> Thanks,
>>> Rong
>>>
>>>
>>> On Thu, Jul 5, 2018 at 8:39 AM Fabian Hueske wrote:
>&g
Hi Nick,
What Ken said is correct, but let me add two more things.
1) State
Usually, you only need to partition (keyBy()) the data if you want to
process tuples with the same same key together.
Therefore, it is necessary to hold some tuples or intermediate results
(like partial or running aggrega
HI James,
Yes, that should also do the trick.
Best, Fabian
2018-07-19 16:06 GMT+02:00 Porritt, James :
> It looks like the following gives me the result I’m interested in:
>
>
>
> batchEnv
>
> .createInput(dataset)
>
> .groupBy("id")
>
> .sortGrou
Hi,
Flink guarantees order only within a partition. For example, if you have
the program map_1 -> map_2 and both map functions run with parallelism 4,
the order of records in each of the 4 partitions is not changed..
In case of a shuffle (such as a keyBy or change in parallelism) records are
shipp
Hi Henkka,
You might want to consider implementing a dedicated job for state
bootstrapping that uses the same operator UUID and state names. That might
be easier than integrating the logic into your regular job.
I think you have to use the monitoring file source because AFAIK it won't
be possible
disabling
> checkpointing).
>
> I can see the point of making the checkpoint triggering more flexible and
> giving some control to the user. In contrast to savepoints, checkpoints are
> considered for recovery. My question here would be, what would be the
> triggering condition in
Hi Darshan,
The join implementation in SQL / Table API does what is demanded by the SQL
semantics.
Hence, what results to emit and also what data to store (state) to compute
these results is pretty much given.
You can think of the semantics of the join as writing both streams into a
relational DBM
Hi,
The problem is that Flink tracks which files it has read by remembering the
modification time of the file that was added (or modified) last.
We use the modification time, to avoid that we have to remember the names
of all files that were ever consumed, which would be expensive to check and
sto
Hi,
Thanks for creating the Jira issue.
I'm not sure if I would consider this a blocker but it is certainly an
important problem to fix.
Anyway, in the original version Flink checkpoints the modification
timestamp up to which all files have been read (or at least up to which
point it *thinks* to
Hi,
First of all, the ticket reports a bug (or improvement or feature
suggestion) such that others are aware of the problem and understand its
cause.
At some point it might be picked up and implemented. In general, there is
no guarantee whether or when this happens, but the Flink community is of
Hi,
Flink processes streams record by record, instead of micro-batching records
together. Since every record comes by itself, there is no for-each.
Simple record-by-record transformations can be done with a MapFunction,
filtering out records with a FilterFunction. You can also implement a
FlatMapF
p://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author wi
Hi Chang,
The state handle objects are not created per key but just once per function
instance.
Instead they route state accesses to the backend (JVM heap or RocksDB) for
the currently active key.
Best, Fabian
2018-07-30 12:19 GMT+02:00 Chang Liu :
> Hi Andrey,
>
> Thanks for your reply. My que
Hi,
Watermarks are not holding back records. Instead they define the event-time
at an operator (as Vino said) and can trigger the processing of data if the
logic of an operator is based on time.
For example, a window operator can emit complete results for a window once
the time passed the window's
Hi Averell,
The records emitted by the monitoring tasks are "just" file splits, i.e.,
meta information that defines which data to read from where.
The reader tasks receive these splits and process them by reading the
corresponding files.
You could of course partition the splits based on the file
Hi,
If you are using a custom source, you can call
SourceContext.markAsTemporarilyIdle() to indicate that a task is currently
not producing new records [1].
Best, Fabian
2018-07-31 8:50 GMT+02:00 Reza Sameei :
> It's not a real solution; but why you don't change the parallelism for
> your `Sour
Levy :
> Fabian,
>
> You have any time to review the changes?
>
> On Thu, Jul 19, 2018 at 2:19 AM Fabian Hueske wrote:
>
>> Hi Elias,
>>
>> Thanks for the update!
>> I'll try to have another look soon.
>>
>> Best, Fabian
>>
>> 2018
Hi Averell,
please find my answers inlined.
Best, Fabian
2018-07-31 13:52 GMT+02:00 Averell :
> Hi Fabian,
>
> Thanks for the information. I will try to look at the change to that
> complex
> logic that you mentioned when I have time. That would save one more shuffle
> (from 1 to 0), wouldn't t
Hi I think you are mixing Java and Scala dependencies.
org.apache.flink.streaming.api.datastream.DataStream is the DataStream of
the Java DataStream API.
You should use the DataStream of the Scala DataStream API.
Best, Fabian
2018-08-01 14:01 GMT+02:00 Mich Talebzadeh :
> Hi,
>
> I believed I t
LinkedIn *
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own r
Hi,
Paul is right.
Which and how much data is stored in state for a window depends on the type
of the function that is applied on the windows:
- ReduceFunction: Only the reduced value is stored
- AggregateFunction: Only the accumulator value is stored
- WindowFunction or ProcessWindowFunction: Al
discussion of your document.
Elias, do you want to put your document into Markdown and open a PR for the
documentation?
Thanks,
Fabian
2018-07-31 18:16 GMT+02:00 Fabian Hueske :
> Hi Elias,
>
> Sorry for the delay. I just made a pass over the document.
> I think it is very good.
>
>
Hi,
By setting the time characteristic to EventTime, you enable the internal
handling of record timestamps and watermarks.
In contrast to EventTime, ProcessingTime does not require any additional
data.
You can use both, EventTime and ProcessingTime in the same application and
StreamExecutionEnvir
)Unit
>>>> [error] cannot be applied to (String, org.apache.flink.streaming.
>>>> api.environment.StreamExecutionEnvironment, Symbol, Symbol, Symbol,
>>>> Symbol)
>>>> [error] tableEnv.registerDataStream("table1", streamExecEnv, 'key,
&g
Hi Mugunthan,
this depends on the type of your job. Is it a batch or a streaming job?
Some queries could be ported to Flink's SQL API as suggested by the link
that Hequn posted. In that case, the query would be executed in Flink.
Other options are to use a JDBC InputFormat or persisting the resul
Hi Averall,
As Vino said, checkpoints store the state of all operators of an
application.
The state of a monitoring source function is the position in the currently
read split and all splits that have been received and are currently pending.
In case of a recovery, the splits are recovered and the
Hi Dylan,
Yes, that's a bug.
As you can see from the plan, the partitioning step is pushed past the
Filter.
This is possible, because the optimizer knows that a Filter function cannot
modify the data (it only removes records).
A workaround should be to implement the filter as a FlatMapFunction. A
om>:
> Thanks for the reply. I was mainly thinking of the usecase of streaming
> job.
> In the approach to port to Flink's SQL API, is it possible to read parquet
> data from S3 and register table in flink?
>
>
> On Tue, Aug 7, 2018 at 1:05 PM, Fabian Hueske wrote:
&g
I've created FLINK-10100 [1] to track the problem and suggest a solution
and workaround.
Best, Fabian
[1] https://issues.apache.org/jira/browse/FLINK-10100
2018-08-08 10:39 GMT+02:00 Fabian Hueske :
> Hi Dylan,
>
> Yes, that's a bug.
> As you can see from the plan, th
The code or the execution plan (ExecutionEnvironment.getExecutionPlan()) of
the job would be interesting.
2018-08-08 10:26 GMT+02:00 Chesnay Schepler :
> What have you tried so far to increase performance? (Did you try different
> combinations of -yn and -ys?)
>
> Can you provide us with your app
Hi Alexis,
First of all, I think you leverage the partitioning and sorting properties
of the data returned by the database using SplitDataProperties.
However, please be aware that SplitDataProperties are a rather experimental
feature.
If used without query parameters, the JDBCInputFormat generate
Hi everybody,
The Flink community maintains a directory of organizations and projects
that use Apache Flink [1].
Please reply to this thread if you'd like to add an entry to this list.
Thanks,
Fabian
[1] https://cwiki.apache.org/confluence/display/FLINK/Powered+by+Flink
Thanks Amit!
I've added Limeroad to the list with your description.
Best, Fabian
2018-08-08 14:12 GMT+02:00 amit.jain :
> Hi Fabian,
>
> We at Limeroad, are using Flink for multiple use-cases ranging from ETL
> jobs, ClickStream data processing, real-time dashboard to CEP. Could you
> list us on
Hi Juan,
The state will be purged if you return None instead of a Some.
However, this only happens when the function is called for a specific key,
i.e., state won't be automatically removed after some time.
If this is your use case, you have to implement a ProcessFunction and use
timers to manuall
Hi David,
Did you try to set the encoding on the TextInputFormat with
TextInputFormat tif = ...
tif.setCharsetName("UTF-16");
Best, Fabian
2018-08-08 17:45 GMT+02:00 David Dreyfus :
> Hello -
>
> It does not appear that Flink supports a charset encoding of "UTF-16". It
> particular, it doesn't
nericInputSplit takes two parameters:
> partitionNumber and totalNumberOfPartitions. Should I assume that there are
> 2 splits divided into 24 partitions?
>
> Regards,
> Alexis.
>
>
>
> On Wed, Aug 8, 2018 at 11:57 AM Fabian Hueske wrote:
>
>> Hi Alexis,
>&
Hi,
regarding the plans. There are no plans to support custom window assigners
and evictors.
There were some thoughts about supporting different result update
strategies that could be used to return early results or update results in
case of late data.
However, these features are currently not on
Hi Averell,
One comment regarding what you said:
> As my files are small, I think there would not be much benefit in
checkpointing file offset state.
Checkpointing is not about efficiency but about consistency.
If the position in a split is not checkpointed, your application won't
operate with e
e BOM indicates Little Endian and the caller indicates
> UTF-16BE, Flink should rewrite the charsetName as UTF-16LE.
>
> I hope this makes sense and that I haven't been testing incorrectly or
> misreading the code.
>
> Thank you,
> David
>
> On Thu, Aug 9, 2018
Hi Will,
The distinct operator is implemented as a groupBy(distinctKeys) and a
ReduceFunction that returns the first argument.
Hence, it depends on the order in which the records are processed by the
ReduceFunction.
Flink does not maintain a deterministic order because it is quite expensive
in di
Hi,
Elias and Paul have good points.
I think the performance degradation is mostly to the lack of function
chaining in the rebalance case.
If all steps are just map functions, they can be chained in the
no-rebalance case.
That means, records are passed via function calls.
If you add rebalancing,
upBy(0, 1)
>> .reduceGroup(groupReducer)
>> .withForwardedFields("_1")
>> .output(outputFormat)
>>
>> It seems to work well, and the semantic annotation does remove a hash
>> partition from the execution plan.
>>
>> Regards,
>> Ale
Hi Averell,
Conceptually, you are right. Checkpoints are taken at every operator at the
same "logical" time.
It is not important, that each operator checkpoints at the same wallclock
time. Instead, the need to take a checkpoint when they have processed the
same input.
This is implemented with so-c
Hi Henry,
The problem is that the table that results from the query does not have a
unique key.
You can only use an upsert sink if the table has a (composite) unique key.
Since this is not the case, you cannot use upsert sink.
However, you can implement a StreamRetractionTableSink which allows to
Hi,
ExecutionEnvironment.socketTextStream is deprecated and it is very likely
that it will be removed because of its limited use.
I would recommend to have at the implementation of the SourceFunction [1]
and adapt it to your needs.
Best, Fabian
[1]
https://github.com/apache/flink/blob/master/fli
ong configuration of the cluster; there was only 1
> task manager with 1 slot.
>
> If I submit a job with "flink run -p 24 ...", will the job hang until at
> least 24 slots are available?
>
> Regards,
> Alexis.
>
> On Fri, 10 Aug 2018, 14:01 Fabian Hueske wrote:
>
Hi Mingliang,
let me answer your second question first:
> Another question is about the alignment buffer, I thought it was only
used for multiple input stream cases. But for keyed process function , what
is actually aligned?
When a task sends records to multiple downstream tasks (task not
operat
Hi,
It is sufficient to implement the CheckpointedFunction interface.
Since SourceFunctions emit records in a separate thread, you need to ensure
that not record is emitted while the shapshotState method is called.
Flink provides a lock to synchronize data emission and state snapshotting.
See the
Hi Paul,
Maybe Aljoscha (in CC) can help you with this question.
AFAIK, he has some experience with Flink and Kerberos.
Best, Fabian
2018-08-13 14:51 GMT+02:00 Paul Lam :
> Hi,
>
> I built Flink from the latest 1.5.x source code, and got some strange
> outputs from the command line when submitt
Hi,
It is recommended to always call update().
State modifications by modifying objects is only possible because the heap
based backends do not serialize or copy records to avoid additional costs.
Hence, this is rather a side effect than a provided API. As soon as you
change the state backend, st
Hi,
Flink InputFormats generate their InputSplits sequentially on the
JobManager.
These splits are stored in the heap of the JM process and handed out to
SourceTasks when they request them lazily.
Split assignment is done by a InputSplitAssigner, that can be customized.
FileInputFormats typically
Hi,
I've given you Contributor permissions for Jira and assigned the issue to
you.
You can now also assign other issue to you.
Looking forward to your contribution.
Best, Fabian
2018-08-14 19:45 GMT+02:00 Guibo Pan :
> Hello, I am a new user for flink jira. I reported an issue and would like
>
Hi John,
Watermarks cannot make progress if you have stream partitions that do not
carry any data.
What kind of source are you using?
Best,
Fabian
2018-08-15 4:25 GMT+02:00 vino yang :
> Hi Johe,
>
> In local mode, it should also work.
> When you debug, you can set a breakpoint in the getCurren
tionManager$1.get(PoolingHttpClientConnectionMan
> ager.java:263)
> at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at com.amazon.ws.emr.hadoop.
Hi,
I've recently published a blog post about Broadcast State [1].
Cheers,
Fabian
[1]
https://data-artisans.com/blog/a-practical-guide-to-broadcast-state-in-apache-flink
2018-08-20 3:58 GMT+02:00 Paul Lam :
> Hi Rong, Hequn
>
> Your answers are very helpful! Thank you!
>
> Best Regards,
> Paul
Hi,
No, it won't. I will simply remove state that has not been accessed for the
configured time but not change the result.
For example, if you have a GROUP BY aggregation and the state for a
grouping key is removed, the operator will start a new aggregation if a
record with the removed grouping ke
blocker or that I've identified the right component.
> I'm afraid I don't have the bandwidth or knowledge to make the kind of
> pull request you really need. I do hope my suggestions prove a little
> useful.
>
> Thank you,
> David
>
> On Fri, Aug 10, 2018 at
the dynamic table.
>
> Best,
> Henry
>
>
> 在 2018年8月21日,下午4:16,Fabian Hueske 写道:
>
> Hi,
>
> No, it won't. I will simply remove state that has not been accessed for
> the configured time but not change the result.
> For example, if you have a GROUP BY agg
rtain threshold.
In that case, the query would only operate on tail of the stream, e.g., the
last day or week.
Best, Fabian
2018-08-21 12:03 GMT+02:00 徐涛 :
> Hi Fabian,
> Is the behavior a bit weird? Because it leads to data inconsistency.
>
> Best,
> Henry
>
>
> 在 2018年8月21日,下午5
Hi,
The semantics of a query do not depend on the way that it is used.
praiseAggr is a table that grows by one row per second and article_id. If
you use that table in a join, the join will fully materialize the table.
This is a special case because the same row is added multiple times, so the
stat
Hi Henry,
Flink is an open source project. New build-in functions are constantly
contributed to Flink. Right now, there are more than 5 PRs open to add or
improve various functions.
If you find that some functions are not working correctly or could be
improved, you can open a Jira issue. The same
Hi,
I don't think that recommending Gists is a good idea.
Sure, well formatted and highlighted code is nice and much better than
posting screenshots but Gists can be deleted.
Deleting a Gist would make an archived thread useless.
I would definitely support instructions on how to add code to a mail
I agree to remove the slides section.
A lot of the content is out-dated and hence not only useless but might
sometimes even cause confusion.
Best,
Fabian
Am Mo., 27. Aug. 2018 um 08:29 Uhr schrieb Renjie Liu <
liurenjie2...@gmail.com>:
> Hi, Stephan:
> Can we put project wiki in some place? I
Hi Averell,
If this is a more general error, I'd prefer a separate issue & PR.
Thanks,
Fabian
Am Fr., 24. Aug. 2018 um 13:15 Uhr schrieb Averell :
> Good day everyone,
>
> I'm writing unit test for the bug fix FLINK-9940, and found that in some
> existing tests in flink-fs-tests cannot detect t
paste text instead of screenshots of text
> 3. you keep formatting when pasting code in order to keep the code readable
> 4. there are enough import statements to avoid ambiguities
>
>
>
> On Mon, Aug 27, 2018 at 10:51 AM Fabian Hueske wrote:
>
>> Hi,
>>
>> I
B 56063,
> Geschäftsführer: Bo PENG, Qiuen Peng, Shengli Wang
> This e-mail and its attachments contain confidential information from
> HUAWEI, which is intended only for the person or entity whose address is
> listed above. Any use of the information contained herein in any way
> (including
Hi Averell,
Barriers are injected into the regular data flow by source functions.
In case of a file monitoring source, the barriers are injected into the
stream of file splits that are passed to the
ContinuousFileMonitoringFunction.
The CFMF puts the splits into a queue and processes them with a d
GROUP BY
>> article_id", the answer is "101,102,103"
>> 2. if you change your sql to s"SELECT last_value(article_id) FROM
>> praise", the answer is "100"
>>
>> Best, Hequn
>>
>> On Tue, Aug 21, 2018 at 8:52 PM, 徐涛 wrote:
Hi,
CMCF is not a source, only the file monitoring function is. Barriers are
injected by the FMF when the JM sends a checkpoint message. The barriers
then travel to the CMCF and trigger the Checkpoint ING.
Fabian
Averell schrieb am Di., 28. Aug. 2018, 12:02:
> Hello Fabian,
>
> Thanks for t
Hi,
I guess that this is not a fundamental problem but just a limitation in the
current implementation.
Andrey (in CC) who implemented the TTL support should be able to give more
insight on this issue.
Best, Fabian
Am Mi., 29. Aug. 2018 um 04:06 Uhr schrieb vino yang :
> Hi Elias,
>
> From the
Hi Elias,
Your assumption is correct. An operation on a KeyedStream results in a
regular DataStream because the operation might change the data type or the
key field.
Hence, it is not guaranteed that the same keys can be extracted from the
output of the keyed operation.
However, there is a way to
Hi
You are using SQL syntax in a Table API query. You have to stick to Table
API syntax or use SQL as
tEnv.sqlQuery("SELECT col1,CONCAT('field1:',col2,',field2:',CAST(col3 AS
string)) FROM csvTable")
The Flink documentation lists all supported functions for Table API [1] and
SQL [2].
Best, Fabi
Actually, some parts of Flink's batch engine are similar to streaming as
well. If the data does not need to be sorted or put into a hash-table, the
data is pipelined (like in many relational database systems).
For example, if you have a job that joins two inputs with a HashJoin, only
the build side
> is pushed from operators to operators in both stream and batch
>
> On Wed 12 Sep, 2018, 4:28 PM Fabian Hueske, wrote:
>
>> Actually, some parts of Flink's batch engine are similar to streaming as
>> well. If the data does not need to be sorted or put into a hash
Hi,
The problem is that Flink SQL does not expose the UIDs of the generated
operators.
We've met that issue before, but it is still not fully clear what would be
the best way to this accessible.
Best, Fabian
2018-09-13 5:15 GMT-04:00 Dawid Wysakowicz :
> Hi Oleksandr,
>
> The mapping of state t
Hi John,
Are you sure that the first rows of the first window are dropped?
When a query with processing time windows is terminated, the last window is
not computed. This in fact intentional and does not apply to event-time
windows.
Best, Fabian
2018-09-17 17:21 GMT+02:00 John Stone :
> Hello,
Hi Chen,
Yes, this is possible.
Have a look at the configuration of idle state retention time [1].
Best, Fabian
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/table/streaming.html#idle-state-retention-time
2018-09-17 20:10 GMT+02:00 burgesschen :
> Hi everyone,
> I'm tryin
Hmm, that's interesting.
HOP and TUMBLE window aggregations are directly translated into their
corresponding DataStream counterparts (Sliding, Tumble).
There should be no filtering of records.
I assume you tried a simple query like "SELECT * FROM MyEventTable" and
received all expected data?
Fabi
Hi Alejandro,
asScala calls iterator() the first time and reduce() another time.
These iterators can only be iterated once because they are possibly backed
by multiple sorted files which have been spilled to disk and are
merge-sorted while iterating.
I'm actually surprised that you found this cod
Hi,
Any operator can have multiple out-going edges.
If you implement something like:
DataStream instream = ...
DataStream outstream1 = instream.map(new MapFunc1());
DataStream outstream2 = instream.map(new MapFunc2());
The node representing instream will have two outgoing edges that lead to
the
Hi,
The functionality of the SQL ScalarFunction is backed by Flink's
distributed cache and just passes on the function call.
I tried it locally on my machine and it works for me.
What is your setup? Are you running on Yarn?
Maybe Chesnay or Dawid (added to CC) can help to track the problem down.
The auto-generated ids are included in the savepoint data. So, it should be
possible to them from the savepoint.
However, AFAIK, there is no tool to do that. You'd need to manually dig
into the serialized data.
Cheers, Fabian
2018-09-18 13:30 GMT+02:00 vino yang :
> Hi Paul,
>
> Referring to the
,
> Paul Lam
>
>
> 在 2018年9月18日,20:09,Fabian Hueske 写道:
>
> The auto-generated ids are included in the savepoint data. So, it should
> be possible to them from the savepoint.
> However, AFAIK, there is no tool to do that. You'd need to manually dig
> into the seriali
Hi John,
Just to clarify, this missing data is due to the starting overhead and not
due to a bug?
Best, Fabian
2018-09-18 15:35 GMT+02:00 John Stone :
> Thank you all for your assistance. I believe I've found the root cause if
> the behavior I am seeing.
>
> If I just use "SELECT * FROM MyEven
401 - 500 of 1728 matches
Mail list logo