Hi Pieter,
cross is indeed too expensive for this task.
If dataset A fits into memory, you can do the following: Use a
RichMapPartitionFunction to process dataset B and add dataset A as a
broadcastSet. In the open method of mapPartition, you can load the
broadcasted set and sort it by a.propertyX
ers for getting started with the 'tricky range
> partitioning'? I am quite keen to get this working with large datasets ;-)
>
> Cheers,
>
> Pieter
>
> 2015-09-30 10:24 GMT+02:00 Fabian Hueske :
>
>> Hi Pieter,
>>
>> cross is indeed
Hi Flavio,
I don't think this is a feature that needs to go into Flink core.
To me it looks like this be implemented as a utility method by anybody who
needs it without major effort.
Best, Fabian
2015-10-02 15:27 GMT+02:00 Flavio Pompermaier :
> Hi to all,
>
> in many of my jobs I have to read
artitionFunction
>>> on A:
>>> In the open method of mapPartition, you sort B. Then, for each element
>>> of A, you do a binary search in B, and look at the index found by the
>>> binary search, which will be the count that you are looking for.
>>>
>>
Hi Lydia,
you need to implement a custom InputFormat to read binary files.
Usually you can extend the FileInputFormat.
The implementation depends a lot on your use case, for example whether each
binary file is read into a single or multiple records and how records are
delimited if there are more t
Hi Flavio,
it is not possible to split by line count because that would mean to read
and parse the file just for splitting.
Parallel processing of data sources depends on the input splits created by
the InputFormat. Local files can be split just like files in HDFS. Usually,
each file corresponds
description of
> their internals?
>
> On Wed, Oct 7, 2015 at 3:09 PM, Fabian Hueske wrote:
>
>> Hi Flavio,
>>
>> it is not possible to split by line count because that would mean to read
>> and parse the file just for splitting.
>>
>> Parallel processing
Hi Konstantin,
Flink uses managed memory only for its internal processing (sorting, hash
tables, etc.).
If you allocate too much memory in your user code, it can still fail with
an OOME. This can also happen for large broadcast sets.
Can you check how much much memory the JVM allocated and how muc
I think that's actually very use case specific.
You're code will never see the malformed record because it is dropped by
the input format.
Other applications might rely on complete input and would prefer an
exception to be notified about invalid input.
Flink's CsvInputFormat has a parameter "lenie
Hi Philip,
here a few additions to what Max said:
- ORDER BY: As Max said, Flink's sortPartition() does only sort with a
partition and does not produce a total order. You can either set the
parallelism to 1 as Max suggested or use a custom partitioner to range
partition the data.
- SORT BY: From y
tribute
> By] + [Sort By]. Therefore, according to your suggestion, should it be
> partitionByHash() + sortGroup() instead of sortPartition() ?
>
> Or probably I did not still get much difference between Partition and
> scope within a reduce.
>
> Regards,
> Philip
>
> On Mon,
Thanks for starting this Kostas.
I think the list is quite hidden in the wiki. Should we link from
flink.apache.org to that page?
Cheers, Fabian
2015-10-19 14:50 GMT+02:00 Kostas Tzoumas :
> Hi everyone,
>
> I started a "Powered by Flink" wiki page, listing some of the
> organizations that are
Sounds good +1
2015-10-19 14:57 GMT+02:00 Márton Balassi :
> Thanks for starting and big +1 for making it more prominent.
>
> On Mon, Oct 19, 2015 at 2:53 PM, Fabian Hueske wrote:
>
>> Thanks for starting this Kostas.
>>
>> I think the list is quite hidden in
ult to answer to
> interested users.
>
>
> On 19.10.2015 15:08, Suneel Marthi wrote:
>
> +1 to this.
>
> On Mon, Oct 19, 2015 at 3:00 PM, Fabian Hueske wrote:
>
>> Sounds good +1
>>
>> 2015-10-19 14:57 GMT+02:00 Márton Balassi <
>> balassi.mar...
I think it's not a nice solution to check for the type of the returned
execution environment to determine whether it is a local or a remote
execution environment.
Wouldn't it be better to add a method isLocal() to ExecutionEnvironment?
Cheers, Fabian
2015-10-14 19:14 GMT+02:00 Flavio Pompermaier
It might even be materialized (to disk) if both derived data sets are
joined.
2015-10-22 12:01 GMT+02:00 Till Rohrmann :
> I fear that the filter operations are not chained because there are at
> least two of them which have the same DataSet as input. However, it's true
> that the intermediate re
> reading the entire input multiple times (we are talking 100GB+ on max 32
> workers) but I would have to run some experiments to confirm that.
>
>
>
> 2015-10-22 12:06 GMT+02:00 Fabian Hueske :
>
>> It might even be materialized (to disk) if both derived data sets are
>
Hi Philip,
the CsvInputFormat does not support to read empty fields.
I see two ways to achieve this functionality:
- Use a TextInputFormat that returns each line as a String and do the
parsing in a subsequent MapFunction
- Extend the CsvInputFormat to support empty fields
Cheers,
Fabian
2015-10
Hi Thomas,
until recently, Flink provided an own implementation of a S3FileSystem
which wasn't fully tested and buggy.
We removed that implementation and are using now (in 0.10-SNAPSHOT)
Hadoop's S3 implementation by default.
If you want to continue using 0.9.1 you can configure Flink to use Hado
Hi Johann,
I see three options for your use case.
1) Generate Pojo code at planning time, i.e., when the program is composed.
This does not work when the program is already running. The benefit is that
you can use key expressions, have typed fields, and type specific
serializers and comparators.
Hi Jeffery,
Flink uses a (potentially external) merge-sort to group data. Combining is
done using an in-memory sort.
Because Flink uses pipelined data transfer, the execution of operators in a
program can overlap. For example in WordCount, the sort of a groupBy will
immediately start as soon as th
You refer to the DataSet (batch) API, right?
In that case you can evaluate your condition in the program and fetch a
DataSet back to the client using List myData = DataSet.collect();
Based on the result of the collect() call you can define and execute a new
program.
Note: collect() will immediate
aset.
>
> Some pseudocode from your solution:
> DataSet A = env.readFile(...);
> DataSet C = env.readFile(...);
>
> A.groupBy().reduce().filter(*Check conditions here and in case start
> processing C*);
>
>
> Thanks,
> Giacomo
>
>
>
>
> On Fri, Oct 30
Hi Philip,
thanks for reporting the issue. I just verified the problem.
It is working correctly for the Java API, but is broken in Scala.
I will work on a fix and include it in the next RC for 0.10.0.
Thanks, Fabian
2015-11-02 12:58 GMT+01:00 Philip Lee :
> Thanks for your reply, Stephan.
>
>
Converting String ids into Long ids can be quite expensive, so you should
make sure it pays off.
The save way to do it is to get all unique String ids (project, distinct),
do zipWithUniqueId, and join all DataSets that have the String id with the
new long id. So it is a full sort for the unique an
These reason is that the non-rich function interfaces are SAM (single
abstract method) interfaces.
In Java 8, SAM interfaces can be specified as concise lambda functions.
Cheers, Fabian
2015-11-09 10:45 GMT+01:00 Flavio Pompermaier :
> Hi flinkers,
> I have a simple question for you that I want
Why don't you use a composite key for the Flink join
(first.join(second).where(0,1).equalTo(2,3).with(...)?
This would be more efficient and you can omit the check in the join
function.
Best, Fabian
2015-11-08 19:13 GMT+01:00 Philip Lee :
> I want to join two tables with two columns like
>
> //
Hi Flavio,
this will not work out of the box. If you extend a Flink tuple and add
additional fields, the type will be recognized as tuple and the
TupleSerializer will be used to serialize and deserialize the record. Since
the TupleSerializer is not aware of your additional fields it will not
seria
Hi Nick,
I think you can do this with Flink quite similar to how it is explained in
the Samza documentation by using a stateful CoFlatMapFunction [1], [2].
Please have a look at this snippet [3].
This code implements an updateable stream filter. The first stream is
filtered by words from the seco
Hi everybody,
The Flink community is excited to announce that Apache Flink 0.10.0 has
been released.
Please find the release announcement here:
--> http://flink.apache.org/news/2015/11/16/release-0.10.0.html
Best,
Fabian
Hi,
this is an artifact of how the solution set is internally implemented.
Usually, a CoGroup is executed using a sort-merge strategy, i.e., both
input are sorted, merged, and handed to the CoGroup function in a streaming
fashion. Both inputs are treated equally, and if one of both inputs does
not
Hi Ron,
Have you checked:
https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html#transformations
?
Fold is like reduce, except that you define a start element (of a different
type than the input type) and the result type is the type of the initial
value. In reduce,
Hi Konstantin,
let me first summarize to make sure I understood what you are looking for.
You computed an aggregate over a keyed event-time window and you are
looking for the maximum aggregate for each group of windows over the same
period of time.
So if you have
(key: 1, w-time: 10, agg: 17)
(key
1, w-time: 20, agg: 30)
> (key: 1, w-time: 20, agg: 30)
>
> Because the reduce function is evaluated for every incoming event (i.e.
> each key), right?
>
> Cheers,
>
> Konstantin
>
> On 23.11.2015 10:47, Fabian Hueske wrote:
> > Hi Konstantin,
> >
> > let me firs
Hi Nick,
you can use Flink's HadoopInputFormat wrappers also for the DataStream API.
However, DataStream does not offer as much "sugar" as DataSet because
StreamEnvironment does not offer dedicated createHadoopInput or
readHadoopFile methods.
In DataStream Scala you can read from a Hadoop InputFo
Hi Anton,
If I got your requirements right, you are looking for a solution that
continuously produces updated partial aggregates in a streaming fashion.
When a special event (no more trades) is received, you would like to store
the last update as a final result. Is that correct?
You can compute
A strong argument for YARN mode can be the isolation of multiple users and
jobs. You can easily start a new Flink cluster for each job or user.
However, this comes at the price of resource (memory) fragmentation. YARN
mode does not use memory as effective as cluster mode.
2015-11-25 9:46 GMT+01:00
use only YARN without Hadoop ?
>
> Currently we are using Cassandra and CFS ( cass file system )
>
>
> Cheers
>
> On Wed, Nov 25, 2015 at 3:51 PM, Fabian Hueske wrote:
>
>> A strong argument for YARN mode can be the isolation of multiple users
>> and jobs. You can e
Hi Radu,
the connectors are available in Maven Central.
Just add them as a dependency in your project and they will be fetched and
included.
Best, Fabian
2015-11-27 14:38 GMT+01:00 Radu Tudoran :
> Hi,
>
>
>
> I was trying to use flink connectors. However, when I tried to import this
>
>
>
> im
ecipient(s) is
> prohibited. If you receive this e-mail in error, please notify the sender
> by phone or email immediately and delete it!
>
>
>
> *From:* Fabian Hueske [mailto:fhue...@gmail.com]
> *Sent:* Friday, November 27, 2015 2:41 PM
> *To:* user@flink.apache.org
>
Hi Nirmalya,
can you describe the semantics that you want to implement?
Do you want to find the max temperature every 5 milliseconds or the max of
every 5 records?
Right now, you are using a non-keyed timeWindow of 5 milliseconds. This
will create a window for the complete stream every 5 msecs.
H
Hi Nirmalya,
it is correct that the evictor is called BEFORE the window function is
applied because this is required to support certain types of sliding
windows.
If you want to remove all elements from the window after the window
function was applied, you need a trigger that purges the window. The
Hi Anwar,
You trigger looks good!
I just want to make sure you know what it is exactly happening after a
window was evaluated and purged.
Once a window was purged, the whole window is cleared and removed. If a new
element arrives, that would have fit into the purged window, a new window
with exac
If you are just looking for an exchange format between two Flink jobs, I
would go for the TypeSerializerInput/OutputFormat.
Note that these are binary formats.
Best, Fabian
2015-11-27 15:28 GMT+01:00 Flavio Pompermaier :
> Hi to all,
>
> I have a complex POJO (with nexted objects) that I'd like
ompletion.
> In that 1 min window, is my modified count trigger still valid ? Say, if
> in that one minute window, I have 100 events after 30 s, it will still fire
> at 30th second ?
>
> Cheers,
> anwar.
>
>
>
> On Fri, Nov 27, 2015 at 3:31 PM, Fabian Hueske wrote:
>
rmal or TypeSerializer is
> supposed to perform better then this?
>
>
> On Fri, Nov 27, 2015 at 3:39 PM, Fabian Hueske wrote:
>
>> If you are just looking for an exchange format between two Flink jobs, I
>> would go for the TypeSerializerInput/OutputFormat.
>>
Hi Guido,
please check this section of the configuration documentation:
https://ci.apache.org/projects/flink/flink-docs-release-0.10/setup/config.html#configuring-the-network-buffers
It should answer your questions. Please let us know, if not.
Cheers, Fabian
2015-11-27 16:41 GMT+01:00 Guido :
Hi Nirmalya,
thanks for the detailed description of your understanding of Flink's window
semantics.
Most of it is correct, but a few things need a bit of correction ;-)
Please see my comments inline.
2015-11-28 4:36 GMT+01:00 Nirmalya Sengupta :
> Hello Fabian,
>
>
> A little long mail; please
Sorry, I have to correct myself. The windowing semantics are not easy ;-)
2015-11-30 15:34 GMT+01:00 Fabian Hueske :
> Hi Nirmalya,
>
> thanks for the detailed description of your understanding of Flink's
> window semantics.
> Most of it is correct, but a few things need
Hi Flavio,
Flink does not support caching of data sets in memory yet.
Best, Fabian
2015-11-30 16:45 GMT+01:00 Flavio Pompermaier :
> Hi to all,
> I was wondering if Flink could fit a use case where a user load a dataset
> in memory and then he/she wants to explore it interactively. Let's say I
Mon, Nov 30, 2015 at 4:52 PM, Fabian Hueske wrote:
>
>> Hi Flavio,
>>
>> Flink does not support caching of data sets in memory yet.
>>
>> Best, Fabian
>>
>> 2015-11-30 16:45 GMT+01:00 Flavio Pompermaier :
>>
>>> Hi to all,
>>>
Hi Radu,
with the corrected setters/getters, Flink accepts your data type as a POJO
data type which automatically is a key (in contrast to the GenericType it
was before).
There is no need to implement the Key interface.
Best, Fabian
2015-11-30 17:46 GMT+01:00 Radu Tudoran :
> Thank you both for
Yes, that is correct. The first element will be lost.
In fact, you do neither need a trigger nor an evictor if you want to get
the max element for each group of 5 elements.
See my reply on your other mail.
Cheers,
Fabian
2015-11-30 18:47 GMT+01:00 Nirmalya Sengupta :
> Hello Aljoscha ,
>
> Many
Hi Nirmalya,
please don't feel discouraged to write blog posts! There are many things
you could write about Flink's support for windows, e.g., you could discuss
use cases / applications that require advanced window semantics.
Regarding my comment on how/when elements are removed from a window:
El
Hi Madhu,
checkout the following resources:
- Apache Flink Blog: http://flink.apache.org/blog/index.html
- Data Artisans Blog: http://data-artisans.com/blog/
- Flink Forward Conference website (Talk slides & recordings):
http://flink-forward.org/?post_type=session
- Flink Meetup talk recordings:
Hi Phil,
an apply method after a join runs pipelined with the join, i.e., it starts
processing when the first join result is emitted and finishes after it
handled the last join result.
Unless the logic in your apply function is not terribly complex, this
should be OK. If you do not specify an appl
Hi Welly,
at the moment we only provide native Windows .bat scripts for start-local
and the CLI client.
However, we check that the Unix scripts (including start-webclient.sh) work
in a Windows Cygwin environment.
I have to admit, I am not familiar with MinGW, so not sure what is
happening there.
Hi Nirmalya,
please find my answers in line.
2015-12-02 3:26 GMT+01:00 Nirmalya Sengupta :
> Hello Fabian (),
>
> Many thanks for your encouraging words about the blogs. I want to make a
> sincere attempt.
>
> To summarise my understanding of the rule of removal of the elements from
> the window
Hi Mihail,
not sure if I correctly got your requirements, but you can define windows
on a keyed stream. This basically means that you partition the stream, for
example by order-id, and compute windows over the keyed stream. This will
create one (or more, depending on the window type) window for ea
Hi Nick,
thanks for pushing this and opening the JIRA issue.
The issue came up a couple of times and a known limitation (see FLINK-1256).
So far the workaround of marking member variables as transient and
initializing them in the open() method of a RichFunction has been good
enough for all cases
Hi Sebastian,
There is no way to return memory from a Flink process except shutting the
process down.
I think YARN could help in your setup. In a YARN setup, you can flexibly
start and stop Flink sessions with different configurations (memory, TMs,
slots) or run a single job. When running a single
memory lazily
> - Set the memory to offheap memory. That way the JVM heap is small. The
> off-heap memory is returned when no longer used deallocated - this releases
> memory much better than JVM shrinking the heap.
>
>
>
> On Wed, Dec 9, 2015 at 10:06 AM, Fabian Hueske w
memory consuming operators release the
> memory.
>
> The Java process releases that memory then on the next GC, as far as I
> know.
>
> On Wed, Dec 9, 2015 at 11:01 AM, Fabian Hueske wrote:
>
>> Streaming mode with on-heap memory won't help because the JVM allocates
>
Hi Martin,
you can get the start and end time of a window from the TimeWindow object.
The following Scala code snippet shows how to access the window end time
(start time is equivalent):
.timeWindow(Time.minutes(5))
.trigger(new EarlyCountTrigger(earlyCountThreshold))
.apply { (
key: Int,
win
ltiValuePoissonPreProcess())
>
> How can I get access to the time window object in the fold function?
>
>
> cheers Martin
>
>
> On Thu, Dec 10, 2015 at 12:20 PM, Fabian Hueske wrote:
>
>> Hi Martin,
>>
>> you can get the start and end time of a windo
ocessed (at least to my
> understanding) this will lead to problems. Any Idea how to get around this?
>
> cheers Martin
>
>
>
> On Thu, Dec 10, 2015 at 2:59 PM, Fabian Hueske wrote:
>
>> Sure. You don't need a trigger, but a WindowFunction instead of the
>> F
Hi Ovidiu,
the job execution information is only held in memory and completely
discarded when the JobManager is shut down.
However, you can query all stats that are displayed by the dashboard via a
REST API [1] while the JM is running and save them yourself.
This way you can analyze the data also
Hi Nirmalya,
sorry for the delayed answer.
First of all, Flink does not take care that our windows fit into memory.
The default trigger depends on the way in which you define a window. Given
a KeyedStream you can define a window in the following ways:
KeyedStream s = ...
s.timeWindow() // this w
Hi,
In your program, you apply a distinct transformation on a data set that has
a (nested) GValue[] type. Distinct requires that all fields are comparable
with each other. Therefore all fields of the data sets' type must be valid
key types.
However, Flink does not support object arrays as keys typ
Hi Nirmalya,
event time events (such as an event time trigger to compute a window) are
triggered when a watermark is received that is larger than the triggers
timestamp. By default, watermarks are emitted with a fixed time interval,
i.e., every x milliseconds. When a new watermark is emitted, Flin
Hi,
you can only get the execution plan for programs that have a data sink and
haven't been executed. In your code print() defines the data sink, however
it also eagerly executes a program. After execution the program is
"removed" from the execution environment. Therefore, Flink complains that
no
Hi Gordon,
this is great news! I am very happy to hear that there is a new local Flink
community in Taiwan!
Translating blog posts, slide sets, and other documentation is very
valuable and makes the project known to a broader audience and accessible
to more people.
Please let us know, if you need
Hi Serkan,
yes this is expected. The reference implementations use DataStream.print()
to emit the resulting data stream to the standard output. If the program is
executed in an IDE, the standard output is goes to the IDE's console. If
you start Flink in a local (or cluster) setup, the standard out
Hi Sourav,
Flink does not support to execute SQL queries on DataSets, yet.
We just started an effort to change that. See the discussion on the dev
mailing list [1] and the corresponding design document [2].
For simple relational queries, you can use the Table API [3].
Best,
Fabian
[1]
https://m
Hi,
I answered your question on Stack Overflow.
To summarize, the program reads data from a text socket.
You need to open the socket before starting the program. You can open the
socket on any (available) port.
If you run
nc -lk
from a command line, you should used localhost and port .
Hi Christian,
the open method is called by the Flink workers when the parallel tasks are
initialized.
The configuration parameter is the configuration object of the operator.
You can set parameters in the operator config as follows:
DataSet text = ...
DataSet wc = text.flatMap(new
Tokenizer()).ge
@Christian: I don't think that is possible.
There are quite a few things missing in the JSON including:
- User function objects (Flink ships objects not class names)
- Function configuration objects
- Data types
Best, Fabian
2016-01-14 16:02 GMT+01:00 lofifnc :
> Hi Márton,
>
> Thanks for your
Hi Tal,
you said that most processing will be done in external processes. If these
processes are stateful, this might be hard to integrate with Flink's
fault-tolerance mechanism.
In principle, Flink requires two things to achieve exactly-once processing:
1) A data source that can be replayed from
Hi Saliya,
yes that is possible, however the requirements for reading a binary file
from local fs are basically the same as for reading it from HDSF.
In order to be able to start reading different sections of a file in
parallel, you need to know the different starting positions. This can be
done b
o,
> the file is replicated across nodes and the reading (mapping) happens only
> once.
>
> Thank you,
> Saliya
>
> On Mon, Jan 25, 2016 at 4:38 AM, Fabian Hueske wrote:
>
>> Hi Saliya,
>>
>> yes that is possible, however the requirements for reading a bi
You can start a job and then periodically request and store information
about the running job and vertices from using corresponding REST calls [1].
The data will be in JSON format.
After the job finished, you can stop requesting data.
Next you parse the JSON, extract the information you need and g
Hi,
it is correct that the metrics are collected from the task managers.
In Flink 0.9.1 the metrics are visualized as charts in the web dashboard.
This visualization was removed when the dashboard was redesigned and
updated for 0.10. but will be hopefully be added again.
For Flink 0.9.1, the metr
Hi,
this is currently not support yet. However, this feature is on our roadmap
and has been requested for a few times.
So I hope somebody will pick it up soon.
If the static data set is small enough, you can read the full data set
(e.g., as a file) in the open method of FlatMapFunction, build a
h
Hi Emmanuel,
the feature you are looking for is called event time processing in Flink.
These blog posts should help you to become familiar with the concepts:
1) Event-Time concepts:
http://data-artisans.com/how-apache-flink-enables-new-streaming-applications-part-1/
2) Windows in Flink:
http://fl
gt; I did not find the network usage metric in it.
>
> Best,
> Phil
>
> On Mon, Jan 25, 2016 at 5:06 PM, Fabian Hueske wrote:
>
>> You can start a job and then periodically request and store information
>> about the running job and vertices from using corresponding RE
Hi Sana,
The feature you are looking for is called event time processing in Flink.
These blog posts should help you to become familiar with the concepts:
1) Event-Time concepts:
http://data-artisans.com/how-apache-flink-enables-new-streaming-applications-part-1/
2) Windows in Flink:
http://flink.
Hi Flavio,
using a default FileOutputFormat, Flink writes one output file for each
data sink task, i.e., as many files as the defined parallelism.
The size of these files depends on the total output size and the
distribution. If you write to HDFS, a file consists of one or more HDFS
blocks.
Parque
e InputSplits. Am I right? Or am I misunderstanding something?
>
> Best,
> Flavio
>
> On Fri, Jan 29, 2016 at 11:14 AM, Fabian Hueske wrote:
>
>> Hi Flavio,
>>
>> using a default FileOutputFormat, Flink writes one output file for each
>> data sink task, i.e.
ile), right?
>
>
> On Fri, Jan 29, 2016 at 11:56 AM, Fabian Hueske wrote:
>
>> The number of input splits does not depend on the number of files but on
>> the number of HDFS blocks of all files.
>> Reading a single file with 100 HDFS blocks and reading of 100 files wit
Hi,
1) At the moment, state is kept on the JVM heap in a regular HashMap.
However, we added an interface for pluggable state backends. State backends
store the operator state (Flink's built-in window operators are based on
operator state as well). A pull request to add a RocksDB backend (going to
Hi Flavio,
we use tags to identify releases. The "release-0.10.1" tag, refers to the
code that has been released as Flink 0.10.1.
The "release-0.10" branch is used to develop 0.10 releases. Currently, it
contains Flink 0.10.1 and additionally a few more bug fix commits. We will
fork off this branc
Hi Arnauld,
in a previous mail you said:
"Note that I did not rebuild & reinstall flink, I just used a 0.10-SNAPSHOT
compiled jar submitted as a batch job using the "0.10.0" flink installation"
This will not fix the Netty version error. You need to install a new Flink
version or submit the Flink
> However, I’ve just recompiled everything and ran with a real 0.10.1
> snapshot and everything worked at an astounding speed with a reasonable
> memory amount.
>
> Thanks for the great work and the help, as always,
>
> Arnaud
>
>
>
> *De :* Fabian Hueske [mailto:fh
Hi Anastasiia,
this is difficult because the input is usually read in parallel, i.e., an
input file is split into several blogs which are independently read and
processed by different threads (possibly on different machines). So it is
difficult to have a sequential row number.
If all rows have th
Soumya Simanta :
> Fabian,
>
> Thank a lot for your response. Really appreciated. I've some additional
> questions (please see inline)
>
> On Wed, Feb 3, 2016 at 2:42 PM, Fabian Hueske wrote:
>
>> Hi,
>>
>> 1) At the moment, state is kept on the JVM h
You can get the end time of a window from the TimeWindow object which is
passed to the AllWindowFunction. This is basically a window ID / index.
I would go for a custom output sink which writes records to files based on
their timestamp.
IMO, this would be cleaner & easier than implementing the file
Hi,
please try to replace
DataSet ds = env.createInput(sif);
by
DataSet ds = env.createInput(sif,
ValueTypeInfo.SHORT_VALUE_TYPE_INFO);
Best, Fabian
2016-02-08 19:33 GMT+01:00 Saliya Ekanayake :
> Till,
>
> I am still having trouble getting this to work. Here's my code (
> https://github.com/es
java:169)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
> at java.lang.Thread.run(Thread.java:745)
>
> On Mon, Feb 8, 2016 at 3:50 PM, Fabian Hueske wrote:
>
>> Hi,
>>
>> please try to replace
>> DataSet ds = env.createInput(sif);
>>
Hi,
where did you observe the duplicates, within Flink or in Kafka?
Please be aware that the Flink Kafka Producer does not provide exactly-once
consistency. This is not easily possible because Kafka does not support
transactional writes yet.
Flink's exactly-once guarantees are only valid within t
What is the type of sessionId?
It must be a key type in order to be used as key. If it is a generic class,
it must implement Comparable to be used as key.
2016-02-09 11:53 GMT+01:00 Dominique Rondé :
> The fields in SourceA and SourceB are private but have public getters and
> setters. The classe
1 - 100 of 1728 matches
Mail list logo