Hi,
how are you executing your code? From an IDE or on a running Flink instance?
If you execute it on a running Flink instance, you have to look into the
.out files of the task managers (located in ./log/).
Best, Fabian
2016-10-06 22:08 GMT+02:00 drystan mazur :
> Hello I am reading a csv file
Hi Greg,
print is only eagerly executed for DataSet programs.
In the DataStream API, print() just appends a print sink and execute() is
required to trigger an execution.
2016-10-06 22:40 GMT+02:00 Greg Hogan :
> The program executes when you call print (same for collect), which is why
> you are
Maybe this can be done by assigning the same window id to each of the N
local windows, and do a
.keyBy(windowId)
.countWindow(N)
This should create a new global window for each window id and collect all N
windows.
Best, Fabian
2016-10-06 22:39 GMT+02:00 AJ Heller :
> The goal is:
> * to split
in
> the mapper that increments on every map call). It works, but by any chance
> is there a more succinct way to do it?
>
> On Thu, Oct 6, 2016 at 1:50 PM, Fabian Hueske wrote:
>
>> Maybe this can be done by assigning the same window id to each of the N
>> local windows, an
As the exception says the class
org.apache.flink.api.scala.io.jdbc.JDBCInputFormat does not exist.
You have to do:
import org.apache.flink.api.java.io.jdbc.JDBCInputFormat
There is no Scala implementation of this class but you can also use Java
classes in Scala.
2016-10-07 21:38 GMT+02:00 Alber
> Your solution compile with out errors, but IncludedFields Isn't working:
> [image: Imágenes integradas 1]
>
> The output is incorrect:
> [image: Imágenes integradas 2]
>
> The correct result must be only 1º Column
> (a,1)
> (aa,1)
>
> 2016-10-06 21:37 GMT+02:
","
> ,includedFields = Array(2))
> val counts4 = text3
> .map { (_, 1) }
> .groupBy(0)
> .sum(1)
> counts4.print()
>
> The result is:
> [image: Imágenes integradas 1]
>
> Can you see any bug in mi code to read only 1º column ¿?
>
>
> 2
Hi Rashmi,
as Marton said, you do not need to start a local Flink instance
(start-lcoal.bat) if you want to run programs from your IDE.
Maybe running a local instance causes a conflict when starting an instance
from IDE.
Developing and running Flink programs on Windows should work, both from the
I
Hi,
you can do it like this:
1) you have to split each label record of the main dataset into separate
records:
(0,List(a, b, c, d, e, f, g)) -> (0, a), (0, b), (0, c), ..., (0, g)
(1,List(b, c, f, a, g)) -> (1, b), (1, c), ..., (1, g)
2) join word index dataset with splitted main dataset:
Data
Hi Ken,
I think your solution should work.
You need to make sure though, that you properly manage the state of your
function, i.e., memorize all records which have been received but haven't
be emitted yet.
Otherwise records might get lost in case of a failure.
Alternatively, you can implement thi
rd = super.nextRecord(record);
> } catch (IOException e) {
> e.printStackTrace();
> }
> } while (returnRecord == null && !reachedEnd());
> return returnRecord;
> }
> }
>
> Thanks,
> Yassine
>
&g
ead of (2016-08-31
>> 12:08:11.223, 000)
>>
>> DataSet> withReadCSV =
>> env.readCsvFile("C:\\Users\\yassine\\Desktop\\test.csv")
>> .ignoreFirstLine()
>> .fieldDelimiter(",")
>> .includeFields("101")
>>
Hi Shannon,
I tried to reproduce the problem in a unit test without success.
My test configures a HadoopOutputFormat object, serializes and deserializes
it, cally open, and verifies that a configured String property is present
in the getRecordWriter() method.
Next I would try to reproduce the err
Hi Pedro,
support for window aggregations in SQL and Table API is currently work in
progress.
We have a pull request for the Table API and will add this feature for the
next release.
For SQL we depend on Apache Calcite to include the TUMBLE keyword in its
parser and optimizer.
At the moment the o
Hi Pedro,
the DataStream program would like this:
val eventData: DataStream[?] = ???
val result = eventData
.filter("action = denied")
.keyBy("user", "ip")
.timeWindow(Time.hours(1))
.apply("window.end, user, ip, count(*)")
.filter("count > 5")
.map("windowEnd, user, ip")
Note, this
Hi Ken,
FYI: we just received a pull request for FLIP-12 [1].
Best, Fabian
[1] https://github.com/apache/flink/pull/2629
2016-10-11 9:35 GMT+02:00 Fabian Hueske :
> Hi Ken,
>
> I think your solution should work.
> You need to make sure though, that you properly manage the s
apply() accepts a WindowFunction which is essentially the same as a
GroupReduceFunction, i.e., you have an iterator over all events in the
window.
If you only want to count, you should have a look at incremental window
aggregation with a ReduceFunction or FoldFunction [1].
Best, Fabian
[1]
https:
Hi everybody,
I would like to propose to deprecate the utility methods to read data with
Hadoop InputFormats from the (batch) ExecutionEnvironment.
The motivation for deprecating these methods is reduce Flink's dependency
on Hadoop but rather have Hadoop as an optional dependency for users that
a
Hi Santlal,
I'm afraid I don't know what is going wrong here either.
Debugging and correctly configuring the Taps was one of the major obstacles
when implementing the connector.
Best, Fabian
2016-10-14 14:40 GMT+02:00 Aljoscha Krettek :
> +Fabian directly looping in Fabian since he worked on th
tInputFormat and CqlBulkOutputFormat in Flink (although we won't
> be using CqlBulkOutputFormat any longer because it doesn't seem to be
> reliable).
>
> -Shannon
>
> From: Fabian Hueske
> Date: Friday, October 14, 2016 at 4:29 AM
> To: , "d...@flink.apache
Hi Yassine,
the difference is the following:
1) The BoundedOutOfOrdernessTimestampExtractor is a built-in timestamp
extractor and watermark assigner.
A timestamp extractor tells Flink when an event happened, i.e., it extracts
a timestamp from the event. A watermark assigner tells Flink what the
c
evaluated and late events are processed alone, i.e.,
in my example <12:09, G> would be processed without [A, B, C, D].
When the allowed lateness is passed, all window state is purged regardless
of the trigger.
Best, Fabian
2016-10-17 16:24 GMT+02:00 Fabian Hueske :
> Hi Yassine,
>
>
Hi Pedro,
The sql() method calls the Calcite parser in line 129.
Best, Fabian
2016-10-17 16:43 GMT+02:00 PedroMrChaves :
> Hello,
>
> I am pretty new to Apache Flink.
>
> I am trying to figure out how does Flink parses an Apache Calcite sql query
> to its own Streaming API in order to maybe ext
The translation is done in multiple stages.
1. Parsing (syntax check)
2. Validation (semantic check)
3. Query optimization (rule and cost based)
4. Generation of physical plan, incl. code generation (DataStream program)
The final translation happens in the DataStream nodes, e.g., DataStreamCalc
[
The error message suggests that Flink tries to resolve "D:" as a file
system schema such as "file:" or "hdfs:".
Can you try to use specify your path as "file:/D:/dir/myfile.csv"?
Best, Fabian
2016-10-20 14:41 GMT+02:00 Radu Tudoran :
> Hi,
>
>
>
> I know that Flink in general supports files als
Hi Robert,
it is certainly possible to feed the same DataStream into two (or more)
operators.
Both operators should then process the complete input stream.
What you describe is an unintended behavior.
Can you explain how you figure out that both window operators only receive
half of the events?
> EventTimeSessionWindow, or perhaps create a custom EventTrigger, to force a
> session to close after either X seconds of inactivity or Y seconds of
> duration (or perhaps after Z events).
>
>
>
>
>
>
>
> *From: *Fabian Hueske
> *Reply-To: *"user@f
d get rid of my heap issues.
>
>
>
> Thanks!
>
>
>
> *From: *Fabian Hueske
> *Reply-To: *"user@flink.apache.org"
> *Date: *Monday, October 24, 2016 at 2:27 PM
>
> *To: *"user@flink.apache.org"
> *Subject: *Re: multiple processing of st
The window is evaluated when a watermark arrives that is behind the
window's end time.
For instance, give the window in your example there are windows that end at
1:00:00, 1:00:30, 1:01:00, 1:01:30, ... (every 30 seconds).
given the windows above, the window from 00:59:00 to 1:00:00 will be
evalua
hat is inserted into the window
> and when a previously registered timer times out"*
> Thanks !!
>
> 2016-10-24 20:45 GMT+02:00 Fabian Hueske :
>
>> The window is evaluated when a watermark arrives that is behind the
>> window's end time.
>>
>> For inst
Hi Yassine,
I thought I had fixed that bug a few weeks a ago, but apparently the fix
did not catch all cases.
Can you please reopen FLINK-2662 and post the program to reproduce the bug
there?
Thanks,
Fabian
[1] https://issues.apache.org/jira/browse/FLINK-2662
2016-10-25 12:33 GMT+02:00 Yassine
Hi Paul,
Flink pushes the results of operators (including GroupReduce) to the next
operator or sink as soon as they are computed. So what you are asking for
is actually happening.
However, before the GroupReduceFunction can be applied, the whole data is
sorted in order to group the data. This step
Hi Radu,
I might not have complete understood your problem, but if you do
val env = StreamExecutionEnvironment.getExecutionEnvironment
val tEnv = TableEnvironment.getTableEnvironment(env)
val ds = env.fromElements( (1, 1L, new Time(1,2,3)) )
val t = ds.toTable(tEnv, 'a, 'b, 'c)
val results = t
Hi,
a NoSuchMethod indicates that you are using incompatible versions.
You should check that the versions of your job dependencies and the version
cluster you want to run the job on are the same.
Best, Fabian
2016-10-27 7:13 GMT+02:00 NagaSaiPradeep :
> Hi,
> I am working on connecting Flink
its attachments contain confidential information from
> HUAWEI, which is intended only for the person or entity whose address is
> listed above. Any use of the information contained herein in any way
> (including, but not limited to, total or partial disclosure, reproduction,
> or dis
Wouldn't that be orthogonal to adding it to the TypeInfoParser?
2016-10-27 15:22 GMT+02:00 Greg Hogan :
> Fabian,
>
> Should we instead add this as a registered TypeInfoFactory?
>
> Greg
>
> On Thu, Oct 27, 2016 at 3:55 AM, Fabian Hueske wrote:
>
>> Yes, I thi
Hi Luis,
these blogposts should help you with the periodic partial result trigger
[1] [2].
Regarding the second question:
Time windows are by default aligned to 1970-01-01-00:00:00.
So a 24 hour window will always start at 00:00.
Best, Fabian
[1] http://flink.apache.org/news/2015/12/04/Introduc
Hi Niklas,
I don't know exactly what is going wrong there, but I have a few pointers
for you:
1) in cluster setups, Flink redirects println() to ./log/*.out files, i.e,
you have to search for the task manager that ran the DirReader and check
its ./log/*.out file
2) you are using Java's File class
Hi,
a MapFunction should be the way to go for this use case.
What exactly is not working? Do you get an exception? Is the map method not
called?
Best, Fabian
2016-11-03 0:00 GMT+01:00 Sandeep Vakacharla :
> Hi there,
>
>
>
> I have the following use case-
>
>
>
> I have data coming from Kafka w
Hi Dominik,
the discussion about the 1.2 release was started on the dev mailing list
[1] about 2 weeks ago.
So far the proposed timeline is have a release in mid December.
Best, Fabian
[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Schedule-and-Scope-for-Flink-1-2-tp
Yes, that is true for Flink 1.1.x.
The upcoming Flink 1.2.0 release will be able to restore jobs from
savepoints with different parallelism.
2016-11-04 11:24 GMT+01:00 Renjie Liu :
> Hi, all:
> It seems that flink's checkpoint mechanism saves state per partition.
> However, if I want to change c
will not lose any state.
Best, Fabian
2016-11-04 12:26 GMT+01:00 Renjie Liu :
> Hi, Fabian:
> Will the checkpointed state be restored between streaming jobs?
>
> On Fri, Nov 4, 2016 at 6:47 PM Fabian Hueske wrote:
>
>> Yes, that is true for Flink 1.1.x.
>>
>> The
First of all, the document only proposes semantics for Flink's support of
relational queries on streams.
It does not describe the implementation and in fact most of it is not
implemented.
How the queries will be executed would depend on the definition of the
table, i.e., whether the tables are der
> I'm try to find the video of: http://flink-forward.org/kb_se
> ssions/scaling-stream-processing-with-apache-flink-to-very-large-state/
>
> 2016-11-07 22:02 GMT+01:00 Fabian Hueske :
>
>> First of all, the document only proposes semantics for Flink's support of
Hi,
I encountered this issue before as well.
Which Maven version are you using?
Maven 3.3.x does not properly shade dependencies.
You have to use Maven 3.0.3 (see [1]).
Best, Fabian
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.1/setup/building.html
2016-11-08 11:05 GMT+01:00 T
/16, 7:18 AM, "Steffen Hausmann"
> wrote:
>
> Hi Fabian,
>
> I can confirm that the behaviour is reproducible with both, Maven
> 3.3.9 and Maven 3.0.5.
>
> Cheers,
> Steffen
>
> Am 8. November 2016 11:11:19 MEZ, schrieb Fabian Hue
Hi Stephan,
I just wrote an answer to your SO question.
Best, Fabian
2016-11-10 11:01 GMT+01:00 Stephan Epping :
> Hello,
>
> I found this question in the Nabble archive (http://apache-flink-user-
> mailing-list-archive.2336050.n4.nabble.com/Maintaining-
> watermarks-per-key-instead-of-per-oper
Hi Mich,
at the moment there is not much support handle such data driven exceptions
(badly formatted data, late data, ...).
However, there is a proposal to improve this: FLIP-13 [1]. So it is work in
progress.
It would be very helpful if you could check if the proposal would address
your use case
tioned design?
>
> kind regards,
> Stephan
>
>
> On 11 Nov 2016, at 00:39, Fabian Hueske wrote:
>
> Hi Stephan,
>
> I just wrote an answer to your SO question.
>
> Best, Fabian
>
> 2016-11-10 11:01 GMT+01:00 Stephan Epping :
>
>> Hello,
&g
Hi,
there are no built-in methods to read JSON.
How this can be done depends on the formatting of the JSON.
If you have a collection of JSON objects which are separated by newline
(and there are no other newlines) you can read the file with
env.readTextFile().
This will give you one String per lin
Hi,
it is not possible to mix the DataSet and DataStream APIs at the moment.
If the DataSet is constant and not too big (which I assume, since otherwise
crossing would be extremely expensive), you can load the data into a
stateful MapFunction.
For that you can implement a RichFlatMapFunction and
Hi,
that does not sound like a time window problem because there is not
time-related condition to split the windows.
I think you can implement that with a GlobalWindow and a custom trigger.
The documentation about global windows, triggers, and evictors [1] and this
blogpost [2] might be helpful
O
A common choice is Apache Avro.
You can to define a schema for you Pojos and generate serializers and
deserializers.
2016-11-18 5:11 GMT+01:00 Matt :
> Just to be clear, what I'm looking for is a way to serialize a POJO class
> for Kafka but also for Flink, I'm not sure the interface of both fra
Hi Gabor,
I don't think there is a way to tune the memory settings for specific
operators.
For that you would need to change the memory allocation in the optimizers,
which is possible but not a lightweight change either.
If you want to get something working, you could add a method to the API to
m
Hi Vinay,
not sure why it's not working, but maybe TImo (in CC) can help.
Best, Fabian
2016-11-18 17:41 GMT+01:00 Vinay Patil :
> Hi,
>
> According to JavaDoc if I use the below method
> *env.getConfig().setCodeAnalysisMode(CodeAnalysisMode.HINT);*
>
> ,it will print the program improvements to
Hi,
the result of a window operation on a KeyedStream is a regular DataStream.
So, you would need to call keyBy() on the result again if you'd like to
have a KeyedStream.
You can also key a stream by two or more attributes:
DataStream> windowedStream =
jsonToTuple.keyBy(0,1,3) /
This looks rather like a version conflict. If Curator wasn't on the
classpath it should be a ClassNotFoundException.
Can you check if any of your jobs dependencies depends on a different
Curator version?
Best, Fabian
2016-11-22 12:06 GMT+01:00 Maximilian Michels :
> As far as I know we're shadi
Hi Lydia,
that is certainly possible, however you need to adapt the algorithm a bit.
The straight-forward approach would be to replicate the input data and
assign IDs for each k-means run.
If you have a data point (1, 2, 3) you could replicate it to three data
points (10, 1, 2, 3), (15, 1, 2, 3),
Hi Anastasios,
that's certainly possible. The most straight-forward approach would be a
synchronous call to the database.
Because only one request is active at the same time, you do not need a
thread pool.
You can establish the connection in the open() method of a RichMapFunction.
The problem with
Hi Pedro,
if I read you code correctly, you are not assigning timestamps and
watermarks to the rules stream.
Flink automatically derives watermarks from all streams involved.
If you do not assign a watermark, the default is watermark is
Long.MIN_VALUE which is exactly the value you are observing.
will the concept of generating timestamps/watermarks be applicable
> in this scenario ?
>
>
> On Fri, Nov 18, 2016 at 9:50 AM, Fabian Hueske wrote:
>
>> Hi,
>>
>> that does not sound like a time window problem because there is not
>> time-related condition
Hi Flavio,
I think the easiest solution is to read the CSV file with the
CsvInputFormat and use a subsequent MapPartition to batch 1000 rows
together.
In each partition, you might end up with an incomplete batch.
However, I don't see yet how you can feed these batches into the
JdbcInputFormat whic
pseudocode) of a mapPartition that groups together elements into chunks of
> size n?
>
> Best,
> Flavio
>
> On Mon, Nov 28, 2016 at 8:24 PM, Fabian Hueske wrote:
>
>> Hi Flavio,
>>
>> I think the easiest solution is to read the CSV file with the
>> CsvIn
Hi Diego,
If you want the data of all streams to be written to the same files, you
can also union the streams before sending them to the sink.
Best, Fabian
2016-11-29 15:50 GMT+01:00 Kostas Kloudas :
> Hi Diego,
>
> You cannot prefix each stream with a different
> string so that the paths do no
Hi Miguel,
the exception does indeed indicate that the process ran out of available
disk space.
The quoted paragraph of the blog post describes the situation when you
receive the IOE.
By default the systems default tmp dir is used. I don't know which folder
that would be in a Docker setup.
You ca
Hi Konstantin,
Regarding 2): I've opened FLINK-5227 to update the documentation [1].
Regarding the Row type: The Row type was introduced for flink-table and was
later used by other modules. There is FLINK-5186 to move Row and all the
related TypeInfo (+serializer and comparator) to flink-core [2]
Hi Manu,
As far as I know, there are not plans to change the stand-alone deployment.
FLIP-6 is focusing on deployments via resource providers (YARN, Mesos,
etc.) which allow to start Flink processes per job.
Till (in CC) is more familiar with the FLIP-6 effort and might be able to
add more detail
Hi Miguel,
have you found a solution to your problem?
I'm not a docker expert but this forum thread looks like could be related
to your problem [1].
Best,
Fabian
[1] https://forums.docker.com/t/no-space-left-on-device-error/10894
2016-12-02 17:43 GMT+01:00 Miguel Coimbra :
> Hello Fabian,
>
>
Hi Max,
Tuples in Flink are of fixed length. You can define your own data types and
serializers, but this is not the easiest solution.
I would go for Array types, especially if your data can be primitive types
(long).
The serializer for primitive arrays should be almost as efficient as the
Tuple
ue(), new ArrayList());
> }
> commSizes.get(v.getValue()).add(v.getId());
> }
>
>
> System.out.println("#communities:\t" +
> commSizes.keySet().size() + "\n|result|:\t" + result.count() +
> "\n|col
Hi Arnaud,
Flink does not cache data at the moment.
What happens is that for every day, the complete program is executed, i.e.,
also the program that computes wholeSet.
Each execution should be independent from each other and all temporary data
be cleaned up.
Since Flink executes programs in a pip
Hi,
the heap mem usage should be available via Flink's metrics system.
Not sure if that also captures spilled data. Chesnay (in CC) should know
that.
If the spilled data is not available as a metric, you can try to write a
small script that monitors the directories to which Flink spills (Config
p
of encoded class
> org.apache.flink.runtime.messages.JobManagerMessages$LeaderSessionMessage
> was 79885809 bytes.*
>
> Do you have any idea what this might be?
>
> Kind regards,
>
> Miguel E. Coimbra
> Email: miguel.e.coim...@gmail.com
> Skype: miguel.e.coimbra
>
&g
Hi Gennady,
this bloom filter is actually not distributed and only used internally as
an optimization to reduce the amount of data spilled by a hash join.
So, it is not meant to be user facing and not integrated in any API.
You could of course use the code, but there might be better implementation
The system metrics [1] are only available on a system level, i.e. not for
an individual job.
The reason is that multiple job might run concurrently on the same task
manager JVM process. So it would not be possible to separate their heap
usage.
The same would be true for the approach that monitors t
Hi Matt,
the combination of a tumbling time window and a count window is one way to
define a sliding window.
In your example of a 30 secs tumbling window and a (3,1) count window
results in a time sliding window of 90 secs width and 30 secs slide.
You could define a time sliding window of 90 secs
Hi Yury,
Flink's operators start processing as soon as they receive data. If an
operator produces more data than its successor task can process, the data
is buffered in Flink's network stack, i.e., its network buffers.
The backpressure mechanism kicks in when all network buffers are in use and
no
Your functions do not need to implement RichFunction (although, each
function can be a RichFunction and it should not be a problem to adapt the
job).
The system metrics are automatically collected. Metrics are exposed via a
Reporter [1].
So you do not need to take care of the collection but rather
Hi Yury,
your solution should exactly solve your problem.
An operator sends all outgoing records to all connected successor operators.
There should not be any non-deterministic behavior or splitting of records.
Can you share some example code that produces the non-deterministic
behavior?
Best, F
Hi Gwenhael,
The _SUCCESS files were originally generated by Hadoop for successful jobs.
AFAIK, Spark leverages Hadoop's Input and OutputFormats and seems to have
followed this approach as well to be compatible.
You could use Flink's HadoopOutputFormat which is a wrapper for Hadoop
OutputFormats
Hi Mäki,
some additions to Greg's answer: The flatMapWithState shortcut of the Scala
API uses Flink's key-value state while your TestCounters class uses the
Checkpointed interface.
As Greg said, the checkpointed interface operates on an operator level not
per key. The key-value state automatically
nhael Pasquiers <
gwenhael.pasqui...@ericsson.com>:
> Thanks, it is working properly now.
>
> NB : Had to delete the folder by code because Hadoop’s OuputFormats will
> only overwrite file by file, not the whole folder.
>
>
>
> *From:* Fabian Hueske [mailto:fhue...@gmai
;Records In" values in successor operators
> were not exactly equal which made me think they receive different portions
> of the stream. I believe the inaccuracy is somewhat intrinsic to live
> stream sampling, so that's fine.
>
>
> 2016-12-20 14:35 GMT+03:00 Yury Ruchin
Hi Sandeep,
I'm sorry but I think I do not understand your question.
What do you mean by static or dynamic look ups? Do you want to access an
external data store and cache data?
Can you give a bit more detail about your use?
Best, Fabian
2016-12-20 23:07 GMT+01:00 Meghashyam Sandeep V :
> Hi t
Hi Abiy,
to which type of filesystem are you persisting your checkpoints?
We have seen problems with S3 and its consistency model. These issues have
been addressed in newer versions of Flink.
Not sure if the fix went into 1.1.3 already but release 1.1.4 is currently
voted on and has tons of other
:
> Hi,
>
> Any updates on this thread ?
>
> Regards,
> Vinay Patil
>
> On Fri, Nov 18, 2016 at 10:25 PM, Fabian Hueske-2 [via Apache Flink User
> Mailing List archive.] <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=10722&i=0>> wrote:
Copying my reply from the other thread with the same issue to have the
discussion in one place.
--
Hi Abiy,
to which type of filesystem are you persisting your checkpoints?
We have seen problems with S3 and its consistency model. These issues have
been addressed in newer versions of Flink.
This issue was posted twice to the ML.
The discussion should be continued on the other thread with the
subject "Stateful
Stream Processing with RocksDB causing Job failure"
2016-12-21 9:44 GMT+01:00 Fabian Hueske :
> Hi Abiy,
>
> to which type of filesystem are you persisti
; Thanks,
> Sandeep
>
> On Dec 21, 2016 12:35 AM, "Fabian Hueske" wrote:
>
>> Hi Sandeep,
>>
>> I'm sorry but I think I do not understand your question.
>> What do you mean by static or dynamic look ups? Do you want to access an
>> exte
applying operators on stream from Kafka.
>
> Thanks,
> Sandeep
>
> On Wed, Dec 21, 2016 at 6:17 AM, Fabian Hueske wrote:
>
>> OK, I see. Yes, you can do that with Flink. It's actually a very common
>> use case.
>>
>> You can store the names in operator st
I have a flink streaming process with Kafka
>>> as a source. I only have ids coming from kafka messages. My look ups
>>> () which is a static map come from a different source. I would
>>> like to use those lookups while applying operators on stream from Kafka.
>>>
&
Hi Markus,
thanks for reporting this issue. This bug was introduced when the opt.xml
file was added to the repository a few days ago.
There are two open JIRAs, FLINK-5392 and FLINK-5396, each one with a pull
request to fix the problem.
Best, Fabian
2016-12-26 2:16 GMT+01:00 M. Dale :
> I cloned
Hi Kanagaraj,
I would assume that the issue is caused by this configuration parameter:
taskmanager.memory.segment-size: 131072
I think the maximum possible value given Netty's "writeBufferHighWaterMark"
parameter is 65536.
There might be a way to tune Netty's parameters but I don't know how to d
Hi,
no, broadcast sets are not available in the DataStream API.
There might be other ways to achieve similar functionality, but the optimal
solution depends on the use case.
If you give a few details about what you would like to do, we might be able
to suggest alternatives.
Best, Fabian
2016-12-
Hi Robert,
this is indeed a bit tricky to do. The problem is mostly with the
generation of the input splits, setup of Flink, and the scheduling of tasks.
1) you have to ensure that on each worker at least one DataSource task is
scheduled. The easiest way to do this is to have a bare metal setup (
Twitter]
> <http://twitter.com/verizon> [image: LinkedIn]
> <http://www.linkedin.com/company/verizon>
>
>
>
> *From:* Fabian Hueske [mailto:fhue...@gmail.com]
> *Sent:* Tuesday, December 27, 2016 10:01 AM
> *To:* user@flink.apache.org
> *Cc:* Ufuk Celebi
> *Sub
Hi,
Flink's batch join API only features a binary join.
However, if the join functions have semantic annotations [1], multiple
binary joins on the same attributes can be executed in a pipelined fashion
without additional shuffles or sorts.
Best, Fabian
[1]
https://ci.apache.org/projects/flink/fl
Hi Lawrence,
comparison of binary data are mainly used by the DataSet API when sorting
large data sets or building and probing hash tables.
The DataStream API mainly benefits from Flink's custom and efficient
serialization when sending data over the wire or taking checkpoints.
There are also plan
Hi Henri,
can you express the logic of your FoldFunction (or WindowFunction) as a
combination of ReduceFunction and WindowFunction [1]?
ReduceFunction should be supported by a merging WindowAssigner and has the
same resource consumption as a FoldFunction, i.e., a single record per
window.
Best, F
Hi Konstantin,
the DataSet API tries to execute all operators as soon as possible.
I assume that in your case, Flink does not do this because it tries to
avoid a deadlock.
A dataflow which replicates data from the same source and joins it again
might get deadlocked because all pipelines need to m
801 - 900 of 1728 matches
Mail list logo