Hello -
An application that uses Akka 2.5 and Flink 1.8.0 gives runtime errors
because of version mismatch between Akka that we use and the one that Flink
uses (which is Akka 2.4). Anyone tried shading Akka dependency with Flink ?
Or is there any other alternative way to handle this issue ? I kno
Hi Andres,
In that case you should use `flatMap` method instead of `map` method.
`flatMap` method allows you to return multiple elements and collect them
all into one DS. This applies even if you have multiple contents in your
DS.
public static void main(String[] args) throws Exception {
Strea
Hi,
Flink acquires these 'Status_JVM_Memory' metrics through the MXBean
library. According to MXBean document, non-heap is "the Java virtual
machine manages memory other than the heap (referred as non-heap memory)".
Not sure whether that is equivalent to the metaspace. If the
'-XX:MaxMetaspaceSize
Hello Weng,
This definitely helps a lot, however I know my initial DS has a single row
content then I would in theory just create a DS which is what I need. That
is why I need to know how to create a new environment DS within a map
function.
thanks so much
On Tue, Jul 23, 2019 at 11:41 PM Caizh
Hi Andres,
Thanks for the detailed explanation.
but apparently I can't create a new DS within a map function
If you create a new DS within the map function, then you'll create as many
DSs as the number of elements in the old DS which... doesn't seem to be
your desired situation? I suppose you w
Hello,
Let me list properly the questions I have:
* How to catch into a string the content of a DataStream? about this point
basically I have a DS , the only way how I can use the content is
within a map function , print , store the content somewhere or SQL queries.
The point is that I need the c
Hi Andres,
Sorry I can't quite get your question... Do you mean that how to spilt the
string into fields?
There is a `split` method in java. You can give it a regexp and it will
return an array containing all the split fields.
Andres Angel 于2019年7月24日周三 上午10:28写道:
> Hello Weng,
>
> thanks for
Hello Weng,
thanks for your reply, however I'm struggling to somehow read the content
of my DS with the payload that defines how many fields the message contains
into a String. That is the reason why I thought into a map function for
that DS.
The Tuple part can change overtime can even pass from
Hi Andres,
Are the payloads strings? If yes, one method is that you can store them as
strings and process it further with user defined functions when you need to
use them.
Another method is that you can store them into arrays.
Also, if the type of the first 3 fields are the same for the first an
Hi alaa,
In the KMeans example, in each iteration the new centers is computed in a
map-reduce pattern. Each task maintains a part of points and it first choose
the new center for each point, and then the new center of the sum(point) and
num(point) is computed in the CentroidAccumulator, an
Hi Fanbin,
Fabian is right, it should be a watermark problem. Probably, some tasks of
the source don't have enough data to advance the watermark. Furthermore,
you could also monitor event time through Flink web interface.
I have answered a similar question on stackoverflow, see more details
here[1
Hello everyone,
I need to read an element from my DS and according to the content create on
the flight a new DS and register it as new EnvironmentTable.
I'm using the map function for my input DS, however when I try to use the
variable env(environment, in my case StreamExecutionEnvironment ) I ca
This has been fixed now, something weird is that according to the
documentation , I might include around 4 maven packages to properly work
along with the TABLE/SQL API
https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/ .
However , I solved my issue working without :
Hello everyone,
I need to create dynamically the size of my Tuple that feeds a DS, let me
explain it better. Let's assume the first payload I read has this format
"filed1,field2,field3", then this might require a Tuple3<> but my payload
later can be "field1,field2,field3,field4" then my Tuple migh
Hi Andres,
Can you print your entire code (including the import section) in this post?
It might be that this Exception has something to do with your import. If
you are coding in a Java environment then you should import
StreamTableEnvironment.java not StreamTableEnvironment.scala.
Andres Angel 于
If I use proctime, the groupBy happens without any delay.
On Tue, Jul 23, 2019 at 10:16 AM Fanbin Bu wrote:
> not sure whether this is related:
>
> public SingleOutputStreamOperator assignTimestampsAndWatermarks(
> AssignerWithPeriodicWatermarks timestampAndWatermarkAssigner) {
>
>// m
not sure whether this is related:
public SingleOutputStreamOperator assignTimestampsAndWatermarks(
AssignerWithPeriodicWatermarks timestampAndWatermarkAssigner) {
// match parallelism to input, otherwise dop=1 sources could lead
to some strange
// behaviour: the watermark will creep a
Thanks Fabian for the prompt reply. I just started using Flink and this is
a great community.
The watermark setting is only accounting for 10 sec delay. Besides that,
the local job on IntelliJ is running fine without issues.
Here is the code:
class EventTimestampExtractor(slack: Long = 0L) extend
Hello guys I'm working on Java environment and I have a sample code as:
Table schemafit = tenv.sqlQuery("Here is my query");
I need to turn this into a DS to print and any other transformation then I
doing a sort of:
DataStream resultSet = tenv.toAppendStream(schemafit, Row.class);
resultSet.pr
Looks like this is the issue:
https://issues.apache.org/jira/browse/FLINK-11164
We'll try switching to 1.8 and see if it helps.
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Sure:
/--> AsyncIO --\
STREAM --> ProcessFunc -- -- Union -- WindowFunc
\--/
ProcessFunc keeps track of the unique keys per window duration and emits
each
Hi Shuyi,
I think there were some discussions in the mailing list [1,2] and JIRA
tickets [3,4] that might be related.
Since the table-blink planner doesn't produce such error, I think this
problem is valid and should be fixed.
Thanks,
Rong
[1]
http://apache-flink-user-mailing-list-archive.233605
Using that classifier worked, the code builds fine now, thanks a lot. I'm
using 1.8.0 by the way
Greetings,
Juan
On Tue, Jul 23, 2019 at 5:06 AM Haibo Sun wrote:
> Hi, Juan
>
> It is dependent on "flink-runtime-*-tests.jar", so build.sbt should be
> modified as follows:
>
> *scalaVersion := "
Hi Dongwon,
Sorry for the late reply. I did try some experiment and seems like you are
right:
Setting the `.return()` type actually alter the underlying type of the
DataStream from a GenericType into a specific RowTypeInfo. Please see the
JIRA ticket [1] for more info.
Regarding the approach, yes
Hi Bao,
Thanks for your answer.
1. Integration tests for my project.
2. Both data stream and data sets
On Mon, Jul 22, 2019 at 11:44 PM Biao Liu wrote:
> Hi Juan,
>
> I'm not sure what you really want. Before giving some suggestions, could
> you answer the questions below first?
>
> 1. Do yo
For each key I need to call an external REST service to get the current
status and this is why I'd like to use Async IO. At the moment I do this in
a process function but I'd like a cleaner solution (if possible).
Do you think your proposal of forking could be a better option?
Could you provide a s
Hi Pedro,
each pattern gets translated into one or more Flink operators. Hence, your
Flink program becomes *very* large and requires much more time to be
deployed.
Hence, the timeout.
I'd try to limit the size your job by grouping your patterns and creating
an own job for each group.
You can also
Indeed Kafka connect is perfect but I think Flink could easily do the same
without much work..this is what I'm asking for..if anybody has never
thought about it
Thank you for responding! I'll subscribe to dev@
---
Oytun Tez
*M O T A W O R D*
The World's Fastest Human Translation Platform.
oy...@motaword.com — www.motaword.com
On Tue, Jul 23, 2019 at 10:25 AM Timo Walther wrote:
> Hi Oytun,
>
> the community is working hard to release 1.9. You can see
Hi Oytun,
the community is working hard to release 1.9. You can see the progress
here [1] and on the dev@ mailing list.
[1]
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=328&view=detail
Regards,
Timo
Am 23.07.19 um 15:52 schrieb Oytun Tez:
Ping, any estimates?
---
Oytun
I've looked into this problem a little bit more. And it looks like the
problem is caused by some problem with Kinesis sink. There is an exception
in the logs at the moment in time when the job gets restored after being
stalled for about 15 minutes:
Encountered an unexpected expired iterator
AA
Ping, any estimates?
---
Oytun Tez
*M O T A W O R D*
The World's Fastest Human Translation Platform.
oy...@motaword.com — www.motaword.com
On Thu, Jul 18, 2019 at 11:07 AM Oytun Tez wrote:
> Hi team,
>
> 1.9 is bringing very exciting updates, State Processor API and MapState
> migrations bein
Ping, any ideas?
---
Oytun Tez
*M O T A W O R D*
The World's Fastest Human Translation Platform.
oy...@motaword.com — www.motaword.com
On Mon, Jul 22, 2019 at 9:39 AM Oytun Tez wrote:
> I did take a look at it, but things got out of hand very quickly from
> there on :D
>
> I see that WebSubmi
I agree with Robert – localization (this is what we do at MotaWord) is a
maintenance work. If not maintained as well as mainstream, it will only
damage and distant devs that use those local websites.
Re: comments, I don't think people will really discuss furiously. But we at
least need a system wh
Hi Tim,
I think this might be a useful sink for small interactions with outside.
Are you planning to open source this? If yes, can you try to make it
agnostic so that people can plug in their own WebSocket protocol – Stomp
etc? :) We can publish this in the upcoming community website as an
extensi
OK, I see. What information will be send out via the async request?
Maybe you can fork of a separate stream with the info that needs to be send
to the external service and later union the result with the main stream
before the window operator?
Am Di., 23. Juli 2019 um 14:12 Uhr schrieb Flavio Po
Hi Fabian,
Thanks for clarification :-)
I could convert back and forth without worrying about it as I keep using
Row type during the conversion (even though fields are added).
Best,
Dongwon
On Tue, Jul 23, 2019 at 8:15 PM Fabian Hueske wrote:
> Hi Dongwon,
>
> regarding the question about t
I'll take a look.
Cheers,
Till
On Tue, Jul 23, 2019 at 3:07 PM Oleksandr Nitavskyi
wrote:
> Hey guys.
>
>
>
> We have also made implementation in Flink on Mesos component in order to
> support network bandwidth configuration.
>
>
>
> Will somebody be able to have a look on our PR:
> *https://g
Hi Richard,
it looks as if the zNode of a completed job has not been properly removed.
Without the logs of the respective JobMaster, it is hard to debug any
further. However, I suspect that this is an instance of FLINK-11665. I am
currently working on a fix for it.
[1] https://issues.apache.org/j
Good to know that you were able to fix the issue!
I definitely agree that it would be good to know why this situation
occurred.
Am Di., 23. Juli 2019 um 14:38 Uhr schrieb Richard Deurwaarder <
rich...@xeli.eu>:
> Hi Fabian,
>
> I followed the advice of another flink user who mailed me directly,
Hey guys.
We have also made implementation in Flink on Mesos component in order to
support network bandwidth configuration.
Will somebody be able to have a look on our PR:
https://github.com/apache/flink/pull/8652
There are for sure some details to clarify.
Cheers
Oleksandr
From: Till Rohrman
Hallo
I have used this k means code on Flink
https://github.com/apache/flink/blob/master/flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java/clustering/KMeans.java
and I would to add noise that follows Laplace distribution to the sum of
data item and to the number
Hi,
We're running a relatively simply Flink application that uses a bunch of
state in RocksDB on Kubernetes.
During the course of development and going to production, we found that we
were often running into memory issues made apparent by Kubernetes OOMKilled
and Java OOM log events.
In order to
Hi Fabian,
I followed the advice of another flink user who mailed me directly, he has
the same problem and told me to use something like: rmr zgrep
/flink/hunch/jobgraphs/1dccee15d84e1d2cededf89758ac2482
which allowed us to start the job again.
It might be nice to investigate what went wrong as i
Anyone else having experience on this topic that could provide additional
feedback here?
On Thu, Jul 18, 2019 at 1:18 PM Flavio Pompermaier
wrote:
> I think that using Kafka to get CDC events is fine. The problem, in my
> case, is really about how to proceed:
> 1) do I need to create Flink tabl
The problem of bundling all records together within a window is that this
solution doesn't scale (in the case of large time windows and number of
events)..my requirement could be fulfilled by a keyed ProcessFunction but I
think AsyncDataStream should provide a first-class support to keyed streams
(
Hi, Juan
It is dependent on "flink-runtime-*-tests.jar", so build.sbt should be modified
as follows:
scalaVersion := "2.11.0"
val flinkVersion = "1.8.1"
libraryDependencies ++= Seq(
"org.apache.flink" %% "flink-test-utils" % flinkVersion % Test,
"org.apache.flink" %% "flink-runtime
Hi,
Right now it is not possible to mix batch and streaming environments in a
job.
You would need to implement the batch logic via the streaming API which is
not always straightforward.
However, the Flink community is spending a lot of effort on unifying batch
and stream processing. So this will
Hi Dongwon,
regarding the question about the conversion: If you keep using the Row type
and not adding/removing fields, the conversion is pretty much for free
right now.
It will be a MapFunction (sometimes even not function at all) that should
be chained with the other operators. Hence, it should
Hi Tim,
One thing that might be interesting is that Flink might emit results more
than once when a job recovers from a failure.
It is up to the receiver to deal with that.
Depending on the type of results this might be easy (idempotent updates) or
impossible.
Best, Fabian
Am Fr., 19. Juli 2019
I would like to have a KTable, or maybe in Flink term a dynamic Table, that
only contains the latest value for each keyed record. This would allow me
to perform aggregation and join, based on the latest state of every record,
as opposed to every record over time, or a period of time.
On Sun, Jul 2
Hi Flavio,
Not sure I understood the requirements correctly.
Couldn't you just collect and bundle all records with a regular window
operator and forward one record for each key-window to an AsyncIO operator?
Best, Fabian
Am Do., 18. Juli 2019 um 12:20 Uhr schrieb Flavio Pompermaier <
pomperma...
I agree but you have to know in which jar a job is contained..when you
upload the jar on our application you immediately know the qualified name
of the job class and in which jar it belongs to. I think that when you
upload a jar on Flink, Flink should list all available jobs inside it
(IMHO)..it c
Hi Richard,
I hope you could resolve the problem in the meantime.
Nonetheless, maybe Till (in CC) has an idea what could have gone wrong.
Best, Fabian
Am Mi., 17. Juli 2019 um 19:50 Uhr schrieb Richard Deurwaarder <
rich...@xeli.eu>:
> Hello,
>
> I've got a problem with our flink cluster where
Hi John,
You could implement your own n-ary Either type.
It's a bit of work because you'd need also a custom TypeInfo & Serializer
but rather straightforward if you follow the implementation of Either.
Best,
Fabian
Am Mi., 17. Juli 2019 um 16:28 Uhr schrieb John Tipper <
john_tip...@hotmail.com>
Hi Peter,
The performance drops probably be due to de/serialization.
When tasks are chained, records are simply forwarded as Java objects via
method calls.
When a task chain in broken into multiple operators, the records (Java
objects) are serialized by the sending task, possibly shipped over the
IIUC the list of jobs contained in jar means the jobs you defined in the
pipeline. Then I don't think it is flink's responsibility to maintain the
job list info, it is the job scheduler that define the pipeline. So the job
scheduler should maintain the job list.
Flavio Pompermaier 于2019年7月23日周二
Hi Yitzchak,
Thanks for reaching out.
I'm not an expert on the Kafka consumer, but I think the number of
partitions and the number of source tasks might be interesting to know.
Maybe Gordon (in CC) has an idea of what's going wrong here.
Best, Fabian
Am Di., 23. Juli 2019 um 08:50 Uhr schrieb Y
Hi Juan,
Which Flink version do you use?
Best, Fabian
Am Di., 23. Juli 2019 um 06:49 Uhr schrieb Juan Rodríguez Hortalá <
juan.rodriguez.hort...@gmail.com>:
> Hi,
>
> I'm trying to use AbstractTestBase in a test in order to use the mini
> cluster. I'm using specs2 with Scala, so I cannot extend
Hi Fanbin,
The delay is most likely caused by the watermark delay.
A window is computed when the watermark passes the end of the window. If
you configured the watermark to be 10 minutes before the current max
timestamp (probably to account for out of order data), then the window will
be computed w
The jobs are somehow related to each other in the sense that we have a
configurable pipeline where there are optional steps you can enable/disable
(and thus we create a single big jar).
Because of this, we have our application REST service that actually works
also as a job scheduler and use the job
Thanks a lot Marta for offering to write a blog post about the community
site!
I'm not sure if multi-language support for the site is a good idea. I see
the packages site as something similar to GitHub or Jira. The page itself
contains very view things we could actually translate. The package owne
Hi Zili,
here is the release notes for 1.8.1
https://flink.apache.org/news/2019/07/02/release-1.8.1.html
But I could not find any ticket related to the "unexpected time-consuming",
I have just tested our application with both versions, this issue is be
able to reproduce every time with version 1.8
Thanks Flavio,
I get most of your points except one
- Get the list of jobs contained in jar (ideally this is is true for
every engine beyond Spark or Flink)
Just curious to know how you submit job via rest api, if there're multiple
jobs in one jar, then do you need to submit jar one time a
Hi Andrew
FilesCreated = CreateFileOps + FsDirMkdirOp Please refer to [1] and [2] to know
the meaning of this metrics.
[1]
https://github.com/apache/hadoop/blob/377f95bbe8d2d171b5d7b0bfa7559e67ca4aae46/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirMk
Hi Jeff, the thing about the manifest is really about to have a way to list
multiple main classes in the jart (without the need to inspect every Java
class or forcing a 1-to-1 between jar and job like it is now).
My requirements were driven by the UI we're using in our framework:
- Get the list
67 matches
Mail list logo