Async I/O: preserve stream order for requests on key level

2023-06-09 Thread Juho Autio
I need to make some slower external requests in parallel, so Async I/O helps greatly with that. However, I also need to make the requests in a certain order per key. Is that possible with Async I/O? The documentation[1] talks about preserving the stream order of results, but it doesn't discuss the

Re: Data loss when restoring from savepoint

2019-02-14 Thread Juho Autio
ic)? Is the count reported there correct (no missing data)? > > Cheers, > > Konstantin > > > > > On Wed, Feb 13, 2019 at 3:19 PM Gyula Fóra wrote: > >> Sorry not posting on the mail list was my mistake :/ >> >> >> On Wed, 13 Feb 2019 at 15:01,

Re: Data loss when restoring from savepoint

2019-02-13 Thread Juho Autio
Stefan (or anyone!), please, could I have some feedback on the findings that I reported on Dec 21, 2018? This is still a major blocker.. On Thu, Jan 31, 2019 at 11:46 AM Juho Autio wrote: > Hello, is there anyone that could help with this? > > On Fri, Jan 11, 2019 at 8:14 AM Juho Aut

Re: Data loss when restoring from savepoint

2019-01-31 Thread Juho Autio
Hello, is there anyone that could help with this? On Fri, Jan 11, 2019 at 8:14 AM Juho Autio wrote: > Stefan, would you have time to comment? > > On Wednesday, January 2, 2019, Juho Autio wrote: > >> Bump – does anyone know if Stefan will be available to comment the latest &

Re: Data loss when restoring from savepoint

2019-01-10 Thread Juho Autio
Stefan, would you have time to comment? On Wednesday, January 2, 2019, Juho Autio wrote: > Bump – does anyone know if Stefan will be available to comment the latest > findings? Thanks. > > On Fri, Dec 21, 2018 at 2:33 PM Juho Autio wrote: > >> Stefan, I managed to analyze

Re: Data loss when restoring from savepoint

2019-01-02 Thread Juho Autio
Bump – does anyone know if Stefan will be available to comment the latest findings? Thanks. On Fri, Dec 21, 2018 at 2:33 PM Juho Autio wrote: > Stefan, I managed to analyze savepoint with bravo. It seems that the data > that's missing from output *is* found in savepoint. > >

Re: Data loss when restoring from savepoint

2018-12-21 Thread Juho Autio
me "side effect kafka output" on individual operators. This should allow tracking more closely at which point the data gets lost. However, maybe this would have to be in some Flink's internal components, and I'm not sure which those would be. Cheers, Juho On Mon, Nov 19, 2018 at 1

Re: Data loss when restoring from savepoint

2018-11-19 Thread Juho Autio
18 at 3:32 PM Juho Autio wrote: > I was glad to find that bravo had now been updated to support installing > bravo to a local maven repo. > > I was able to load a checkpoint created by my job, thanks to the example > provided in bravo README, but I'm still missing the es

Re: Data loss when restoring from savepoint

2018-10-23 Thread Juho Autio
window-contents" threw – obviously there's no operator by that name). Cheers, Juho On Mon, Oct 15, 2018 at 2:25 PM Juho Autio wrote: > Hi Stefan, > > Sorry but it doesn't seem immediately clear to me what's a good way to use > https://github.com/king/bravo. > > How

Re: Data loss when restoring from savepoint

2018-10-15 Thread Juho Autio
t (locally) and run it from there (using an IDE, for example)? Also it doesn't seem like the bravo gradle project supports building a flink job jar, but if it does, how do I do it? Thanks. On Thu, Oct 4, 2018 at 9:30 PM Juho Autio wrote: > Good then, I'll try to analyze the savepo

Re: Data loss when restoring from savepoint

2018-10-04 Thread Juho Autio
with most of Flink's internals. Any way high backpressure is not a seen on this job after it has caught up the lag, so at I thought it would be worth mentioning. On Thu, Oct 4, 2018 at 6:24 PM Stefan Richter wrote: > Hi, > > Am 04.10.2018 um 16:08 schrieb Juho Autio : > > &g

Re: Data loss when restoring from savepoint

2018-10-04 Thread Juho Autio
nder if it would not make sense to use a batch job instead? > > Best, > Stefan > > [1] https://github.com/king/bravo > [2] > https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kafka.html#kafka-consumers-start-position-configuration > > Am 04.10.2018 um 14

Re: Data loss when restoring from savepoint

2018-10-04 Thread Juho Autio
e records as input, e.g. tuples of primitive types saved as >> cvs >> - minimal deduplication job which processes them and misses records >> - check if it happens for shorter windows, like 1h etc >> - setup which you use for the job, ideally locally reproducible or cloud &g

Re: Data loss when restoring from savepoint

2018-10-04 Thread Juho Autio
data that is missing from window output. On Mon, Oct 1, 2018 at 11:56 AM Juho Autio wrote: > Hi Andrey, > > To rule out for good any questions about sink behaviour, the job was > killed and started with an additional Kafka sink. > > The same number of ids were missed in

Re: Data loss when restoring from savepoint

2018-10-01 Thread Juho Autio
9 PM Juho Autio wrote: > Thanks, Andrey. > > > so it means that the savepoint does not loose at least some dropped > records. > > I'm not sure what you mean by that? I mean, it was known from the > beginning, that not everything is lost before/after restoring a savepoin

Re: Data loss when restoring from savepoint

2018-09-21 Thread Juho Autio
gt; the sink from the state. > > Another suggestion is to try to write records to some other sink or to > both. > E.g. if you can access file system of workers, maybe just into local files > and check whether the records are also dropped there. > > Best, > Andrey > > O

Re: Data loss when restoring from savepoint

2018-09-20 Thread Juho Autio
t; is key) it can be that you do not get it in the list or exists (HEAD) > returns false and you risk to rewrite the previous part. > > The BucketingSink was designed for a standard file system. s3 is used over > a file system wrapper atm but does not always provide normal file system >

Re: 1.5.1

2018-09-17 Thread Juho Autio
skManager? From the log extracts you sent, I cannot really draw any > conclusions. > > Best, > Gary > > > On Wed, Aug 15, 2018 at 10:38 AM, Juho Autio wrote: > >> Thanks Gary.. >> >> What could be blocking the RPC threads? Slow checkpoin

Re: Data loss when restoring from savepoint

2018-08-29 Thread Juho Autio
og message indicates a border between the events that should > be included into the savepoint (logged before) or not: > “{} ({}, synchronous part) in thread {} took {} ms” (template) > Also check if the savepoint has been overall completed: > "{} ({}, asynchronous part) in thread {}

Re: Data loss when restoring from savepoint

2018-08-24 Thread Juho Autio
] introduced in Flink 1.6.0 > instead of the previous 'BucketingSink’? > > Cheers, > Andrey > > [1] > https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/streamfile_sink.html > > On 24 Aug 2018, at 18:03, Juho Autio wrote: > > Yes, sorry

Re: Data loss when restoring from savepoint

2018-08-24 Thread Juho Autio
Kafka but > before the window result is written into s3. > > Allowed lateness should not affect it, I am just saying that the final > result in s3 should include all records after it. > This is what should be guaranteed but not the contents of the intermediate > savepoint. > >

Re: Data loss when restoring from savepoint

2018-08-24 Thread Juho Autio
oint but it should be behind the snapshotted offset in Kafka. > Then it should just come later after the restore and should be reduced > within the allowed lateness into the final result which is saved into s3. > > Also, is this `DistinctFunction.reduce` just an example or the actual

Re: Data loss when restoring from savepoint

2018-08-23 Thread Juho Autio
I changed to allowedLateness=0, no change, still missing data when restoring from savepoint. On Tue, Aug 21, 2018 at 10:43 AM Juho Autio wrote: > I realized that BucketingSink must not play any role in this problem. This > is because only when the 24-hour window triggers, BucketinSink

Re: Could not cancel job (with savepoint) "Ask timed out"

2018-08-22 Thread Juho Autio
first and then cancel > the job. Thus, the later savepoints might complete or not depending on the > correct timing. Since savepoint can flush results to external systems, I > would recommend not calling the API multiple times. > > Cheers, > Till > > On Wed, Aug 22, 2018 at

State TTL in Flink 1.6.0

2018-08-22 Thread Juho Autio
First, I couldn't find anything about State TTL in Flink docs, is there anything like that? I can manage based on Javadocs & source code, but just wondering. Then to main main question, why doesn't the TTL support event time, and is there any sensible use case for the TTL if the streaming charater

Re: Could not cancel job (with savepoint) "Ask timed out"

2018-08-22 Thread Juho Autio
gt; Because cancelWithSavepoint is actually waiting for savepoint to complete >> synchronization, and then execute the cancel command. >> >> We do not use CLI. I think since you are through the CLI, you can observe >> whether the savepoint is complete by combining the log

Re: Data loss when restoring from savepoint

2018-08-21 Thread Juho Autio
data when restoring a savepoint? On Fri, Aug 17, 2018 at 4:23 PM Juho Autio wrote: > Some data is silently lost on my Flink stream job when state is restored > from a savepoint. > > Do you have any debugging hints to find out where exactly the data gets > dropped? > > My job

Data loss when restoring from savepoint

2018-08-17 Thread Juho Autio
Some data is silently lost on my Flink stream job when state is restored from a savepoint. Do you have any debugging hints to find out where exactly the data gets dropped? My job gathers distinct values using a 24-hour window. It doesn't have any custom state management. When I cancel the job wi

Re: 1.5.1

2018-08-15 Thread Juho Autio
/ops/config.html#web-timeout Cheers, Juho On Wed, Aug 15, 2018 at 11:43 AM Juho Autio wrote: > Vishal, from which version did you upgrade to 1.5.1? Maybe from 1.5.0 > (release)? Knowing that might help narrowing down the source of this. > > On Wed, Aug 15, 2018 at 11:38 AM Juho Aut

Re: 1.5.1

2018-08-15 Thread Juho Autio
Vishal, from which version did you upgrade to 1.5.1? Maybe from 1.5.0 (release)? Knowing that might help narrowing down the source of this. On Wed, Aug 15, 2018 at 11:38 AM Juho Autio wrote: > Thanks Gary.. > > What could be blocking the RPC threads? Slow checkpointing? > > In p

Re: 1.5.1

2018-08-15 Thread Juho Autio
c84dfe69ea0/flink-runtime/src/main/java/org/apache/flink/runtime/heartbeat/HeartbeatManagerSenderImpl.java#L64 > > On Mon, Aug 13, 2018 at 9:52 AM, Juho Autio wrote: > >> I also have jobs failing on a daily basis with the error "Heartbeat of >> TaskManager with id time

Re: 1.5.1

2018-08-13 Thread Juho Autio
I also have jobs failing on a daily basis with the error "Heartbeat of TaskManager with id timed out". I'm using Flink 1.5.2. Could anyone suggest how to debug possible causes? I already set these in flink-conf.yaml, but I'm still getting failures: heartbeat.interval: 1 heartbeat.timeout: 10

Re: Could not cancel job (with savepoint) "Ask timed out"

2018-08-09 Thread Juho Autio
t and multiple users on the mailing list have > encountered similar problems. > > In our environment, it seems that JM shows that the save point is complete > and JM has stopped itself, but the client will still connect to the old JM > and report a timeout exception. > > Thanks, v

Could not cancel job (with savepoint) "Ask timed out"

2018-08-08 Thread Juho Autio
I was trying to cancel a job with savepoint, but the CLI command failed with "akka.pattern.AskTimeoutException: Ask timed out". The stack trace reveals that ask timeout is 10 seconds: Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/jobmanager_0#106635280]] a

Re: Flink kafka consumer stopped committing offsets

2018-06-19 Thread Juho Autio
that the Flink Kafka Consumer does not rely on the committed >>> offsets for fault tolerance guarantees. The committed offsets are only a >>> means to expose the consumer’s progress for monitoring purposes. >>> >>> Can you post full logs from all TaskManagers/JobM

Re: Flink kafka consumer stopped committing offsets

2018-06-11 Thread Juho Autio
ion > /Cannot-auto-commit-offsets-for-group-console-consumer-79720/m-p/51188 > https://stackoverflow.com/questions/42362911/kafka-high-leve > l-consumer-error-code-15/42416232#42416232 > Especially that in your case this offsets committing is superseded by > Kafka coordinator failure. &

Flink kafka consumer stopped committing offsets

2018-06-08 Thread Juho Autio
Hi, We have a Flink stream job that uses Flink kafka consumer. Normally it commits consumer offsets to Kafka. However this stream ended up in a state where it's otherwise working just fine, but it isn't committing offsets to Kafka any more. The job keeps writing correct aggregation results to the

Re: Late data before window end is even close

2018-06-08 Thread Juho Autio
unnecessary components Etc. >> Either such process help you figure out what’s wrong on your own and if >> not, if you share us such minimal program that reproduces the issue, it >> will allow us to debug it. >> >> Piotrek >> >> >> On 11 May 2018,

Re: REST API "broken" on YARN because POST is not allowed via YARN proxy

2018-05-30 Thread Juho Autio
only. One thing which we have to > fix first is that also the jar file upload goes through REST. > > [1] https://issues.apache.org/jira/browse/FLINK-9478 > > Cheers, > Till > > On Wed, May 30, 2018 at 9:07 AM, Juho Autio wrote: > >> Hi, I tried to search Flink Jira

Re: REST API "broken" on YARN because POST is not allowed via YARN proxy

2018-05-30 Thread Juho Autio
at this > improvement will make it into the 1.6 release. > > Cheers, > Till > > On Thu, Apr 5, 2018 at 4:45 PM, Juho Autio wrote: > >> Thanks for the answer. Wrapping with GET sounds good to me. You said next >> version; do you mean that Flink 1.5 would already i

Re: Missing MapState when Timer fires after restored state

2018-05-18 Thread Juho Autio
er.handleWatermark(StreamInputProcessor.java:262) ... 7 more On Fri, May 18, 2018 at 11:06 AM, Juho Autio wrote: > Thanks Sihua, I'll give that RC a try. > > On Fri, May 18, 2018 at 10:58 AM, sihua zhou wrote: > >> Hi Juho, >> would you like to try out the latest RC(http://people

Re: Missing MapState when Timer fires after restored state

2018-05-18 Thread Juho Autio
t RC includes a fix for the potential silently data > lost. If it's the reason, you will see a different exception when you > trying to recover you job. > > Best, Sihua > > > > On 05/18/2018 15:02,Juho Autio > wrote: > > I see. I appreciate keeping this optio

Re: Missing MapState when Timer fires after restored state

2018-05-18 Thread Juho Autio
single existing checkpoint or also creating other problematic > checkpoints? I am asking because maybe a log from the job that produces the > problematic checkpoint might be more helpful. You can create a ticket if > you want. > > Best, > Stefan > > > Am 18.05.2018 um 09:02 schrie

Re: Missing MapState when Timer fires after restored state

2018-05-18 Thread Juho Autio
I see. I appreciate keeping this option available even if it's "beta". The current situation could be documented better, though. As long as rescaling from checkpoint is not officially supported, I would put it behind a flag similar to --allowNonRestoredState. The flag could be called --allowRescal

Re: Missing MapState when Timer fires after restored state

2018-05-16 Thread Juho Autio
gt; I think you're asking the question I have asked in > https://github.com/apache/flink/pull/5490, you can refer to it and find > the comments there. > > @Stefan, PR(https://github.com/apache/flink/pull/6020) has been prepared. > > Best, Sihua > > > On 05/16/2018 17:

Re: Missing MapState when Timer fires after restored state

2018-05-16 Thread Juho Autio
I'd like to dig it out because currently we also > use the checkpoint like the way you are) ... > > Best, Sihua > > On 05/16/2018 01:46,Juho Autio > wrote: > > I was able to reproduce this error. > > I just happened to notice an important detail about the original fa

Re: Missing MapState when Timer fires after restored state

2018-05-15 Thread Juho Autio
cli.properties log4j-console.properties log4j.properties log4j-yarn-session.properties logback-console.xml logback.xml logback-yarn.xml On Tue, May 15, 2018 at 11:49 AM, Stefan Richter < s.rich...@data-artisans.com> wrote: > Hi, > > Am 15.05.2018 um 10:34 schrieb Juho Autio : > > Ok

Re: Missing MapState when Timer fires after restored state

2018-05-15 Thread Juho Autio
e written for checkpoints/savepoints. > - completed checkpoints/savepoints ids. > - the restored checkpoint/savepoint id. > - files that are loaded on restore. > > Am 15.05.2018 um 10:02 schrieb Juho Autio : > > Thanks all. I'll have to see about sharing the logs & config

Re: Missing MapState when Timer fires after restored state

2018-05-15 Thread Juho Autio
, e.g. if you are using incremental checkpoints or not. > Are you using the local recovery feature? Are you restarting the job from a > checkpoint or a savepoint? Can you provide logs for both the job that > failed and the restarted job? > > Best, > Stefan > > > Am 14.05.2

Missing MapState when Timer fires after restored state

2018-05-14 Thread Juho Autio
We have a Flink streaming job (1.5-SNAPSHOT) that uses timers to clear old state. After restoring state from a checkpoint, it seems like a timer had been restored, but not the data that was expected to be in a related MapState if such timer has been added. The way I see this is that there's a bug,

Re: Late data before window end is even close

2018-05-11 Thread Juho Autio
> > https://gist.github.com/pnowojski/8cd650170925cf35be521cf236f1d97a > > Prints only ONE number to the standard err: > > > 1394 > > And there is nothing on the side output. > > Piotrek > > On 11 May 2018, at 12:32, Juho Autio wrote: > > Thanks. What I still don't get is

Re: Late data before window end is even close

2018-05-11 Thread Juho Autio
would end up in can > also be tricky if records are assigned to multiple windows (e.g., sliding > windows). > In this case, a side-outputted records could still be in some windows and > not in others. > > @Aljoscha (CC) Might have an explanation for the current behavior. >

Late data before window end is even close

2018-05-11 Thread Juho Autio
I don't understand why I'm getting some data discarded as late on my Flink stream job a long time before the window even closes. I can not be 100% sure, but to me it seems like the kafka consumer is basically causing the data to be dropped as "late", not the window. I didn't expect this to ever ha

Re: Flink State monitoring

2018-05-11 Thread Juho Autio
Bump this. I can create a ticket if it helps? On Tue, Apr 24, 2018 at 4:47 PM, Juho Autio wrote: > Anything to add? Is there a Jira ticket for this yet? > > > On Fri, Apr 20, 2018 at 1:03 PM, Stefan Richter < > s.rich...@data-artisans.com> wrote: > >> If estimates

Re: Application logs missing from jobmanager log

2018-05-11 Thread Juho Autio
feature request to include the client logs though. > Would you mind opening a JIRA issue for this? > > Thanks, Fabian > > 2018-04-27 11:27 GMT+02:00 Juho Autio : > >> Ah, found the place! In my case they seem to be going to >> /home/hadoop/flink-1.5-SNAPSHOT/log/flin

Re: Kafka Consumers Partition Discovery doesn't work

2018-05-11 Thread Juho Autio
onstrate the > configuration for partition discovery. > Could you open a JIRA for that? > > Cheers, > Gordon > > On Tue, Apr 10, 2018, 8:44 AM Juho Autio wrote: > >> Ahhh looks like I had simply misunderstood where that property should go. >> >> The docs

Re: Application logs missing from jobmanager log

2018-04-27 Thread Juho Autio
Ah, found the place! In my case they seem to be going to /home/hadoop/flink-1.5-SNAPSHOT/log/flink-hadoop-client-ip-10-0-10-29.log (for example). Any reason why these can't be shown in Flink UI, maybe in jobmanager log? On Fri, Apr 27, 2018 at 12:13 PM, Juho Autio wrote: > The logs l

Application logs missing from jobmanager log

2018-04-27 Thread Juho Autio
The logs logged by my job jar before env.execute can't be found in jobmanager log. I can't find them anywhere else either. I can see all the usual logs by Flink components in the jobmanager log, though. And in taskmanager log I can see both Flink's internal & my application's logs from the executi

Re: Flink State monitoring

2018-04-24 Thread Juho Autio
could be misleading. > > > Am 20.04.2018 um 11:59 schrieb Juho Autio : > > Thanks. At least for us it doesn't matter how exact the number is. I would > expect most users to be only interested in monitoring if the total state > size keeps growing (rapidly), or remains abo

Re: Flink State monitoring

2018-04-20 Thread Juho Autio
Thanks. At least for us it doesn't matter how exact the number is. I would expect most users to be only interested in monitoring if the total state size keeps growing (rapidly), or remains about the same. I suppose all of the options that you suggested would satisfy this need? On Fri, Apr 20, 2018

Re: Flink State monitoring

2018-04-20 Thread Juho Autio
Hi Aljoscha & co., Is there any way to monitor the state size yet? Maybe a ticket in Jira? When using incremental checkpointing, the total state size can't be seen anywhere. For example the checkpoint details only show the size of the increment. It would be nice to add the total size there as wel

Re: Kafka topic partition skewness causes watermark not being emitted

2018-04-16 Thread Juho Autio
A possible workaround while waiting for FLINK-5479, if someone is hitting the same problem: we chose to send "heartbeat" messages periodically to all topics & partitions found on our Kafka. We do that through the service that normally writes to our Kafka. This way every partition always has some ~r

Re: Kafka consumer to sync topics by event time?

2018-04-16 Thread Juho Autio
ree Juho! > > Do you want to contribute the docs fix? > If yes, we should update FLINK-5479 to make sure that the warning is > removed once the bug is fixed. > > Thanks, Fabian > > 2018-04-12 9:32 GMT+02:00 Juho Autio : > >> Looks like the bug https://issues.apache.

Re: Kafka consumer to sync topics by event time?

2018-04-12 Thread Juho Autio
ht be a window (such as a session window) that is > open much longer than all other windows and which would hold back the > offset. Other applications might not use the built-in windows at all but > custom ProcessFunctions. > > Have you considered tracking progress using watermarks? &

Re: Allowed lateness + side output late data, what's included?

2018-04-11 Thread Juho Autio
Thanks! On Wed, Apr 11, 2018 at 12:59 PM, Chesnay Schepler wrote: > Data that arrives within the allowed lateness should not be written to the > side output. > > > On 11.04.2018 11:12, Juho Autio wrote: > > If I use a non-zero value for allowedLateness and also sideOutpu

Allowed lateness + side output late data, what's included?

2018-04-11 Thread Juho Autio
If I use a non-zero value for allowedLateness and also sideOutputLateData, does the late data output contain also the events that were triggered in the bounds of allowed lateness? By looking at the docs I can't be sure which way it is. Code example: .timeWindow(Time.days(1)) .allowedLateness(Time

Re: Kafka Consumers Partition Discovery doesn't work

2018-04-10 Thread Juho Autio
this mistake. The docs are clear though, I just had become blind to this detail as I thought I had already read it. On Thu, Apr 5, 2018 at 10:26 AM, Juho Autio wrote: > Still not working after I had a fresh build from https://github.com/ > apache/flink/tree/release-1.5. > > When the

Re: REST API "broken" on YARN because POST is not allowed via YARN proxy

2018-04-05 Thread Juho Autio
se/YARN-2084 > > Cheers, > Till > > On Wed, Apr 4, 2018 at 4:31 PM, Fabian Hueske wrote: > >> Hi Juho, >> >> Thanks for raising this point! >> >> I'll add Chesnay and Till to the thread who contributed to the REST API. >> >> Best, Fabi

Re: Kafka Consumers Partition Discovery doesn't work

2018-04-05 Thread Juho Autio
ing about discovering a new partition. We should probably add this. > > And yes, it would be great if you can report back on this using either the > latest master, release-1.5 or release-1.4 branches. > > On 22 March 2018 at 10:24:09 PM, Juho Autio (juho.au...@rovio.com) wrote: > > Th

REST API "broken" on YARN because POST is not allowed via YARN proxy

2018-04-04 Thread Juho Autio
I just learned that Flink savepoints API was refactored to require using HTTP POST. That's fine otherwise, but makes life harder when Flink is run on top of YARN. I've added example calls below to show how POST is declined by the hadoop-yarn-server-web-proxy*, which only supports GET and PUT. Ca

Re: cancel-with-savepoint: 404 Not Found

2018-04-04 Thread Juho Autio
d '{"cancel-job": true}' > > Let me know if it works for you. > > Best, > Gary > > On Thu, Mar 29, 2018 at 10:39 AM, Juho Autio wrote: > >> Thanks Gary. And what if I want to match the old behaviour ie. have the >> job cancelled after savepoi

Re: All sub-tasks stuck in CREATED state after "BlobServerConnection: GET operation failed"

2018-03-29 Thread Juho Autio
Sorry, my bad. I checked the persisted jobmanager logs and can see that job was still being restarted at 15:31 and then at 15:36. If I wouldn't have terminated the cluster, I believe the flink job / yarn app would've eventually exited as failed. On Thu, Mar 29, 2018 at 4:49 PM, Juho Au

Re: All sub-tasks stuck in CREATED state after "BlobServerConnection: GET operation failed"

2018-03-29 Thread Juho Autio
; > As a side note, beginning from Flink 1.5, you do not need to specify -yn > -ys > because resource allocations are dynamic by default (FLIP-6). The > parameter -yst > is deprecated and should not be needed either. > > Best, > Gary > > On Thu, Mar 29, 2018 at 8:59 AM, Ju

Re: cancel-with-savepoint: 404 Not Found

2018-03-29 Thread Juho Autio
org/jira/browse/FLINK-9104 > [4] https://github.com/apache/flink/blob/release-1.5/flink- > runtime/src/main/java/org/apache/flink/runtime/rest/ > handler/job/savepoints/SavepointHandlers.java#L59 > > On Thu, Mar 29, 2018 at 10:04 AM, Juho Autio wrote: > >> With a fresh build fro

cancel-with-savepoint: 404 Not Found

2018-03-29 Thread Juho Autio
With a fresh build from release-1.5 branch, calling /cancel-with-savepoint fails with 404 Not Found. The snapshot docs still mention /cancel-with-savepoint: https://ci.apache.org/projects/flink/flink-docs-master/monitoring/rest_api.html#cancel-job-with-savepoint 1. How can I achieve the same res

All sub-tasks stuck in CREATED state after "BlobServerConnection: GET operation failed"

2018-03-28 Thread Juho Autio
I built a new Flink distribution from release-1.5 branch yesterday. The first time I tried to run a job with it ended up in some stalled state so that the job didn't manage to (re)start but what makes it worse is that it didn't exit as failed either. Next time I tried running the same job (but ne

Re: NoClassDefFoundError for jersey-core on YARN

2018-03-28 Thread Juho Autio
Never mind, I'll post this new problem as a new thread. On Wed, Mar 28, 2018 at 6:35 PM, Juho Autio wrote: > Thank you. The YARN job was started now, but the Flink job itself is in > some bad state. > > Flink UI keeps showing status CREATED for all sub-tasks and nothing seems

Re: NoClassDefFoundError for jersey-core on YARN

2018-03-28 Thread Juho Autio
s://ci.apache.org/projects/flink/flink-docs- > master/ops/deployment/hadoop.html#configuring-flink-with-hadoop-classpaths > > > On Wed, Mar 28, 2018 at 4:26 PM, Juho Autio wrote: > >> I built a new Flink distribution from release-1.5 branch today. >> >> I tried runni

NoClassDefFoundError for jersey-core on YARN

2018-03-28 Thread Juho Autio
I built a new Flink distribution from release-1.5 branch today. I tried running a job but get this error: java.lang.NoClassDefFoundError: com/sun/jersey/core/util/FeaturesAndProperties I use yarn-cluster mode. The jersey-core jar is found in the hadoop lib on my EMR cluster, but seems like it's

Re: Kafka Consumers Partition Discovery doesn't work

2018-03-22 Thread Juho Autio
s://issues.apache.org/jira/browse/FLINK-8419 > > <https://issues.apache.org/jira/browse/FLINK-8419> > This issue should have been fixed in the recently released 1.4.2 version. > > Cheers, > Gordon > > On 22 March 2018 at 8:02:40 PM, Juho Autio (juho.au..

Kafka Consumers Partition Discovery doesn't work

2018-03-22 Thread Juho Autio
According to the docs*, flink.partition-discovery.interval-millis can be set to enable automatic partition discovery. I'm testing this, apparently it doesn't work. I'm using Flink Version: 1.5-SNAPSHOT Commit: 8395508 and FlinkKafkaConsumer010. I had my flink stream running, consuming an existin

Re: SQL Table API: Naming operations done in query

2018-03-16 Thread Juho Autio
Hi, has there been any changes to state handling with Flink SQL? Anything planned? I didn't find it at https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql.html. Recently I ran into problems when trying to restore the state after changes that I thought wouldn't change the executi

Adding a new field in java class -> restore fails with "KryoException: Unable to find class"

2018-03-16 Thread Juho Autio
Is it possible to add new fields to the object type of a stream, and then restore from savepoint? I tried to add a new field "private String" to my java class. It previously had "private String" and a "private final Map". When trying to restore an old savepoint after this code change, it failed wi

Re: Unexpected hop start & end timestamps after stream SQL join

2018-02-27 Thread Juho Autio
also seems a bit risky to me, like the above attempt shows. Maybe it's best to wait for this to be supported properly. As I said I don't seem to really need early firing right now, because writing out all distinct values once window closes is not too slow for us at the moment. Thanks ag

Re: Unexpected hop start & end timestamps after stream SQL join

2018-02-27 Thread Juho Autio
OWTIME(rowtime, INTERVAL '10' SECOND) AS rowtime > FROM events > WHERE s_aid1 IS NOT NULL > GROUP BY > s_aid1, > s_cid, > TUMBLE(rowtime, INTERVAL '10' SECOND) > ) > > Early triggering is not yet supported for SQL queries. > > B

Re: Unexpected hop start & end timestamps after stream SQL join

2018-02-27 Thread Juho Autio
oduces the > results in real-time. If the batching is not required, you should be good > by adding a filter on occurrence = 1. > Otherwise, you could add the filter and wrap it by 10 secs tumbling window. > > Hope this helps, > Fabian > > > 2018-02-14 15:30 GMT+01:00 Juho

Unexpected hop start & end timestamps after stream SQL join

2018-02-14 Thread Juho Autio
I'm joining a tumbling & hopping window in Flink 1.5-SNAPSHOT. The result is unexpected. Am I doing something wrong? Maybe this is just not a supported join type at all? Any way here goes: I first register these two tables: 1. new_ids: a tumbling window of seen ids within the last 10 seconds: SE

Trigger Time vs. Latest Acknowledgement

2018-01-29 Thread Juho Autio
I'm triggering nightly savepoints at 23:59:00 with crontab on the flink cluster. For example last night's savepoint has this information: Trigger Time: 23:59:14 Latest Acknowledgement: 00:00:59 What are the min/max boundaries for the data contained by the savepoint? Can I deduce from this either

Re: SideOutput doesn't receive anything if filter is applied after the process function

2018-01-15 Thread Juho Autio
t; > > On 15.01.2018 14:09, Juho Autio wrote: > > Thanks for the explanation. Did you meant that process() would return a > SingleOutputWithSideOutputOperator? > > Any way, that should be enough to avoid the problem that I hit (and it > also seems like the best & only way). &

Re: SideOutput doesn't receive anything if filter is applied after the process function

2018-01-15 Thread Juho Autio
explicitly define the code as below, which makes the > behavior unambiguous: > > processed = stream > .process(...) > > filtered = processed > .filter(...) > > filteredSideOutput = processed > .getSideOutput(...) > .filter(...) > > > On 15.01.2

Re: SideOutput doesn't receive anything if filter is applied after the process function

2018-01-15 Thread Juho Autio
uld get resource and how much > sharing same hardware resources, I guess it will open gate to lots of edge > cases -> strategies-> more edge cases :) > > Chen > > On Fri, Jan 12, 2018 at 6:34 AM, Juho Autio wrote: > >> Maybe I could express it in a slightly different

Re: Restoring 1.3.1 savepoint in 1.4.0 fails in TimerService

2018-01-12 Thread Juho Autio
Thanks, the window operator is just: .timeWindow(Time.seconds(10)) We haven't changed key types. I tried debugging this issue in IDE and found the root cause to be this: !this.keyDeserializer.equals(keySerializer) -> true => throw new IllegalStateException("Tried to initialize restored TimerS

Re: SideOutput doesn't receive anything if filter is applied after the process function

2018-01-12 Thread Juho Autio
t; Aljoscha would be the expert here, maybe he’ll have more insights. I’ve > looped him in cc’ed. > > Cheers, > Gordon > > > On 12 January 2018 at 4:05:13 PM, Juho Autio (juho.au...@rovio.com) wrote: > > When I run the code below (Flink 1.4.0 or 1.3.1), only "a"

Restoring 1.3.1 savepoint in 1.4.0 fails in TimerService

2018-01-12 Thread Juho Autio
I'm trying to restore savepoints that were made with Flink 1.3.1 but getting this exception. The few code changes that had to be done to switch to 1.4.0 don't seem to be related to this, and it seems like an internal issue of Flink. Is 1.4.0 supposed to be able to restore a savepoint that was made

SideOutput doesn't receive anything if filter is applied after the process function

2018-01-12 Thread Juho Autio
When I run the code below (Flink 1.4.0 or 1.3.1), only "a" is printed. If I switch the position of .process() & .filter() (ie. filter first, then process), both "a" & "b" are printed, as expected. I guess it's a bit hard to say what the side output should include in this case: the stream before fi

Re: Kafka consumer to sync topics by event time?

2017-12-04 Thread Juho Autio
l catching up. > The watermarks are moving at the speed of the bigger topic, but all > "early" events of the smaller topic are stored in stateful operators and > are checkpointed as well. > > So, you do not lose neither early nor late data. > > Best, Fabian > >

Re: Kafka consumer to sync topics by event time?

2017-12-01 Thread Juho Autio
Thanks for the answers, I still don't understand why I can see the offsets being quickly committed to Kafka for the "small topic"? Are they committed to Kafka before their watermark has passed on Flink's side? That would be quite confusing.. Indeed when Flink handles the state/offsets internally, t

Kafka consumer to sync topics by event time?

2017-11-22 Thread Juho Autio
I would like to understand how FlinkKafkaConsumer treats "unbalanced" topics. We're using FlinkKafkaConsumer010 with 2 topics, say "small_topic" & "big_topic". After restoring from an old savepoint (4 hours before), I checked the consumer offsets on Kafka (Flink commits offsets to kafka for refer

Flink config as an argument (akka.client.timeout)

2016-05-24 Thread Juho Autio
Is there any way to set akka.client.timeout (or other flink config) when calling bin/flink run instead of editing flink-conf.yaml? I tried to add it as a -yD flag but couldn't get it working. Related: https://issues.apache.org/jira/browse/FLINK-3964

Re: Dynamic partitioning for stream output

2016-05-24 Thread Juho Autio
Related issue: https://issues.apache.org/jira/browse/FLINK-2672 On Wed, May 25, 2016 at 9:21 AM, Juho Autio wrote: > Thanks, indeed the desired behavior is to flush if bucket size exceeds a > limit but also if the bucket has been open long enough. Contrary to the > current RollingSink

  1   2   >