In the web UI, I can see these information under JobManager. How can I access
variables job_env in main method?
Job Manager
Configuration
env.hadoop.conf.dir /etc/hadoop/conf
env.yarn.conf.dir /etc/hadoop/conf
high-availability.cluster-idapplication_1517362137681_0001
job_env stage
j
Stefan,
So are we essentially saying that in this case, for now, I should stick to
DataSet / Batch Table API?
Thanks,
Hayden
-Original Message-
From: Stefan Richter [mailto:s.rich...@data-artisans.com]
Sent: Tuesday, January 30, 2018 4:18 PM
To: Marchant, Hayden [ICG-IT]
Cc: user@flin
Hi Hayden,
To perform a full-history join on two streams has not been natively
supported now.
As a workaround, you may implement a CoProcessFunction and cache the
records from both sides in states until the stream with fewer data has been
fully cached. Then you could safely clear the cache for th
I've seen a similar issue while running successive Flink SQL batches on
1.4. In my case, the Job Manager would fail with the log output about
unreachability (with an additional statement about something going
"horribly wrong"). Under workload pressure, I reverted to 1.3.2 where
everything works per
Hi,
this looks like the timer service is the culprit for this problem. Timers are
currently not stored in the state backend, but in a separate on-heap data
structure that does not support copy-on-write or async snapshots in general.
Therefore, writing the timers for a snapshot is always synchro
Hi group,
In our project we are using asynchronous FSStateBackend, and we are trying to
move to distributed storage - currently S3.
When using this storage we are experiencing issues of high backpressure and
high latency, in comparison of local storage.
We are trying to understand the reason, s
Thanks Chesnay, so if I read it well it shouldn't be too long (at least
less time than between regular 1.x releases).
On Mon, Jan 29, 2018 at 4:24 PM, Chesnay Schepler
wrote:
> As of right now there is no specific date, see also
> https://flink.apache.org/news/2017/11/22/release-1.4-and-1.5-time
In case of reading from input files, at the EOF event, readers will send
Watermark(Long.MAX_VALUE) on all of the output edges and those watermarks will
be propagated accordingly. So your ACK operator will get
Watermark(Long.MAX_VALUE) only when it gets it from ALL of it’s input edges.
When read
Hi,
as far as I know, this is not easily possible. What would be required is
something like a CoFlatmap function, where one input stream is blocking until
the second stream is fully consumed to build up the state to join against.
Maybe Aljoscha (in CC) can comment on future plans to support thi
Hi,
backpressure comes into play when the source is attempting to pass the
generated events to downstream operators. If the downstream operators build up
backpressure, passing data to them can block. You might think of this like a
bounded queue that is full in case of backpressure and blocks un
Hi,
I am interested in how back pressure is handled by sources in Flink ie
Kinesis source. From what I understood back pressure is a mechanism to slow
down rate at which records are read from the stream. However, in the
kinesis source code I can see that it configured to read the same number of
ro
Ah, I see. Yes, it seems like serializers for Scala tuples are generated
anonymous-classes.
Since your window operator uses reducing state, the state type would be the
same as the input type of the window (which in your case is a Scala 2-tuple).
In general, using Scala collections, case classes,
We have a use case where we have 2 data sets - One reasonable large data set (a
few million entities), and a smaller set of data. We want to do a join between
these data sets. We will be doing this join after both data sets are available.
In the world of batch processing, this is pretty straigh
I looked into it a little more. The anonymous-classed serializer is being
created here
https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/WindowedStream.java#L1247
So far the only strategy for making it less likely to break is
Hi,
I think a) doesn't hold because there is no synchronisation between the
CheckpointCoordinator and the sources doing the reading. I think b) will hold
but it's also not exact because of clock differences between different machines
and whatnot.
Best,
Aljoscha
> On 29. Jan 2018, at 15:34, Ju
If you don't run Flink in standalone mode, then you can activate
taskmanager.exit-on-fatal-akka-error. However, keep in mind that at some
point you might run out of spare TMs to run your jobs unless you restart
them manually.
Cheers,
Till
On Mon, Jan 29, 2018 at 6:41 PM, Vishal Santoshi
wrote:
16 matches
Mail list logo