Hi all,
I am doing a simple word count example and want to checkpoint the
accumulated word counts. I am not having any luck getting the counts saved
and restored. Can someone help?
env.enableCheckpointing(1000)
env.setStateBackend(new MemoryStateBackend())
> ...
inStream
> .keyBy({s =>
Hey Till & Ufuk,
We're running on self-managed EC2 instances (and we'll eventually have a mirror
cluster in our colo). The provided documentation notes that for Hadoop 2.6,
we'd need such-and-such version of hadoop-aws and guice on the CP. If I wanted
to instead use Hadoop 2.7, which versions o
Hi Max,
Thank you for your reply. Exactly, I want to setup the Yarn cluster and
submit a job through code and not using cmd client.
I had done what you suggested, I used part of the deploy method to write
my own code that starts up the cluster which seems to be working fine.
Could you point m
If the data exceeds the main memory of your machine, then you should use
the RocksDBStateBackend as a state backend. It allows you to store state
(including windows) on disk. Thus, the size of state you can store is then
limited by your hard disk capacity.
If the expected data size can be kept in
Hi Stefano,
Hadoop supports this feature since version 2.6.0. You can define a time
interval for the maximum number of applications attempt. This means that
you have to observe this number of application failures in a time interval
before failing the application ultimately. Flink will activate thi
Aljoscha -
I want to use a RichFoldFunction to get the open() hook. I cheat and use
this structure instead with a (non-Rich) FoldFunction:
public class InfinitResponseFilterFolder implements
FoldFunction, String> {
private BackingStore backingStore;
@Override
public String fold(Infini
Hi Norman,
sorry for the late reply. I finally found time and could, thanks to you,
reproduce the problem. The problem was that the window borders were treated
differently in two parts of the code. Now the left border of a window is
inclusive and the right border (late elements) is exclusive. I've
Hey Stefano,
Flink's resource management has been refactored for 1.1 recently. This
could be a regression introduced by this. Max can probably help you
with more details. Is this currently a blocker for you?
– Ufuk
On Tue, Apr 19, 2016 at 6:31 PM, Stefano Baghino
wrote:
> Hi everyone,
>
> I'm c
Hi everyone,
I'm currently experiencing a weird situation, I hope you can help me out
with this.
I've cloned and built from the master, then I've edited the default config
fil by adding my Hadoop config path, exported the HADOOP_CONF_DIR env var
and ran bin/yarn-session.sh -n 1 -s 2 -jm 2048 -tm
I use maven to generate the shaded jar (and the classes are inside it) but
when the job starts it can load those classes using Class.forName()
(required to instantiate the JDBC connections).
I think it's probably a problem related to class loading of Flink
On Tue, Apr 19, 2016 at 6:02 PM, Balaji R
Hi Till and Aljoscha,
Thank you so much for your suggestions and I'll try them out. I have
another question.
Since S2 my be days delayed, so there are may be lots of windows and large
amount of data stored in memory waiting for computation. How does Flink
deal with that?
Thanks,
Yifei
On Tue,
Flink version : 1.0.0
Kafka version : 0.8.2.1
Try to use a topic which has no message posted to it, at the time flink
starts.
On Tue, Apr 19, 2016 at 5:41 PM, Robert Metzger wrote:
> Can you provide me with the exact Flink and Kafka versions you are using
> and the steps to reproduce the issue?
In your pom.xml add the maven.plugins like this, and you will have to add
all the dependent artifacts, this works for me, if you fire mvn clean
compile package, the created jar is a fat jar.
org.apache.maven.plugins
maven-dependency-plugin
2.9
Hi,
In my case the root cause for this was mainly that I was using eclipse to
package the jar. Try using mvn instead. Additioanlly you can copy the
dependency jars in the lib of the task managers and restart them
Dr. Radu Tudoran
Research Engineer - Big Data Expert
IT R&D Division
[cid:image00
Hi to all,
I just tied to dubmit my application to the Flink cluster (1.0.1) but I get
ClassNotFound exceptions for classes inside my shaded jar (like
oracle.jdbc.OracleDriver or org.apache.commons.pool2.PooledObjectFactory).
Those classes are in the shaded jar but aren't found.
If I put the jars
Have you made sure that Flink is using logback [1]?
[1]
https://ci.apache.org/projects/flink/flink-docs-master/apis/best_practices.html#using-logback-instead-of-log4j
Cheers,
Till
On Tue, Apr 19, 2016 at 2:01 PM, Balaji Rajagopalan <
balaji.rajagopa...@olacabs.com> wrote:
> The are two files in
I assume that the provided FetchStock code is not complete. As the
exception indicates, you somehow store a LocalStreamEnvironment in you
source function. The StreamExecutionEnvironments are not serializable and
cannot be part of the source function’s closure.
Cheers,
Till
On Tue, Apr 19, 2016
The picture you reference does not really show how dataflows are connected.
For a better picture, visit this link:
https://ci.apache.org/projects/flink/flink-docs-release-1.0/concepts/concepts.html#parallel-dataflows
Let me know if this doesn't answer your question.
On 19.04.2016 14:22, Ravind
Hi Theofilos,
I'm not sure whether I understand correctly what you are trying to do.
I'm assuming you don't want to use the command-line client.
You can setup the Yarn cluster in your code manually using the
FlinkYarnClient class. The deploy() method will give you a
FlinkYarnCluster which you can
Hi everyone,
I'm using Flink 0.10.1 and hadoop 2.4.0 to implement a client that
submits a flink application to Yarn. To keep it simple I use the
ConnectedComponents app from flink examples.
I set the required properties (Resources, AM ContainerLaunchContext
etc.) on the YARN client interface
Hello all,
My requirement is to re-read the csv file from a file path at certain time
intervals and process the csv data. The csv file gets updated at regular
intervals.
Below is my code:
StreamExecutionEnvironment see =
StreamExecutionEnvironment.getExecutionEnvironment();
*DataStream dataStream
Hello All,
Considering the following streaming dataflow of the example WordCount, I
want to understand how Sink is parallelised.
Source --> flatMap --> groupBy(), sum() --> Sink
If I set the paralellism at runtime using -p, as shown here
https://ci.apache.org/projects/flink/flink-docs-release-1
Can you provide me with the exact Flink and Kafka versions you are using
and the steps to reproduce the issue?
On Tue, Apr 19, 2016 at 2:06 PM, Balaji Rajagopalan <
balaji.rajagopa...@olacabs.com> wrote:
> It does not seem to fully work if there is no data in the kafka stream,
> the flink applica
It does not seem to fully work if there is no data in the kafka stream, the
flink application emits this error and bails, could this be missed use case
in the fix.
On Tue, Apr 19, 2016 at 3:58 PM, Robert Metzger wrote:
> Hi,
>
> I'm sorry, the documentation in the JIRA issue is a bit incorrect.
The are two files in the /usr/share/flink/conf directory, and I was trying
to do the rolling of application logs which goes to following directory in
task nodes.
/var/log/hadoop-yarn/containers/application_*/container_*/taskmanager.log
out err
Changing the logback.xml and logback-yarn.xml has no
Hi everyone,
we are using a long running yarn session and changed
jobmanager.web.checkpoints.history to 20. On the dashboard's job manager
panel I can see the changed config, but the checkpoint history for the
job still has only 10 entries.
Are these properties only supported in stand-alone mode?
Hi,
I'm sorry, the documentation in the JIRA issue is a bit incorrect. The
issue has been fixed in all versions including and after 1.0.0. Earlier
releases (0.10, 0.9) will fail when the leader changes.
However, you don't necessarily need to upgrade to Flink 1.0.0 to resolve
the issue: With checkp
I am facing this exception repeatedly while trying to consume from kafka
topic. It seems it was reported in 1.0.0 and fixed in 1.0.0, how can I be
sure that is fixed in the version of flink that I am using, does it require
me to install patch updates ?
Caused by: java.lang.RuntimeException: Unabl
Hi Leonard,
the UUID class cannot be treated as a POJO by Flink, because it is lacking
the public getters and setters for mostSigBits and leastSigBits. However,
it should be possible to treat it as a generic type. I think the difference
is that you cannot use key expressions and key indices to def
Hi Alex,
I suspect its a GC issue with the code generated by ScalaBuff. Can you
maybe try to do something like a standalone test where use use a
while(true) loop to see how fast you can deserialize elements from your Foo
type?
Maybe you'll find that the JVM is growing all the time. Then there's
pro
Thank you! Totally works.
Best,
Sendoh
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Turn-off-logging-in-Flink-tp6196p6200.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at
Nabble.com.
Hi Ovidiu,
Hash tables are currently used for joins (inner & outer) and the solution
set of delta iterations.
There is a pending PR that implements a hash table for partial aggregations
(combiner) [1] which should be added soon.
Joins (inner & outer) are already implemented as Hybrid Hash joins t
Hi Sendoh,
you have to edit your log4j.properties file to set log4j.rootLogger=OFF in
order to turn off the logger. Depending on how you run Flink and where you
wanna turn off the logging, you either have to edit the log4j.properties
file in the FLINK_HOME/conf directory or the in your project whi
Hi Yifei,
if you don't wanna implement your own join operator, then you could also
chain two join operations. I created a small example to demonstrate that:
https://gist.github.com/tillrohrmann/c074b4eedb9deaf9c8ca2a5e124800f3.
However, bare in mind that for this approach you will construct two wi
Hi,
Can I ask how to turn off Flink logging to avoid seeing INFO? I have tried
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
env.getConfig().disableSysoutLogging();
env.execute()
and
Configuration env_config = new Configuration();
env_config.setBoo
Hi,
right now, there is no built-in support for n-ary joins. I am working on
this, however.
For now you can simulate n-ary joins by using a tagged union and doing the
join yourself in a WindowFunction. I created a small example that
demonstrates this:
https://gist.github.com/aljoscha/a2a213d90c7c1
Hey Alex,
(1) Which Flink version are you using for this?
(2) Can you also get a heap dump after the job slows down? Slow downs
like this are often caused by some component leaking memory, maybe in
Flink, maybe the Scalabuff deserializer. Can you also share the Foo
code?
– Ufuk
On Mon, Apr 18,
Hey Michael-Keith,
are you running self-managed EC2 instances or EMR?
In addition to what Till said:
We tried to document this here as well:
https://ci.apache.org/projects/flink/flink-docs-master/setup/aws.html#provide-s3-filesystem-dependency
Does this help? You don't need to really install Ha
Hi Michael-Keith,
you can use S3 as the checkpoint directory for the filesystem state
backend. This means that whenever a checkpoint is performed the state data
will be written to this directory.
The same holds true for the zookeeper recovery storage directory. This
directory will contain the sub
39 matches
Mail list logo