Hi,
I have a use case to stream messages from Kafka to Amazon S3. I am not
using the s3 file system way since i need to have Object tags to be added
for each object written in S3.
So i am planning to use the AWS S3 sdk . But i have a query on how to hold
the data till the message size is in few MBs
Hey Fabi,
many thanks for your clarifications! It seems flink-shaded-hadoop2
itself is already included in the binary distribution:
> $ jar tf flink-1.2.0/lib/flink-dist_2.10-1.2.0.jar | grep org/apache/hadoop |
> head -n3
> org/apache/hadoop/
> org/apache/hadoop/fs/
> org/apache/hadoop/fs/FileS
Hi Petr,
I think that's an expected behavior because the exception is intercepted
and enriched with an instruction to solve the problem.
As you assumed, you need to add the flink-hadoop-compatibility JAR file to
the ./lib folder. Unfortunately, the file is not included in the binary
distribution.
Hi all,
I'm looking for an example of Tranquility (Druid's lib) as a Flink sink.
I'm trying to follow the code in [1] but I feel it's incomplete or maybe
outdated, it doesn't mention anything about other method (tranquilizer)
that seems to be part of the BeamFactory interface in the current versi
Hi Wolfe,
that's all correct. Thank you!
I'd like to emphasize that the FsStateBackend stores all state on the heap
of the worker JVM. So you might run into OutOfMemoryErrors if you state
grows too large.
Therefore, the RocksDBStatebackend is the recommended choice for most
production use cases.
Hi Kant,
Jumping in here, would love corrections if I'm wrong about any of this.
In short answer, no, HDFS is not necessary to run stateful stream
processing. In the minimal case, you can use the MemoryStateBackend to back
up your state onto the JobManager.
In any production scenario, you will w
Hello,
with 1.2.0 `WritableTypeInfo` got moved into its own artifact
(flink-hadoop-compatibility_2.10-1.2.0.jar). Unlike with 1.1.0, the
distribution jar `flink-dist_2.10-1.2.0.jar` does not include the hadoop
compatibility classes anymore. However, `TypeExtractor` which is part of
the distributio
Hi All,
I read the docs however I still have the following question For Stateful
stream processing is HDFS mandatory? because In some places I see it is
required and other places I see that rocksDB can be used. I just want to
know if HDFS is mandatory for Stateful stream processing?
Thanks!
Hi,
I'm currently examining the I/O patterns of Flink, and I'd like to know
when/how Flink goes to disk. Let me give an introduction of what I have
done so far.
I am running TeraGen (from the Hadoop examples package) + TeraSort (
https://github.com/robert-schmidtke/terasort) on a 16 node cluster,
Yeah, hoping you'd say that :)
Would be nice to share them just because the network overhead between flink
and kafka.
Thanks
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Shared-DataStream-tp12545p12549.html
Sent from the Apache Flink User
Flink does not provide any tooling to link two jobs together.
I would use Kafka to connect Flink jobs with each other.
Best, Fabian
2017-04-06 17:28 GMT+02:00 nragon :
> Hi,
>
> Can we share the end point of on job (datastream) with another job?
> The arquitecture I'm working on should provide a
Hey,
I had a similar problem when I tried to list the jobs and kill one by name
in yarn cluster. Initially I also tried to set YARN_CONF_DIR but it didn't
work.
What helped tho was passing hadoop conf dir to my application when starting
it. Like that:
java -cp application.jar:/etc/hadoop/conf
Rea
12 matches
Mail list logo