Hi Robert,
Thank you for the prompt reply. You're right, it was a left over from a
previous build. With the fixed dependencies, I get the same error though.
A have a question on job submission as well. I use the following code to
submit the job:
InetSocketAddress jobManagerAddress = cluster
Hi Theo,
you can't mix different Flink versions in your dependencies. Please use
1.0.2 for the flink_yarn client as well or 1.1-SNAPSHOT everywhere.
On Fri, May 6, 2016 at 7:02 PM, Theofilos Kakantousis wrote:
> Hi everyone,
> Flink 1.0.2
> Hadoop 2.4.0
>
> I am running Flink on Yarn by using F
Hi
I am new in Apache Flink and using Flink 1.0.1
I have a streaming program that fetch data from kafka , perform some
computation and send result to kafka again.
I am want to compare results between Flink and Spark .
I have below information from spark . do i can get similar information fro
Hi everyone,
Flink 1.0.2
Hadoop 2.4.0
I am running Flink on Yarn by using FlinkYarnClient to launch a Flink
cluster and Flink Client to submit a PackagedProgram. To keep it simple,
for batch jobs I use the WordCount example and for streaming the
IterateExample and IncrementalLearning ones wit
OK, thanks for reporting back. Thanks to Igor as well.
I just updated the docs with a note about this.
On Thu, May 5, 2016 at 3:16 AM, Chen Qin wrote:
> Uruk & Igor,
>
> Thanks for helping out! Yup, it fixed my issue.
>
> Chen
>
>
>
> On Wed, May 4, 2016 at 12:57 PM, Igor Berman wrote:
>>
>> I
Hi,
I am trying to run the code examples from the Gelly documentation, in
particular this code:
import org.apache.flink.api.scala._
import org.apache.flink.graph.generator.GridGraph
object SampleObject {
def main(args: Array[String]) {
val env = ExecutionEnvironment.getExecutionEnviron
Yes, you can transform the broadcast set when it is accessed with
RuntimeContext.getBroadcastVariableWithInitializer() and a
BroadcastVariableInitializer.
2016-05-06 14:07 GMT+02:00 Flavio Pompermaier :
> That was more or less what I was thinking. The only thing I'm not sure is
> the usage of the
Hi Fabian,
So I misunderstood the behaviour of configure(), thank you.
Andrea
2016-05-06 14:17 GMT+02:00 Fabian Hueske :
> Hi Andrea,
>
> actually, OutputFormat.configure() will also be invoked per task. So you
> would also end up with 16 ActorSystems.
> However, I think you can use synchronize
Hi Andrea,
actually, OutputFormat.configure() will also be invoked per task. So you
would also end up with 16 ActorSystems.
However, I think you can use synchronized singleton object to start one
ActorSystem per TM (each TM and all tasks run in a single JVM).
So from the point of view of configur
Hi Palle,
this sounds indeed like a good use case for Flink.
Depending on the complexity of the aggregated historical views, you can
implement a Flink DataStream program which builds the views on the fly,
i.e., you do not need to periodically trigger MR/Flink/Spark batch jobs to
compute the views
Hi Fabian,
ATM I am not interesting to guarantee exactly-once processing, thank you
for the clarification.
As far as I know, it is not present a similar method as OutputFormat's
configure for RichSinkFunction, correct? So I am not able to instantiate an
ActorSystem per TM but I have to instantiat
That was more or less what I was thinking. The only thing I'm not sure is
the usage of the broadcasted dataset, since I'd need to access tot the
MetaData dataset by sourceId (so I'd need an Map.
Probably I'd do:
Map meta = ...;//preparing metadata lookUp table
...
ds.map(MetaMapFunctionWrapper(new
Hi Andrea,
you can use any OutputFormat to emit data from a DataStream using the
writeUsingOutputFormat() method.
However, this method does not guarantee exactly-once processing. In case of
a failure, it might emit some records a second time. Hence the results will
be written at least once.
Hope
I see the flow to be as below:
LogStash->Log Stream->Flink ->Kafka->Live Model
|
Mongo/HBASE
The Live Model will again be Flink streaming data sets from Kakfa.
There you analyze the incoming stream for the certain value and
1. Suppose I have stream of different events(A,B,C). Each event will need
it's own processing pipeline.
what is recommended approach of splitting pipelines per each event? I can
do some filter operator at the beginning. I can setup different jobs per
each event. I can hold every such event in diffe
HI Palle,
I am a beginner in Flink.
However, I can say something about your other questions:
1. It is better to use Spark to create aggregate views. It is a lot
faster than MR. You could use either batch or streaming mode in spark based on
your needs.
2. If your a
Hi there.
We are putting together some BigData components for handling a large amount of
incoming data from different log files and perform some analysis on the data.
All data being fed into the system will go into HDFS. We plan on using
Logstash, Kafka and Flink for bringing data from the log
Hi,
I created a custom OutputFormat to send data to a remote actor, there are
issues to use an OutputFormat into a stream job? Or it will treat like a
Sink?
I prefer to use it in order to create a custom ActorSystem per TM in the
configure method.
Cheers,
Andrea
I have a requirement where I want to do aggregation on one data stream
every 5 minutes, a different data stream every 1 minute. I wrote a example
code to test this out but the behavior is different from what I expected ,
I expected the window2 to be called 5 times, and window 1 to called once ,
but
Thanks Robert appreciate your help.
On Fri, May 6, 2016 at 3:07 PM, Robert Metzger wrote:
> Hi,
>
> yes, you can use Kafka's configuration setting for that. Its called
> "auto.offset.reset". Setting it to "latest" will change the restart
> behavior to the current offset ("earliest" is the opposi
Hi Flavio,
I'll open a JIRA for de/serializing TableSource to textual JSON.
Would something like this work for you?
main() {
ExecutionEnvironment env = ...
TableEnvironment tEnv = ...
// accessing an external catalog
YourTableSource ts = Catalog.getTableSource("someIdentifier");
tEnv.
Hi,
yes, you can use Kafka's configuration setting for that. Its called
"auto.offset.reset". Setting it to "latest" will change the restart
behavior to the current offset ("earliest" is the opposite).
How heavy is the processing you are doing? 4500 events/second sounds not
like a lot of throughpu
22 matches
Mail list logo