Hey Jerry,
Jay is on the right track, this issue has to do with the Flink operator
life cycle. When you hit execute all your user defined classes get
serialized, so that they can be shipped to the workers on the cluster. To
execute some code before your FlatMapFunction starts processing the data
y
Maybe wrapping Jedis with a serializable class will do the trick?
But in general is there a way to reference jar classes in flink apps without
serializable them?
> On Sep 4, 2015, at 1:36 PM, Jerry Peng wrote:
>
> Hello,
>
> So I am trying to use jedis (redis java client) with Flink streami
Hello,
So I am trying to use jedis (redis java client) with Flink streaming api,
but I get an exception:
org.apache.flink.client.program.ProgramInvocationException: The main method
caused an error.
at
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:452)
at
or
Hi All,
We're running into a memory management issue when using the
iterateWithTermination function.
Using a small amount of data, everything works perfectly fine. However,
as soon as the main memory is filled up on a worker, nothing seems to be
happening any more. Once this happens, any worker
Hello,
Does a flink currently support operators to use redis? If I using the
streaming api in Flink and I need to look up something in a redis database
during the processing of the stream how can I do that?
Hi,
I have an open Pull Request for a RollingFile sink. It is integrated with
checkpointing, so it can provide exactly-once behavior. If you're
interested, please check it out: https://github.com/apache/flink/pull/1084
Cheers,
Aljoscha
On Wed, 26 Aug 2015 at 10:31 Stephan Ewen wrote:
> BTW: We
Hi to all,
running a job with Flink 0/10-SNAPSHOT I got the following Exception:
java.lang.IllegalArgumentException: expectedEntries should be > 0
at
org.apache.flink.shaded.com.google.common.base.Preconditions.checkArgument(Preconditions.java:122)
at
org.apache.flink.runtime.opera
Hi Andres
Does something like this solve what you're trying to achieve?
https://github.com/apache/flink/pull/918/files
Regards
Sachin
On Sep 4, 2015 6:24 PM, "Stephan Ewen" wrote:
> I think you can do this with the current interface. The convergence
> criterion object stays around, so you should
I think you can do this with the current interface. The convergence
criterion object stays around, so you should be able to simply store the
current aggregator value in a field (when the check is invoked). Any round
but the first could compare against that field.
On Fri, Sep 4, 2015 at 2:25 PM, An
Hi,
I trying to implement some machine learning algorithms that involve
several iterations until convergence (to a fixed point).
My idea is to use a IterativeDataSet with an Aggregator which produces
the result (i.e. a set of parameters defining the model).
>From the interface "ConvergenceCrite
Interesting question, you are the second to ask that.
Batching in user code is a way, as Matthias said. We have on the roadmap a
way to transform a stream to a set of batches, but it will be a bit until
this is in. See
https://cwiki.apache.org/confluence/display/FLINK/Streams+and+Operations+on+Str
Hi Andres,
you could do this by using your own data type, for example
> public class MyBatch {
> private ArrayList data = new ArrayList
> }
In the DataSource, you need to specify your own InputFormat that reads
multiple tuples into a batch and emits the whole batch at once.
However, be aware,
+1 for dropping
On 09/04/2015 11:04 AM, Maximilian Michels wrote:
> +1 for dropping Hadoop 2.2.0 binary and source-compatibility. The
> release is hardly used and complicates the important high-availability
> changes in Flink.
>
> On Fri, Sep 4, 2015 at 9:33 AM, Stephan Ewen wrote:
>> I am good
Hi,
I'm trying to code some machine learning algorithms on top of flink such
as a variational Bayes learning algorithms. Instead of working at a data
element level (i.e. using map transformations), it would be far more
efficient to work at a "batch of elements" levels (i.e. I get a batch of
elemen
Hi Jerry,
If you don't want to use Hadoop, simply pick _any_ Flink version. We
recommend the Hadoop 1 version because it contains the least dependencies,
i.e. you need to download less and the installation occupies less space.
Other than that, it doesn't really matter if you don't use the HDFS
fun
+1 for dropping Hadoop 2.2.0 binary and source-compatibility. The
release is hardly used and complicates the important high-availability
changes in Flink.
On Fri, Sep 4, 2015 at 9:33 AM, Stephan Ewen wrote:
> I am good with that as well. Mind that we are not only dropping a binary
> distribution
Hi Jerry,
yes, that's possible. You can download the appropriate version
https://flink.apache.org/downloads.html
[image: Inline image 1]
Cheers
On Fri, Sep 4, 2015 at 1:57 AM, Jerry Peng
wrote:
> Hello,
>
> Does flink require hdfs to run? I know you can use hdfs to checkpoint and
> process fil
Hi Stephan,
Thanks for your clarification.
Basically we will have lots of sensor that will push this kind of data to
queuing system ( currently we are using RabbitMQ, but will soon move to
Kafka).
We also will use the same pipeline to process the historical data.
I also want to minimize the chai
Hi Stephan,
Cheers
On Fri, Sep 4, 2015 at 2:31 PM, Stephan Ewen wrote:
> We will definitely also try to get the chaining overhead down a bit.
>
> BTW: To reach this kind of throughput, you need sources that can produce
> very fast...
>
> On Fri, Sep 4, 2015 at 12:20 AM, Welly Tambunan wrote:
>
I am good with that as well. Mind that we are not only dropping a binary
distribution for Hadoop 2.2.0, but also the source compatibility with 2.2.0.
Lets also reconfigure Travis to test
- Hadoop1
- Hadoop 2.3
- Hadoop 2.4
- Hadoop 2.6
- Hadoop 2.7
On Fri, Sep 4, 2015 at 6:19 AM, Chiwan
We will definitely also try to get the chaining overhead down a bit.
BTW: To reach this kind of throughput, you need sources that can produce
very fast...
On Fri, Sep 4, 2015 at 12:20 AM, Welly Tambunan wrote:
> Hi Stephan,
>
> That's good information to know. We will hit that throughput easily
21 matches
Mail list logo