Hi,val env = StreamExecutionEnvironment.getExecutionEnvironment val tr
= env.fromParallelCollection(data)
the data i do not know initialize,some one can tell me..
Hi,
I have had issues when I processed large amount of data (large windows
where I could not do incremental updates), flink slowed down significantly.
It did help when I increased the amount of memory and used off heap
allocation. But it only delayed the onset of the probelm without solving
it.
C
Hi,
some time ago I found a problem with backpressure in Spark and prepared a
simple test to check it and compare with Flink.
https://github.com/rssdev10/spark-kafka-streaming
+
https://mail-archives.apache.org/mod_mbox/spark-user/201607.mbox/%3CCA+AWphp=2VsLrgSTWFFknw_KMbq88fZhKfvugoe4YYByEt7
Thanks Robert,
I tried to checkout the commit you mentioned, but git returns an error
"fatal: reference if not a tree: 547e7490fb99562ca15a2127f0ce1e784db97f3e".
I've searched for a solution but could not find any. Am I doing something
wrong?
-
$ git clone https://github.com/rmetz
Of course! I really appreciate your interest & attention. I hope we will figure
out solutions that other people can use.
I agree with your analysis. Your triggering syntax is particularly nice. I
wrote a custom trigger which does exactly that but without the nice fluent API.
As I considered the
On Fri, Sep 2, 2016 at 5:24 PM, Aljoscha Krettek
wrote:
> Hi,
> from this I would expect to get as many HashMaps as you have keys. The
> winFunction is also executed per-key so it cannot combine the HashMaps of
> all keys.
>
> Does this describe the behavior that you're seeing?
>
yes, it's the b
Hi,
from this I would expect to get as many HashMaps as you have keys. The
winFunction is also executed per-key so it cannot combine the HashMaps of
all keys.
Does this describe the behavior that you're seeing?
Cheers,
Aljoscha
On Fri, 2 Sep 2016 at 17:37 Luis Mariano Guerra
wrote:
> hi!
>
> I
Hi Eric,
I'm sorry that you are running into these issues. I think the version is
0.10-SNAPSHOT, and I think I've used this commit:
https://github.com/rmetzger/flink/commit/547e749 for some of the runs (of
the throughput / latency tests, not for the yahoo benchmark). The commit
should at least poi
hi!
I'm trying to collect some metrics by key per window and emiting the full
result at the end of the window to kafka, I started with a simple count by
key to test it but my requirements are a little more complex than that.
what I want to do is to fold the stream events as they come and then at
Hi Greg,
thanks for your response!
I just had a look and realized that it's just about 85 GB of data. Sorry
about that wrong information.
It's read from a csv file on the master node's local file system. The 8
nodes have more than 40 GB available memory each and since the data is
equally di
Hi Robert,
I've been trying to build the "performance" project using various versions
of Flink, but failing. It seems that I need both KafkaZKStringSerializer
class and FlinkKafkaConsumer082 class to build the project, but none of the
branches has both of them. KafkaZKStringSerializer existed in 0
Hi Dan,
Where are you reading the 200 GB "data" from? How much memory per node? If
the DataSet is read from a distributed filesystem and if with iterations
Flink must spill to disk then I wouldn't expect much difference. About how
many iterations are run in the 30 minutes? I don't know that this i
How about using a source and broadcast variable?
You could write the model to the storage (DFS), the read it with a source
and use a broadcast variable to send it to all tasks.
A single record can be very large, so it should work even if your model is
quite big.
Does that sound feasible?
In futu
This should be of concern mostly to the users of the Storm compatibility layer:
We just received a pull request [1] for updating the Storm
compatibility layer to support Storm versions >= 1.0.0. This is a
major change because all Storm imports have changed their namespace
due to package renaming.
Hi,
I'd like to write a client that can execute an already 'uploaded' JAR (i.e.
the JAR is deployed and available by some other external process). This is
similar to what the web console allows which consists of 2 steps: upload
the JAR followed by a submit with parameters.
I'm looking at the Flin
For an operator, the input stream is faster than its output stream, so its
input buffer will block the previous operator's output thread that transfers
the data to this operator. Right?
Do the Flink and the Spark both handle the backpressure by blocking the
thread? So what's the difference between
Hi,
I’m using HDFS as state backend.
The checkpoints folder grows bigger every moments.
What shall I do?
Regards.
Hi Stefan,
Thank you so much for the answer. Ok, I'll do it asap.
For the sake of argument, could the issue be related to the low number of
blocks? I noticed the Flink implementation, as default, set the number of
blocks to the input count (which is actually a lot). So with a low
cardinality and bi
Hello Everybody,
I’m currently refactoring some code and am looking for a better alternative to
handle
JPMML-Models in data streams. At the moment the flink job I’m working on
references a model-object
as a Singleton which I want to change because static references tend to cause
problems in dis
I see, I didn't forget about this, it's just that I'm thinking hard.
I think in your case (which I imagine some other people to also have) we
would need an addition to the windowing system that the original Google
Dataflow paper called retractions. The problem is best explained with an
example. Sa
Hi,
unfortunately, the log does not contain the required information for this case.
It seems like a sender to the SortMerger failed. The best way to find this
problem is to take a look to the exceptions that are reported in the web
front-end for the failing job. Could you check if you find any
21 matches
Mail list logo