Hi Elias!
There is a feature pending that uses an optimized version for aligned time
windows. In that case, elements would go into a single window pane, and the
full window would be composed of all panes it spans (in the case of sliding
windows). That should help a lot in those cases.
The default
Hello
I executed my Flink code in eclipse and it properly generated the output by
creating a folder (as specified in the string) and placing output files in
them.
But when I exported the project as JAR and ran the same code using ./flink
run, it generated the output, but instead of creating a fol
Did you specify a parallelism? The default parallelism of a Flink instance
is 1 [1].
You can set a different default parallelism in ./conf/flink-conf.yaml or
pass a job specific parallelism with ./bin/flink using the -p flag [2].
More options to define parallelism are in the docs [3].
[1]
https:/
Hi Igor, thanks for your reply.
As for your first point I'm not sure I understand correctly. I'm ingesting
records at a rate of about 50k records per second, and those records are
fairly small. If I add a time stamp to each of them, I will have a lot more
data, which is not exactly what I want. In
Hi Elias,
thanks for the long write-up. It's interesting that it actually kinda works
right now.
You might be interested in a design doc that we're currently working on. I
posted it on the dev list but here it is:
https://docs.google.com/document/d/1hIgxi2Zchww_5fWUHLoYiXwSBXjv-M5eOv-MKQYN3m4/edit
Hi!
There is the option to always create a directory:
"fs.output.always-create-directory"
See
https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/config.html#file-systems
Greetings,
Stephan
On Tue, May 3, 2016 at 9:26 AM, Punit Naik wrote:
> Hello
>
> I executed my Flink code i
There is a Scaladoc but it is not covering all packages, unfortunately. In
the Scala API you can call transform without specifying a TypeInformation,
it works using implicits/context bounds.
On Tue, 3 May 2016 at 01:48 Srikanth wrote:
> Sorry for the previous incomplete email. Didn't realize I h
Hi Stephen, Fabian
setting "fs.output.always-create-directory" to true in flink-config.yml
worked!
On Tue, May 3, 2016 at 2:27 PM, Stephan Ewen wrote:
> Hi!
>
> There is the option to always create a directory:
> "fs.output.always-create-directory"
>
> See
> https://ci.apache.org/projects/flink
Yes, but be aware that your program runs with parallelism 1 if you do not
configure the parallelism.
2016-05-03 11:07 GMT+02:00 Punit Naik :
> Hi Stephen, Fabian
>
> setting "fs.output.always-create-directory" to true in flink-config.yml
> worked!
>
> On Tue, May 3, 2016 at 2:27 PM, Stephan Ewen
Hi,
even with the optimized operator for aligned time windows I would advice
against using long sliding windows with a small slide. The system will
internally create a lot of "buckets", i.e. each sliding window is treated
separately and the element is put into 1,440 buckets, in your case. With a
mo
Hey Tarandeep,
I think the failures are unrelated. Regarding the number of network
buffers:
https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/config.html#configuring-the-network-buffers
The timeouts might occur, because the task managers are pretty loaded.
I would suggest to incr
Hello all,
How could we perform *withBroadcastSet* and *groupBy* in DataStream like
that of DataSet in the below KMeans code:
DataSet newCentroids = points
.map(new SelectNearestCenter()).*withBroadcastSet*(loop, "centroids")
.map(new CountAppender()).*groupBy*(0).reduce(new CentroidAccumulator()
Hello Fabian,
we delved more moving from the input you gave us but a question arised. We
always assumed that runtime operators were open for extension without
modifying anything inside Flink but it looks like this is not the case and
the documentation assumes that the developer is working to a con
Yeah thanks for letting me know.
On 03-May-2016 2:40 PM, "Fabian Hueske" wrote:
> Yes, but be aware that your program runs with parallelism 1 if you do not
> configure the parallelism.
>
> 2016-05-03 11:07 GMT+02:00 Punit Naik :
>
>> Hi Stephen, Fabian
>>
>> setting "fs.output.always-create-direc
Hi Simone,
you are right, the interfaces you extend are not considered to be public,
user-facing API.
Adding custom operators to the DataSet API touches many parts of the system
and is not straightforward.
The DataStream API has better support for custom operators.
Can you explain what kind of op
After fixing the clock issue on the application level, the latency is as
expected. Thanks again!
Robert
On Tue, May 3, 2016 at 9:54 AM, Robert Schmidtke
wrote:
> Hi Igor, thanks for your reply.
>
> As for your first point I'm not sure I understand correctly. I'm ingesting
> records at a rate of
Just had a quick chat with Aljoscha...
The first version of the aligned window code will still duplicate the
elements, later versions should be able to get rid of that.
On Tue, May 3, 2016 at 11:10 AM, Aljoscha Krettek
wrote:
> Hi,
> even with the optimized operator for aligned time windows I w
Hi!
The order in which the elements arrive in an iteration HEAD is the order in
which the last operator in the loop (the TAIL) produces them. If that is a
deterministic ordering (because of a sorted reduce, for example), then you
should be able to rely on the order.
Otherwise, the order of elemen
Hi
I did all the settings required for cluster setup. but when I ran the
start-cluster.sh script, it only started one jobmanager on the master node.
Logs are written only on the master node. Slaves don't have any logs. And
when I ran a program it said:
Resources available to scheduler: Number of
I'm not sure this is the right way to do it but we were exploring all the
possibilities and this one is the more obvious. We also spent some time to
study how to do it to achieve a better understanding of Flink's internals.
What we want to do though is to integrate Flink with another distributed
s
Yes, you are right, the exception was caused as task managers were heavily
loaded. I checked ganglia metrics and CPU usage was very high. I reduced
parallelism and ran with 5000 buffers and didn't get any exception.
Thanks,
Tarandeep
On Tue, May 3, 2016 at 2:19 AM, Ufuk Celebi wrote:
> Hey Tara
Yes, I did notice the usage of implicit in ConnectedStreams.scala.
Better Scaladoc will be helpful, especially when compiler errors are not
clear.
Thanks
On Tue, May 3, 2016 at 5:02 AM, Aljoscha Krettek
wrote:
> There is a Scaladoc but it is not covering all packages, unfortunately. In
> the Sc
I'm not sure in regards of "withBroadcastSet", but in the DataStream you
"keyBy" instead of "groupBy".
On Tue, May 3, 2016 at 12:35 PM, subash basnet wrote:
> Hello all,
>
> How could we perform *withBroadcastSet* and *groupBy* in DataStream like
> that of DataSet in the below KMeans code:
>
> D
Yes, please go ahead. That would be helpful.
On Mon, 2 May 2016 at 21:56 Christopher Santiago
wrote:
> Hi Aljoscha,
>
> Yes, there is still a high partition/window count since I have to keyby
> the userid so that I get unique users. I believe what I see happening is
> that the second window wit
Hello Stefano,
Thank you, I found out that just sometime ago that I could use keyBy, but I
couldn't find how to set and getBroadcastVariable in datastream like that
of dataset.
For example in below code we get collection of *centroids* via broadcast.
Eg: In KMeans.java
class X extends MapFunction
Hello all,
Suppose I have the datastream as:
DataStream> *newCentroids*;
How to get collection of *newCentroids * to be able to loop as below:
private Collection> *centroids*;
for (Centroid cent : *centroids*) {
}
Best Regards,
Subash Basnet
DataStream> *newCentroids = new DataStream<>.()*
*Iterator> iter =
DataStreamUtils.collect(newCentroids);*
*List> list = Lists.newArrayList(iter);*
On Tue, May 3, 2016 at 10:26 AM, subash basnet wrote:
> Hello all,
>
> Suppose I have the datastream as:
> DataStream> *newCentroids*;
>
> How
Hi,
please keep in mind that we're dealing with streams. The Iterator might
never finish.
Cheers,
Aljoscha
On Tue, 3 May 2016 at 16:35 Suneel Marthi wrote:
> DataStream> *newCentroids = new DataStream<>.()*
>
> *Iterator> iter =
> DataStreamUtils.collect(newCentroids);*
>
> *List> list = Li
Hello,
Just to follow up on this issue: after collecting some data and setting up
additional tests we have managed to pinpoint the issue to the the
ScalaBuff-generated code that decodes enumerations. After switching to use
ScalaPB generator instead, the problem was gone.
One thing peculiar about
Why do you want collect and iterate? Why not iterate on the DataStream
itself?
May be I didn't understand your use case completely.
Srikanth
On Tue, May 3, 2016 at 10:55 AM, Aljoscha Krettek
wrote:
> Hi,
> please keep in mind that we're dealing with streams. The Iterator might
> never finish.
>
Till,
Thanks again for putting this together. It is certainly along the lines of
what I want to accomplish, but I see some problem with it. In your code
you use a ValueStore to store the priority queue. If you are expecting to
store a lot of values in the queue, then you are likely to be using
Hello!
Has anyone used Flink in "production" for PLC's sanomaly detections?
Any pointers/docs to check?
Best regards,
Iván Venzor C.
Hi there,
I run a test job with filestatebackend and save checkpoints on s3 (via s3a)
The job crash when checkpoint triggered. Looking into s3 directory and list
objects. I found the directory is create successfully but all checkpoints
directory size are empty.
The host running task manager show
I am using the flink connector to read from a kafka stream, I ran into the
problem where the flink job went down due to some application error, it was
down for sometime, meanwhile the kafka queue was growing as expected no
consumer to consume from the given group , and when I started the flink it
s
What is the flink documentation you were following to set up your cluster ,
can you point to that ?
On Tue, May 3, 2016 at 6:21 PM, Punit Naik wrote:
> Hi
>
> I did all the settings required for cluster setup. but when I ran the
> start-cluster.sh script, it only started one jobmanager on the ma
35 matches
Mail list logo