Taskmanager memory

2015-12-09 Thread Kruse, Sebastian
Hi everyone, I am currently looking into how Flink can coexist and interoperate with other frameworks in a cluster, such as plain single-machine processes or Spark?. ?Tachyon seems to be nice solution to exchange data between them. However, I think it is a problem that Flink's taskmanagers al

Re: Taskmanager memory

2015-12-09 Thread Kruse, Sebastian
r setup. In a YARN setup, you can flexibly start and stop Flink sessions with different configurations (memory, TMs, slots) or run a single job. When running a single job, Flink will allocate resources and free them after the job is done. Best, Fabian 2015-12-09 9:46 GMT+01:00 Kruse, Sebastian mai

DeltaIterations: shrink solution set

2015-02-10 Thread Kruse, Sebastian
Hi everyone, >From playing around a bit around with delta iterations, I saw that you can >update elements from the solution set and add new elements. My question is: is >it possible to remove elements from the solution set (apart from marking them >as "deleted" somehow)? My use case at hand fo

RE: DeltaIterations: shrink solution set

2015-02-11 Thread Kruse, Sebastian
particular algorithm? I was thinking about examples iterative algorithms with this property... Regards, A. 2015-02-10 14:18 GMT+01:00 Kruse, Sebastian mailto:sebastian.kr...@hpi.de>>: Hi everyone, From playing around a bit around with delta iterations, I saw that you can update elemen

RE: DeltaIterations: shrink solution set

2015-02-11 Thread Kruse, Sebastian
:40 AM, Kruse, Sebastian mailto:sebastian.kr...@hpi.de>> wrote: Thanks for your answers. I am trying to build an apriori-like algorithm to find key candidates in a relational dataset. I was considering delta iterations, because the algorithm should maintain two datasets: a set of

Efficient datatypes?

2015-02-19 Thread Kruse, Sebastian
Hi everyone, I think that during one of the meetups, it was mentioned that Flink can in some cases operate on serialized data. Given I understood that correctly, which cases that would be, i.e, which data types and operators support such a feature? Cheers, Sebastian --- Sebastian Kruse Doktor

Release date 0.9

2015-04-13 Thread Kruse, Sebastian
Hello everyone, we are having a seminar at HPI that is supposed to compare Flink and Spark. The students should implement solutions for different data analytics problems on both platforms. Since Flink is still moving a lot, we would like to have the students work on the latest version and mayb

Collections within POJOs/tuples

2015-04-17 Thread Kruse, Sebastian
Hello everyone, I was just wondering, which class would be most efficient to store collections of primitive elements, and which one to store objects, within POJOs and tuples from a serialization point of view. And would it make any difference if such a collection is not embedded within a POJO/t

Load balancing

2015-06-09 Thread Kruse, Sebastian
Hi folks, I would like to do some load balancing within one of my Flink jobs to achieve good scalability. The rebalance() method is not applicable in my case, as the runtime is dominated by the processing of very few larger elements in my dataset. Hence, I need to distribute the processing work

RE: how can Combine between two dataset in on datset and execution more condition in the same time

2015-06-11 Thread Kruse, Sebastian
Hi, You might want to translate your SQL statement into an expression of the relational algebra at first [1]. This expression can be expressed with Flink's operators in a straight-forward manner. In the end, it will look something like this: Employees.filter(_.job_id = ...).join(departments.fil

RE: Load balancing

2015-06-11 Thread Kruse, Sebastian
...@gmail.com>> wrote: Hi Sebastian, I agree, shuffling only specific elements would be a very useful feature, but unfortunately it's not supported (yet). Would you like to open a JIRA for that? Cheers, Fabian 2015-06-09 17:22 GMT+02:00 Kruse, Sebastian mailto:sebastian.kr...@hpi.de&

RE: how can Combine between two dataset in on datset and execution more condition in the same time

2015-06-12 Thread Kruse, Sebastian
That example was written in Scala, if you are using Java, then the join function is applied with the function "with(..)". However, logically you do the same in Scala and Java, there a just some minor differences in the API. -Original Message- From: hagersaleh [mailto:loveallah1...@yahoo.

RE: Load balancing

2015-06-15 Thread Kruse, Sebastian
u can process the subelements in parallel, and finally group by key to aggregate the result. Cheers, -- Gianmarco On 11 June 2015 at 19:16, Kruse, Sebastian mailto:sebastian.kr...@hpi.de>> wrote: Hi Gianmarco, Thanks for the pointer! I had a quick look at the paper, but unfortunately I

RE: Random Selection

2015-06-15 Thread Kruse, Sebastian
Hi everyone, I did not reenact it, but I think the problem here is rather the anonymous class. It looks like it is created within a class, not an object. Thus it is not “static” in Java terms, which means that also its surrounding class (the job class) will be serialized. And in this job class,