Re: Dataset and select/split functionality

2017-03-03 Thread CPC
Hi Fabian, Thank you for your explanation. Also can you give an example on how the optimizer behaves on the assumption that the outputs of a function are replicated? Thank you... On 3 March 2017 at 13:52, Fabian Hueske wrote: > Hi CPC, > > we had several requests in the past to

Dataset and select/split functionality

2017-03-02 Thread CPC
Hi all, We will try to implement select/split functionality for batch api. We looked at streaming side and understand how it works but since streaming side does not include an optimizer it was easier. Since adding such a runtime operator will affect optimizer layer as well, is there a part that yo

Re: Dataset and eager scheduling

2017-03-02 Thread CPC
Hi Till, Thank you. On 2 March 2017 at 18:13, Till Rohrmann wrote: > Hi CPC, > > I think that the optimizer does not take the scheduling mode into account > when optimizing a Flink job. > > Cheers, > Till > > On Thu, Mar 2, 2017 at 11:52 AM, CPC wrote: > >

Dataset and eager scheduling

2017-03-02 Thread CPC
Hi all, Currently our team trying implement a runtime operator also playing with scheduler. We are trying to understand batch optimizer but it will take some time. What we want to know is whether changing batch scheduling mode from LAZY_FROM_SOURCES to EAGER could affect optimizer? I mean whether

Re: [Discuss] FLIP-13 Side Outputs in Flink

2016-10-26 Thread CPC
Is it just related to stream api? This feature could be really useful for etl scenarios with dataset api as well. On Oct 26, 2016 22:29, "Fabian Hueske" wrote: > Hi Chen, > > thanks for this interesting proposal. I think side output would be a very > valuable feature to have! > > I went of the F

release 1.1 rc2 confusion

2016-08-07 Thread CPC
Hi, On flink downloads page it says that the latest version is 1.1 and have links for 1.1 binaries but hasnt rc2 voted recently? Are those binaries 1.1 or rc2?

Re: DagConnection tempMode vs breakPipeline property

2016-07-04 Thread CPC
izer, but I would expect everything that goes back to > the DataExchangeMode to be correct. The rest should be an artifact of > the old pipeline breaker logic and not be used any more. Does this > help in any way? > > On Thu, Jun 16, 2016 at 4:46 PM, CPC wrote: > > Hi everybody, >

Re: offheap memory allocation and memory leak bug

2016-06-20 Thread CPC
e runtime performance at the > cost of a slightly longer start-up time for your TaskManagers. > > Cheers, > Till > ​ > > On Sun, Jun 19, 2016 at 6:16 PM, CPC wrote: > > > Hi, > > > > I think i found some information regarding this behavior. In jvm it is &

Re: offheap memory allocation and memory leak bug

2016-06-19 Thread CPC
ct-when-JNA-not-available-td6977711.html ). I think currently only way to limit memory usage in flink if you want to use same taskmanager across jobs is via "taskmanager.memory.preallocate: true". Since it allocate memory at the beginning and not freed its memory usage stays constant. PS:

Re: offheap memory allocation and memory leak bug

2016-06-18 Thread CPC
skManager --configDir > /home/capacman/Data/programlama/flink-1.0.3/conf > but memory usage reach up to 13Gb. Could somebodey explain me why memory usage is so high? I expect it to be at most 8GB with some jvm internal overhead. [image: Inline images 1] [image: Inline images 2] On 17 Jun

offheap memory allocation and memory leak bug

2016-06-17 Thread CPC
Hi, I am making some test about offheap memory usage and encounter an odd behavior. My taskmanager heap limit is 12288 Mb and when i set "taskmanager.memory.off-hep:true" for every job it allocates 11673 Mb off heap area at most which is heapsize*0.95(value of taskmanager.memory.fraction). But whe

DagConnection tempMode vs breakPipeline property

2016-06-16 Thread CPC
Hi everybody, I am trying to understand flink optimizer in DataSet api. But i dont understant tempMode and breakPipeline properties. TempMode enum also has a breakPipeline property but they both accessed and set in different places. DagConnection.breakPipeline is used to select network io DataExch

Re: incremental Checkpointing , Rocksdb HA

2016-06-09 Thread CPC
Cassandra backend would be interesting especially if flink could benefit from cassandra data locality. Cassandra/spark integration is using this for information to schedule spark tasks. On 9 June 2016 at 19:55, Nick Dimiduk wrote: > You might also consider support for a Bigtable > backend: HBas

Re: DataStream split/select behaviour

2016-06-07 Thread CPC
Data/wiki/14533") env.execute() } } On 7 June 2016 at 19:26, CPC wrote: > Hello everyone, > > When i use DataStream split/select,it always send all selected records to > same taskmanager. Is there any reason for this behaviour? Also is it > possible to implement same spli

DataStream split/select behaviour

2016-06-07 Thread CPC
Hello everyone, When i use DataStream split/select,it always send all selected records to same taskmanager. Is there any reason for this behaviour? Also is it possible to implement same split/select behaviour for DataSet api(without using a different filter for every output )? I found this https:

Re: Dataset split/demultiplex

2016-05-12 Thread CPC
only allow > one output from an operation right now. Maybe we can extends this somehow > in the future. > > Cheers, > Aljoscha > > On Thu, 12 May 2016 at 17:27 CPC wrote: > > > Hi Gabor, > > > > Yes functionally this helps. But in this case i am proc

Re: Dataset split/demultiplex

2016-05-12 Thread CPC
t[A] = xs.filter(f1) > val split2: DataSet[A] = xs.filter(f2) > > where f1 and f2 are true for those elements that should go into the > first and second DataSets respectively. So far, the splits will just > contain elements from the input DataSet, but you can of course apply

Dataset split/demultiplex

2016-05-12 Thread CPC
Hi folks, Is there any way in dataset api to split Dataset[A] to Dataset[A] and Dataset[B] ? Use case belongs to a custom filter component that we want to implement. We will want to direct input elements whose result is false after we apply the predicate. Actually we want to direct input elements

Re: Data locality and scheduler

2016-04-26 Thread CPC
> Best, Fabian > > > 2016-04-26 16:59 GMT+02:00 CPC : > > > Hi, > > > > I look at some scheduler documentations but could not find answer to my > > question. My question is: suppose that i have a big file on 40 node > hadoop > > cluster and since it is a

Data locality and scheduler

2016-04-26 Thread CPC
Hi, I look at some scheduler documentations but could not find answer to my question. My question is: suppose that i have a big file on 40 node hadoop cluster and since it is a big file every node has at least one chunk of the file. If i write a flink job and want to filter file and if job has par

Re: Flink optimizer optimizations

2016-04-17 Thread CPC
Himmm i understand now. Thank you guys:) On Apr 16, 2016 2:21 PM, "Matthias J. Sax" wrote: > Sure. WITHOUT. > > Thanks. Good catch :) > > On 04/16/2016 01:18 PM, Ufuk Celebi wrote: > > On Sat, Apr 16, 2016 at 1:05 PM, Matthias J. Sax > wrote: > >> (with the need to sort the data, because both >

Flink optimizer optimizations

2016-04-15 Thread CPC
Hi When i look for what kind of optimizations flink does, i found https://cwiki.apache.org/confluence/display/FLINK/Optimizer+Internals is it up to date? Also i couldnt understand: "Reusing of partitionings and sort orders across operators. If one operator leaves the data in partitioned fashion