Re: sampling function

2016-07-13 Thread Le Quoc Do
rs could make a better fit in the > ML > > library and not in the core API but I am not very familiar with the > > milestones there. > > Maybe the maintainers of the batch ML library could check if sampling > > techniques could be useful there I guess. > > > > Pa

Re: sampling function

2016-07-11 Thread Le Quoc Do
stable @Public interface. DataSetUtils is marked > > > @PublicEvolving which is intended for public use, has stable behavior, > > but > > > method signatures may change. It's also good to limit DataSet to common > > > methods whereas the utility methods tend

sampling function

2016-07-09 Thread Le Quoc Do
Hi all, I'm working on approximate computing using sampling techniques. I recognized that Flink supports the sample function for Dataset (org/apache/flink/api/java/utils/DataSetUtils.java). I'm just wondering why you didn't merge the function to org/apache/flink/api/java/DataSet.java since the sam

Performing consecutive Action operators

2016-03-31 Thread Le Quoc Do
Hi all, Right now, in Flink, if I call to 2 action operators (print, count, collect, ) consecutively, Flink will create 2 independent execution plans. A simple example: DataSet text = env.fromElements( "Some text ….", );

Re: Flink deployment fabric script

2015-11-09 Thread Le Quoc Do
Hi Max, Thank you for your comments. you wrote: > Hi Do, > For example, the Flink part is available here: > https://github.com/mxm/yoka/blob/master/cluster/flink.py nice one. The scope of such a script would be to a) bring up > instances at a clouder provider b) install Flink and its dependen

Flink deployment fabric script

2015-11-08 Thread Le Quoc Do
Hi Flinkers, I'm start working with Flink and I would like to contribute to Flink. However, I'm a very new Flinker, so the first thing I could contribute is a one-click style deployment script to deploy Flink, Spark and Hadoop Yarn on cluster and cloud computing environments (OpenStack based Cloud