Re: sampling function

2016-07-13 Thread Le Quoc Do
ris > > > > > On 11 Jul 2016, at 16:15, Le Quoc Do wrote: > > > > > > Hi all, > > > > > > Thank you all for your answers. > > > By the way, I also recognized that Flink doesn't support "stratified > > >

Re: sampling function

2016-07-12 Thread Till Rohrmann
l there I guess. > > Paris > > > On 11 Jul 2016, at 16:15, Le Quoc Do wrote: > > > > Hi all, > > > > Thank you all for your answers. > > By the way, I also recognized that Flink doesn't support "stratified > > sampling" function (onl

Re: sampling function

2016-07-12 Thread Paris Carbone
On 11 Jul 2016, at 16:15, Le Quoc Do wrote: > > Hi all, > > Thank you all for your answers. > By the way, I also recognized that Flink doesn't support "stratified > sampling" function (only simple random sampling) for DataSet. > It would be nice if someone c

Re: sampling function

2016-07-11 Thread Le Quoc Do
Hi all, Thank you all for your answers. By the way, I also recognized that Flink doesn't support "stratified sampling" function (only simple random sampling) for DataSet. It would be nice if someone can create a Jira for it, and assign the task to me so that I can work for it. Th

Re: sampling function

2016-07-11 Thread Vasiliki Kalavri
11:31, Kostas Kloudas wrote: > Hi Do, > > In DataStream you can always implement your own > sampling function, hopefully without too much effort. > > Adding such functionality it to the API could be a good idea. > But given that in sampling there is no “one-size-fits-all” > so

Re: sampling function

2016-07-11 Thread Kostas Kloudas
Hi Do, In DataStream you can always implement your own sampling function, hopefully without too much effort. Adding such functionality it to the API could be a good idea. But given that in sampling there is no “one-size-fits-all” solution (as not every use case needs random sampling and not

Re: sampling function

2016-07-09 Thread Greg Hogan
Hi Do, DataSet provides a stable @Public interface. DataSetUtils is marked @PublicEvolving which is intended for public use, has stable behavior, but method signatures may change. It's also good to limit DataSet to common methods whereas the utility methods tend to be used for specific application

sampling function

2016-07-09 Thread Le Quoc Do
Hi all, I'm working on approximate computing using sampling techniques. I recognized that Flink supports the sample function for Dataset (org/apache/flink/api/java/utils/DataSetUtils.java). I'm just wondering why you didn't merge the function to org/apache/flink/api/java/DataSet.java since the sam