Re: sampling function

2016-07-09 Thread Greg Hogan
Hi Do, DataSet provides a stable @Public interface. DataSetUtils is marked @PublicEvolving which is intended for public use, has stable behavior, but method signatures may change. It's also good to limit DataSet to common methods whereas the utility methods tend to be used for specific application

sampling function

2016-07-09 Thread Le Quoc Do
Hi all, I'm working on approximate computing using sampling techniques. I recognized that Flink supports the sample function for Dataset (org/apache/flink/api/java/utils/DataSetUtils.java). I'm just wondering why you didn't merge the function to org/apache/flink/api/java/DataSet.java since the sam

Re: Random access to small global state

2016-07-09 Thread Suneel Marthi
U could use ignite too, I believe they have a plugin for flink streaming. Sent from my iPhone > On Jul 9, 2016, at 8:05 AM, Sebastian wrote: > > Hi, > > I'm planning to work on a streaming recommender in Flink, and one problem > that I have is that the algorithm needs random access to a small

Random access to small global state

2016-07-09 Thread Sebastian
Hi, I'm planning to work on a streaming recommender in Flink, and one problem that I have is that the algorithm needs random access to a small global state (say a million counts). It should be ok if there is some inconsistency in the state (e.g., delay in seeing updates). Does anyone here ha

Re: Extract type information from SortedMap

2016-07-09 Thread Yukun Guo
Hi Robert, On 9 July 2016 at 00:25, Robert Metzger wrote: > Hi Yukun, > > can you also post the code how you are invoking the GenericFlatMapper on > the mailing list? > Here is the code defining the topology: DataStream stream = ...; stream .keyBy(new KeySelector() { @Overr

Modifying start-cluster scripts to efficiently spawn multiple TMs

2016-07-09 Thread Saliya Ekanayake
Hi, The current start/stop scripts SSH worker nodes each time they appear in the slaves file. When spawning multiple TMs (like 24 per node), this is very inefficient. I've changed the scripts to do one SSH per node and spawn a given N number of TMs afterwards. I can make a pull request if this se