Hi,
S4 is inspired by the actor model: it is not a strict implementation,
but provides asynchronous event based processing, state encapsulation,
safe messaging and location transparency.
It also incorporates data partitioning, as in MapReduce.
AFAIK, MapReduce Online extends Hadoop with continuous micro batching
but still retains some blocking behaviour, disk I/O, and tradeoffs
coming from adapting a platform originally optimized for large batch
processing.
In contrast, the main objective of S4 is to provide a generic platform
for low latency data processing. You can define arbitrarily complex
graphs of PEs, everything is processed in memory, in a stateful manner
(if needed), and there is typically no disk I/O.
Hope this helps,
Matthieu
On 10/16/12 7:49 AM, 杨定裕 wrote:
Hello, all,
S4 is using actor model to implement real-time processing.
Each PE is regarded as a actor and messages communicate between actors.
While I read the paper "Mapreduce Online", it also supports pipeline
online processing and near real-time and stream publish results.
Therefore, I am really interested in Where is the actor model in S4, and
What it can do but MapRduce cannot?
Thank you !
Dingyu Yang