Just found a paper from Google in this year's SIGMOD accepted papers about joining continuous data streams:
Photon: Fault-tolerant and scalable joining of continuous data streams. [slides: http://cloud.berkeley.edu/data/photon.pdf]. It uses sharding + Paxos to ensure scalability and exactly-once semantics during recovery. -- Guozhang