If new keys don't follow roughly the same distribution as the ones you already have, then yes, you're going to be load balancing a lot.
On Thu, Sep 16, 2010 at 2:24 AM, Hien. To Trong <hie...@vng.com.vn> wrote: > Hi all, > Thanks you for your comments. > I already read some papers about load balancing in P2P which some of them > allow range query. > During this time, I find another problem: "NEW KEYS" > > Karger - Simple efficient load balancing algorithms for p2p systems. > (support OPP) > John Byers - Simple load balancing for distributed hash table. > Third: Ananth Rao - Load balancing in structured P2P systems. > I think the first paper is the best to get the background. > > Prefix Hash Tree: An Indexing Data Structure over Distributed Hash Tables > and Range queries over DHTs > These paper use PHT to allow range query. > > James Aspnes - Skip Graphs (most recent as I known) > and Distributed Balanced Tables, Not making a hash of it All. > > All of these papers deal with load balancing and range query probs by > creating schemes or strategies > based on only "load - number of key on each nodes", not care about new keys > and highly-accessed keys. > HOWEVER, in fact, "the most recent keys are likely accessed more > frequently". > > I suppose, we have 400.000 "NEW KEYS" in 2 recent days (are likely accessed > more frequently). --> A scheme: these new keys are uniformly partitioned and > divided into some successive nodes. For example, there are 10 node > (N1...N10), keys in day 1 will be put into node_1_2, keys in day 2 will be > put into node_3_4 and so on... But, the prob is that number of keys and > nodes increase/decrease day by day (not fixed) --> the number of node used > to store keys in each day may increase/decrease (1 or 3 node for example). > > Does any one have any ideas or know papers to deal with this prob (new keys > and highly-accessed keys)? > > Thanks a lot. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com