Thank you. I agree that request "lots of" machines process a single query could be slow, if there are hundreds of them instead of dozens. Will a cluster of e.g. 4-20 nodes behave well if we spread the query to all nodes?
Many articles suggest model TimeUUID in columns instead of rows, but since only one node can serve a single row, won't this lead to hot spot problems? 在 2011-11-4 晚上10:28,"Sylvain Lebresne" <sylv...@datastax.com>写道: > On Fri, Nov 4, 2011 at 1:49 PM, Gary Shi <gary...@gmail.com> wrote: > > I want to save time series event logs into Cassandra, and I need to load > > them by key range (row key is time-based). But we can't use > > RandomPartitioner in this way, while OrderPreservingPartitioner leads to > hot > > spot problem. > > > > So I wonder why Cassandra save SSTable by sorted row tokens instead of > keys: > > if rows in SSTable are sorted by keys, it should be quite easy to return > > rows by key range -- token should be used to determine which node > contains > > the data. For key range requests, Cassandra could ask every node for that > > range of rows, merge them and return to the caller. > > Without going for exhaustiveness: > - Requesting every node is not too scalable. Cassandra is built to target > the > 'lots of cheap machines' kind of cluster, so that kind of operation is > going the > exact opposite way. In other words, that would be slow enough that you're > better off modeling this using columns for time series. > - That would make topology operations (bootstrap, move, decommission) > much more costly, because we wouldn't be able to tell which keys to move > unless we iterate over all the data each time. > > -- > Sylvain > > > > > -- > > regards, > > Gary Shi > > >