> Many articles suggest model TimeUUID in columns instead of rows, but since> > only one node can serve a single row, won't this lead to hot spot problems?
It won't cause hotspots as long as you are sharding by a small enough time period, like hour, day, or week. I.e. the key is the hour day or week, then your column can either be the timestampUUID, or time offset (from the key) + UUID. Now your data will be split across multiple rows, and won't create hotspots. However, if you are going to query across time boundaries, then your client logic will have to handle that and perform the slices for each row. Zach On Fri, Nov 4, 2011 at 7:45 PM, Gary Shi <gary...@gmail.com> wrote: > Thank you. > > I agree that request "lots of" machines process a single query could be > slow, if there are hundreds of them instead of dozens. Will a cluster of > e.g. 4-20 nodes behave well if we spread the query to all nodes? > > Many articles suggest model TimeUUID in columns instead of rows, but since > only one node can serve a single row, won't this lead to hot spot problems? > > 在 2011-11-4 晚上10:28,"Sylvain Lebresne" <sylv...@datastax.com>写道: >> >> On Fri, Nov 4, 2011 at 1:49 PM, Gary Shi <gary...@gmail.com> wrote: >> > I want to save time series event logs into Cassandra, and I need to load >> > them by key range (row key is time-based). But we can't use >> > RandomPartitioner in this way, while OrderPreservingPartitioner leads to >> > hot >> > spot problem. >> > >> > So I wonder why Cassandra save SSTable by sorted row tokens instead of >> > keys: >> > if rows in SSTable are sorted by keys, it should be quite easy to return >> > rows by key range -- token should be used to determine which node >> > contains >> > the data. For key range requests, Cassandra could ask every node for >> > that >> > range of rows, merge them and return to the caller. >> >> Without going for exhaustiveness: >> - Requesting every node is not too scalable. Cassandra is built to target >> the >> 'lots of cheap machines' kind of cluster, so that kind of operation is >> going the >> exact opposite way. In other words, that would be slow enough that you're >> better off modeling this using columns for time series. >> - That would make topology operations (bootstrap, move, decommission) >> much more costly, because we wouldn't be able to tell which keys to move >> unless we iterate over all the data each time. >> >> -- >> Sylvain >> >> > >> > -- >> > regards, >> > Gary Shi >> > >