Hi Wow that was lot of information...
Think about users storing files online (means with their customer name) - each customer maintains his own "hashtable" of files. Each File can consist of some or several thousand entries (depends on the size of the whole file). for example: File Test.doc consists of 3 * 64K Blobs - each blob does have the hash value ABC - so we will only store one blob, one hash value and the entry that this blob is needed 3 times, so we try to avoid duplicate file data (and there's a lot of duplicates) Means our modell is: Filename: Test.doc Hashes: 3 Hash 1: ABC Hash 2: ABC Hash 3: ABC BLOB:ABC Used: 3 Binary: Data 1* Each customer does have: Customer: Customer:ID Filename: Test.doc MetaData (Path / Accessed / modified / Size / compression / OS-Type from / Security) Version: 0 Hash 1 / Hash 2 / Hash 3 Filename: Another.doc MetaData (Path / Accessed / modified / Size / compression / OS-Type from / Security) Version: 0 Hash 1 / Hash 2 / Hash 3 Version: 1 Hash 1 / Hash 2 / Hash 4 Version: 2 Hash 3 / Hash 2 / Hash 2 Hope this clear some things :-) Mike 2010/7/26 Aaron Morton <aa...@thelastpickle.com> > Some background reading.. > http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/ > > Not sure on your follow up question, so I'll just wildly blather on about > things :) > > My assumption of your data is you have 64K chunks that are identified by a > hash, which can somehow be grouped together into larger files (so there is a > "file name" of sorts). > > One possible storage design (assuming the Random Partitioner) is.... > > A Chunks CF, each row in this CF uses the hash of the chunk as it's key and > has is a single column with the chunk data. You could use more columns to > store meta here. > > A ChunkIndex CF, each row uses the file name (from above) as the key and > has one column for each chunk in the file. The column name *could* be an > offset for the chunk and the column value could be the hash for the chunk. > Or you could use the chunk hash as the col name and the offset as the col > value if needed. > > To rebuild the file read the entire row from the ChunkIndex, then make a > series of multi gets to read all the chunks. Or you could lazy populate the > ones you needed. > > This is all assuming that the 1000's comment below means you could want to > combine the chunks 60+ MB chunks. It would be easier to keep all the chunks > together in one row, if you are going to have large (unbounded) file size > this may not be appropriate. > > You could also think about using the order preserving partitioner, and > using a compound key for each row such as "file_name_hash.offset" . Then by > using the get_range_slices to scan the range of chunks for a file you would > not need to maintain a secondary index. Some drawbacks to that approach, > read the article above. > > Hope the helps > Aaron > > > > On 26 Jul, 2010,at 04:01 PM, Michael Widmann <michael.widm...@gmail.com> > wrote: > > Thanks for this detailed description ... > > You mentioned the secondary index in a standard column, would it be better > to build several indizes? > Is that even possible to build a index on for example 32 columns? > > The hint with the smaller boxes is very valuable! > > Mike > > 2010/7/26 Aaron Morton <aa...@thelastpickle.com> > >> For what it's worth... >> >> * Many smaller boxes with local disk storage are preferable to 2 with huge >> NAS storage. >> * To cache the hash values look at the KeysCached setting in the >> storage-config >> * There are some row size limits see >> http://wiki.apache.org/cassandra/CassandraLimitations >> * If you wanted to get 1000 blobs, rather then group them in a single row >> using a super column consider building a secondary index in a standard >> column. One CF for the blobs using your hash, one CF that uses whatever they >> grouping key is with a col for every blobs hash value. Read from the index >> first, then from the blobs themselves. >> >> Aaron >> >> >> >> On 24 Jul, 2010,at 06:51 PM, Michael Widmann <michael.widm...@gmail.com> >> wrote: >> >> Hi Jonathan >> >> Thanks for your very valuable input on this. >> >> I maybe didn't enough explanation - so I'll try to clarify >> >> Here are some thoughts: >> >> >> - binary data will not be indexed - only stored. >> - The file name to the binary data (a hash) should be indexed for >> search >> - We could group the hashes in 62 "entry" points for search retrieving >> -> i think suprcolumns (If I'm right in terms) (a-z,A_Z,0-9) >> - the 64k Blobs meta data (which one belong to which file) should be >> stored separate in cassandra >> - For Hardware we rely on solaris / opensolaris with ZFS in the >> backend >> - Write operations occur much more often than reads >> - Memory should hold the hash values mainly for fast search (not the >> binary data) >> - Read Operations (restore from cassandra) may be async - (get about >> 1000 Blobs) - group them restore >> >> So my question is too: >> >> 2 or 3 Big boxes or 10 till 20 small boxes for storage... >> Could we separate "caching" - hash values CFs cashed and indexed - binary >> data CFs not ... >> Writes happens around the clock - on not that tremor speed but constantly >> Would compaction of the database need really much disk space >> Is it reliable on this size (more my fear) >> >> thx for thinking and answers... >> >> greetings >> >> Mike >> >> 2010/7/23 Jonathan Shook <jsh...@gmail.com> >> >>> There are two scaling factors to consider here. In general the worst >>> case growth of operations in Cassandra is kept near to O(log2(N)). Any >>> worse growth would be considered a design problem, or at least a high >>> priority target for improvement. This is important for considering >>> the load generated by very large column families, as binary search is >>> used when the bloom filter doesn't exclude rows from a query. >>> O(log2(N)) is basically the best achievable growth for this type of >>> data, but the bloom filter improves on it in some cases by paying a >>> lower cost every time. >>> >>> The other factor to be aware of is the reduction of binary search >>> performance for datasets which can put disk seek times into high >>> ranges. This is mostly a direct consideration for those installations >>> which will be doing lots of cold reads (not cached data) against large >>> sets. Disk seek times are much more limited (low) for adjacent or near >>> tracks, and generally much higher when tracks are sufficiently far >>> apart (as in a very large data set). This can compound with other >>> factors when session times are longer, but that is to be expected with >>> any system. Your storage system may have completely different >>> characteristics depending on caching, etc. >>> >>> The read performance is still quite high relative to other systems for >>> a similar data set size, but the drop-off in performance may be much >>> worse than expected if you are wanting it to be linear. Again, this is >>> not unique to Cassandra. It's just an important consideration when >>> dealing with extremely large sets of data, when memory is not likely >>> to be able to hold enough hot data for the specific application. >>> >>> As always, the real questions have lots more to do with your specific >>> access patterns, storage system, etc I would look at the benchmarking >>> info available on the lists as a good starting point. >>> >>> >>> On Fri, Jul 23, 2010 at 11:51 AM, Michael Widmann >>> <michael.widm...@gmail.com> wrote: >>> > Hi >>> > >>> > We plan to use cassandra as a data storage on at least 2 nodes with >>> RF=2 >>> > for about 1 billion small files. >>> > We do have about 48TB discspace behind for each node. >>> > >>> > now my question is - is this possible with cassandra - reliable - means >>> > (every blob is stored on 2 jbods).. >>> > >>> > we may grow up to nearly 40TB or more on cassandra "storage" data ... >>> > >>> > anyone out did something similar? >>> > >>> > for retrieval of the blobs we are going to index them with an hashvalue >>> > (means hashes are used to store the blob) ... >>> > so we can search fast for the entry in the database and combine the >>> blobs to >>> > a normal file again ... >>> > >>> > thanks for answer >>> > >>> > michael >>> > >>> >> >> >> >> -- >> bayoda.com - Professional Online Backup Solutions for Small and Medium >> Sized Companies >> >> > > > -- > bayoda.com - Professional Online Backup Solutions for Small and Medium > Sized Companies > > -- bayoda.com - Professional Online Backup Solutions for Small and Medium Sized Companies