Okay . That really made a knot into my brain - It twist's a little bit now I've to draw that on the whiteboard to understand it better ... but I've seen some very interesting cornerstones in your answer for our project.
really thanks a lot mike 2010/7/26 aaron morton <aa...@thelastpickle.com> > I see, got carried away thinking about it so here are some thoughts.... > > Your access patterns will determine the best storage design, so it's > probably not the best solution. I would welcome thoughts from others. > > => Standard CF: Chunks > > * key is chunk hash > * col named 'data' col value is chunk data > * another col with name "hash" and value of the hash. > * could also store access data against the individual chunks here > * probably cache lots of keys, and a only a few rows > > - to test if the hashes you have already exist do a multi get slice . But > you need to specify a col to return the value for, and not the big data one. > So use the hash column. > > => Standard CF: CustomerFiles > > * key is customer id > * col per file name, col value perhaps latest version number or last > accessed > * cache keys and rows > > - to get all the files for a customer slice the customers row > - if you want access data when you list all files for a customer, use a > super CF with a super col for each file name. Store the meta data for the > file in this CF *and* in the Files CF > > => Super CF: Files > > * key is client_id.file_name > * super column called "meta" with path / accessed etc. including versions > * super column called "current" with columns named "0001" and col values as > the chunk hash > * super column called "version.X" with the same format as above for current > * cache keys and rows > > - assumes meta is shared across versions > - to rebuild a file get_slice for all cols in the "current" super col and > then do multi gets for the chunks > - row grows as versions of file grows, but only storing links to the chunks > so probably OK. Consider how many versions X how many chunks, then may way > to make the number of rows grow instead of row size. Perhaps have a > FileVersions CF where the key includes the version number, then maintain > information about the current version in both Files and FileVersions CFs. > Files CF would only ever have the current file, update access meta data in > both CF's. > > => Standard CF: ChunkUsage > > * key is chunk hash > * col name is a versioned file key cust_id.file_name.version col value is > the count of usage in that version > * no caching > > - cannot increase a counter until cassandra 0.7, so cannot keep a count of > chunk usage. > - this is the reverse index to see where the chunk is used. > - does not consider what is current, just all time usage. > - not sure how much re-use their is, but row size grows with reuse. Should > be ok for couple of million cols. > > > Oh and if your going to use Hadoop / PIG to analyse the data in this > beastie you need to think about that in the design. You'll probably want > single CF to serve the queries with. > > Hope that helps > Aaron > > > On 26 Jul 2010, at 17:29, Michael Widmann wrote: > > Hi > > Wow that was lot of information... > > Think about users storing files online (means with their customer name) - > each customer maintains his own "hashtable" of files. Each File can consist > of some or several thousand entries (depends on the size of the whole file). > > > for example: > > File Test.doc consists of 3 * 64K Blobs - each blob does have the hash > value ABC - so we will only store one blob, one hash value and the entry > that this blob is needed 3 times, so we try to avoid duplicate file data > (and there's a lot of duplicates) > > Means our modell is: > > Filename: > Test.doc > Hashes: 3 > Hash 1: ABC > Hash 2: ABC > Hash 3: ABC > BLOB:ABC > Used: 3 > Binary: Data 1* > > Each customer does have: > > Customer: Customer:ID > > Filename: Test.doc > MetaData (Path / Accessed / modified / Size / compression / OS-Type from / > Security) > Version: 0 > Hash 1 / Hash 2 / Hash 3 > > Filename: Another.doc > > MetaData (Path / Accessed / modified / Size / compression / OS-Type from > / Security) > Version: 0 > Hash 1 / Hash 2 / Hash 3 > Version: 1 > Hash 1 / Hash 2 / Hash 4 > Version: 2 > Hash 3 / Hash 2 / Hash 2 > > Hope this clear some things :-) > > Mike > > > 2010/7/26 Aaron Morton <aa...@thelastpickle.com> > >> Some background reading.. >> http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/ >> >> Not sure on your follow up question, so I'll just wildly blather on about >> things :) >> >> My assumption of your data is you have 64K chunks that are identified by a >> hash, which can somehow be grouped together into larger files (so there is a >> "file name" of sorts). >> >> One possible storage design (assuming the Random Partitioner) is.... >> >> A Chunks CF, each row in this CF uses the hash of the chunk as it's key >> and has is a single column with the chunk data. You could use more columns >> to store meta here. >> >> A ChunkIndex CF, each row uses the file name (from above) as the key and >> has one column for each chunk in the file. The column name *could* be an >> offset for the chunk and the column value could be the hash for the chunk. >> Or you could use the chunk hash as the col name and the offset as the col >> value if needed. >> >> To rebuild the file read the entire row from the ChunkIndex, then make a >> series of multi gets to read all the chunks. Or you could lazy populate the >> ones you needed. >> >> This is all assuming that the 1000's comment below means you could want to >> combine the chunks 60+ MB chunks. It would be easier to keep all the chunks >> together in one row, if you are going to have large (unbounded) file size >> this may not be appropriate. >> >> You could also think about using the order preserving partitioner, and >> using a compound key for each row such as "file_name_hash.offset" . Then by >> using the get_range_slices to scan the range of chunks for a file you would >> not need to maintain a secondary index. Some drawbacks to that approach, >> read the article above. >> >> Hope the helps >> Aaron >> >> >> >> On 26 Jul, 2010,at 04:01 PM, Michael Widmann <michael.widm...@gmail.com> >> wrote: >> >> Thanks for this detailed description ... >> >> You mentioned the secondary index in a standard column, would it be better >> to build several indizes? >> Is that even possible to build a index on for example 32 columns? >> >> The hint with the smaller boxes is very valuable! >> >> Mike >> >> 2010/7/26 Aaron Morton <aa...@thelastpickle.com> >> >>> For what it's worth... >>> >>> * Many smaller boxes with local disk storage are preferable to 2 with >>> huge NAS storage. >>> * To cache the hash values look at the KeysCached setting in the >>> storage-config >>> * There are some row size limits see >>> http://wiki.apache.org/cassandra/CassandraLimitations >>> * If you wanted to get 1000 blobs, rather then group them in a single row >>> using a super column consider building a secondary index in a standard >>> column. One CF for the blobs using your hash, one CF that uses whatever they >>> grouping key is with a col for every blobs hash value. Read from the index >>> first, then from the blobs themselves. >>> >>> Aaron >>> >>> >>> >>> On 24 Jul, 2010,at 06:51 PM, Michael Widmann <michael.widm...@gmail.com> >>> wrote: >>> >>> Hi Jonathan >>> >>> Thanks for your very valuable input on this. >>> >>> I maybe didn't enough explanation - so I'll try to clarify >>> >>> Here are some thoughts: >>> >>> >>> - binary data will not be indexed - only stored. >>> - The file name to the binary data (a hash) should be indexed for >>> search >>> - We could group the hashes in 62 "entry" points for search >>> retrieving -> i think suprcolumns (If I'm right in terms) (a-z,A_Z,0-9) >>> - the 64k Blobs meta data (which one belong to which file) should be >>> stored separate in cassandra >>> - For Hardware we rely on solaris / opensolaris with ZFS in the >>> backend >>> - Write operations occur much more often than reads >>> - Memory should hold the hash values mainly for fast search (not the >>> binary data) >>> - Read Operations (restore from cassandra) may be async - (get about >>> 1000 Blobs) - group them restore >>> >>> So my question is too: >>> >>> 2 or 3 Big boxes or 10 till 20 small boxes for storage... >>> Could we separate "caching" - hash values CFs cashed and indexed - binary >>> data CFs not ... >>> Writes happens around the clock - on not that tremor speed but constantly >>> >>> Would compaction of the database need really much disk space >>> Is it reliable on this size (more my fear) >>> >>> thx for thinking and answers... >>> >>> greetings >>> >>> Mike >>> >>> 2010/7/23 Jonathan Shook <jsh...@gmail.com> >>> >>>> There are two scaling factors to consider here. In general the worst >>>> case growth of operations in Cassandra is kept near to O(log2(N)). Any >>>> worse growth would be considered a design problem, or at least a high >>>> priority target for improvement. This is important for considering >>>> the load generated by very large column families, as binary search is >>>> used when the bloom filter doesn't exclude rows from a query. >>>> O(log2(N)) is basically the best achievable growth for this type of >>>> data, but the bloom filter improves on it in some cases by paying a >>>> lower cost every time. >>>> >>>> The other factor to be aware of is the reduction of binary search >>>> performance for datasets which can put disk seek times into high >>>> ranges. This is mostly a direct consideration for those installations >>>> which will be doing lots of cold reads (not cached data) against large >>>> sets. Disk seek times are much more limited (low) for adjacent or near >>>> tracks, and generally much higher when tracks are sufficiently far >>>> apart (as in a very large data set). This can compound with other >>>> factors when session times are longer, but that is to be expected with >>>> any system. Your storage system may have completely different >>>> characteristics depending on caching, etc. >>>> >>>> The read performance is still quite high relative to other systems for >>>> a similar data set size, but the drop-off in performance may be much >>>> worse than expected if you are wanting it to be linear. Again, this is >>>> not unique to Cassandra. It's just an important consideration when >>>> dealing with extremely large sets of data, when memory is not likely >>>> to be able to hold enough hot data for the specific application. >>>> >>>> As always, the real questions have lots more to do with your specific >>>> access patterns, storage system, etc I would look at the benchmarking >>>> info available on the lists as a good starting point. >>>> >>>> >>>> On Fri, Jul 23, 2010 at 11:51 AM, Michael Widmann >>>> <michael.widm...@gmail.com> wrote: >>>> > Hi >>>> > >>>> > We plan to use cassandra as a data storage on at least 2 nodes with >>>> RF=2 >>>> > for about 1 billion small files. >>>> > We do have about 48TB discspace behind for each node. >>>> > >>>> > now my question is - is this possible with cassandra - reliable - >>>> means >>>> > (every blob is stored on 2 jbods).. >>>> > >>>> > we may grow up to nearly 40TB or more on cassandra "storage" data ... >>>> > >>>> > anyone out did something similar? >>>> > >>>> > for retrieval of the blobs we are going to index them with an >>>> hashvalue >>>> > (means hashes are used to store the blob) ... >>>> > so we can search fast for the entry in the database and combine the >>>> blobs to >>>> > a normal file again ... >>>> > >>>> > thanks for answer >>>> > >>>> > michael >>>> > >>>> >>> >>> >>> >>> -- >>> bayoda.com - Professional Online Backup Solutions for Small and Medium >>> Sized Companies >>> >>> >> >> >> -- >> bayoda.com - Professional Online Backup Solutions for Small and Medium >> Sized Companies >> >> > > > -- > bayoda.com - Professional Online Backup Solutions for Small and Medium > Sized Companies > > > -- bayoda.com - Professional Online Backup Solutions for Small and Medium Sized Companies