On Fri, Sep 9, 2011 at 10:34 AM, Dean Hiller <d...@alvazan.com> wrote:
> I saw this quote in the pdf..... > > "For large indexes with common terms this too much data! Queries with > > 100k hits" > > 1. What would be considered large? In most of my experience, we have the > typical size of a RDBMS index but just have many many many more indexes as > the size of the index is just dependent on our largest partition based on > how we partition the data. > > 2. Does solandra have a lucene api underlying implementation? Our > preference is to use lucene's api and the underlying implementation could be > lucene, lucandra or solandra. > > 3. Why not just use a 8 bit or 16 bit key as the prefix instead of an sha > and the rest of the key is unique as the user would have to choose a unique > key to begin with? After all, the hash only had to be bigger than the max > number of nodes and 2^16 is quite large. > > thanks, > Dean > > > On Thu, Sep 8, 2011 at 4:10 PM, Edward Capriolo <edlinuxg...@gmail.com>wrote: > >> >> >> On Thu, Sep 8, 2011 at 5:12 PM, Dean Hiller <d...@alvazan.com> wrote: >> >>> I was wondering something. Since I can take OPP and I can create a layer >>> that for certain column families, I hash the key so that some column >>> families are just like RP but on top of OPP and some of my other column >>> families are then on OPP directly so I could use lucandra, why not make RP >>> deprecated and instead allow users to create OPP by column family or RP >>> where RP == doing the hash of the key on my behalf and prefixing my key with >>> that hashcode and stripping it back off when I read it in again. >>> >>> ie. why have RP when you could do RP per column family with the above >>> reasoning on top of OPP and have the best of both worlds????? >>> >>> ie. I think of having some column families random and then some column >>> famiiles ordered so I could range query or use lucandra on top of those >>> ones. >>> >>> thoughts? I was just curious. >>> thanks, >>> Dean >>> >>> >> You can use ByteOrderPartitioner and hash data yourself. However that >> makes every row key will be 128bits larger as the key has to be: >> >> md5+originalkey >> >> >> http://www.datastax.com/wp-content/uploads/2011/07/Scaling_Solr_with_Cassandra-CassandraSF2011.pdf >> >> Solandra now uses a 'modified' RandomPartitioner. >> > > I am not quite sure that using 8bit is good enough. It will shard your data across a small number of nodes effectively, however I can imagine the SStables will be "clumpy" because you reduce your sorting . It seems like a http://en.wikipedia.org/wiki/Birthday_problem to me. (I could be wrong)