I don't really need to answer "what rows contain column named X", so no need 
for a reverse index here. All I want is a distinct set of all the column names, 
so I can answer "what are all the available column names"


On Apr 4, 2013, at 4:20 PM, Edward Capriolo <edlinuxg...@gmail.com> wrote:

> Your reverse index of "which rows contain a column named X" will have very 
> wide rows. You could look at cassandra's secondary indexing, or possibly look 
> at a solandra/solr approach. Another option is you can shift the problem 
> slightly, "which rows have column X that was added between time y and time 
> z". Remember with few distinct column names that reverse index of column to 
> row is going to be a very big list.
> 
> 
> On Thu, Apr 4, 2013 at 5:45 PM, Drew Kutcharian <d...@venarc.com> wrote:
> Hi Edward,
> 
> I anticipate that the column names will be reused a lot. For example, key1 
> will be in many rows. So I think the number of distinct column names will be 
> much much smaller than the number of rows. Is there a way to have a separate 
> CF that keeps track of the column names? 
> 
> What I was thinking was to have a separate CF that I write only the column 
> name with a null value in there every time I write a key/value to the main 
> CF. In this case if that column name exist, then it will just be overridden. 
> Now if I wanted to get all the column names, then I can just query that CF. 
> Not sure if that's the best approach at high load (100k inserts a second).
> 
> -- Drew
> 
> 
> On Apr 4, 2013, at 12:02 PM, Edward Capriolo <edlinuxg...@gmail.com> wrote:
> 
>> You can not get only the column name (which you are calling a key) you can 
>> use get_range_slice which returns all the columns. When you specify an empty 
>> byte array (new byte[0]{}) as the start and finish you get back all the 
>> columns. From there you can return only the columns to the user in a format 
>> that you like.
>> 
>> 
>> On Thu, Apr 4, 2013 at 2:18 PM, Drew Kutcharian <d...@venarc.com> wrote:
>> Hey Guys,
>> 
>> I'm working on a project and one of the requirements is to have a schema 
>> free CF where end users can insert arbitrary key/value pairs per row. What 
>> would be the best way to know what are all the "keys" that were inserted 
>> (preferably w/o any locking). For example,
>> 
>> Row1 => key1 -> XXX, key2 -> XXX
>> Row2 => key1 -> XXX, key3 -> XXX
>> Row3 => key4 -> XXX, key5 -> XXX
>> Row4 => key2 -> XXX, key5 -> XXX
>> …
>> 
>> The query would be give me all the inserted keys and the response would be 
>> {key1, key2, key3, key4, key5}
>> 
>> Thanks,
>> 
>> Drew
>> 
>> 
> 
> 

Reply via email to