Re: Data Modeling: How to keep track of arbitrarily inserted column names?

Edward Capriolo Fri, 05 Apr 2013 06:14:30 -0700

Since there are few column names what you can do is this. Make a reverse
index, low read repair chance, Be aggressive with compaction. It will be
many extra writes but that is ok.


Other option is turn on row cache and try read before write. It is a good
case for row cache because it is a very small data set.

On Thursday, April 4, 2013, Drew Kutcharian <[email protected]> wrote:
> I don't really need to answer "what rows contain column named X", so no
need for a reverse index here. All I want is a distinct set of all the
column names, so I can answer "what are all the available column names"
>
> On Apr 4, 2013, at 4:20 PM, Edward Capriolo <[email protected]> wrote:
>
> Your reverse index of "which rows contain a column named X" will have
very wide rows. You could look at cassandra's secondary indexing, or
possibly look at a solandra/solr approach. Another option is you can shift
the problem slightly, "which rows have column X that was added between time
y and time z". Remember with few distinct column names that reverse index
of column to row is going to be a very big list.
>
>
> On Thu, Apr 4, 2013 at 5:45 PM, Drew Kutcharian <[email protected]> wrote:
>>
>> Hi Edward,
>> I anticipate that the column names will be reused a lot. For example,
key1 will be in many rows. So I think the number of distinct column names
will be much much smaller than the number of rows. Is there a way to have a
separate CF that keeps track of the column names?
>> What I was thinking was to have a separate CF that I write only the
column name with a null value in there every time I write a key/value to
the main CF. In this case if that column name exist, then it will just be
overridden. Now if I wanted to get all the column names, then I can just
query that CF. Not sure if that's the best approach at high load (100k
inserts a second).
>> -- Drew
>>
>> On Apr 4, 2013, at 12:02 PM, Edward Capriolo <[email protected]>
wrote:
>>
>> You can not get only the column name (which you are calling a key) you
can use get_range_slice which returns all the columns. When you specify an
empty byte array (new byte[0]{}) as the start and finish you get back all
the columns. From there you can return only the columns to the user in a
format that you like.
>>
>>
>> On Thu, Apr 4, 2013 at 2:18 PM, Drew Kutcharian <[email protected]> wrote:
>>>
>>> Hey Guys,
>>>
>>> I'm working on a project and one of the requirements is to have a
schema free CF where end users can insert arbitrary key/value pairs per
row. What would be the best way to know what are all the "keys" that were
inserted (preferably w/o any locking). For example,
>>>
>>> Row1 => key1 -> XXX, key2 -> XXX
>>> Row2 => key1 -> XXX, key3 -> XXX
>>> Row3 => key4 -> XXX, key5 -> XXX
>>> Row4 => key2 -> XXX, key5 -> XXX
>>> …
>>>
>>> The query would be give me all the inserted keys and the response would
be {key1, key2, key3, key4, key5}
>>>
>>> Thanks,
>>>
>>> Drew
>>>
>>
>>
>
>
>

Re: Data Modeling: How to keep track of arbitrarily inserted column names?

Reply via email to