Re: Data Modeling: How to keep track of arbitrarily inserted column names?

Drew Kutcharian Fri, 05 Apr 2013 11:38:23 -0700

One thing I can do is to have a client-side cache of the keys to reduce the 
number of updates.



On Apr 5, 2013, at 6:14 AM, Edward Capriolo <edlinuxg...@gmail.com> wrote:

> Since there are few column names what you can do is this. Make a reverse 
> index, low read repair chance, Be aggressive with compaction. It will be many 
> extra writes but that is ok. 
> 
> Other option is turn on row cache and try read before write. It is a good 
> case for row cache because it is a very small data set.
> 
> On Thursday, April 4, 2013, Drew Kutcharian <d...@venarc.com> wrote:
> > I don't really need to answer "what rows contain column named X", so no 
> > need for a reverse index here. All I want is a distinct set of all the 
> > column names, so I can answer "what are all the available column names"
> >
> > On Apr 4, 2013, at 4:20 PM, Edward Capriolo <edlinuxg...@gmail.com> wrote:
> >
> > Your reverse index of "which rows contain a column named X" will have very 
> > wide rows. You could look at cassandra's secondary indexing, or possibly 
> > look at a solandra/solr approach. Another option is you can shift the 
> > problem slightly, "which rows have column X that was added between time y 
> > and time z". Remember with few distinct column names that reverse index of 
> > column to row is going to be a very big list.
> >
> >
> > On Thu, Apr 4, 2013 at 5:45 PM, Drew Kutcharian <d...@venarc.com> wrote:
> >>
> >> Hi Edward,
> >> I anticipate that the column names will be reused a lot. For example, key1 
> >> will be in many rows. So I think the number of distinct column names will 
> >> be much much smaller than the number of rows. Is there a way to have a 
> >> separate CF that keeps track of the column names? 
> >> What I was thinking was to have a separate CF that I write only the column 
> >> name with a null value in there every time I write a key/value to the main 
> >> CF. In this case if that column name exist, then it will just be 
> >> overridden. Now if I wanted to get all the column names, then I can just 
> >> query that CF. Not sure if that's the best approach at high load (100k 
> >> inserts a second).
> >> -- Drew
> >>
> >> On Apr 4, 2013, at 12:02 PM, Edward Capriolo <edlinuxg...@gmail.com> wrote:
> >>
> >> You can not get only the column name (which you are calling a key) you can 
> >> use get_range_slice which returns all the columns. When you specify an 
> >> empty byte array (new byte[0]{}) as the start and finish you get back all 
> >> the columns. From there you can return only the columns to the user in a 
> >> format that you like.
> >>
> >>
> >> On Thu, Apr 4, 2013 at 2:18 PM, Drew Kutcharian <d...@venarc.com> wrote:
> >>>
> >>> Hey Guys,
> >>>
> >>> I'm working on a project and one of the requirements is to have a schema 
> >>> free CF where end users can insert arbitrary key/value pairs per row. 
> >>> What would be the best way to know what are all the "keys" that were 
> >>> inserted (preferably w/o any locking). For example,
> >>>
> >>> Row1 => key1 -> XXX, key2 -> XXX
> >>> Row2 => key1 -> XXX, key3 -> XXX
> >>> Row3 => key4 -> XXX, key5 -> XXX
> >>> Row4 => key2 -> XXX, key5 -> XXX
> >>> …
> >>>
> >>> The query would be give me all the inserted keys and the response would 
> >>> be {key1, key2, key3, key4, key5}
> >>>
> >>> Thanks,
> >>>
> >>> Drew
> >>>
> >>
> >>
> >
> >
> >

Re: Data Modeling: How to keep track of arbitrarily inserted column names?

Reply via email to