Re: Finding the intersection results of column sets of two rows

Aaron Morton Tue, 08 Feb 2011 12:42:33 -0800

Makes sense, use a get_slice() against the second row and pass in the column 
names. Should e fine.


If you run into performance issues look at slice_buffer_size and 
column_index_size in the config.

Aaron


On 9/02/2011, at 5:16 AM, Aklin_81 <asdk...@gmail.com> wrote:

> Amongst two rows, where I need to find the common columns. I will not
> have more than 200 columns(in 99% cases) for the 1st row. But the 2nd
> row where I need to find these columns may have even around a million
> valueless columns.
> 
> A point to note is:- These calculations are all done for **writing the
> data to the database that has been collected from presentation layer**
> & not while presentation of data.
> 
> I am using the results of such intersection to find the rows(that are
> pointed by names of common columns) that I should write to. The
> calculations are done after a Post is submitted by a user, in a
> discussions forum. Actually this is used to find out the mutual
> connections in a group & write to the rows pointed by common columns.
> 1st row represents the connection list of a user, which is not going
> to be more than 100-250 columns for my case & 2nd row represents the
> members of a group which may contain a million columns as I told.
> I find the mutual connections in a group(by finding the common columns
> in the above two rows) and then write to the rows of those users.
> 
> Cant I run a batch query to ask for all columns that I picked up from
> 1st row and want to ask in the 2nd row ??
> 
> Is there any better way ?
> 
> Asil
> 
> 
>> 
>> On Feb 7, 2011, at 12:30 AM, Aklin_81 wrote:
>> 
>>> Thanks Aaron & Shaun,
>>> 
>>> ******************************
>>> I think my question might have been unclear to some of you. So I would
>>> again explain my problem(& solution which I thought of) for the sake
>>> of clarity:-
>>> 
>>> Consider I have 2 rows.  1st row contains 60-70 columns and 2nd row
>>> contains like in hundreds of thousands columns. Both the columns sets
>>> are all valueless. I need to just findout the **common column names**
>>> in the two rows. **These two rows are known to me**. So what I plan to
>>> do is, I just pick up all **columns (names)** of 1st row (60 -70
>>> columns) and just ask for them in 2nd row, whatever column names I get
>>> back is my result.
>>> Would there be any problem with this solution ? This is how I am
>>> expecting to get common column names.
>>> 
>>> Please do not consider it as a JOIN case as it leads to unnecessary
>>> confusions, I just need common column names from valueless columns in
>>> the two rows.
>>> 
>>> ********************************
>>> 
>>> Aaron, actually the intersection data is very much context based. So
>>> say if there are 10 million rows in CF A & 1 million in CF B, then
>>> intersection data would be containing 10 million *1 million rows. This
>>> would involve very huge & unaffordable amounts of denormalization.
>>> And finding columns in client would require pulling unnecessary
>>> columns like pulling 100,000 columns from a row of which only 60-70
>>> are required .
>>> 
>>> Shaun, I hope my above clarification has clarified things a bit. Yes,
>>> the rows, of which I need to find common columns are known to me.
>>> 
>>> 
>>> Thank you all,
>>> Asil
>>> 
>>> 
>>> On Mon, Feb 7, 2011 at 3:53 AM, Shaun Cutts <sh...@cuttshome.net> wrote:
>>>> In theory, you should be able to do joins by creating an extra column in 
>>>> one column family, holding the "foreign key" of the matching row in the 
>>>> other family.
>>>> 
>>>> This assumes that the info you are joining on is available in both CFs (is 
>>>> not some sort of functional transformation).
>>>> 
>>>> I have just found that the implementation for secondary indexes is not yet 
>>>> very close to optimal for more complex "joins" involving multiple indexes, 
>>>> I'm not sure if that affects you as you didn't say what you are joining on.
>>>> 
>>>> -- Shaun
>>>> 
>>>> 
>>>> On Feb 6, 2011, at 4:22 PM, Aaron Morton wrote:
>>>> 
>>>>> Is it possible for you to dernormalise and write all the intersection 
>>>>> values? Will depend on how many I guess.
>>>>> 
>>>>> The other alternative is to pull back more data that you need and the 
>>>>> intersection in code in the client.
>>>>> 
>>>>> 
>>>>> Hope that helps.
>>>>> Aaron
>>>>> On 7/02/2011, at 7:11 AM, Aklin_81 <asdk...@gmail.com> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> @buddhasystem : yes that's well known solution. But obviously when
>>>>>> mysql couldnt satisfy my needs, I am here. My question is in context
>>>>>> of Cassandra, if it possible to achieve intersection result set of
>>>>>> columns in two rows, by the way I spoke about.
>>>>>> 
>>>>>> @Edward: yes that I know but how does that fit here for obtaining the
>>>>>> common columns among two rows.
>>>>>> 
>>>>>> Thanks for your comments..
>>>>>> 
>>>>>> -Asil
>>>>>> 
>>>>>> 
>>>>>> On Sun, Feb 6, 2011 at 9:55 PM, Edward Capriolo <edlinuxg...@gmail.com> 
>>>>>> wrote:
>>>>>>> On Sun, Feb 6, 2011 at 10:15 AM, buddhasystem <potek...@bnl.gov> wrote:
>>>>>>>> 
>>>>>>>> Hello,
>>>>>>>> 
>>>>>>>> If the amount of data is _that_ small, you'll have a much easier life 
>>>>>>>> with
>>>>>>>> MySQL, which supports the "join" procedure -- because that's exactly 
>>>>>>>> what
>>>>>>>> you want to achieve.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> asil klin wrote:
>>>>>>>>> 
>>>>>>>>> Hi all,
>>>>>>>>> 
>>>>>>>>> I want to procure the intersection of columns set of two rows (from 2
>>>>>>>>> different column families).
>>>>>>>>> 
>>>>>>>>> To achieve the intersection results, Can I, first retrieve all
>>>>>>>>> columns(around 300) from first row and just query by those column
>>>>>>>>> names in the second row(which contains maximum 100 000 columns) ?
>>>>>>>>> 
>>>>>>>>> I am using the results during the write time & not before presentation
>>>>>>>>> to the user, so latency wont be much concern while writing.
>>>>>>>>> 
>>>>>>>>> Is it the proper way to procure intersection results of two rows ?
>>>>>>>>> 
>>>>>>>>> Would love to hear your comments..
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> ---------
>>>>>>>>> 
>>>>>>>>> Regards,
>>>>>>>>> Asil
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> View this message in context: 
>>>>>>>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Finding-the-intersection-results-of-column-sets-of-two-rows-tp5997248p5997743.html
>>>>>>>> Sent from the cassandra-u...@incubator.apache.org mailing list archive 
>>>>>>>> at Nabble.com.
>>>>>>>> 
>>>>>>> 
>>>>>>> You can use multi-get when fetching lists of already know keys
>>>>>>> optimize your round rip time.
>>>>>>> 
>>>> 
>>>> 
>> 
>>

Re: Finding the intersection results of column sets of two rows

Reply via email to