Re: Finding the intersection results of column sets of two rows

Aklin_81 Tue, 08 Feb 2011 08:16:54 -0800

Amongst two rows, where I need to find the common columns. I will not
have more than 200 columns(in 99% cases) for the 1st row. But the 2nd
row where I need to find these columns may have even around a million
valueless columns.


A point to note is:- These calculations are all done for **writing the
data to the database that has been collected from presentation layer**
& not while presentation of data.

I am using the results of such intersection to find the rows(that are
pointed by names of common columns) that I should write to. The
calculations are done after a Post is submitted by a user, in a
discussions forum. Actually this is used to find out the mutual
connections in a group & write to the rows pointed by common columns.
1st row represents the connection list of a user, which is not going
to be more than 100-250 columns for my case & 2nd row represents the
members of a group which may contain a million columns as I told.
I find the mutual connections in a group(by finding the common columns
in the above two rows) and then write to the rows of those users.

Cant I run a batch query to ask for all columns that I picked up from
1st row and want to ask in the 2nd row ??

Is there any better way ?

Asil


>
> On Feb 7, 2011, at 12:30 AM, Aklin_81 wrote:
>
>> Thanks Aaron & Shaun,
>>
>> ******************************
>> I think my question might have been unclear to some of you. So I would
>> again explain my problem(& solution which I thought of) for the sake
>> of clarity:-
>>
>> Consider I have 2 rows.  1st row contains 60-70 columns and 2nd row
>> contains like in hundreds of thousands columns. Both the columns sets
>> are all valueless. I need to just findout the **common column names**
>> in the two rows. **These two rows are known to me**. So what I plan to
>> do is, I just pick up all **columns (names)** of 1st row (60 -70
>> columns) and just ask for them in 2nd row, whatever column names I get
>> back is my result.
>> Would there be any problem with this solution ? This is how I am
>> expecting to get common column names.
>>
>> Please do not consider it as a JOIN case as it leads to unnecessary
>> confusions, I just need common column names from valueless columns in
>> the two rows.
>>
>> ********************************
>>
>> Aaron, actually the intersection data is very much context based. So
>> say if there are 10 million rows in CF A & 1 million in CF B, then
>> intersection data would be containing 10 million *1 million rows. This
>> would involve very huge & unaffordable amounts of denormalization.
>> And finding columns in client would require pulling unnecessary
>> columns like pulling 100,000 columns from a row of which only 60-70
>> are required .
>>
>> Shaun, I hope my above clarification has clarified things a bit. Yes,
>> the rows, of which I need to find common columns are known to me.
>>
>>
>> Thank you all,
>> Asil
>>
>>
>> On Mon, Feb 7, 2011 at 3:53 AM, Shaun Cutts <sh...@cuttshome.net> wrote:
>>> In theory, you should be able to do joins by creating an extra column in 
>>> one column family, holding the "foreign key" of the matching row in the 
>>> other family.
>>>
>>> This assumes that the info you are joining on is available in both CFs (is 
>>> not some sort of functional transformation).
>>>
>>> I have just found that the implementation for secondary indexes is not yet 
>>> very close to optimal for more complex "joins" involving multiple indexes, 
>>> I'm not sure if that affects you as you didn't say what you are joining on.
>>>
>>> -- Shaun
>>>
>>>
>>> On Feb 6, 2011, at 4:22 PM, Aaron Morton wrote:
>>>
>>>> Is it possible for you to dernormalise and write all the intersection 
>>>> values? Will depend on how many I guess.
>>>>
>>>> The other alternative is to pull back more data that you need and the 
>>>> intersection in code in the client.
>>>>
>>>>
>>>> Hope that helps.
>>>> Aaron
>>>> On 7/02/2011, at 7:11 AM, Aklin_81 <asdk...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> @buddhasystem : yes that's well known solution. But obviously when
>>>>> mysql couldnt satisfy my needs, I am here. My question is in context
>>>>> of Cassandra, if it possible to achieve intersection result set of
>>>>> columns in two rows, by the way I spoke about.
>>>>>
>>>>> @Edward: yes that I know but how does that fit here for obtaining the
>>>>> common columns among two rows.
>>>>>
>>>>> Thanks for your comments..
>>>>>
>>>>> -Asil
>>>>>
>>>>>
>>>>> On Sun, Feb 6, 2011 at 9:55 PM, Edward Capriolo <edlinuxg...@gmail.com> 
>>>>> wrote:
>>>>>> On Sun, Feb 6, 2011 at 10:15 AM, buddhasystem <potek...@bnl.gov> wrote:
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> If the amount of data is _that_ small, you'll have a much easier life 
>>>>>>> with
>>>>>>> MySQL, which supports the "join" procedure -- because that's exactly 
>>>>>>> what
>>>>>>> you want to achieve.
>>>>>>>
>>>>>>>
>>>>>>> asil klin wrote:
>>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I want to procure the intersection of columns set of two rows (from 2
>>>>>>>> different column families).
>>>>>>>>
>>>>>>>> To achieve the intersection results, Can I, first retrieve all
>>>>>>>> columns(around 300) from first row and just query by those column
>>>>>>>> names in the second row(which contains maximum 100 000 columns) ?
>>>>>>>>
>>>>>>>> I am using the results during the write time & not before presentation
>>>>>>>> to the user, so latency wont be much concern while writing.
>>>>>>>>
>>>>>>>> Is it the proper way to procure intersection results of two rows ?
>>>>>>>>
>>>>>>>> Would love to hear your comments..
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Asil
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> View this message in context: 
>>>>>>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Finding-the-intersection-results-of-column-sets-of-two-rows-tp5997248p5997743.html
>>>>>>> Sent from the cassandra-u...@incubator.apache.org mailing list archive 
>>>>>>> at Nabble.com.
>>>>>>>
>>>>>>
>>>>>> You can use multi-get when fetching lists of already know keys
>>>>>> optimize your round rip time.
>>>>>>
>>>
>>>
>
>

Re: Finding the intersection results of column sets of two rows

Reply via email to