Re: [R] simple matching with R

Birgit Lemcke Mon, 01 Oct 2007 05:46:30 -0700

Hello Jeff,

thanks a lot for your help. It seems to work well now.


Greetings

Birgit

Am 28.09.2007 um 20:33 schrieb Jeffrey Robert Spies:

> Hi Birgit,
>
> I've updated the recipe here, including a change to the  
> dissimilarity function (making it more efficient):
>
> http://www.r-cookbook.com/node/40
>
> You'll notice the change is:
>
> dissimilar <- function(tRow){
>       (sum(tRow==FALSE, na.rm=TRUE) + sum(is.na(tRow)))/length(tRow)
> }
>
> It's actually about 40% faster to use sums instead of sub-setting  
> the lists and using lengths (but the speed increase will only be  
> noticeable on very, very, very long lists).
>
> --Jeff.
>
> On Sep 28, 2007, at 12:47 PM, Birgit Lemcke wrote:
>
>> Thanks a lot for both solutions of my problem.
>>
>> I tried it immediately and I understood how they are working.
>>
>> The next problem for me is now to deal with the NAs. I thought  
>> perhaps it is possible to exclude the variable from the row  
>> comparison if in one of the rows is an NA?
>> Furthermore it would be useful than to divide the resulting number  
>> by the number of used variables for the comparison to get back a  
>> number between 0 and 1.
>>
>> Unfortunately I am able to understand what happens if somebody  
>> gives me the code but I am not able at the moment to write it by  
>> myself. I hope this will change by and by.
>>
>> So I would be very pleased if you could help me once again.
>>
>> Greetings
>>
>> Birgit
>>
>>
>> Am 28.09.2007 um 18:25 schrieb Jeffrey Robert Spies:
>>
>>> Not sure how you want to handle the NAs, but you could try the
>>> following:
>>>
>>> #start
>>> MalVar29_37 <- read.table(textConnection("V1 V2 V3 V4 V5 V6 V7 V8 V9
>>> 0  0  0  0  0  1  0  0  0
>>> 0  0  0  0  0  1  0  0  0
>>> 0  0  0  0  0  1  0  0  0
>>> NA NA NA NA NA NA NA NA NA
>>> 0  1  0  0  0  1  0  0  0"), header=TRUE)
>>>
>>> FemVar29_37 <- read.table(textConnection("     V1 V2 V3 V4 V5 V6 V7
>>> V8 V9
>>> 1  1  0  0  0  0  0  0  0
>>> 0  1  0  0  1  1  0  0  0
>>> 1  0  0  1  0  0  0  0  0
>>> 0  1  0  0  1  0  0  0  0
>>> 0  1  0  0  0  0  0  0  0"), header=TRUE)
>>>
>>> comparison <- MalVar29_37 == FemVar29_37
>>>
>>> dissimilar <- function(tRow){
>>>     length(tRow[tRow==FALSE])
>>> }
>>>
>>> dissimilarity <- apply(comparison, c(1), dissimilar)
>>> dissimilarity
>>> # finish
>>>
>>> Variable comparison is an entry by entry comparison, resulting in
>>> values of TRUE or FALSE.  I've defined a function dissimilar as the
>>> number of FALSEs in a given object (tRow).  Variable  
>>> dissimilarity is
>>> then the application of this dissimilar function for each row of
>>> comparison.  In this example, 0 means all of the entries in a row
>>> matche, 9 means none of them matched.  You can see the solution here
>>> in recipe form: http://www.r-cookbook.com/node/40
>>>
>>> Hope this helps,
>>>
>>> Jeff.
>>>
>>> On Sep 28, 2007, at 11:13 AM, Birgit Lemcke wrote:
>>>
>>>> Hello!
>>>>
>>>> I am R beginner and I have a question obout a simple matching.
>>>>
>>>> I have to datasets that i read in with:
>>>>
>>>> MalVar29_37<-read.table("MalVar29_37.csv", sep = ";")
>>>> FemVar29_37<-read.table("FemVar29_37.csv", sep = ";")
>>>>
>>>> They look like this and show binary variables:
>>>>
>>>>      V1 V2 V3 V4 V5 V6 V7 V8 V9
>>>> 1    0  0  0  0  0  1  0  0  0
>>>> 2    0  0  0  0  0  1  0  0  0
>>>> 3    0  0  0  0  0  1  0  0  0
>>>> 4   NA NA NA NA NA NA NA NA NA
>>>> 5    0  1  0  0  0  1  0  0  0
>>>>
>>>>      V1 V2 V3 V4 V5 V6 V7 V8 V9
>>>> 1    1  1  0  0  0  0  0  0  0
>>>> 2    0  1  0  0  1  1  0  0  0
>>>> 3    1  0  0  1  0  0  0  0  0
>>>> 4    0  1  0  0  1  0  0  0  0
>>>> 5    0  1  0  0  0  0  0  0  0
>>>>
>>>> each with 348 rows.
>>>>
>>>> I would like to perform a simple matching but only row 1  
>>>> compared to
>>>> row1, row 2 compared to row 2 (paired).......giving back a  
>>>> number as
>>>> dissimilarity for each comparison.
>>>>
>>>> How can i do that?
>>>>
>>>> Thanks in advance
>>>>
>>>> Birgit
>>>>
>>>>
>>>>
>>>>
>>>> Birgit Lemcke
>>>> Institut für Systematische Botanik
>>>> Zollikerstrasse 107
>>>> CH-8008 Zürich
>>>> Switzerland
>>>> Ph: +41 (0)44 634 8351
>>>> [EMAIL PROTECTED]
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>    [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-
>>>> guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> ______________________________________________
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting- 
>>> guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> Birgit Lemcke
>> Institut für Systematische Botanik
>> Zollikerstrasse 107
>> CH-8008 Zürich
>> Switzerland
>> Ph: +41 (0)44 634 8351
>> [EMAIL PROTECTED]
>>
>>
>>
>>
>>
>

Birgit Lemcke
Institut für Systematische Botanik
Zollikerstrasse 107
CH-8008 Zürich
Switzerland
Ph: +41 (0)44 634 8351
[EMAIL PROTECTED]






        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] simple matching with R

Reply via email to