Re: [R] Unique matching of two sets of multidimensional data

Adams, Jean Tue, 25 Jun 2013 06:36:37 -0700

You could give 1-nearest neighbor classification a try.  For example,

a <- data.frame(person=1:10, ht=rnorm(10, mean=5, sd=1),
     wt=rnorm(10, mean=180, sd=30), bp=rnorm(10, mean=120, sd=10))
meas.err <- data.frame(ht=rnorm(10, sd=0.1),
     wt=rnorm(10, sd=3), bp=rnorm(10, sd=1))
b <- (a[, -1] + meas.err)[sample(10), ]


library(class)
b$person <- knn1(a[, -1], b, a$person)



On Mon, Jun 24, 2013 at 8:47 PM, Leif Kirschenbaum <
kirschenbaum.l...@ssd.loral.com> wrote:

> Dear list,
> I've searched the archives and tried some code, however would appreciate
> some input - even a pointer in the direction of the correct function to use.
>
> Given N samples each of which is measured for characteristics x1, x2,
> x3,... (m  6) where each characteristic is a roughly normally distributed
> numeric, but with different center and scale.
> Then the N samples are measured again for characteristics again as x1, x2,
> x3,..., however the identity of the samples is unknown.
>
> Is there a function which will assign the unique identities from the first
> measurement to the second measurement?
>
> I've tried scaling by using the pooled variance of each x1 (i.e. 2N values
> to estimate the variance of the measure of characteristic x1, the
> characteristic x2, etc.) to construct the normalized distance from one
> sample's second measurement x1, x2, x3... to each of the first measurements
> and then pick the minimum distance to assign an identity to the second
> measurement.  Then loop over all the second measurements to find the first
> measurement "closest" to it.
> However I result with one sample ID from the first measurement being
> assigned to multiple second measurements.
>
> How could I minimize the matching between the second measurements and the
> first with unique sample ID assignment?
>
>
> Example:
> measure height, weight, and blood pressure of 100 people with their names
> recorded (scale and ruler both have some random unknown error)
> measure the height, weight, and blood pressure of those 100 people again,
> but you forgot to write down their names. (assume that the scale and ruler
> errors have not changed since the first measurement)
>
> How to assign the second set of measurements to the first?
>
>
> Leif Kirschenbaum, Ph.D., PMP
> Principal Reliability Engineer
> Parts Engineering
> Design Reliability
> Product Reliability
> SSL
> 3825 Fabian Way M/S H-21
> Palo Alto, CA 94303
> Tel: +1-650-852-6580
> Facsimile: +1-650-852-7832
> www.ssloral.com
>
> This e-mail, and any attachments, are intended solely for the use of the
> intended recipient(s) and may contain
> legally privileged, proprietary and/or confidential information. Any use,
> disclosure, dissemination, distribution or
> copying of this e-mail and any attachments for any purposes that have not
> been specifically authorized by the
> sender is strictly prohibited. If you are not the intended recipient,
> please immediately notify the sender by reply
> e-mail and permanently delete all copies and attachments.
> The entire content of this e-mail is for "information purposes" only and
> should not be relied upon by the recipient
> in any way unless otherwise confirmed in writing by way of letter or
> facsimile.
>
>
> ________________________________
> This message (including any attachments) may contain con...{{dropped:7}}
>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Unique matching of two sets of multidimensional data

Reply via email to