[R] randomly subsample rows from subsets

2012-03-18 Thread aly
Hi,

I have a list of 1787 fish from 948 full-sib families and their lengths. My
table looks like this,

fishfam length
1   a   71.46
2   a   71.06
3   a   62.94
4   b   79.46
5   b   52.38
6   b   56.78
7   b   92.08
8   c   96.86
9   d   98.09
10  d   17.23
11  d   98.35
12  d   82.43
13  e   83.85
14  e   33.92
15  e   23.16
16  e   31.39
17  e   57.08
18  e   27.05
19  f   62.38
20  f   83.21
21  f   18.72
22  f   84.32
23  g   15.99
24  h   40.33
25  h   92.73
26  h   59.08
27  i   29.05

I want to randomly select 2 fish from each family that has 2 or more
individuals and exclude those families that have just one fish. How can I do
that? Thanks,



--
View this message in context: 
http://r.789695.n4.nabble.com/randomly-subsample-rows-from-subsets-tp4483477p4483477.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reshape from long to wide

2012-03-19 Thread aly
Hi,

I'm a total beginner in R and this question is probably very simple but I've
spent hours reading about it and can't find the answer. I'm trying to
reshape a data table from long to wide format. I've tried reshape() and
cast() but I get error messages every time and I can't figure why. In my
data, I have the length of two fish from each family. My data table (called
fish) looks like this:

family  length
14  18
14  7
15  7
15  21
17  50
17  21
18  36
18  21
20  36
20  42
24  56
24  42
25  43
25  56
27  15
27  42
28  7
28  42
29  56
29  49

I want it to look like this:

family kid1 kid2
14  18  7
15  7   21
17  50  21
18  36  21
28  36  42
24  56  42
25  43  56
27  15  42
28  7   42
29  56  49

I've tried:

>cast( fish, fam~length)

and got the error message:

Using length as value column.  Use the value argument to cast to override
this choice
Error in `[.data.frame`(data, , variables, drop = FALSE) : 
  undefined columns selected

Then I rename the columns:

>myvars<-c("fam","length")
>fish<-fish[myvars]

and try the cast() again with no luck (same error)

By using reshape() I don't get the results I want:

>reshape(rdm1, timevar="fam", idvar=c("length"), direction="wide")
> head(first)
   length
14.2014
14.19 7
15.2521
17.3050
18.3236
20.3642

Can someone help with this? Thanks a lot!




--
View this message in context: 
http://r.789695.n4.nabble.com/Reshape-from-long-to-wide-tp4486875p4486875.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reshape from long to wide

2012-03-21 Thread aly

Thanks a lot,

I tried one of the ways you guys showed me and it totally work. Just for
fun, I tried all the others and with some modifications here and there they
work fine too. It was time consuming but definitely worth as a good learning
experience. Thanks again

--
View this message in context: 
http://r.789695.n4.nabble.com/Reshape-from-long-to-wide-tp4486875p4494055.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Randomly select elements based on criteria

2012-03-22 Thread aly
Hi,

I want to randomly pick 2 fish born the same day but I need those
individuals to be from different families. My table includes 1787 fish
distributed in 948 families. An example of a subset of fish born in one
specific day would look like:

>fish

fam   born  spawn
25  46  43
25  46  56
26  46  50
43  46  43
131 46  43
133 46  64
136 46  43
136 46  42
136 46  50
136 46  85
137 46  64
142 46  85
144 46  56
144 46  64
144 46  78
144 46  85
145 46  64
146 46  64
147 46  64
148 46  78
149 46  43
149 46  98
149 46  85
150 46  64
150 46  78
150 46  85
151 46  43
152 46  78
153 46  43
156 46  43
157 46  91
158 46  42

Where "fam" is the family that fish belongs to, "born" is the day it was
born (in this case day 46), and "spawn" is the day it was spawned. I want to
know if there is a correlation in the day of spawn between fish born the
same day but that are unrelated (not from the same family). 
I want to randomly select two rows but they have to be from different fam.
The fist part (random selection), I got it by doing:

> ran <- sample(nrow (fish), size=2); ran

[1]  9 12

> newfish <- fish [ran,];  newfish

fam born spawn
103 136   4650 
106 142   4685 

In this example I got two individuals from different families (good) but I
will repeat the process many times and there's a chance that I get two fish
from the same family (bad):

> ran<-sample (nrow(fish), size=2);ran

[1] 26 25

> newfish <-fish [ran,]; newfish

fam born spawn
127 150   4685
126 150   4678

I need a conditional but I have no clue on how to include it in the code.
Thanks in advance for any suggestions,

Aly

--
View this message in context: 
http://r.789695.n4.nabble.com/Randomly-select-elements-based-on-criteria-tp4496483p4496483.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Logistic Regression Fitting with EM-Algorithm

2011-01-03 Thread Robin Aly

Hi all,

is there any package which can do an EM algorithm fitting of
logistic regression coefficients given only the explanatory
variables? I tried to realize this using the Design package,
but I didn't find a way.

Thanks a lot & Kind regards
Robin Aly

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression Fitting with EM-Algorithm

2011-01-10 Thread Robin Aly

Dear Ted,

sorry for being unclear. Let me try again.

I indeed have no knowledge about the value of the response variable for 
any object.

Instead, I have a data frames of explanatory variables for
each object. For example,

x1   x2   x3
1   4.409974 2.348745 1.9845313
2   3.809249 2.281260 1.9170466
3   4.229544 2.610347 0.9127431
4   4.259644 1.866025 1.5982859
5   4.001306 2.225069 1.2551570
...

, and I want to model a regression model of the form y ~ x1 + x2 + x3.

From prior information I know that all coefficients are approximately 
Gaussian distributed around one and the same for the intercept around 
-10. Now I think there must be a package which estimates the 
coefficients more precisely by fitting the logistic regression function 
to the data without knowledge of the response variable (similar to 
fitting Gaussians in a mixture model where the class labels are unknown).


I looked at the flexmix package but this seems to "only" find 
dependencies in the data assuming the presence of some training data.
I also found some evidence In Magder1997 (see below) that such an 
algorithm exists, however from the documented math I can't apply the 
method to my problem.


Thanks in advance,
Best Regards
Robin

Magder, L. S. & Hughes, J. P. Logistic Regression When the Outcome Is 
Measured with Uncertainty American Journal of Epidemiology, 1997, 146, 
195-203





On 01/04/2011 12:36 AM, (Ted Harding) wrote:

On 03-Jan-11 14:02:21, Robin Aly wrote:

Hi all,
is there any package which can do an EM algorithm fitting of
logistic regression coefficients given only the explanatory
variables? I tried to realize this using the Design package,
but I didn't find a way.

Thanks a lot&  Kind regards
Robin Aly

As written, this is a strange question! You imply that you
do not have data on the response (0/1) variable at all,
only on the explanatory variables. In that case there is
no possible estimate, because that would require data on
at least some of the values of the response variable.

I think you should explain more clearly and explicitly what
the information is that you have for all the variables.

Ted.


E-Mail: (Ted Harding)
Fax-to-email: +44 (0)870 094 0861
Date: 03-Jan-11   Time: 23:36:56
-- XFMail --


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.