[Rd] resampling from string when it runs across multiple lines

Suraaga Kulkarni Mon, 24 Mar 2008 10:41:22 -0700

Hi,

I need to resample from a long string, which is written in many lines with
carriage-return marks at the end of each line.  Perhaps because the data
looks like a matrix, using the code: sample(data, 25, replace=T) gives me 25
columns of characters from the data because it is resampling whole columns.
What I would like it to do is to treat the data as a vector that has just
been spread across many lines, and pick single characters from random
positions in randomly chosen lines.


I am reproducing a sample dataset, the command and the output here:

> y
     X..1. X..2. X..3. X..4. X..5. X..6. X..7. X..8. X..9. X..10.
[1,]     A     C     G     T     T     G     C     A     G      C
[2,]     A     C     G     F     F     F     F     F      F      G
[3,]     A     C     G    S     S     S     S     S     G      A
[4,]     A     C     G     T     T     G     C     A     G      G
[5,]     A     B     B     B     B     B     B     A     G      T

> sample(y, 20, replace=T)
     X..9. X..4. X..2. X..7. X..9..1 X..3. X..3..1 X..9..2 X..9..3 X..4..1
X..3..2 X..8. X..9..4 X..3..3 X..6. X..7..1
[1,]     G     T     C     C       G     G       G       G       G
T       G     A       G       G     G       C
[2,]     F      F    C     F        F     G       G       F       F
F       G     F       F       G      F       F
[3,]     G     S    C     S       G     G       G       G       G
S       G     S       G       G     S       S
[4,]     G     T     C     C       G     G       G       G       G
T       G     A       G       G     G       C
[5,]     G     B     B     B       G     B       B       G       G
B       B     A       G       B     B       B

     X..6..1 X..3..4 X..7..2 X..10.
[1,]       G       G       C      C
[2,]       F       G        F      G
[3,]       S       G       S      A
[4,]       G       G       C      G
[5,]       B       B       B       T

I wanted to try the bootstrap approach (since that's what I am doing -
resampling with replacement) but that requires a "statistic" and I don't
know what sense that makes for character data.

Any help will be greatly appreciated.

Thanks,

S.

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] resampling from string when it runs across multiple lines

Reply via email to