Groot, Philip de wrote: > Hello all, > > I am not sure whether it actually is a bug, but it is not the behaviour I > would expect. Please consider this: > > >> Sibships >> > [1] Patient_2400 Patient_2400 Patient_345 Patient_345 Patient_8901 > [6] Patient_8901 Patient_4008 Patient_4008 Patient_7991 Patient_7991 > [11] Patient_8353 Patient_8353 Patient_1212 Patient_1212 Patient_2168 > [16] Patient_2168 Patient_2760 Patient_2760 Patient_4726 Patient_4726 > [21] Patient_6699 Patient_6699 Patient_7641 Patient_7641 Patient_8263 > [26] Patient_8263 Patient_1389 Patient_1389 Patient_1618 Patient_1618 > [31] Patient_2410 Patient_2410 Patient_2612 Patient_2612 Patient_2721 > [36] Patient_2721 Patient_5053 Patient_5053 Patient_8458 Patient_8458 > [41] Patient_211 Patient_211 Patient_9004 Patient_9004 Patient_3423 > [46] Patient_3423 Patient_7413 Patient_7413 Patient_7815 Patient_7815 > [51] Patient_9232 Patient_9232 Patient_2267 Patient_2267 Patient_468 > [56] Patient_468 > 28 Levels: Patient_1212 Patient_1389 Patient_1618 Patient_211 ... Patient_9232 > > >> Comparison_Indices >> > [1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE > [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE > [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > > >> Sibships[Comparison_Indices] >> > [1] Patient_2400 Patient_2400 Patient_345 Patient_345 Patient_8901 > [6] Patient_8901 Patient_7413 Patient_7413 > 28 Levels: Patient_1212 Patient_1389 Patient_1618 Patient_211 ... Patient_9232 > > The problem with this last command is that I would expect 4 levels (because > only 8 "Comparison_Indices" are true, which is equal to 4 sibships. So: > levels() does not take array indices into account or stated otherwise: if you > use a subset in an array (vector), the levels() are not properly updated (to > my opinion). > > What I additionally found is the following: > >> small_test <- factor(x=c("a", "b", "c")) >> typeof(small_test) >> > [1] "integer" > > The same happens to the Sibships that I defined as a factor? Why is it of > type integer? > > This is the version() output: > >> version >> > _ > platform x86_64-unknown-linux-gnu > arch x86_64 > os linux-gnu > system x86_64, linux-gnu > status > major 2 > minor 6.1 > year 2007 > month 11 > day 26 > svn rev 43537 > language R > version.string R version 2.6.1 (2007-11-26) > > > So: should I submit a Bug report? > > No. This is all completely as designed. Factors are internally integers (group codes), with a levels attribute that says what the codes mean. If you want the full story, use dput(small_test) or class(small_test) or str(small_test).
And subsetting a factor retains the original factor levels. To drop unused levels, just use factor(f[index]) or f[index, drop=TRUE]. The opposite behaviour can be even more annoying/dangerous because it leads to empty cells dropping out of tables and bars disappearing from barplots. -- O__ ---- Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.