Hi Gabor,
I am coming back to you about the method you described to me a month ago
to define the level order during a read.table call. I initially thought
that I would need to apply the 'unique' function on a single column of
my dataset, so I only used it after the read.table step (to make my life
easier)... Well, I was wrong: I need to reorder all my columns (just to
remind you, I don't know the numbers of columns my code has to handle).
So, here come troubles.
I first tried to apply your code as is, although I thought there might
be some problems. The class can actually not be recycled, when a list
notation is used (the help says that "colClasses character. A *vector*
of classes to be assumed for the columns. Recycled as necessary...").
See the following example:
######################
library(methods)
setClass("my.factor")
setAs("character",
"my.factor",
function(from) factor(from, levels =
unique(from)))
Input<-"a b c
d
1 1 175 n
f
2 2 102 n
j
3 3 187 o
n
4 4 106 u
g
5 5 102 o
v
6 6 133 l
x
7 7 149 w
q
8 8 122 x
p
9 9 151 u
r
10 10 134 e
g
11 11 170 j
q
12 12 103 v
n
13 13 153 n
w
14 14 106 x
x
15 15 185 v
x
16 16 102 s
p
17 17 181 i
h
18 18 192 o
k
19 19 161 d
f
20 20 158 n
q
"
DF <- read.table(textConnection(Input), header = TRUE, colClasses =
list(c=("my.factor")))
levels(DF$c) # properly ordered
levels(DF$d) # not reordered
######################
I also tried that:
######################
DF <- read.table(textConnection(Input), header = TRUE, colClasses =
c("my.factor"))
levels(DF$c)
levels(DF$d)
######################
In this case, the class is definitely recycled as all the columns of DF
are transformed into factors... Not really useful :)
I tried to modify the content of the list or my second notation, by
including "integer" or a second "my.factor"... but I did not have much
success.
Any idea how to use the class "my.factor" multiple times ?
Thanks in advance
Gabor Grothendieck a écrit :
> Its the same principle. Just change the function to be suitable. This one
> arranges the levels according to the input:
>
> library(methods)
> setClass("my.factor")
> setAs("character", "my.factor",
> function(from) factor(from, levels = unique(from)))
>
> Input <- "a b c
> 1 1 176 w
> 2 2 141 k
> 3 3 172 r
> 4 4 182 s
> 5 5 123 k
> 6 6 153 p
> 7 7 176 l
> 8 8 170 u
> 9 9 140 z
> 10 10 194 s
> 11 11 164 j
> 12 12 100 j
> 13 13 127 x
> 14 14 137 r
> 15 15 198 d
> 16 16 173 j
> 17 17 113 x
> 18 18 144 w
> 19 19 198 q
> 20 20 122 f
> "
> DF <- read.table(textConnection(Input), header = TRUE,
> colClasses = list(c = "my.factor"))
> str(DF)
>
>
> On 8/28/07, Sébastien <[EMAIL PROTECTED]> wrote:
>
>> Ok, I cannot send to you one of my dataset since they are confidential. But
>> I can produce a dummy "mini" dataset to illustrate my question. Let's say I
>> have a csv file with 3 columns and 20 rows which content is reproduced by
>> the following line.
>>
>>
>>> mydata<-data.frame(a=1:20,
>>>
>> b=sample(100:200,20,replace=T),c=sample(letters[1:26], 20,
>> replace = T))
>>
>>> mydata
>>>
>> a b c
>> 1 1 176 w
>> 2 2 141 k
>> 3 3 172 r
>> 4 4 182 s
>> 5 5 123 k
>> 6 6 153 p
>> 7 7 176 l
>> 8 8 170 u
>> 9 9 140 z
>> 10 10 194 s
>> 11 11 164 j
>> 12 12 100 j
>> 13 13 127 x
>> 14 14 137 r
>> 15 15 198 d
>> 16 16 173 j
>> 17 17 113 x
>> 18 18 144 w
>> 19 19 198 q
>> 20 20 122 f
>>
>> If I had to read the csv file, I would use something like:
>> mydata<-data.frame(read.table(file="c:/test.csv",header=T))
>>
>> Now, if you look at mydata$c, the levels are alphabetically ordered.
>>
>>> mydata$c
>>>
>> [1] w k r s k p l u z s j j x r d j x w q f
>> Levels: d f j k l p q r s u w x z
>>
>> What I am trying to do is to reorder the levels as to have them in the order
>> they appear in the table, ie
>> Levels: w k r s p l u z j x d q f
>>
>> Again, keep in mind that my script should be used on datasets which content
>> are unknown to me. In my example, I have used letters for mydata$c, but my
>> code may have to handle factors of numeric or character values (I need to
>> transform specific columns of my dataset into factors for plotting
>> purposes). My goal is to let the code scan the content of each factor of my
>> data.frame during or after the read.table step and reorder their levels
>> automatically without having to ask the user to hard-code the level order.
>>
>> In a way, my problem is more related to the way the factor levels are
>> ordered than to the read.table function, although I guess there is a link...
>>
>> Gabor Grothendieck a écrit :
>> Its not clear from your description what you want.
>>
> Could you be a bit more
>
>> specific including an example.
>>
>
> On 8/28/07, Sébastien <[EMAIL PROTECTED]>
>
>> wrote:
>>
>
>
>> Thanks Gabor, I have two questions:
>>
>
> 1- Is there any difference between your
>
>> code and the following one, with
>>
> regards to Fld2 ?
> ### test ###
>
>
>> Input <- "Fld1 Fld2
>>
> 10 A
> 20 B
> 30 C
> 40 A
> "
> DF <-
>
>
>> read.table(textConnection(Input), header =
>>
> TRUE)
>
>
>> DF$Fld2<-factor(DF$Fld2,levels= c("C", "A", "B")))
>>
>
>
>> 2- do you see any way to bring flexibility to your method ? Because,
>> it
>>
> looks to me as, at this stage, I have to i) know the order of my
>
>> levels
>>
> before I read the table and ii) create one class per factor.
> My
>
>> problem is that I am not really working on a specific dataset. My goal is
>>
> to
>
>> develop R scripts capable of handling datasets which have various
>>
> contents
>
>> but close structures. So, I really need to minimize the quantity
>> of
>>
> "user-specific" code.
>
> Sebastien
>
> Gabor Grothendieck a écrit :
> You can
>
>> create your own class and pass that to read table. In
>>
>
>
>> the example
>>
>
>
>> below Fld2 is read in with factor levels C, A, B
>>
>
>
>> in that
>>
>
>
>> order.
>>
>
>
> library(methods)
> setClass("my.levels")
> setAs("character",
>
>
>> "my.levels",
>>
>
>
>> function(from) factor(from, levels = c("C", "A", "B")))
>>
>
>
> ###
>
>
>> test ###
>>
>
>
>> Input <- "Fld1 Fld2
>>
> 10 A
> 20 B
> 30 C
> 40 A
> "
> DF <-
>
>
>> read.table(textConnection(Input), header = TRUE,
>>
>
>
>> colClasses = c("numeric",
>>
>
>
>> "my.levels"))
>>
>
>
>> str(DF)
>>
> # or
> DF <- read.table(textConnection(Input), header =
>
>
>> TRUE,
>>
>
>
>> colClasses = list(Fld2 = "my.levels"))
>>
> str(DF)
>
>
> On 8/28/07,
>
>
>> Sébastien <[EMAIL PROTECTED]> wrote:
>>
>
>
>> Dear R-users,
>>
>
>
>> I have found this not-so-recent post in the archives
>>
>
>
>> -
>>
>
>
>> http://tolstoy.newcastle.edu.au/R/devel/00a/0291.html -
>>
>
>
>> while I was
>>
>
>
>> looking for a particular way to reorder factor levels. The
>>
>
>
>> question
>>
>
>
>> addressed by the author was to know if the read.table function
>>
>
>
>> could be
>>
>
>
>> modified to order the levels of newly created factors "according to
>>
>
>
>> the
>>
>
>
>> order that they appear in the data file". Exactly what I am looking
>>
>
>
>> for.
>>
>
>
>> As there was no reply to this post, I wonder if any move have been
>>
>
>
>> made
>>
>
>
>> towards the implementation of this suggestion. A quick look
>>
>
>
>> at
>>
>
>
>> ?read.table tells me that if this option was implemented, it was not
>>
>
>
>> in
>>
>
>
>> the read.table function...
>>
>
> Sebastien
>
> PS: I am sorry to post so many
>
>
>> messages on the list, but I am learning R
>>
>
>
>> (basically by trials & errors ;-)
>>
>
>
>> ) and no one around me has even a
>>
>
>
>> slight notion about
>>
>
>
>> it...
>>
>
>
>> ______________________________________________
>>
> [EMAIL PROTECTED]
>
>> mailing
>>
> list
>
>
>> https://stat.ethz.ch/mailman/listinfo/r-help
>>
> PLEASE do
>
>
>> read the posting
>> guide
>>
> http://www.R-project.org/posting-guide.html
>
>
>> and provide
>>
>
>
>> commented, minimal, self-contained, reproducible code.
>>
>
>
>
>
>>
>
>
>
>
>
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.