Re: [R] what is wrong with this dataset?

Rob Griffin Thu, 24 Nov 2011 00:43:23 -0800

I (and probably everyone who has looked at your email) have literally no clue 
what you are trying to do. Take a look at this first. 
And for your âquestionâ I assume you have tried to aggregate the data frame 
you have called dataset â what you have produced is a new data frame that has 
all of the combinations gender and position, it has then simply filled the 
category and salary with counts of how many female managers, female sales 
reps... and so on. If you are trying to show average category and salaries for 
each of the combinations use:
dataset2<-aggregate(dataset[,c(3:4)],by = dataset[,c(1:2), drop = F], mean)


But, please actually enlighten us as to the nature of your problem.
Rob

-----Original Message----- 
From: Carl Witthoft 
Sent: Thursday, November 24, 2011 3:08 AM 
To: r-help@r-project.org 
Subject: Re: [R] what is wrong with this dataset? 

  As the Kroger  Data Munger Guru would say,  "What is the problem you 
are trying to solve?"

The datasets look just fine from a structural point of view. What do you 
want to do and what is wrong with the results you get?

<quote>
From: Kaiyin Zhong <kindlychung_at_gmail.com>
Date: Thu, 24 Nov 2011 09:39:20 +0800

> d = data.frame(gender=rep(c('f','m'), 5), pos=rep(c('worker', 'manager',
'speaker', 'sales', 'investor'), 2), lot1=rnorm(10), lot2=rnorm(10))
> d

    gender      pos       lot1       lot2
1       f   worker  1.1035316  0.8710510
2       m  manager -0.4824027 -0.2595865
3       f  speaker  0.8933589 -0.5966119
4       m    sales  0.4489920  0.4971199
5       f investor  0.9246900 -0.7531117
6       m   worker  0.2777642 -0.3338369
7       f  manager -1.0890828  0.7073686
8       m  speaker -1.3045821  0.4373199
9       f    sales  0.3092965 -2.6441382
10      m investor -0.5770073 -1.5200347

> cast(melt(d))

Using gender, pos as id variables
    gender      pos       lot1       lot2
1       f investor  0.9246900 -0.7531117
2       f  manager -1.0890828  0.7073686
3       f    sales  0.3092965 -2.6441382
4       f  speaker  0.8933589 -0.5966119
5       f   worker  1.1035316  0.8710510
6       m investor -0.5770073 -1.5200347
7       m  manager -0.4824027 -0.2595865
8       m    sales  0.4489920  0.4971199
9       m  speaker -1.3045821  0.4373199
10      m   worker  0.2777642 -0.3338369

> dataset = read.csv('datalist.csv')
> dataset
    Gender     Title Category Salary
1       M   Manager        3  27000
2       F   Manager        2  22500
3       M Sales Rep        1  18000
4       M Sales Rep        3  27000
5       F   Manager        3  27000
6       M Secretary        4  31500
7       M Sales Rep        2  22500
8       M Secretary        2  22500
9       M    Worker        4  40500
10      M   Manager        4  37100
11      F Secretary        2  22500
12      F   Manager        3  27000
13      M    Worker        2  20000
14      M   Manager        4  32000
15      F Sales Rep        2  22900
16      M Sales Rep        3  27000
17      F Sales Rep        2  22500
18      M   Manager        1  18000
19      M Secretary        3  27000
20      F Sales Rep        3  27000
21      M Secretary        4  31500
22      M    Worker        2  22500
23      M   Manager        2  22500
24      M    Worker        4  40500
25      M    Worker        4  37100
26      F Secretary        2  22500
27      F   Manager        3  27000
28      M    Worker        2  20000
29      M   Manager        4  32000
30      F Sales Rep        2  22900

> cast(melt(dataset))

Using Gender, Title as id variables
Aggregation requires fun.aggregate: length used as default
   Gender     Title Category Salary
1      F   Manager        4      4
2      F Sales Rep        4      4
3      F Secretary        2      2
4      M   Manager        6      6
5      M Sales Rep        4      4
6      M Secretary        4      4
7      M    Worker        6      6

The content of datalist.xls is here:
http://paste.pound-python.org/show/15098/
-- 

Sent from my Cray XK6
"Pendeo-navem mei anguillae plena est."

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] what is wrong with this dataset?

Reply via email to