Hello,
I need some help in data cleaning using R. my CSV file looks as
follows.
"id","gender","age","category1","category2","category3","category4","category5","category6","category7","category8","category9","category10"1,"Male",22,"movies","music","travel","cloths","grocery",,,,,2,"Male",28,"travel","books","movies",,,,,,,3,"Female",27,"rent","fuel","grocery","cloths",,,,,,4,"Female",22,"rent","grocery","travel","movies","cloths",,,,,5,"Female",22,"rent","online-shopping","utiliy",,,,,,,
I need to reformat as follows.
id gender age category rank1 Male 22 movies
11 Male 22 music 21 Male 22 travel
31 Male 22 cloths 41 Male 22 grocery
51 Male 22 books NA1 Male 22 rent
NA1 Male 22 fuel NA1 Male 22 utility
NA1 Male 22 online-shopping NA
...................................5 Female 22 movies
NA5 Female 22 music NA5 Female 22 travel
NA5 Female 22 cloths NA5 Female 22 grocery
NA5 Female 22 books NA5 Female 22 rent
15 Female 22 fuel NA5 Female 22 utility
NA5 Female 22 online-shopping 2
So far My efforts are as follows.
mini <- read.csv("~/MS/coding/mini.csv", header=FALSE)
mini_clean <- mini[-1,]
df_mini <- melt(df_clean, id.vars=c("V1","V2","V3"))
sqldf('select * from df_mini order by "V1"')
Now I want to know what is the best way to fill all missing categories for
all users.
Thanks
Nash
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.