Hello,
           I need some help in data cleaning using R. my CSV file looks as
follows.

"id","gender","age","category1","category2","category3","category4","category5","category6","category7","category8","category9","category10"1,"Male",22,"movies","music","travel","cloths","grocery",,,,,2,"Male",28,"travel","books","movies",,,,,,,3,"Female",27,"rent","fuel","grocery","cloths",,,,,,4,"Female",22,"rent","grocery","travel","movies","cloths",,,,,5,"Female",22,"rent","online-shopping","utiliy",,,,,,,

I need to reformat as follows.

id gender age category            rank1 Male    22  movies
  11 Male    22  music                21 Male    22  travel
   31 Male    22  cloths               41 Male    22  grocery
    51 Male    22  books                NA1 Male    22  rent
      NA1 Male    22  fuel                 NA1 Male    22  utility
         NA1 Male    22  online-shopping      NA
...................................5 Female    22  movies
NA5 Female    22  music              NA5 Female    22  travel
   NA5 Female    22  cloths             NA5 Female    22  grocery
      NA5 Female    22  books              NA5 Female    22  rent
         15 Female    22  fuel               NA5 Female    22  utility
           NA5 Female    22  online-shopping    2

So far My efforts are as follows.

mini <- read.csv("~/MS/coding/mini.csv", header=FALSE)
mini_clean <- mini[-1,]
df_mini <- melt(df_clean, id.vars=c("V1","V2","V3"))
sqldf('select * from df_mini order by  "V1"')

Now I want to know what is the best way to fill all missing categories for
all users.

Thanks
Nash

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to