Hi All, Here is the case. I want to build classification model (SVM). Some of variables for this model are categorical attributes which represent words (usually 3-10 words - query for search in google). For example: search_id | query_words |..| result -----------+----------------------------------+--+-------- 1 | how,to,grow,tree |..| 4 2 | smartfone,htc,buy,price |..| 7 3 | buy,house,realty,london |..| 6 4 | where,to,go,weekend,cinema |..| 4 ... As you can see, words in the query are disordered and may occur in different queries. Total number of unique words for all queries is several thousands. The question is how to represent this variable (query_words) to use for SVM.
Thank you for any advices! Alex [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.