Hi All,

Here is the case. I want to build classification model (SVM). Some of variables 
for this model are categorical attributes which represent words  (usually 3-10 
words - query for search in google). For example:
search_id | query_words                        |..| result
-----------+----------------------------------+--+--------
1            | how,to,grow,tree                  |..| 4
2            | smartfone,htc,buy,price         |..| 7
3            | buy,house,realty,london         |..| 6
4            | where,to,go,weekend,cinema |..| 4
...
As you can see, words in the query are disordered and may occur in different 
queries. Total number of unique words for all queries is several thousands.
The question is how to represent this variable (query_words) to use for SVM.

Thank you for any advices!

Alex

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to