On 25/10/2014, 5:25 AM, Wush Wu wrote:
> Dear all,
> 
> Sorry that I am not sure that whether I should ask the question here or
> R-devel. Is there any existed packages which implements or is implementing
> feature hashing or similar function?
> 
> For who does not know "feature hashing", please let me give a brief
> explanation here.
> 
> Feature hashing is a technique to convert a large amount of string to dummy
> variables quickly( similar to `stats::contrasts` ). For example, if I want
> to convert a character vector `x <- c("asdfa", "adsfausd", .....)` to dummy
> variable, I need to construct a mapping between the string and the index
> (`base::factor`). However, if the `x` has lots of different elements and
> the size of `x` is huge, the overhead of constructing index is large.
> Moreover, the overhead is larger for the distributed environment.
> 
> A good hashing function could be used to map the string to the index
> quickly without the overhead of constructing the index. The probability of
> "collision" might be small if we pick a good hashing function. For details,
> please see http://en.wikipedia.org/wiki/Feature_hashing

The "digest" package implements several different hash functions.  You
could use the hash values as names in an environment to index arbitrary
objects associated with the values.

Duncan Murdoch

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to