My purpose involves creating a dissimilarity matrix using the daisy package
in R before applying k-mediod clustering for customer segmentation. The
dataset has 133,153 observations of 35 variables in a data.frame with
numerical, categorical, blank cells and missing values. Missing values
refer to NA, while a blank cells means nothing present within the
data.frame.

Here’s my OS:

> sessionInfo()

    R version 3.1.0 (2014-04-10)

    Platform x86_64-w64-mingw32/x64 (64-bit)

I have 35 variables, but here is description of the first 5:

> head(df)

      user_id    Age    Gender  Household.Income  Marital.Status

    1   12945             Male

    2   12947             Male

    3   12990

    4   13160   25-34  Male     100k-125k               Single

    5   13195             Male      75k-100k                Single

    6   13286

Since the Windows computer has 3 Gb RAM, I increased the virtual memory to
100Gb hoping that would be enough to create the matrix - it didn't
work. I've looked into other R packages for solving the memory problem, but
they don't work. I cannot use the `bigmemory` with the `biganalytics`
package because it only accepts numeric matrices. The `clara` and `ff`
packages also accept only numeric matrices. Here's the daisy script:

#Load csv file

> Store1 <- read.csv("/Users/name/Client1.csv", head = TRUE)

#Convert csv to data.frame

> df <-as.data.frame(Store1)

#Increase memory allocation in R to 70 GB using the command:

> memory.limit(size = 70000)

[1] 70000

#Load cluster package

> library(cluster)

#Create daisy dissimilarity matrix

#Use Gower distance coefficient for mixed variables

#Set type as ratio scaled variable

> daisy1 <- daisy(df, metric = "gower”,

                           type = list(ordratio = c(1:35)))

#Error: cannot allocate vector of size 66.0 Gb


How can I fix the error?
-- 
Scott Davis
Cell: (408)826-9561
Skype ID: Scdavis61
San Jose, CA.

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to