[R] Random Forest AUC

2010-10-21 Thread vioravis
Guys, I used Random Forest with a couple of data sets I had to predict for binary response. In all the cases, the AUC of the training set is coming to be 1. Is this always the case with random forests? Can someone please clarify this? I have given a simple example, first using logistic regressi

Re: [R] Random Forest AUC

2010-10-22 Thread vioravis
Thanks Max and Andy. If the Random Forest is always giving an AUC of 1, isn't it over fitting??? If not, how do you differentiate this from over fitting??? I believe Random forests are claimed to never over fit (from the following link). http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home

[R] Which - value not present

2010-11-09 Thread vioravis
I am trying to use which function to obtain the index of a value in a dataframe. Depending on whether the value is present in the dataframe or not I am performing further operations to the dataframe. However, if the value is not present in the dataframe, I am getting an integer(0). How do I chec

Re: [R] Which - value not present

2010-11-09 Thread vioravis
Thank you. It works fine. -- View this message in context: http://r.789695.n4.nabble.com/Which-value-not-present-tp3035455p3035575.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/

[R] 3D Binning

2011-01-21 Thread vioravis
I am trying to do binning on three variables (3d binning). The bin boundaries are specified by the user separately for each variable. I used the bin2 function in the 'ash' package for 2d binning that involves only two variables but didn't any package for similar binning with three variables. Are t

[R] 3D Binning

2011-01-25 Thread vioravis
I am trying to do binning on three variables (3d binning). The bin boundaries are specified by the user separately for each variable. I used the bin2 function in the 'ash' package for 2d binning that involves only two variables but didn't any package for similar binning with three variables. Are t

[R] Fitting gamma and exponential Distributions with fitdist

2011-04-27 Thread vioravis
I am trying to fit gamma and exponential distributions using fitdist function in the "fitdistrplus" package to the data I have and obtain the parameters along with the AIC values of the fit. However, I am getting errors with both distributions. I have given an reproducible example with the errors I

Re: [R] Fitting gamma and exponential Distributions with fitdist

2011-04-27 Thread vioravis
There was a small error in the data creation step and have fixed it as below: test <- c(895.1358,2915.7447,335.5472,1470.4022,194.5461,1814.2328, 1056.3067,3110.0783,11441.8656,142.1714,2136.0964,1958.9022, 891.89,352.6939,1341.7042,167.4883,2502.0528,1742.1306, 837.1481,867.8533,3590.4308,1125

Re: [R] Fitting gamma and exponential Distributions with fitdist

2011-04-28 Thread vioravis
Joshua, thanks for your reply. I have tried out the following scaling and it seems to work fine: scaledVariable <- (test-min(test)+0.001)/(max(test)-min(test)+0.002) The gamma distribution parameters are obtained using the scaled variable and samples obtained from this distributions are scaled

Re: [R] Fitting gamma and exponential Distributions with fitdist

2011-04-28 Thread vioravis
I tried using JMP for the same and get two distinct recommendations when using the unscaled values. When using the unscaled values, Log Normal appears to be best fit. fitdist in R is unable to provide a fit in this case. Compare Distributions ShowDistributionNumber of Parameters -2

[R] Loading a FORTRAN DLL

2011-05-03 Thread vioravis
I have a FORTRAN DLL file obtained from Compaq Visual Fortran and when I try to load the DLL into the R environment I get an error. > dyn.load("my_function.dll") "This application has failed to start because MSCVRTD.dll was not found. Re-installing this application may fix the problem." When I

Re: [R] histograms and for loops

2011-05-06 Thread vioravis
This should work!! for(i in 1:12){ xLabel <- paste("Graph",i) plotTitle <- paste("Graph",i,".jpg") jpeg(plotTitle) print(hist(zNort1[,i], freq=FALSE, xlab=xLabel, col="blue", main="Standardized Residuals Histogram", ylim=c(0,1), xlim=c(-3.0,3.0)),axes = FALSE) axis(1, col = "blue",col.axis = "bl

[R] Fortran Symbol Name not in Load Table

2011-05-09 Thread vioravis
I am trying to call a FORTRAN subroutine from R. is.loaded is turning out to be TRUE. However when I run my .Fortran command I get the following error: Error in .Fortran("VALUEAHROPTIMIZE", as.double(ahrArray), as.double(kwArray), : Fortran symbol name "valueahroptimize" not in load table

Re: [R] Fortran Symbol Name not in Load Table

2011-05-09 Thread vioravis
I used the DLL export viewer to what is the table name being exported. It is showing as VALUEAHROPTIMIZE_. This is the name of the function we have used plus the underscore. Is there any other reason for the function not getting recognized??? Thanks. -- View this message in context: http://r.78

[R] SQP with Constraints

2011-05-09 Thread vioravis
I am trying to optimize function similar to the following: Minimize x1^2 - x2^2 - x3^2 st x1 < x2 x2 < x3 The constraint is that the variables should be monotonically increasing. Is there any package that implements Sequential Quadratic Programming with ability include these constraints???

Re: [R] Total effect of X on Y under presence of interaction effects

2011-05-11 Thread vioravis
This is what I believe is referred to as "supression" in regression, where the correlation correlation between the independent and the dependent variable turns out to be of one sign whereas the regression coefficient turns out to be of the opposite sign. Read here about supression: http://www.uv

[R] Building Custom GUIs for R

2011-05-20 Thread vioravis
I am looking to build simple GUIs based on the R codes I have. The main objective is to hide the scary R codes from non-programming people and make it easier for them to try out different inputs. For example, 1. The GUI will have means to upload a csv file which will be read by the R code. 2.

Re: [R] Building Custom GUIs for R

2011-05-20 Thread vioravis
Thanks everyone. I will try out the packages you have mentioned. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Building-Custom-GUIs-for-R-tp3537794p3538539.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r

[R] Fortran DLL in Spotfire

2011-05-24 Thread vioravis
I have a R code that loads a FORTRAN DLL to do some calculations. The code works fine when I use it in R. But when I try it in spotfire it throws an error that the it is unable to load the shared library and the specified DLL cannot be found. I have used "setwd" to point to the location in the spot

[R] Using read.xls

2011-05-26 Thread vioravis
I am using read.xls command from the gdata package. I get the following error when I try to read a work sheet from an excel sheet. Error in xls2sep(xls, sheet, verbose = verbose, ..., method = method, : Intermediate file 'C:\Tmp\RtmpYvLnAu\file7f06650f.csv' missing! In addition: Warning messa

[R] Text Summarization

2011-05-31 Thread vioravis
Is there a text mining/ NLP package in R that could do text summarization? For example, take a huge text as input and provide a summary of the text. In package tm, summarization is defined more as high frequency terms which is not what I want. I actually want a summary of what is present in the h

[R] Checking for combination of words in a sentence

2011-06-02 Thread vioravis
I am trying to implement some expert rules based on the presence or absence of words in a sentence. I have given a reproducible example below. In this, every time I come across the words lunch and bag in the same sentence, the outcome would be 1. If lunch and pack are in the same sentence, then the

Re: [R] append date to write csv filename

2011-06-03 Thread vioravis
You could use the paste function to define the filename with date appended to it. See the example below: currentDate <- Sys.Date() csvFileName <- paste("C:/R/Remake/XPX",currentDate,".csv",sep="") write.csv(S1X.sub, file=csvFileName) -- View this message in context: http://r.789695.n4.nabble.c

Re: [R] R program writing standard/practices

2011-06-10 Thread vioravis
Check this out: http://www1.maths.lth.se/help/R/RCC/ -- View this message in context: http://r.789695.n4.nabble.com/R-program-writing-standard-practices-tp3588716p3588911.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-projec

[R] Problem with loading the Snowball package

2011-01-31 Thread vioravis
I tried using the "Snowball" package for performing stemming in text mining. But when I tried to load the package the following error is thrown: Error : .onLoad failed in loadNamespace() for 'Snowball', details: call: NULL error: .onLoad failed in loadNamespace() for 'rJava', details: call:

Re: [R] 3D Binning

2011-01-31 Thread vioravis
This worked fine. Thanks. -- View this message in context: http://r.789695.n4.nabble.com/3D-Binning-tp3236223p3248489.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listi

[R] Zero Inflated Distributions

2011-03-04 Thread vioravis
I am currently fitting the following distributions using JMP and looking for ways to fit the same distributions in R: Zero Inflated Lognormal Zero Inflated Loglogistic Zero Inflated Frechet Zero Inflated Weibull Threshold Frechet Threshold Loglogistic Threshold Lognormal Log Generalized Gamma Thre

Re: [R] Zero Inflated Distributions

2011-03-04 Thread vioravis
Thanks, Thierry. Has anyone used the "bayescount" for estimating zero inflated distributions? It states that it is a "crude function". Does that mean the estimates are only approximate??? The example they have given seems to work only with Gamma Poisson. data <- rpois(100, rgamma(100, shape=1,

Re: [R] Zero Inflated Distributions

2011-03-07 Thread vioravis
Any help on this would be appreciated. Thank you. -- View this message in context: http://r.789695.n4.nabble.com/Zero-Inflated-Distributions-tp3334861p3338344.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing

[R] Seasonality in STL Decomposition

2011-03-10 Thread vioravis
I having issues with interpreting the results of STL decomposition. The following is the data used as well as the decompsed seasonality, trend and the remainder components. It is a weekly data. The original data doesn't appear to be seasonal. But there seems to be a periodic peak in the seasonal c

Re: [R] flow map lines between point pairs (latitude/longitude)

2011-03-17 Thread vioravis
I am working on a similar problem. I have to add two columns: one containing the US state to which the origin belongs and another one to add the state in to which destination belongs. All I have is the latitude and the longitude of the origin and destination. Are there any packages in R that can do

[R] Optimzing a nested function

2011-04-11 Thread vioravis
I am trying to optimize a nested function using nlminb. This throws out an error that y is missing. Can someone help me with the correct syntax?? Thank you. test1 <- function(x,y) { sum <- x + y return(sum) } test2 <- function(x,y) { sum <- test1(x,y) sumSq <- sum*sum return(sumSq) } n

Re: [R] Integrate na.rm in own defined functions

2011-04-20 Thread vioravis
This should work!! rmse<-function (x){ dquared<-x^2 sum1<-sum(x^2,na.rm=TRUE) rmse<-sqrt((1/length(x))*sum1) rmse} -- View this message in context: http://r.789695.n4.nabble.com/Integrate-na-rm-in-own-defined-functions-tp3462492p3462615.html Sent from the R help mailing list archive at Nab

[R] Distance between a vector and matrix rows

2011-08-08 Thread vioravis
I am trying to find the distance between a vector and each row of a dataframe. I am using the function "distancevector" in the package "hopach" as follows: mydata<-as.data.frame(matrix(c(1,1,1,1,0,1,1,1,1,0),nrow=2)) V1 V2 V3 V4 V5 1 1 1 0 1 1 2 1 1 1 1 0 vec <- c(1,1,1,1,1) d2<-distan

Re: [R] Distance between a vector and matrix rows

2011-08-08 Thread vioravis
Thank you both for your reply. I went with the cosine function for similarity and used it with apply to get a measure of distance. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Distance-between-a-vector-and-matrix-rows-tp3726268p3726610.html Sent from the R help mailing lis

[R] findFreqTerms vs minDocFreq in Package 'tm'

2011-09-11 Thread vioravis
I am using 'tm' package for text mining and facing an issue with finding the frequently occuring terms. From the definition it appears that findFreqTerms and minDocFreq are equivalent commands and both tries to identify the documents with terms appearing more than a specified threshold. However, I

Re: [R] findFreqTerms vs minDocFreq in Package 'tm'

2011-09-12 Thread vioravis
Thanks, Bettina. -- View this message in context: http://r.789695.n4.nabble.com/findFreqTerms-vs-minDocFreq-in-Package-tm-tp3806644p3808134.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.

[R] SVD Memory Issue

2011-09-13 Thread vioravis
I am trying to perform Singular Value Decomposition (SVD) on a Term Document Matrix I created using the 'tm' package. Eventually I want to do a Latent Semantic Analysis (LSA). There are 5677 documents with 771 terms (the DTM is 771 x 5677). When I try to do the SVD, it runs out of memory. I am us

[R] Large Test Datasets in R

2012-06-24 Thread vioravis
I am looking for some large datasets (10,000 rows & 100,000 columns or vice versa) to create some test sets. I am not concerned about the invidividual elements since I will be converting them to binary (0/1) by using arbitrary thresholds. Does any R package provide such big datasets? Also, what

[R] Skipping lines and incomplete rows

2012-07-09 Thread vioravis
I have a text file that has semi-colon separated values. The table is nearly 10,000 by 585. The files looks as follows: *** First line: Skip this line Second line: skip this line Third line: skip this line variable1 Variable2 Variable3 Variable4

Re: [R] Skipping lines and incomplete rows

2012-07-10 Thread vioravis
Thanks a lot Rui and Arun. The methods work fine with the data I gave but when I tried the two methods with the following semi-colon separated data using sep = ";". Only the first 3 columnns are read properly rest of the columns are either empty or NAs. **

Re: [R] Skipping lines and incomplete rows

2012-07-11 Thread vioravis
Thanks a lot for the guidance. I have another text file with a time stamp and an empty column as given below: First line: Skip this line Second line: skip this line Third line: skip this line variable1

[R] Merging on Datetime Column

2012-07-13 Thread vioravis
I have the following dataframe with the first column being of type datetime: dateTime <- c("10/01/2005 0:00", "10/01/2005 0:20", "10/01/2005 0:40", "10/01/2005 1:00", "10/01/2005 1:20") var1 <- c(1,2,3,4,5) var2 <- c(10,20,30,40,50) df <- dat

[R] Extracting certain text using tm package

2011-06-26 Thread vioravis
I have used "tm" package to import a set of text documents using the following command: text <- Corpus(DirSource("."),readerControl = list(language ="ansi")) I would like to extract only a certain portion of the text in each document using certain keywords. For example, I would like to include al

[R] Crosstab with Average and Count

2012-07-20 Thread vioravis
I have the following data: x <- as.factor(c(1,1,1,2,2,2,3,3,3)) y <- as.factor(c(10,10,10,20,20,20,30,30,30)) z <- c(100,100,NA,200,200,200,300,300,300) I could create the cross tab of x and y with Sum of z as its elements using the xtabs function as follows: # X Vs. Y with Sum Z xtabs(z ~ x +

[R] Automatic Labeling of Document Clusters

2011-11-14 Thread vioravis
I am performing document clustering on a set of documents using R. I performed hierarchical clustering using hclust and have identified the cluster corresponding to each data point. I would like to lablel each cluster automatically in order to identify the top keywords associated with each cluster.

[R] Spatial Statistics using R

2011-11-16 Thread vioravis
I am looking for online courses to learn Spatial Statistics using R. Statistics.com is offering an online course in December on the same topic but that schedule doesn't suit mine. Are there any other similar modes for learning spatial statistics using R??? Can someone please advice??? Thank you.

Re: [R] Spatial Statistics using R

2011-11-17 Thread vioravis
Thanks, Raphael. Just checked their website. It appears that they currently do not have any online courses planned. -- View this message in context: http://r.789695.n4.nabble.com/Spatial-Statistics-using-R-tp4079092p4079574.html Sent from the R help mailing list archive at Nabble.com. _

Re: [R] Spatial Statistics using R

2011-11-17 Thread vioravis
Thanks a lot for the guidance. I will take a look at these options. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Spatial-Statistics-using-R-tp4079092p4082354.html Sent from the R help mailing list archive at Nabble.com. __ R-hel

[R] HTML Forms to R

2011-12-06 Thread vioravis
I have currently a R function that reads a csv file, does some computations, produces some plots and writes a csv file as output. I would like to use HTML forms to make a user interface for calling appropriate parts of the functions (reading csv file, doing computations, displaying plots and writin

[R] Sequential Sum in R

2011-12-06 Thread vioravis
I am trying to code the following excel formula in R. ab cResultFormula 1 10 0.1 #N/A IF(B2<20,NA(),C2+IF(ISERROR(D1),0,D1)) 2 20 0.2 0.2 IF(B3<20,NA(),C3+IF(ISERROR(D2),0,D2)) 3 30

[R] Conditionally adding a constant

2012-01-02 Thread vioravis
I am trying to add a constant to the previous value of a variable based on certain conditions. Maybe there is a simple way to do this that I am missing completely. I have given an example below: df <- data.frame(x = c(1,2,3,4,5), y = c(10,20,30,NA,NA)) > df x y 1 1 10 2 2 20 3 3 30 4 4 NA 5 5

[R] Reading stopwords from a csv file

2011-10-04 Thread vioravis
I am using the tm package to do text miniing: I have a huge list of stopwords (2000+) that are in a csv file. I read it as follows: stopwordlist <- read.csv("stopwords to be Removed 10042011.csv") myStopwords <- as.character(stopwordlist$stopwords) When try removing the stopwords using tr1=tm_

Re: [R] Reading stopwords from a csv file

2011-10-04 Thread vioravis
The following for loops does the work but it takes a good 30 minutes to run: for(i in 1:length(myStopwords)) { currentWord <- myStopwords[i] tr1=tm_map(tr1,removeWords,currentWord) } Are there any faster alternatives?? Thank you. Ravi -- View this message in context: http://r.789695.n4.n

[R] Min Frequency in findFreqTerms

2011-11-09 Thread vioravis
I am using 'tm' package for text mining. I use the function findFreqTerms to obtain the frequent words based on their frequency in the term document matrix. The following is the example given in the help page of this function: library("tm") data("crude") tdm <- TermDocumentMatrix(crude) findFreqT

[R] Removing numbers from a list

2011-11-10 Thread vioravis
I am using gsub to remove numbers for each element of a list. Code is given below. testList <- list("this contains a number 1000","this does not contain") removeNumbers <- function(X) { gsub("\\d","",X) } outputList <- lapply(testList,removeNumbers) However, when I try to find the

[R] Error in prune in the rEMM package

2012-03-15 Thread vioravis
I am trying to use rEMM package for the Extensible Markov Models. I tried the following sequence of code: emmt=EMM(measure="euclidean",threshold=0.75,lambda=0.001) emmt=build(emmt,data) new_threshold=sum(cluster_counts(emmt))*0.002 emmt_ new=prune(emmt,new_threshold) However, I get the following

[R] Word Count

2012-04-10 Thread vioravis
I have a sentence like the following: sentence <- "Part 1 is working, Part 2 is not working and Part 3 is working" I would like th get the total count of working and not working as Working = 2 and Not Working = 1. Can someone help with how can this be done in R??? Thank you. Ravi -- View thi

[R] htmlParse Error

2012-05-21 Thread vioravis
I am trying to parse a webpage using the htmlParse command in XML package as follows: library(XML) u = "http://en.wikipedia.org/wiki/World_population"; doc = htmlParse(u) I get the following error: Error in htmlParse(u) : error in creating parser for http://en.wikipedia.org/wiki/World_populat