Hello,
I've come up with some challenges with my process that are a bit too
complicated for the mailing list.
Is there anyone out there, preferably a real "statistician", who is
willing to consult with me via phone/email for a few hours. I'm happy
to pay you for your time.
Thanks,
-Noah
Hi,
I'm training an SVM (C-classification from e1071 library)
Some of the variables in my data set are nominal. Is there some
easy/automatic way to convert them to numerical representations?
Thanks,
-N
__
R-help@r-project.org mailing list
https:/
On Wed, 12 Aug 2009, Daniel Malter wrote:
Hi you can use newvariable=as.numeric(variablename). This converts your
factors into numeric variables, but not always with the desired
result. So
make sure that you check whether "newvariable" gives you what you want.
Otherwise recoding by
Hi,
The answers to my previous question about nominal variables has lead me
to a more important question.
What is the "best practice" way to feed nominal variable to an SVM.
For example:
color = ("red, "blue", "green")
I could translate that into an index so I wind up with
color= (1,2,3)
Bu
PM, Steve Lianoglou wrote:
Hi,
On Aug 12, 2009, at 2:53 PM, Noah Silverman wrote:
Hi,
The answers to my previous question about nominal variables has lead
me to a more important question.
What is the "best practice" way to feed nominal variable to an SVM.
For example:
color = ("
factor? i.e. foo$color <-
factor(foo$color)
On 8/12/09 2:21 PM, Achim Zeileis wrote:
> On Wed, 12 Aug 2009, Noah Silverman wrote:
>
>> Hi,
>>
>> The answers to my previous question about nominal variables has lead
>> me to a more important question.
>>
>
ominal factor, but I'm
not sure.
Can anyone provide an opinion on this?
Thanks!
-N
On 8/12/09 2:21 PM, Achim Zeileis wrote:
> On Wed, 12 Aug 2009, Noah Silverman wrote:
>
>> Hi,
>>
>> The answers to my previous question about nominal variables has lead
>>
Hi,
I'm developing an experiment with logistic regression.
I've come across the lrm function in the Design library.
While I understand and can use the basic functionality, there are a ton
of options that go beyond my knowledge.
I've carefully read the help page for lrm, but don't understand
Hi,
I'm developing an experiment with logistic regression.
I've come across the lrm function in the Design library.
While I understand and can use the basic functionality, there are a ton
of options that go beyond my knowledge.
I've carefully read the help page for lrm, but don't understand
Hello,
I'm using an SVM for predicting a model, but I'm most interested in the
probability output. This is easy enough to calculate.
My challenge is how to measure the relative performance of the SVM for
different settings/parameters/etc.
An AUC curve comes to mind, but I'm NOT interested
Hello,
I working on a model to predict probabilities.
I don't really care about binary prediction accuracy.
I do really care about the accuracy of my probability predictions.
Frank was nice enough to point me to the val.prob function from the
Design library. It looks very promising for my ne
Hello,
In my ongoing quest to develop a "best" model, I'm testing various forms
of SVM to see which is best for my application.
I have been using the SVM from the e1071 library without problem for
several weeks.
Now, I'm interested in RVM and LSSVM to see if I get better performance.
When
kind of score that measures just the accuracy?
Thanks!
-N
On 8/19/09 10:42 AM, Frank E Harrell Jr wrote:
> Noah Silverman wrote:
>> Hello,
>>
>> I working on a model to predict probabilities.
>>
>> I don't really care about binary prediction accuracy.
>
time. mean(label)
On 8/19/09 11:51 AM, Frank E Harrell Jr wrote:
> Noah Silverman wrote:
>> Thanks for the suggestion.
>>
>> You explained that Briar combines both accuracy and discrimination
>> ability. If I understand you right, that is in relation to binary
:50 AM, Steve Lianoglou wrote:
> Hi,
>
> On Aug 19, 2009, at 1:27 PM, Noah Silverman wrote:
>
>> Hello,
>>
>> In my ongoing quest to develop a "best" model, I'm testing various
>> forms of SVM to see which is best for my application.
>>
>
Steve,
That makes sense, except that x is a data.frame with about 70 columns.
So I don't see how it would convert to a list.
-N
On 8/19/09 12:09 PM, Steve Lianoglou wrote:
> Howdy,
>
> On Aug 19, 2009, at 2:54 PM, Noah Silverman wrote:
>
>> Hi Steve,
>>
>> N
ore
> into discrimination and calibration components (which is not in the
> software).
>
> Frank
>
>>
>> i.e. For predicted probabilities of .10 to .20 the data was actually
>> labeled true .18 percent of the time. mean(label)
>>
>>
>>
Steve,
Not sure what to do with this.
I have a data.frame. Don't know how to convert it to a list.
Does anybody else have any input on this?
On 8/19/09 12:17 PM, Steve Lianoglou wrote:
>> Steve,
>>
>> That makes sense, except that x is a data.frame with about 70
>> columns. So I don't see
;kernel' "
Any suggestions?
Thanks!
On 8/19/09 3:17 PM, David Winsemius wrote:
>
> On Aug 19, 2009, at 6:11 PM, Noah Silverman wrote:
>
>> Steve,
>>
>> Not sure what to do with this.
>>
>> I have a data.frame. Don't know how to convert it
at 6:36 PM, David Winsemius
> wrote:
>
>>
>> On Aug 19, 2009, at 6:30 PM, Noah Silverman wrote:
>>
>>> Thanks David,
>>>
>>> Then, do you have any clue why RVM or LSSVM would be generating an
>>> error?
>>
>> No.
>>>
&
Hello,
I'm attempting to evaluate the accuracy of the probability predictions
for my model. As previously discussed here, the AUC is not a good
measure as I'm not concerned with classification accuracy but
probability accurcy.
It was suggested to me that the loess function would be a good m
Hello,
I'm attempting to evaluate the accuracy of the probability predictions
for my model. As previously discussed here, the AUC is not a good
measure as I'm not concerned with classification accuracy but
probability accurcy.
It was suggested to me that the loess function would be a good m
Hi,
I've come across a strange error when using the lrm.fit function and the
subsequent predict function.
The model is created very quickly and can be verified by printing it on
the console. Everything looks good. (In fact, the performance measures
are rather nice.)
Then, I want to use th
Hello,
I'm attempting to evaluate the accuracy of the probability predictions
for my model. As previously discussed here, the AUC is not a good
measure as I'm not concerned with classification accuracy but
probability accurcy.
It was suggested to me that the loess function would be a good m
Hi,
I've come across a strange error when using the lrm.fit function and the
subsequent predict function.
The model is created very quickly and can be verified by printing it on
the console. Everything looks good. (In fact, the performance measures
are rather nice.)
Then, I want to use th
Thanks Marc,
My apologies to all for the unnecessary re-posting.
-Noah
On 8/21/09 9:13 AM, Marc Schwartz wrote:
> On Aug 21, 2009, at 11:02 AM, Noah Silverman wrote:
>
>> Hi,
>>
>> I've come across a strange error when using the lrm.fit function and
>&
Hello,
Frank was nice enough to point me to the val.prob function of the Design
library.
It creates a beautiful graph that really helps me visualize how well my
model is predicting probabilities.
By default, there are two lines on the graph
1) fitted logistic calibration curve
2) n
>
> require(Design)
> dd <- datadist(predprob); options(datadist='dd')
> f <- lrm(event ~ rcs(qlogis(predprob), 3))
> plot(f, predprob=NA, fun=plogis)
>
> Frank
>
>
> Noah Silverman wrote:
>> Hello,
>>
>> Frank was nice enough
Hi,
Been running the lrm model from the Design package. (Thanks Frank!)
There are some output columns that I don't quite understand.
What is "Wald Z" and then "P" which is 0 for all rows???
---
Coef S.E. Wald Z P
Intercept -2.797 0
Hi,
For fun, I'm trying to throw some horse racing data into either an svm
or lrm model. Curious to see what comes out as there are so many
published papers on this.
One thing I don't know how to do is to standardize the probabilities by
race.
For example, if I train an LRM on a bunch of
, when
training a clogit is the exact value of the strata saved as part of the
model, or is it just used for grouping?)
On 8/22/09 10:57 AM, Charles C. Berry wrote:
On Fri, 21 Aug 2009, Noah Silverman wrote:
Hi,
For fun, I'm trying to throw some horse racing data into either an
svm or l
utput options. (I can see
one that is a probability option.)
Thanks!!
-N
On 8/22/09 10:57 AM, Charles C. Berry wrote:
> On Fri, 21 Aug 2009, Noah Silverman wrote:
>
>> Hi,
>>
>> For fun, I'm trying to throw some horse racing data into either an
>> svm or lrm
the conditional logit. Chuck's
> reference didn't help me much
> with that so if you know of others, please let me know. Thanks.
>
>
>
> Mark
>
>
> On Aug 25, 2009, *Noah Silve
Hi,
I'm trying to find an easy way to do this.
I want to select the top three values of a specific column in a subset
of rows in a data.frame. I'll demonstrate.
ABC
x21
x41
x32
y15
y26
y38
I want the top 3 values of B from the data.fr
I only have a few values in my example, but the real data set might have
20-100 rows with A="X". So how do I pick just the three highest ones?
-N
On 8/26/09 2:46 AM, Ottorino-Luca Pantani wrote:
> df.mydata[df.mydata$A=="X" AND df.mydata$C < 2, ]
> will do
s should work - head is quite a usefull summary function
>
> head(df.mydata[df.mydata$A=="X"& df.mydata$C< 2, ],3)
>
>
> Colin.
>
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf Of Noah Silve
Hi,
Is there a way to build up a vector, item by item. In perl, we can
"push" an item onto an array. How can we can do this in R?
I have a loop that generates values as it goes. I want to end up with a
vector of all the loop results.
In perl it woud be:
for(item in list){
result <- 2
y/lapply/mapply family of functions?
>
> In general, the "for" loop construct can be avoided so you don't have to
> think about messy indexing. What exactly are you trying to do?
>
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bou
tatistical Computing Facility
> Department of Statistics
> UC Berkeley
> spec...@stat.berkeley.edu
>
>
> On Wed, 26 Aug 2009, Noah Silverman wrote:
>
>> The actually process is REALLY complicate, I just gave a simple example
>> for the
Deb,
I generally run my larger R tasks on a server.
Here is my workflow.
1) Write an R script using a text editor. (There are many popular ones.)
2) FTP the R script to your server.
3) SSH into the server
4) Run R
5) Run the script that you uploaded from the R process you just started.
On 8/2
Hi,
I need a bit of guidance with the sapply function. I've read the help
page, but am still a bit unsure how to use it.
I have a large data frame with about 100 columns and 30,000 rows. One
of the columns is "group" of which there are about 2,000 distinct "groups".
I want to normalize (s
Hello,
I'm using the svm function from the e1071 package.
It works well and gives me nice results.
I'm very curious to see the actual coefficients calculated for each
input variable. (Other packages, like RapidMiner, show you this
automatically.)
I've tried looking at attributes for the mo
ot; for each of the
80 variables.
--
Noah
On 8/30/09 7:47 PM, Steve Lianoglou wrote:
Hi,
On Sun, Aug 30, 2009 at 6:10 PM, Noah Silverman wrote:
Hello,
I'm using the svm function from the e1071 package.
It works well and gives me nice results.
I'm very curious to see the actua
relative weight
(significance.) the SVM assigned to each variable.
On 8/31/09 12:54 AM, Achim Zeileis wrote:
> On Mon, 31 Aug 2009, Noah Silverman wrote:
>
>> Steve,
>>
>> That doesn't work.
>>
>> I just trained an SVM with 80 variables.
>> svm_mod
Hello,
I want to start testing using the MNP probit function in stead of the
lrm function in my current experiment.
I have one dependant label and two independent varaibles.
The lrm is simple
model <- lrm(label ~ val1 + val2)
I tried the same thing with the mnp function and got an error tha
my thought was
that they were using the same for this application.
Any thoughts?
--
Noah
On 8/31/09 5:07 PM, Achim Zeileis wrote:
> On Mon, 31 Aug 2009, Noah Silverman wrote:
>
>> Hello,
>>
>> I want to start testing using the MNP probit function in stead of the
&
I get that.
Still trying to figure out what the "multi" nominal labels they used
were. That's why I passed on the reference to the seminar summary.
On 8/31/09 5:40 PM, Achim Zeileis wrote:
> On Mon, 31 Aug 2009, Noah Silverman wrote:
>
>> Thanks Achim,
>>
&g
Since the Boltman and Chapman application didn't really
have multiple discreet choices, I'm not sure how the probit model
would. Hence my inquiry.
On 8/31/09 6:23 PM, Achim Zeileis wrote:
> On Mon, 31 Aug 2009, Noah Silverman wrote:
>
>> I get that.
>>
>> St
ken and they
are in fact predicting rank, would you please show me where that is in
their paper.
Thanks!
-N
On 8/31/09 7:17 PM, Achim Zeileis wrote:
> On Mon, 31 Aug 2009, Noah Silverman wrote:
>
>> Um. I did my research. Have been for years. I assume you're
>> referring to
Erin,
Linux supports many scripting languages.
Which language are you interested in: Perl, PHP, Bash, Python, etc???
--
Noah
On 9/2/09 10:35 PM, Erin Hodgess wrote:
> Dear R People:
>
> I know that this is off topic, but could anyone recommend a good book
> on Linux scripting please?
>
> Any he
nking along the lines of sed or awk, please.
On Thu, Sep 3, 2009 at 1:56 AM, Noah Silverman wrote:
Erin,
Linux supports many scripting languages.
Which language are you interested in: Perl, PHP, Bash, Python, etc???
--
Noah
On 9/2/09 10:35 PM, Erin Hodgess wrote:
Dear R People:
I kn
Hi,
I use the max function often to find the top value from a matrix or
column of a data.frame.
Now I'm looking to find the top 2 (or three) values from my data.
I know that I could sort the list and then access the first two items,
but that seems like the "long way". Is there some way to a
Statistical Computing Facility
> Department of Statistics
> UC Berkeley
> spec...@stat.berkeley.edu
>
>
> On Thu, 3 Sep 2009, Noah Silverman wrote:
>
>> Hi,
>>
>> I use the max function often to find
Hi,
I have a strange one for the group.
We have a system that predicts probabilities using a fairly standard svm
(e1017). We are looking at probabilities of a binary outcome.
The input data is generated by a perl script that calculates a bunch of
things, fetches data from a database, etc.
data = 6.9
2) Run with "bad" data missing = 5.5
3) Run with "correct" data = ?? (We're running now, will take a few
hours to compute.)
I might also try to plot the bad data. It would be interesting to see
what shape it has...
On 9/7/09 1:05 PM, Mark Knecht wro
data = 6.9
2) Run with "bad" data missing = 5.5
3) Run with "correct" data = ?? (We're running now, will take a few
hours to compute.)
I might also try to plot the bad data. It would be interesting to see
what shape it has...
On 9/7/09 1:05 PM, Mark Knecht wrote:
Just for fun, I'll see if I can schedule a few hours to run the same
experiment with the training data order reversed. If I'm correct, the
results should be the same.
Thanks!
--
N
On 9/7/09 2:34 PM, Mark Knecht wrote:
> On Mon, Sep 7, 2009 at 1:22 PM, Noah Silverman
> wrot
Hi,
I'm a daily user of both mac and Linux so wanted to offer some thoughts:
1) R runs great on a Mac. There is a standard install from the cran
website that has a nice GUI built into it. You can do things like drag
files to the console and it will fill in the path name.
2) I like using B
Steve,
You make a good point. I confused 64 bit with a multi-core setup.
That said, I don't belive the pretty packaged up GUI has a 64 bit
version, just the "raw terminal" version does.
On 9/11/09 12:38 PM, Steve Lianoglou wrote:
Hi,
On Sep 11, 2009, at 3:08 PM, Noah Sil
Thanks Steve,
That's a big help.
On 9/11/09 12:48 PM, Steve Lianoglou wrote:
Hi,
On Sep 11, 2009, at 3:40 PM, Noah Silverman wrote:
Steve,
You make a good point. I confused 64 bit with a multi-core setup.
That said, I don't belive the pretty packaged up GUI has a 64 bit
ver
Hi,
Our discussions about 64 bit R has led me to another thought.
I have a nice dual core 3.0 chip inside my Linux Box (Running Fedora 11.)
Is there a version of R that would take advantage of BOTH cores??
(Watching my system performance meter now is interesting, Running R will
hold a single
Hi,
Is there an alternative to the scale function where I can specify my own
mean and standard deviation?
I've come across an interesting issue where this would help.
I'm training and testing on completely different sets of data. The
testing set is smaller than the training set.
Using the
sure that a value is transformed the same regardless of
which data set it is in.
Do I have this correct, or can anybody contribute any more to the concept?
Thanks!
--
Noah
On 9/11/09 1:10 PM, Noah Silverman wrote:
Hi,
Is there an alternative to the scale function where I can specify my
own
Genius,
That certainly is much faster that what I had worked out on my own.
I looked at sweep, but couldn't understand the rather thin help page.
Your example makes it really clear
Thank You!!!
--
Noah
On 9/11/09 1:57 PM, Gavin Simpson wrote:
> On Fri, 2009-09-11 at 13:10 -07
Hello,
I have a very unusual situation with an SVM and wanted to get the
group's opinion.
We developed an experiment where we train the SVM with one set of data
(train data) and then test with a completely independent set of data
(test data). The results were VERY good.
I found and error
inal Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Noah Silverman
Sent: Monday, September 14, 2009 1:00 PM
To: r help
Subject: [R] Strange question/result about SVM
Hello,
I have a very unusual situation with an SVM and wanted to get the
gro
Hi,
I'm not sure of the correct nomenclature or function for what I'm trying
to do.
I'm interested in calculated a logistic regression on a binary dependent
variable (True,False).
There are a few ways to easily do this in R. Both SVM and GLM work easily.
The part that I want to add is "gr
as you work through them would have to be adjusted to look "per group".
I would call this something like "grouped maximum liklihood" if I got
to make up the name.
-N
On 9/17/09 11:06 AM, (Ted Harding) wrote:
On 17-Sep-09 17:28:16, Noah Silverman wrote:
Hi,
I'
Hi,
I've been testing some models with the MCMCpack library.
I can run the process and get a nice model "object". I can easily see
the summary and even plot it.
I can't seem to figure out how to:
1) Access the final coefficients in the model
2) Turn the coefficients into a model so I can the
21/09 7:58 PM, Debabrata Midya wrote:
> Try this:
> apply(foo, 2, mean) or
> apply(foo, 2, median)
> Thanks,
> Deb
>
> >>> Noah Silverman 22/09/2009 12:34 pm >>>
> Hi,
>
> I've been testing some models with the MCMCpack library.
>
> I can
I'm a bit confused on how to use lapply with a data.frame.
For example.
lapply(data, function(x) print(x))
WHAT exactly is passed to the function. Is it each ROW in the data
frame, one by one, or each column, or the entire frame in one shot?
What I want to do apply a function to each row in
I'm a bit confused on how to use lapply with a data.frame.
For example.
lapply(data, function(x) print(x))
WHAT exactly is passed to the function. Is it each ROW in the data
frame, one by one, or each column, or the entire frame in one shot?
What I want to do apply a function to each row in
Hi,
I'm just learning about poison links for the glm function.
One of the data sets I'm playing with has several of the variables as
factors (i.e. month, group, etc.)
When I call the glm function with a formula that has a factor variable,
R automatically converts the variable to a series of
37
From my understanding, the exp of the prediction should be equal to the
fitted value. Here it is not. I don't understand why. Any insight?
-N
On 3/2/10 12:47 AM, (Ted Harding) wrote:
On 02-Mar-10 08:02:27, Noah Silverman wrote:
Hi,
I'm just learning about poison links for the g
Hi,
Looking for a function in R that can help me calculate a parameter that
maximizes the likelihood over groups of observations.
The general formula is:
p = exp(xb) / sum(exp(xb))
So, according to the formulas I've seen published, to do this "by group" is
product(p = exp(x_i * b_i) / sum(exp(
Corey,
Thanks for the quick reply.
I cant give any sample code as I don't know how to code this in R.
That's why I tried to pass along some pseudo code.
I'm looking for the best "beta" that maximize likelihood over all the
groups. So, while your suggestion is close, it isn't quite what I need.
I'm just starting to learn about GAM models.
When using the lm function in R, any factors I have in my data set are
automatically converted into a series of binomial variables.
For example, if I have a data.frame with a column named color and values
"red", "green", "blue". The lm function a
rg [r-help-boun...@r-project.org] On Behalf Of
Noah Silverman [n...@smartmediacorp.com]
Sent: March 19, 2010 12:54 PM
To: r-help@r-project.org
Subject: [R] Factor variables with GAM models
I'm just starting to learn about GAM models.
When using the lm function in R, any factors I have in my d
Hello,
I'm brand new to using R. (I've been using Rapid Miner, but would like
to move over to R since it gives me much more functionality.)
I'm trying to learn how to do a conditional logit model.
My data has one dependent variable, 2 independent variables and a
"group" variable.
example:
c
e which field is the "group ID" for the subset
grouping?
3) How do I indicate which field is the label?
4) How do I indicate which fields in my dataset are for training?
Thanks!!!
On 7/16/09 12:54 PM, (Ted Harding) wrote:
> On 16-Jul-09 19:40:20, Noah Silverman wrote:
>
Hello,
I'm using the e1071 library for SVM functions.
I can quickly train an SVM with:
svm(formula = label ~ ., data = testdata)
That works well.
I want to tune the parameters, so I tried:
tune.svm(label ~ ., data=testdata[1:2000, ], gamma=10^(-6:3), cost=10^(1:2))
THIS FAILS WITH AN ERROR:
'
Hello,
I'm coming from RapidMiner, so some of the "easy" things there are a bit
difficult for me to find in R
How do I normalize data in a data frame. Ideally I want to scale the
values for each column in the range of (-1,1)
Thank You,
__
R-hel
Hi,
I am testing out some things with the kernlab library.
The dataframe is 22,000 rows of 32 columns.
The command I execute is:
model <- ksvm(label ~ ., data = traindata, type="C-svc", kernel =
"rbfdot", class.weights= c("0" =1, "1" =3), kpar = "automatic", C = 10,
cross = 3, prob.model = T
Hi,
Quick question.
I'm working on training an SVM.
I have a dataframe with about 50 columns. I want to train on 46 of them.
Is there a way to say "All except columns 22,23,25 and 31"?
It would be nice to not have to do +c1 +c2 +c3 +c4, etc for all 48 columns.
Thanks!
-N
[[alternat
Hi,
I'm not sure that would work for the "formula" format of an SVM function.
the idea is normally
svm(label ~ c1 + c2 +c3, data=mydata);
It doesn't work to say
svm(label ~ -c(22,23,24), data=mydata)
On 7/27/09 12:17 PM, Steve Lianoglou wrote:
> Hi,
>
> On J
Hi,
I'm switch over from RapidMiner to R. (The learning curve is steep, but
there is so much more I can do with R and it runs much faster overall.)
In RapidMiner, I can "tune" a parameter of my svm in a nice cross
validation loop. The process will print out the progress as it goes.
So for a
Hi,
I'm switch over from RapidMiner to R. (The learning curve is steep, but
there is so much more I can do with R and it runs much faster overall.)
In RapidMiner, I can "tune" a parameter of my svm in a nice cross
validation loop. The process will print out the progress as it goes.
So for
Hi,
This should be an easy one, but I have some trouble formatting the data
right
I'm trying to replace the column of a subset of a dataframe with the
scaled data for that column of the subset
subset(rawdata, code== "foo", select = a) <- scale( subset(rawdata,
code== "foo", select = a) )
It
That works perfectly.
Thanks!
-N
On 7/31/09 2:04 PM, Steve Lianoglou wrote:
> Hi,
>
> On Jul 31, 2009, at 4:13 PM, Noah Silverman wrote:
>
>> Hi,
>>
>> This should be an easy one, but I have some trouble formatting the data
>> right
>>
>> I
Hello,
I'm trying to duplicate what's an easy process in RapidMiner.
In RM, we can simply use two operators:
subgroup iteration
attribute value selection (Can use a regex for the attrribute name.)
I can do this in R with a lot of code and manual steps. It would be
really nice to find
Hi,
I am reading in a dataframe from a CSV file. It has 70 columns. I do
not have any kind of unique "row id".
rawdata <- read.table("r_work/train_data.csv", header=T, sep=",",
na.strings=0)
When training an svm, I keep getting an error
So, as an experiment, I wrote the data back out to a
27;row.names=FALSE' in the write.table.
>
> On Sun, Aug 2, 2009 at 5:10 PM, Noah Silverman
> wrote:
>
>> Hi,
>>
>> I am reading in a dataframe from a CSV file. It has 70 columns. I do
>> not have any kind of unique "row id".
>>
>>
can see
> what it is doing. Most likely you have a format problem, comment
> characters, or mismatched quotes.
>
> On Sun, Aug 2, 2009 at 5:24 PM, Noah Silverman
> wrote:
>
>> Jim,
>>
>> The "write.table" was simply a diagnostic step.
>>
>&g
Somehow, my data is still getting mangled.
Running the SVM gives me the following error:
"names" attribute[1994] must me the same length as the vector[1950]
Any thoughts?
-N
On 8/2/09 2:35 PM, (Ted Harding) wrote:
> On 02-Aug-09 21:10:12, Noah Silverman wrote:
>
>>
he data
after the scale command.
But, issuing the same 0 substitution AFTER the scale command makes
everything work again.
rawdata[is.na(rawdata)] <- 0
VERY strange behavior.
-N
On 8/2/09 3:57 PM, J Dougherty wrote:
> On Sunday 02 August 2009 02:34:43 pm Noah Silverman wrote:
>
ve realized that's a "bad thing", so am
trying to learn R. Additionally, R seems MUCH MUCH faster.)
I'm open to ideas.
Thanks!
-N
On 8/2/09 4:14 PM, David Winsemius wrote:
>
> On Aug 2, 2009, at 7:02 PM, Noah Silverman wrote:
>
>> Hi,
>>
>> I
Just tried your suggestion.
rawdata[is.na(rawdata), ] <- 0
It FAILS with the following error:
Error in `[<-.data.frame`(`*tmp*`, is.na(rawdata), , value = 0) :
non-existent rows not allowed
__
R-help@r-project.org mailing list
https://stat.ethz.ch/m
14 AM, David Winsemius wrote:
>
>>
>> On Aug 2, 2009, at 7:02 PM, Noah Silverman wrote:
>>
>>> Hi,
>>>
>>> It seems as if the problem was caused by an odd quirk of the "scale"
>>> function.
>>>
>>> Some of my da
Hi,
More questions in my ongoing quest to convert from RapidMiner to R.
One thing has become VERY CLEAR: None of the issues I'm asking about
here are addressed in RapidMiner. How it handles misisng values,
scaling, etc. is hidden within the "black box". Using R is forcing me
to take a much
Hi,
Thanks for the continued support.
I've been working on this all night, and have learned some things:
1) Since I'm really committed to using an SVM, I need to skip the
examples with missing data. I have a training set of approximately
22,000 examples of which about 500 have missing values
1 - 100 of 273 matches
Mail list logo