[R] Installation of R, Sweave, ESS and [X]Emacs on Windows?
I'm trying to get R, Sweave, ESS and XEmacs or emacs all installed and working together on my Windows XP Pro system. I've got R 2.6.0 working just fine, installed from the R Windows installer. I also have CYGWIN_NT-5.1 with XEmacs 21.4 working okay. Can anyone point me to any documentation on how to bring these together so that R code typed in Xemacs can be run in R? I found the ESS installation directions here at http://ess.r-project.org/Manual/ess.html#Microsoft-Windows-installation but they seem daunting. I'm not sure that Xemacs from cygwin can work with R installed alone. Can anyone confirm that I just have to follow these directions to have everything I want? Thank you all for your help and advice. -Kevin Kevin Zembower Internet Services Group manager Center for Communication Programs Bloomberg School of Public Health Johns Hopkins University 111 Market Place, Suite 310 Baltimore, Maryland 21202 410-659-6139 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Installation of R, Sweave, ESS and [X]Emacs on Windows?
Jim and Vincent, thank you both so much. Vincent, I really appreciate the time and effort you've put into this project. I was hoping for exactly what you've provide. Thanks, again. -Kevin -Original Message- From: Vincent Goulet [mailto:[EMAIL PROTECTED] Sent: Thursday, March 20, 2008 12:02 PM To: Zembower, Kevin Cc: [EMAIL PROTECTED] Subject: Re: [R] Installation of R, Sweave, ESS and [X]Emacs on Windows? Kevin, Save yourself a lot of trouble and use my modified version of GNU Emacs available from http://vgoulet.act.ulaval.ca/en/emacs and also linked from the ESS home page. It comes bundled with ESS and AUCTeX, so the only other thing you will need to install for the purposes you mention is R itself (upgrade while you're at it, you're two versions behind) and a TeX distribution (consider TeX Live or MiKTeX). There is no need for Cygwin with this setup. Hope this helps --- Vincent Goulet, Associate Professor École d'actuariat Université Laval, Québec [EMAIL PROTECTED] http://vgoulet.act.ulaval.ca Le jeu. 20 mars à 11:34, Zembower, Kevin a écrit : > I'm trying to get R, Sweave, ESS and XEmacs or emacs all installed and > working together on my Windows XP Pro system. I've got R 2.6.0 working > just fine, installed from the R Windows installer. I also have > CYGWIN_NT-5.1 with XEmacs 21.4 working okay. Can anyone point me to > any > documentation on how to bring these together so that R code typed in > Xemacs can be run in R? I found the ESS installation directions here > at > http://ess.r-project.org/Manual/ess.html#Microsoft-Windows- > installation > but they seem daunting. I'm not sure that Xemacs from cygwin can work > with R installed alone. Can anyone confirm that I just have to follow > these directions to have everything I want? > > Thank you all for your help and advice. > > -Kevin > > Kevin Zembower > Internet Services Group manager > Center for Communication Programs > Bloomberg School of Public Health > Johns Hopkins University > 111 Market Place, Suite 310 > Baltimore, Maryland 21202 > 410-659-6139 > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Newbie help with Sweave
I think I've gotten my Emacs/Sweave/R system set up correctly, thanks to Vincent and Jim, but I haven't been successful getting my first document produced. I'm trying to use one of Friedrich Leisch's examples, http://www.ci.tuwien.ac.at/~leisch/Sweave/example-1.Snw. I cut and pasted the text into a document sweaveexample.Rnw in Emacs. It seemed to be processed successfully with R: > Sweave("sweaveexample.Rnw") Writing to file sweaveexample.tex Processing code chunks ... You can now run LaTeX on 'sweaveexample.tex' > However, when I try to open the file sweaveexample.tex and process it with Latex in Emacs, I get this error: ERROR: Missing \endcsname inserted. --- TeX said --- \protect l.7 \begin {document} --- HELP --- >From the .log file... The control sequence marked should not appear between \csname and \endcsname. I've tried a variety of examples, but the error messages are the same. Can anyone point out my errors or mistakes? I've pasted in the full files below. Thanks so much for your help and advice. -Kevin Kevin Zembower Internet Services Group manager Center for Communication Programs Bloomberg School of Public Health Johns Hopkins University 111 Market Place, Suite 310 Baltimore, Maryland 21202 410-659-6139 == sweaveexample.tex: == \documentclass[a4paper]{article} \title{Sweave Example 1} \author{Friedrich Leisch} \usepackage{C:/PROGRA~1/R/R-26~1.2/share/texmf/Sweave} \begin{document} \maketitle In this example we embed parts of the examples from the \texttt{kruskal.test} help page into a \LaTeX{} document: \begin{Schunk} \begin{Sinput} > data(airquality) > library(ctest) > kruskal.test(Ozone ~ Month, data = airquality) \end{Sinput} \begin{Soutput} Kruskal-Wallis rank sum test data: Ozone by Month Kruskal-Wallis chi-squared = 29.2666, df = 4, p-value = 6.901e-06 \end{Soutput} \end{Schunk} which shows that the location parameter of the Ozone distribution varies significantly from month to month. Finally we include a boxplot of the data: \begin{center} \includegraphics{sweaveexample-002} \end{center} \end{document} sweaveexample.Rnw: == \documentclass[a4paper]{article} \title{Sweave Example 1} \author{Friedrich Leisch} \begin{document} \maketitle In this example we embed parts of the examples from the \texttt{kruskal.test} help page into a \LaTeX{} document: <<>>= data(airquality) library(ctest) kruskal.test(Ozone ~ Month, data = airquality) @ which shows that the location parameter of the Ozone distribution varies significantly from month to month. Finally we include a boxplot of the data: \begin{center} <>= boxplot(Ozone ~ Month, data = airquality) @ \end{center} \end{document} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Newbie help with Sweave
Kevin, thanks for writing. Yes, sorry, I forgot to mention that this is a Windows XP Professional system running GNU Emacs 22.1.1 (i386-mingw-nt5.1.2600) from Vincent Goulet, and R 2.6.2 Windows version. I pasted in the sessionInfo() output from ESS inside of Emacs to the end of this note. Was your TA successful in correcting this error? How? Should I report this to R-development as something worth fixing for the next release? Thanks, again, for your response and advice. -Kevin -Original Message- From: Kevin E. Thorpe [mailto:[EMAIL PROTECTED] Sent: Monday, March 24, 2008 9:01 PM To: Zembower, Kevin Cc: [EMAIL PROTECTED] Subject: Re: [R] Newbie help with Sweave Is this in a windows system? A TA of mine was just getting the exact same message. He tracked it down to the pathname for Sweave.sty having trouble with "Program Files" in the path. Kevin Zembower, Kevin wrote: > I think I've gotten my Emacs/Sweave/R system set up correctly, thanks to > Vincent and Jim, but I haven't been successful getting my first document > produced. I'm trying to use one of Friedrich Leisch's examples, > http://www.ci.tuwien.ac.at/~leisch/Sweave/example-1.Snw. I cut and > pasted the text into a document sweaveexample.Rnw in Emacs. It seemed to > be processed successfully with R: >> Sweave("sweaveexample.Rnw") > Writing to file sweaveexample.tex > Processing code chunks ... > > You can now run LaTeX on 'sweaveexample.tex' > > However, when I try to open the file sweaveexample.tex and process it > with Latex in Emacs, I get this error: > ERROR: Missing \endcsname inserted. > > --- TeX said --- > >\protect > l.7 \begin > {document} > --- HELP --- >>From the .log file... > > The control sequence marked should > not appear between \csname and \endcsname. > > I've tried a variety of examples, but the error messages are the same. > > Can anyone point out my errors or mistakes? I've pasted in the full > files below. Thanks so much for your help and advice. > > -Kevin > > Kevin Zembower > Internet Services Group manager > Center for Communication Programs > Bloomberg School of Public Health > Johns Hopkins University > 111 Market Place, Suite 310 > Baltimore, Maryland 21202 > 410-659-6139 > == > sweaveexample.tex: > == > \documentclass[a4paper]{article} > > \title{Sweave Example 1} > \author{Friedrich Leisch} > > \usepackage{C:/PROGRA~1/R/R-26~1.2/share/texmf/Sweave} > \begin{document} > > \maketitle > > In this example we embed parts of the examples from the > \texttt{kruskal.test} help page into a \LaTeX{} document: > > \begin{Schunk} > \begin{Sinput} >> data(airquality) >> library(ctest) >> kruskal.test(Ozone ~ Month, data = airquality) > \end{Sinput} > \begin{Soutput} > Kruskal-Wallis rank sum test > > data: Ozone by Month > Kruskal-Wallis chi-squared = 29.2666, df = 4, p-value = 6.901e-06 > \end{Soutput} > \end{Schunk} > which shows that the location parameter of the Ozone > distribution varies significantly from month to month. Finally we > include a boxplot of the data: > > \begin{center} > \includegraphics{sweaveexample-002} > \end{center} > > \end{document} > > sweaveexample.Rnw: > == > \documentclass[a4paper]{article} > > \title{Sweave Example 1} > \author{Friedrich Leisch} > > \begin{document} > > \maketitle > > In this example we embed parts of the examples from the > \texttt{kruskal.test} help page into a \LaTeX{} document: > > <<>>= > data(airquality) > library(ctest) > kruskal.test(Ozone ~ Month, data = airquality) > @ > which shows that the location parameter of the Ozone > distribution varies significantly from month to month. Finally we > include a boxplot of the data: > > \begin{center} > <>= > boxplot(Ozone ~ Month, data = airquality) > @ > \end{center} > > \end{document} -- Kevin E. Thorpe Biostatistician/Trialist, Knowledge Translation Program Assistant Professor, Department of Public Health Sciences Faculty of Medicine, University of Toronto email: [EMAIL PROTECTED] Tel: 416.864.5776 Fax: 416.864.6057 = > sessionInfo() R version 2.6.2 (2008-02-08) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Learning to do randomized block design analysis
We just studied randomized block design analysis in my statistics class, and I'm trying to learn how to do them in R. I'm trying to duplicate a case study example from my textbook [1]: > # Case Study 13.2.1, page 778 > cd <- c(8, 11, 9, 16, 24) > dp <- c(2, 1, 12, 11, 19) > lm <- c(-2, 0, 6, 2, 11) > table <- data.frame(Block=LETTERS[1:5], "Score changes"=c(cd, dp, lm), Therapy=rep(c("Contact Desensitisztion", "Demonstration Participation", "Live Modeling"), each=5)) > table Block Score.changes Therapy 1 A 8 Contact Desensitisztion 2 B11 Contact Desensitisztion 3 C 9 Contact Desensitisztion 4 D16 Contact Desensitisztion 5 E24 Contact Desensitisztion 6 A 2 Demonstration Participation 7 B 1 Demonstration Participation 8 C12 Demonstration Participation 9 D11 Demonstration Participation 10 E19 Demonstration Participation 11 A-2 Live Modeling 12 B 0 Live Modeling 13 C 6 Live Modeling 14 D 2 Live Modeling 15 E11 Live Modeling > model.aov <- aov(Score.changes ~ Therapy + Error(Block), data=table) > summary(model.aov) Error: Block Df Sum Sq Mean Sq F value Pr(>F) Residuals 4 438.0 109.5 Error: Within Df Sum Sq Mean Sq F value Pr(>F) Therapy2 260.93 130.47 15.259 0.001861 ** Residuals 8 68.408.55 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > I don't understand why R doesn't output a value for F and Pr for the Error (Block) dimension, as my textbook shows 12.807 and 0.0015 respectively. All the other numbers match. Can these two values be recovered? Also, my text shows a total line which R omits. Is this because it's not particularly useful? Thanks for your suggestions and advice. Also, if I'm executing this type of problem in R inefficiently, I'd appreciate suggestions. -Kevin [1] An Introduction to Mathematical Statistics and Its Applications, Larsen and Marx, fourth edition. Kevin Zembower Internet Services Group manager Center for Communication Programs Bloomberg School of Public Health Johns Hopkins University 111 Market Place, Suite 310 Baltimore, Maryland 21202 410-659-6139 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using predict()?
I'm trying to solve a homework problem using R. The problem gives a list of cricket chirps per second and corresponding temperature, and asks to give the equation for the linear model and then predict the temperature to produce 18 chirps per second. So far, I have: > # Homework 11.2.1 and 11.3.3 > chirps <- scan() 1: 20 2: 16 3: 19.8 4: 18.4 5: 17.1 6: 15.5 7: 14.7 8: 17.1 9: 15.4 10: 16.2 11: 15 12: 17.2 13: 16 14: 17 15: 14.4 16: Read 15 items > temp <- scan() 1: 88.6 2: 71.6 3: 93.3 4: 84.3 5: 80.6 6: 75.2 7: 69.7 8: 82 9: 69.4 10: 83.3 11: 79.6 12: 82.5 13: 80.6 14: 83.5 15: 76.3 16: Read 15 items > chirps [1] 20.0 16.0 19.8 18.4 17.1 15.5 14.7 17.1 15.4 16.2 15.0 17.2 16.0 17.0 14.4 > temp [1] 88.6 71.6 93.3 84.3 80.6 75.2 69.7 82.0 69.4 83.3 79.6 82.5 80.6 83.5 76.3 > chirps.res <- lm(chirps ~ temp) > summary(chirps.res) Call: lm(formula = chirps ~ temp) Residuals: Min 1Q Median 3Q Max -1.56146 -0.58088 0.02972 0.58807 1.53047 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.314333.10963 -0.101 0.921028 temp 0.212010.03873 5.474 0.000107 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.9715 on 13 degrees of freedom Multiple R-Squared: 0.6975, Adjusted R-squared: 0.6742 F-statistic: 29.97 on 1 and 13 DF, p-value: 0.0001067 > # From the linear model summary output above, the equation for the least squares line is: > #y = -0.3143 + 0.2120*x or chirps = -0.3143 + 0.2120*temp > I can then determine the answer to the prediction, using algebra and R: > pred_temp <- (18+0.3143)/0.2120 > pred_temp [1] 86.3882 However, I'd like to try to use the predict() function. Since 'chirps' and 'temp' are just vectors of numbers, and not dataframes, these failed: predict(chirps.res, newdata=data.frame(chirp=18)) predict(chirps.res, newdata="chirp=18") predict(chirps.res, newdata=18) I then tried to turn my two vectors into a dataframe. I would have bet money that this would have worked, but it didn't: > df <- data.frame(chirps, temp) > chirps.res <- lm(chirps ~ temp, data=df) > predict(chirps.res, newdata=data.frame(chirps=18)) Can anyone tell me how to use predict() in this circumstance? Thanks for your help and advice. -Kevin Kevin Zembower Internet Services Group manager Center for Communication Programs Bloomberg School of Public Health Johns Hopkins University 111 Market Place, Suite 310 Baltimore, Maryland 21202 410-659-6139 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R brakes when submitting a query to MySQL
Is it your use of 'con' rather than 'con2' in dbSendQuery? -Kevin -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Marc Moragues Sent: Tuesday, December 18, 2007 1:14 PM To: r-help@r-project.org Subject: [R] R brakes when submitting a query to MySQL Hello, I would like to retrieve data stored in MySQL database, so I installed RMySQL package. I can successfully connect with the my database using the following code > dvr<-dbDriver("MySQL") > con2<-dbConnect(dvr,group="exbardiv") > mysqlDescribeConnection(con2) User: mmorag Host: localhost Dbname: exbardiv Connection type: localhost via TCP/IP No resultSet available I can even see the tables in the database > dbListTables(con2) [1] "agoueb""high_ld" "rescue""sjlc_info" "sjlc_ld" "temp" [7] "temp_snp1" "temp_snp2" However, when I try to query the database, R breakes. res<-dbSendQuery(con,'select * from sjlc_ld') Can anyone help me tune up the connection between R and MySQL? Thank you, Marc. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER:\ \ This email is from the Scottish Crop Rese...{{dropped:30}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Oddities with RSiteSearch?
[If I knew who to report this to privately, I would. Sorry to embarrass anyone who's just trying to contribute to the R-project.] There seems to be some oddities with the RSiteSearch web page. When I enter 'RSiteSearch("console")' I'm taken to http://search.r-project.org/cgi-bin/namazu.cgi?query=console&max=20&resu lt=normal&sort=score&idxname=Rhelp02a&idxname=functions&idxname=docs. On this page, the "How to search" link goes to http://finzi.psych.upenn.edu/namazu.html#query, which gives me an '403 Forbidden' error. At the bottom of the page is "This search system is powered by Namazu v". The text "Namazu" is a link to http://www.namazu.org/, which, when clicked on, starts a download rather than displaying a page. Also, the email address at the bottom, [EMAIL PROTECTED], is suspicious. I realize that these last two errors might be caused by the system RSiteSearch uses to form indicies, but we may want to suppress them until that organization gets them working correctly. Thanks for your efforts in setting up the RSiteSearch system; I use it all the time. -Kevin Kevin Zembower Internet Services Group manager Center for Communication Programs Bloomberg School of Public Health Johns Hopkins University 111 Market Place, Suite 310 Baltimore, Maryland 21202 410-659-6139 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Any tools for working with US 2000 census data?
I've been given the job of extracting some data from the United States 2000 census (files at http://www2.census.gov/census_2000/datasets/Summary_File_2/Maryland/all_ Maryland.zip 52M). I'm only interested in Census Block Groups (CBGs) located within Baltimore City, Maryland. Additionally, I just have to extract certain data fields. I think I'll be using Summary File 2. This is my first experience working with US census data. I wasn't successful finding anything using RSiteSearch, although there were some packages with data extracted from the US 2000 census. Are there any pre-constructed tools in R for working with this data? Does the US 2000 census data itself come packaged in R? If there are no R tools, I'd welcome any suggestions on working with this data from anyone experienced with it. Thanks for your advice and suggestions for me. -Kevin Kevin Zembower Internet Services Group manager Center for Communication Programs Bloomberg School of Public Health Johns Hopkins University 111 Market Place, Suite 310 Baltimore, Maryland 21202 410-659-6139 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R on an eeePC
Doing 'RSiteSearch("eee")' yields some hits. I knew that the ASUS eeePC had come up on r-help. -kevin -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dr. Walter H. Schreiber Sent: Monday, January 28, 2008 9:32 AM To: r-help@r-project.org Subject: [R] R on an eeePC Dear list, I wonder if somebody has succeeded in installing R on an eeePC (Xandros desktop). Searching via Rseek (term eeePC) and in eeePC forums (term Cran) left me without proper hits. Best wishes, Walter. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Newbie: Using R to analyse Apache logs
Raj, I've been experimenting with R to compute simple statistics from my web logs somewhat similar to what you're describing. For instance, I'm working on trying to classify a unique IP or domain name requestor as 'human' or 'robot' based on the number of seconds between requests for pages. I've found that the easiest method of work, given my (elementary) knowledge of R and my (professional) knowledge of perl, is to run my logs through a perl program to pre-process the data, before submitting it to R. The output of running my Apache web log through my perl program looks like this tab-delimited output: [EMAIL PROTECTED]:~/weblogstats$ ./weblogtimediff.pl access_log.20071130.sorted |head DateTimeSource TimeDiffType 30/Nov/2007 00:00:4754.100.68.58.sikkanet.com 15 unknown 30/Nov/2007 00:00:4854.100.68.58.sikkanet.com 1 unknown 30/Nov/2007 00:01:1954.100.68.58.sikkanet.com 31 unknown 30/Nov/2007 00:01:2554.100.68.58.sikkanet.com 6 unknown 30/Nov/2007 00:01:29ip-61-14-181-116.asianetcom.net 15 unknown 30/Nov/2007 00:01:4054.100.68.58.sikkanet.com 15 unknown 30/Nov/2007 00:01:4154.100.68.58.sikkanet.com 1 unknown 30/Nov/2007 00:01:44llf520049.crawl.yahoo.net 14 robot 30/Nov/2007 00:01:46ip-61-14-181-116.asianetcom.net 17 unknown [EMAIL PROTECTED]:~/weblogstats$ In this, I also make a preliminary classification into 'robot' (because it identified itself as such in the browser field), 'human' (because it submitted a text string to my internal search engine), or 'unknown'. Unfortunately, this approach doesn't seem to be working. The distributions of both the 'humans' and 'robots' seemed to be Poisson by inspection. I therefore created box plots of the log(mean(time intervals)), but the 'humans' versus the 'robots' were indistinguishable by inspection. As this is not exactly what I'm paid to do, I just play with this on my spare time, so I haven't tried anything else yet. If it's of general interest to this group, I'd be happy to publish my program for this. Otherwise, Raj, if you're interested, I'd be happy to send it to you privately. One oddity I noted is that Apache logs are not always in chronological order. The date/time stamp is when the request occurred, but it's written in the log when the request is completed. Thus, for a long download, several, shorter subsequent downloads may have been requested and completed before the earlier, long one. I was confused by negative time differences from my program until I discovered this. Subsequently, I sort my Apache log in chronological order before passing it through my program. Hope this helps. Let me know if you have any other questions. -Kevin -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Raj Mathur Sent: Thursday, January 31, 2008 8:31 AM To: r-help Subject: [R] Newbie: Using R to analyse Apache logs hits=-2.5 tests=BAYES_00,FORGED_RCVD_HELO X-USF-Spam-Flag: NO Hi, I have a requirement to scan Apache logs and discover ``exceptions''. Exceptions can be of two types: 1. A single IP generating a large amount of traffic within a given time frame (for definable values of ``large'' and ``time frame''). 2. A single IP hitting a wide set of URLs on the server (indicates a crawler), again for definable values of ``wide''. I'm a complete newbie to R (and to statistics), so the questions are: - Can R help me generate graphs which would help me identify these activities? - Has someone already done something like this? If so, where could I find it? - If not, can someone help me with the stats (and R) part to help me achieve these objectives? Any software that gets created as a result would be released under a FOSS license. Data massaging, tuning, etc. are not an issue. We'd be dealing with a few hundred thousand or a million records a day. Regards, -- Raju -- Raj Mathur[EMAIL PROTECTED] http://kandalaya.org/ Freedom in Technology & Software || February 2008 || http://freed.in/ GPG: 78D4 FC67 367F 40E2 0DD5 0FEF C968 D0EF CC68 D17F PsyTrance & Chill: http://schizoid.in/ || It is the mind that moves __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling
Would this work: g<-sample(rep(LETTERS[1:2],12), 24, replace=F) HTH -Kevin -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Judith Flores Sent: Tuesday, February 05, 2008 1:52 PM To: RHelp Subject: [R] Sampling Hi there, I want to generate different samples using the followindg code: g<-sample(LETTERS[1:2], 24, replace=T) How can I specify that I need 12 "A"s and 12 "B"s? Thank you, Judith Be a better friend, newshound, and __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Dice simulation: Getting rep to re-evaluate sample()?
I'm trying to get R to simulate the sum of the values on 10 fair dice (yes, it's related to a homework problem, but is not the problem itself). I tried to do this: > rep(sum(sample(1:6,100,replace=T)), times=10) [1] 341 341 341 341 341 341 341 341 341 341 and noticed that sum(sample()) seems to be only evaluated once. How can I overcome this, so that I get a vector of values that correspond to independent throws of 10 dice each time? Thanks for your advice and suggestions. -Kevin Kevin Zembower Internet Services Group manager Center for Communication Programs Bloomberg School of Public Health Johns Hopkins University 111 Market Place, Suite 310 Baltimore, Maryland 21202 410-659-6139 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dice simulation: Getting rep to re-evaluate sample()?
Thanks so much, Chuck and Mark. Here's my script to simulate 10,000 rolls of 100 fair dice to demonstrate their conformity to a normal curve: > x<-replicate(1, sum(sample(1:6,100,replace=T))) > sdx<-sd(x) > sdx [1] 17.13966 > meanx<-mean(x) > meanx [1] 350.0451 > hist(x, freq=FALSE) > curve(dnorm(x, mean=meanx, sd=sdx), add=TRUE) > Thanks, again, for your quick and accurate help. -Kevin -Original Message- From: Charles C. Berry [mailto:[EMAIL PROTECTED] Sent: Monday, October 08, 2007 1:56 PM To: Zembower, Kevin Cc: [EMAIL PROTECTED] Subject: Re: [R] Dice simulation: Getting rep to re-evaluate sample()? See ?replicate which I think is what you are after. Chuck On Mon, 8 Oct 2007, Zembower, Kevin wrote: > I'm trying to get R to simulate the sum of the values on 10 fair dice > (yes, it's related to a homework problem, but is not the problem > itself). I tried to do this: >> rep(sum(sample(1:6,100,replace=T)), times=10) > [1] 341 341 341 341 341 341 341 341 341 341 > > and noticed that sum(sample()) seems to be only evaluated once. How can > I overcome this, so that I get a vector of values that correspond to > independent throws of 10 dice each time? > > Thanks for your advice and suggestions. > > -Kevin > > Kevin Zembower > Internet Services Group manager > Center for Communication Programs > Bloomberg School of Public Health > Johns Hopkins University > 111 Market Place, Suite 310 > Baltimore, Maryland 21202 > 410-659-6139 > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Calculating confidence in an estimate including number of trials?
[Yes, this is related to a homework problem, but is not the problems itself.] In my mathematical statistics class, we've just learned about properties of estimators, and I can now solve manually problems like this: A sample of size n = 16 is drawn from a normal distribution where sigma = 10 but mu is unknown. If mu = 20, what is the probability that the estimator mu hat = Y bar will lie between 10.0 and 21.0?[1] I solved this by converting to Z scores and using a table of cumulative values under the normal curve and got an answer of .3108 (someone please tell me if I'm wrong). Now I'd like to know how to use R to solve this type of problem. In all my other problems using normal curves, I used dnorm or pnorm, but neither of these includes anything regarding the number of trials. I can put the math into R after I've worked out the equation, but I wondered if there was an R function that computed this directly, in the same fashion that pnorm can compute probabilities using parameters of mean and sd. Using help.search for 'estimator' or 'sample mean' didn't turn up anything that I recognized. Any hints on where to go looking for this? Thanks for your help and advice. -Kevin Kevin Zembower Internet Services Group manager Center for Communication Programs Bloomberg School of Public Health Johns Hopkins University 111 Market Place, Suite 310 Baltimore, Maryland 21202 410-659-6139 [1] Introduction to Mathematical Statistics and its applications, Larsen and Marx, fourth ed., question 5.4.4. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculating confidence in an estimate including numberof trials?
Daniel, thanks for your suggestion. So, it's just done like this: > pnorm(21, mean=20, sd=10/4) - pnorm(19, mean=20, sd=10/4) [1] 0.3108435 > # OR > pnorm(21, mean=20, sd=10/sqrt(16)) - pnorm(19, mean=20, sd=10/sqrt(16)) [1] 0.3108435 > Thanks, again. -Kevin -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Daniel Lakeland Sent: Tuesday, October 16, 2007 4:35 PM To: r-help@r-project.org Subject: Re: [R] Calculating confidence in an estimate including numberof trials? On Tue, Oct 16, 2007 at 04:30:48PM -0400, Zembower, Kevin wrote: > Now I'd like to know how to use R to solve this type of problem. In all > my other problems using normal curves, I used dnorm or pnorm, but > neither of these includes anything regarding the number of trials. pnorm can be used like your table of area under the normal curve. To account for size of sample you have to scale the variance appropriately according to the theory you have learned in your course. -- Daniel Lakeland [EMAIL PROTECTED] http://www.street-artists.org/~dlakelan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Homework help: Is this how CI using t dist are constructed?
I'm trying to replicate some of the examples from my textbook in R (my text uses Minitab). In this problem, I'm trying to construct a 95% confidence interval for these distance measurements [1]: > # Case Study 7.4.1, p. 483 > x <- scan() 1: 62 52 68 23 34 45 27 42 83 56 40 12: Read 11 items > alpha<-.95 > mean(x) + qt(c((1-alpha)/2, 1-((1-alpha)/2)), df=length(x)-1) * sd(x) / sqrt(length(x)) [1] 36.21420 60.51307 > Are confidence intervals with the t distribution constructed using this type of equation, or am I overlooking a more concise, 'canned' approach that's already been programmed? Any suggestions on simplifying this? Thanks for all your advice and help. -Kevin [1] An Introduction to Mathematical Statistics and its Applications, fourth ed., Larsen and Marx. Kevin Zembower Internet Services Group manager Center for Communication Programs Bloomberg School of Public Health Johns Hopkins University 111 Market Place, Suite 310 Baltimore, Maryland 21202 410-659-6139 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Homework help: Is this how CI using t dist are constructed?
Yes, exactly. In fact, I had already discovered this, too. I don't know why I didn't think of it before asking this question. Thanks for your patience with me. -Kevin -Original Message- From: Peter Dalgaard [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 30, 2007 4:54 PM To: Zembower, Kevin Cc: r-help@r-project.org Subject: Re: [R] Homework help: Is this how CI using t dist are constructed? Zembower, Kevin wrote: > I'm trying to replicate some of the examples from my textbook in R (my > text uses Minitab). In this problem, I'm trying to construct a 95% > confidence interval for these distance measurements [1]: > You mean like t.test(x)? -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Homework help: Is this how CIs of normal distributions are computed?
I'm looking for a function in R similar to t.test() which was generously pointed out to me yesterday, but which can be used for normally distributed data. To recap yesterday: > x <- scan() 1: 62 52 68 23 34 45 27 42 83 56 40 12: Read 11 items > alpha<- .05 > t.test(x) One Sample t-test data: x t = 8.8696, df = 10, p-value = 4.717e-06 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 36.21420 60.51307 sample estimates: mean of x 48.36364 What if I now mock-up my data for 100 trials: > x100<-sample(x, 100, replace=TRUE) I think that I should be able to use a normal distribution, because of the n>30 rule-of-thumb. I can compute the 95% CI using: > mean(x100) - qnorm(alpha/2)*sd(x100)/sqrt(length(x100)) [1] 51.91222 > mean(x100) + qnorm(alpha/2)*sd(x100)/sqrt(length(x100)) [1] 44.80778 > t.test(x100) One Sample t-test data: x100 t = 26.683, df = 99, p-value < 2.2e-16 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 44.76383 51.95617 sample estimates: mean of x 48.36 > The critical values I compute manually are close to the t.test values, which is what I expect. As the number of samples increases, the t value approaches the normal distribution value. I thought I looked at all the other .test functions in the stats package, and didn't find one that computed results like the t.test for normal distributions. Is something similar to my 'manual' computations the way it's done in R, or have I overlooked something again? Thanks. -Kevin Kevin Zembower Internet Services Group manager Center for Communication Programs Bloomberg School of Public Health Johns Hopkins University 111 Market Place, Suite 310 Baltimore, Maryland 21202 410-659-6139 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Homework help: Is this how CIs of normal distributionsare computed?
Daniel, thanks, I should have remembered this, too; I've seen it and worked with it before. Thanks. -Kevin -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Daniel Lakeland Sent: Wednesday, October 31, 2007 4:04 PM To: r-help@r-project.org Subject: Re: [R] Homework help: Is this how CIs of normal distributionsare computed? On Wed, Oct 31, 2007 at 03:56:37PM -0400, Zembower, Kevin wrote: > I'm looking for a function in R similar to t.test() which was generously > pointed out to me yesterday, but which can be used for normally > distributed data. ... > > x100<-sample(x, 100, replace=TRUE) > > I think that I should be able to use a normal distribution, because of > the n>30 rule-of-thumb. > > I can compute the 95% CI using: > > mean(x100) - qnorm(alpha/2)*sd(x100)/sqrt(length(x100)) You can compute quantiles of the particular normal distribution itself rather than transforming from the standardized normal by hand. qnorm(c(.025,.975),mean=mean(x100),sd=sd(x100)/sqrt(length(x100))) -- Daniel Lakeland [EMAIL PROTECTED] http://www.street-artists.org/~dlakelan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Homework help: Is this how CIs of normal distributions are computed?
Dan, I didn't realize that the t values were more accurate than the normal approximation for n > about 30. I may have learned (incorrectly) that the normal distribution should be used if n > 30, but now that I'm thinking about it, this may have just been computationally economical before computers. Thanks for this thought. -Kevin Dan Nordlund wrote: You could probably use the Normal distribution as an approximation under these circumstances, but why would you when you have a more accurate CI using t.test? Dan Daniel J. Nordlund Research and Data Analysis Washington State Department of Social and Health Services Olympia, WA 98504-5204 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with log axis
Well, here are two attempts that I would have bet on to work, but don't: #Doesn't seems to show up any line at all: abline(a=as.numeric(r1$coefficients["(Intercept)"]), b=as.numeric(r1$coefficients["log(x)"])) #Line doesn't match points: abline(r1, untf=TRUE) So much for furthering knowledge and this discussion... -Kevin -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of R Heberto Ghezzo, Dr Sent: Thursday, November 01, 2007 1:40 PM To: r-help@r-project.org Subject: [R] problem with log axis Hello, if I do: x <- c(0.5,1,3,6,10,20,40) y <- 10-log(x)+rnorm(7,0,0.05) r1 <- lm(y ~ log(x)) plot(log(x),y) abline(r1) # I get a nice plot with the regression line almost over the points. but: plot(x,y,log="x") abline(r1) gives me exactly the same plot for the points but the regression line is completely off ! I would like the plot with the real values of X on the axis, not the log(X) can somebody tell me why the "abline" is not correct in the second case? Thanks H.Ghezzo McGill University Montreal - Canada __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Homework help: t test hypothesis testing with summarized data?
Is this how a t hypothesis test is done when I don't have the actual data, but just the summarized statistics: > #Homework 9.2.6 [1] > n<-31 > xbar<-3.10 > s_x<-1.469 > m<-57 > ybar<-2.43 > s_y<-1.35 > s_pooled<- (((n-1)*s_x^2) + ((m-1)*s_y^2)) / (n + m - 2) > s_pooled [1] 1.939521 > t_obs <- (xbar - ybar) / (s_pooled * (sqrt(1/n + 1/m))) > t_obs [1] 1.547951 > qt(c(.025, .975), n+m-2) [1] -1.987934 1.987934 > # Therefore, fail to reject H0 at the 0.05 level of significance > Or am I again overlooking a canned procedure or an easier calculation using the t distribution. Thank you for your continued advice and help. -Kevin [1] An Introduction to Mathematical Statistics and its Applications, fourth ed., Larsen and Marx. Kevin Zembower Internet Services Group manager Center for Communication Programs Bloomberg School of Public Health Johns Hopkins University 111 Market Place, Suite 310 Baltimore, Maryland 21202 410-659-6139 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Homework help: t test hypothesis testing with summarized data?
Peter and Moshe, thank you both for your suggestions and hints. I'm proud to say that it took me less than an hour to find my mistake: > s_pooled <- (((n-1)*(s_x^2)) + ((m-1)*(s_y^2))) / (n+m-2) > s_pooled [1] 1.939521 > t_obs <- (xbar - ybar) / (sqrt(s_pooled) * (sqrt(1/n + 1/m))) > t_obs [1] 2.15578 > qt(c(.025, .975), n+m-2) [1] -1.987934 1.987934 > # Therefore, reject H0 at the 0.05 level of significance. Just to be clear about the 'homework' aspect of my questions: my homework is to work the problems out 'longhand' with just a calculator and printed tables. (In fact, 10 weeks into a 14 week course, we haven't been asked yet to use a computer.) I do this before I ask any questions regarding homework on this forum. On my own, I'm trying to answer some of the questions and examples in my textbook using R. My 'Homework help:' subject may have been misleading. I may change it to 'Extra-credit help:' to acknowledge the academic aspect of my question but distinguish it from my homework. I used 'Homework help:' because I didn't want anyone to suspect from the nature of the questions that I was trying to sneak in a homework question without acknowledging it. Thanks, again, for all your help for this statistics student. -Kevin -Original Message- From: Peter Dalgaard [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 07, 2007 6:50 PM To: Zembower, Kevin Cc: [EMAIL PROTECTED] Subject: Re: [R] Homework help: t test hypothesis testing with summarized data? Zembower, Kevin wrote: > Is this how a t hypothesis test is done when I don't have the actual > data, but just the summarized statistics: > >> #Homework 9.2.6 [1] >> n<-31 >> xbar<-3.10 >> s_x<-1.469 >> m<-57 >> ybar<-2.43 >> s_y<-1.35 >> s_pooled<- (((n-1)*s_x^2) + ((m-1)*s_y^2)) / (n + m - 2) >> s_pooled >> > [1] 1.939521 > >> t_obs <- (xbar - ybar) / (s_pooled * (sqrt(1/n + 1/m))) >> t_obs >> > [1] 1.547951 > >> qt(c(.025, .975), n+m-2) >> > [1] -1.987934 1.987934 > >> # Therefore, fail to reject H0 at the 0.05 level of significance >> >> > > Or am I again overlooking a canned procedure or an easier calculation > using the t distribution. > I don't know if someone told you last time, but there's an Internet code of honor about helping with homework Don't expect more than hints. You're on track but there's a mistake. Here's a way of testing your result: > x <- scale(rnorm(31))*1.469+3.10 > y <- scale(rnorm(57))*1.35+2.43 > t.test(x,y, var.equal=TRUE) > Thank you for your continued advice and help. > > -Kevin > > [1] An Introduction to Mathematical Statistics and its Applications, > fourth ed., Larsen and Marx. > > Kevin Zembower > Internet Services Group manager > Center for Communication Programs > Bloomberg School of Public Health > Johns Hopkins University > 111 Market Place, Suite 310 > Baltimore, Maryland 21202 > 410-659-6139 > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.