[R] Can't install or upgrade the "PKI" package on a Debian testing system
Dear list, It seems that recent changes in openssl somehow broke the installation or update of the PKI package. This is probably specific of my setup(s) (Debian testing, updated frequently). Copy of a mail to Simon Urbanek (PKI maintainer), sent 5 days ago without reply nor acknowledgement so far : It seems that recent changes in openssl rendered the PKI package uninstallable : -- > install.packages("PKI") essai de l'URL 'http://cran.univ-paris1.fr/src/contrib/PKI_0.1-3.tar.gz' Content type 'application/x-gzip' length 31058 bytes (30 KB) == downloaded 30 KB * installing *source* package ‘PKI’ ... ** package ‘PKI’ correctement décompressé et sommes MD5 vérifiées ** libs gcc -I/usr/local/sage-7/local/lib/R//include -DNDEBUG -fpic -g -O2 -c asn1.c -o asn1.o gcc -I/usr/local/sage-7/local/lib/R//include -DNDEBUG -fpic -g -O2 -c init.c -o init.o gcc -I/usr/local/sage-7/local/lib/R//include -DNDEBUG -fpic -g -O2 -c pki-x509.c -o pki-x509.o pki-x509.c: In function ‘PKI_extract_key’: pki-x509.c:136:26: error: dereferencing pointer to incomplete type ‘EVP_PKEY {aka struct evp_pkey_st}’ if (EVP_PKEY_type(key->type) != EVP_PKEY_RSA) ^~ pki-x509.c: In function ‘get_cipher’: pki-x509.c:244:40: error: dereferencing pointer to incomplete type ‘EVP_CIPHER_CTX {aka struct evp_cipher_ctx_st}’ ctx = (EVP_CIPHER_CTX*) malloc(sizeof(*ctx)); ^~~~ pki-x509.c: In function ‘PKI_RSAkeygen’: pki-x509.c:550:5: warning: ‘RSA_generate_key’ is deprecated [-Wdeprecated-declarations] rsa = RSA_generate_key(bits, 65537, 0, 0); ^~~ In file included from /usr/include/openssl/rsa.h:13:0, from pki.h:13, from pki-x509.c:1: /usr/include/openssl/rsa.h:193:1: note: declared here DEPRECATEDIN_0_9_8(RSA *RSA_generate_key(int bits, unsigned long e, void ^ /usr/local/sage-7/local/lib/R//etc/Makeconf:132 : la recette pour la cible « pki-x509.o » a échouée make: *** [pki-x509.o] Erreur 1 ERROR: compilation failed for package ‘PKI’ * removing ‘/usr/local/sage-7/local/lib/R/library/PKI’ Les packages source téléchargés sont dans ‘/tmp/Rtmpnmt97E/downloaded_packages’ Warning message: In install.packages("PKI") : l'installation du package ‘PKI’ a eu un statut de sortie non nul -- This problem blocks the installation of rstanarm, brms, rsconnect and shinystan among others. Not exactly trivial. As far as I know, my libssl libraries are all at 1.1.0c, as well as openssl. New data point : it turns out that this problem also impedes the update of PKI : my running R installatin still has PKI 1.3, which seems enough for rstanarm, brms, rsconnect and shinystan to run and install/upgrade. However, on a new installation of R (installed in Sage), PKI can't be installed. I tried to install PKI 1.3 fro Rforge, with no success. - Did someone already had this problem ? - Has he/she been able to work around it ? If so, How ? - Is my mail to Simon Urbanek sufficient as a bug report ? If not, what should I do ? Sincerely yours, Emanuel Charpentier PS : If possible, I'd appreciate to be CC'd of your answers : I'm not on the list and have had trouble subscribing "reasonably". I'm following it through the mail archives. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Followup [Can't install or upgrade the "PKI" package on a Debian testing system]
Dear list; since I've posted the message below, Simon Urbanek has fixed the problem (in less than 5 hours, phewww...). The version available on Rforge *is* installable with OpenSSL 1.1 Note that you'll have to install it with "install.packages('PKI',,'https://www.rforge.net/')", NOT install.packages('PKI',,'http://www.rforge.net/'), which, for some inscrutable reason (runaway proxying ?) fails the same way as before. A big Kudos and Thank You to Simon Urbanek, on behalf of not-so-stable distributions users ! Emmanuel Charpentier Original message follows : Dear list, It seems that recent changes in openssl somehow broke the installation or update of the PKI package. This is probably specific of my setup(s) (Debian testing, updated frequently). Copy of a mail to Simon Urbanek (PKI maintainer), sent 5 days ago without reply nor acknowledgement so far : It seems that recent changes in openssl rendered the PKI package uninstallable : -- install.packages("PKI") essai de l'URL 'http://cran.univ-paris1.fr/src/contrib/PKI_0.1-3.tar.gz' Content type 'application/x-gzip' length 31058 bytes (30 KB) == downloaded 30 KB * installing *source* package ‘PKI’ ... ** package ‘PKI’ correctement décompressé et sommes MD5 vérifiées ** libs gcc -I/usr/local/sage-7/local/lib/R//include -DNDEBUG -fpic -g -O2 -c asn1.c -o asn1.o gcc -I/usr/local/sage-7/local/lib/R//include -DNDEBUG -fpic -g -O2 -c init.c -o init.o gcc -I/usr/local/sage-7/local/lib/R//include -DNDEBUG -fpic -g -O2 -c pki-x509.c -o pki-x509.o pki-x509.c: In function ‘PKI_extract_key’: pki-x509.c:136:26: error: dereferencing pointer to incomplete type ‘EVP_PKEY {aka struct evp_pkey_st}’ if (EVP_PKEY_type(key->type) != EVP_PKEY_RSA) ^~ pki-x509.c: In function ‘get_cipher’: pki-x509.c:244:40: error: dereferencing pointer to incomplete type ‘EVP_CIPHER_CTX {aka struct evp_cipher_ctx_st}’ ctx = (EVP_CIPHER_CTX*) malloc(sizeof(*ctx)); ^~~~ pki-x509.c: In function ‘PKI_RSAkeygen’: pki-x509.c:550:5: warning: ‘RSA_generate_key’ is deprecated [-Wdeprecated-declarations] rsa = RSA_generate_key(bits, 65537, 0, 0); ^~~ In file included from /usr/include/openssl/rsa.h:13:0, from pki.h:13, from pki-x509.c:1: /usr/include/openssl/rsa.h:193:1: note: declared here DEPRECATEDIN_0_9_8(RSA *RSA_generate_key(int bits, unsigned long e, void ^ /usr/local/sage-7/local/lib/R//etc/Makeconf:132 : la recette pour la cible « pki-x509.o » a échouée make: *** [pki-x509.o] Erreur 1 ERROR: compilation failed for package ‘PKI’ * removing ‘/usr/local/sage-7/local/lib/R/library/PKI’ Les packages source téléchargés sont dans ‘/tmp/Rtmpnmt97E/downloaded_packages’ Warning message: In install.packages("PKI") : l'installation du package ‘PKI’ a eu un statut de sortie non nul -- This problem blocks the installation of rstanarm, brms, rsconnect and shinystan among others. Not exactly trivial. As far as I know, my libssl libraries are all at 1.1.0c, as well as openssl. New data point : it turns out that this problem also impedes the update of PKI : my running R installatin still has PKI 1.3, which seems enough for rstanarm, brms, rsconnect and shinystan to run and install/upgrade. However, on a new installation of R (installed in Sage), PKI can't be installed. I tried to install PKI 1.3 fro Rforge, with no success. - Did someone already had this problem ? - Has he/she been able to work around it ? If so, How ? - Is my mail to Simon Urbanek sufficient as a bug report ? If not, what should I do ? Sincerely yours, Emanuel Charpentier PS : If possible, I'd appreciate to be CC'd of your answers : I'm not on the list and have had trouble subscribing "reasonably". I'm following it through the mail archives. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] shapiro wilk normality test
This one should (I am tempted to write "must") make its way to fortune ()... Thankyouthankyouthankyou ... Emmanuel Charpentier On Mon, 14 Jul 2008 14:58:13 -0600, Greg Snow wrote : > For those people who feel the need for a p-value to test normality on > large sample sizes, I propose the following test/function: [ Snip ... ] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Possible buglet (wart ?) of odfWeave 0.7.6 (with workaround)
Dear List, I have had problems inserting some (not all !) figures via odfWeave (using print(someLatticeFunction)...). The figure was correctly displayed in a R device window but the resulting ODF document displayed the correct space for the figure and an empty frame with a "broken image" icon and a "read error" mention. Exploration of the odf (.odt, in my case) file showed that the relevant files (.eps, which is a vector format much more apt than bitmaps to cutting/pasting on various media) existed in the Pictures directory of the file and displayed fine. Then lightning struck : the picture files were named an follows : "content_", serial number, chunk name, ".eps" ; it occured to me that OpenOffice 2.4.1 (as packaged in Debian) displayed a "Read error" *when the chunk name had accented characters in it* (my system works in UTF8). Plain ASCII chunk names work OK. I do not know ODF specification well enough to be sure that this is an odfWeave problem (some chunk name mangling might be necessary) rather than an OpenOffice "infelicity" (non-implementation of a specification-requested feature) : checking with another ODF-compliant word processor might be useful, but I don't have any on hand. In any case, this problem is not obvious after re-reading the odfWeave documentation, and does not seem to have been already reported to r-help (yes, Pr Ripley, I did my homework...). This report might spare some poor non-English-writer sod some time ... For what it's worth : > sessionInfo() R version 2.7.1 (2008-06-23) x86_64-pc-linux-gnu locale: LC_CTYPE=fr_FR.UTF-8;LC_NUMERIC=C;LC_TIME=fr_FR.UTF-8;LC_COLLATE=fr_FR.UTF-8;LC_MONETARY=C;LC_MESSAGES=fr_FR.UTF-8;LC_PAPER=fr_FR.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=fr_FR.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] odfWeave_0.7.6 XML_1.96-0 lattice_0.17-13 loaded via a namespace (and not attached): [1] grid_2.7.1 tools_2.7.1 Hoping this might help, Emmanuel Charpentier PS : don't try to answer to my subscribed address (charpent (at) bacbuc (dot) dyndns (dot) org), which is the apparent source of this message : bacbuc (dot) dyndns (dot) org is down and will quite like likely remain so for at least two to three more weeks. I'm playing fast and loose with Gmane by putting my registered address as a Reply-to address and use my actual present address (emm (dot) charpentier (at) free (dot) fr) as source. I do not know what will survive Gmane, Mailman and Martin's scripts mauling... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Possible buglet (wart ?) of odfWeave 0.7.6 (with workaround)
Dear List, I have had problems inserting some (not all !) figures via odfWeave (using print(someLatticeFunction)...). The figure was correctly displayed in a R device window but the resulting ODF document displayed the correct space for the figure and an empty frame with a "broken image" icon and a "read error" mention. Exploration of the odf (.odt, in my case) file showed that the relevant files (.eps, which is a vector format much more apt than bitmaps to cutting/pasting on various media) existed in the Pictures directory of the file and displayed fine. Then lightning struck : the picture files were named an follows : "content_", serial number, chunk name, ".eps" ; it occured to me that OpenOffice 2.4.1 (as packaged in Debian) displayed a "Read error" *when the chunk name had accented characters in it* (my system works in UTF8). Plain ASCII chunk names work OK. I do not know ODF specification well enough to be sure that this is an odfWeave problem (some chunk name mangling might be necessary) rather than an OpenOffice "infelicity" (non-implementation of a specification-requested feature) : checking with another ODF-compliant word processor might be useful, but I don't have any on hand. In any case, this problem is not obvious after re-reading the odfWeave documentation, and does not seem to have been already reported to r-help (yes, Pr Ripley, I did my homework...). This report might spare some poor non-English-writer sod some time ... For what it's worth : > sessionInfo() R version 2.7.1 (2008-06-23) x86_64-pc-linux-gnu locale: LC_CTYPE=fr_FR.UTF-8;LC_NUMERIC=C;LC_TIME=fr_FR.UTF-8;LC_COLLATE=fr_FR.UTF-8;LC_MONETARY=C;LC_MESSAGES=fr_FR.UTF-8;LC_PAPER=fr_FR.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=fr_FR.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] odfWeave_0.7.6 XML_1.96-0 lattice_0.17-13 loaded via a namespace (and not attached): [1] grid_2.7.1 tools_2.7.1 Hoping this might help, Emmanuel Charpentier PS : don't try to answer to my subscribed address (charpent (at) bacbuc (dot) dyndns (dot) org), which is the apparent source of this message : bacbuc (dot) dyndns (dot) org is down and will quite like likely remain so for at least two to three more weeks. I'm playing fast and loose with Gmane by putting my registered address as a Reply-to address and use my actual present address (emm (dot) charpentier (at) free (dot) fr) as source. I do not know what will survive Gmane, Mailman and Martin's scripts mauling... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lattice: problem using panel.superpose and panel.groups
Le dimanche 17 août 2008 à 09:36 +, Dieter Menne a écrit : [ Snip .. ] > Trellis graphics are a bit like hash functions: you can be close to the > target, but get a far-off result. Nice candidate for a fortune() entry ... Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] FYI: APL in R
Le mardi 19 août 2008 à 23:52 -0700, Jan de Leeuw a écrit : > http://idisk.mac.com/jdeleeuw-Public/utilities/apl/apl.R Thank you ! I was beginning to think I was the last of the dinosaurs... ISTR that someone else on the list is also an (un)repentant APLer... > Although this is all prefix and no infix, we could easily > recreate some of the infamous APL one-liners that nobody > can possibly understand or reproduce. :-)!!! Having implemented a (primitive) stats package in APL, I know what you mean... A liberal use of lamp was ... well, enlightening... Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sending "..." to a C external
Le vendredi 22 août 2008 à 15:16 -0400, John Tillinghast a écrit : > I'm trying to figure this out with "Writing R Extensions" but there's not a > lot of detail on this issue. > I want to write a (very simple really) C external that will be able to take > "..." as an argument. > (It's for optimizing a function that may have several parameters besides the > ones being optimized.) !!! That's a hard one. I have never undertaken this kind of job, but I expect that your "..." argument, if you can reach it from C (which I don't know) will be bound to a Lisp-like structure, notoriously hard to decode in C. Basically, you'll have to create very low level code (an duplicate a good chunk of the R parser-interpreter...). I'd rather treat the "..." argument in a wrapper that could call the relevant C function with all arguments interpreted and bound... This wrapper would probably be an order of magnitude slower than C code, but two orders of magnitude easier to write (and maintain !). Since "..." argument parsing would be done *once* before the "grunt work" is accomplished by C code, the slowdown would (probably) be negligible... > I got the "showArgs" code (from R-exts) to compile and install, but it only > gives me error messages when I run it. I think I'm supposed to pass it > different arguments from what I'm doing, but I have no idea which ones. > > What exactly are CAR, CDR, and CADR anyway? Why did the R development team > choose this very un-C-like set of commands? 'Cause they are fundamental to the Lisp-like language that is S/R. Read on... > They are not explained much in > R-exts. At the risk of incurring the R-help deities wrath : "In the beginning, John Mc Carthy created Lisp 1.5, who begat MACLISP who begat ..., who begat Scheme ... ". Nonwhistanding a long inheritance story, full of sound, fury, holy wars and bastardry (sometimes evoking European royalty lines...), all Lisp-like language bear one mark (dominant allele) of their common ancestor, which is the list structure. This structure is implemented as a pair of pointers, the first one pointing to the first element of the list, the second pointing to ... the list of the rest of the members (or nothing, aka NIL). In McCarthy's implementation, which was machine code on an IBM 704 (we were in 1957, mind you...) such word was spanned among different parts of a CPU word, known in the technical documentation of this dinosaur as "address register" and "decrement register respectively. Thus the mnemonic names of "Content of the Address Register" and "Content of Decrement Register" (aka "CAR" and "CDR") for these two pointers, names which were used for the (lisp) functions allowing to access them. Those names have stuck mainly for historical (sentimental ? hysterical ? ) reasons. Even when "reasonable" synonyms were introduced (e. g. "first" and "rest" in Common Lisp), all the old hands (and young dummies who wanted to emulate them) kept "car" and "cdr" close to their hearts. So, fifty one years after McCarthy stroke of genius, this piece of CS snobbery is still with us (and probably will 'til 64-bit Linux time counters roll over...). Despite its C-like syntax, its huge collection of array and array-like structures, its loops, S and R are fundamentally Lisp-like languages. Most notably, the representation of executable code is accessible by the code itself : one can create a function which computes another function. Lisp was the first language explicitly created for this purpose, and it is no happenstance that it (or one of his almost uncountable dialects) that many (most ?) of such language use Lisp fundamentals. Many of R/S "common way of doing things" have a Lisp smell : for example, it is no chance that {s|t|l}apply(), outer() and suchlike are *way* more efficient than for(), and they are strongly reminescent of Lisp's mapcar... BTW, this ability to "compute the language" is probably the most fundamental point distinguishing S/R from all the rest of "statistical packages" (SPSS, Stata, SAS and the rest of the crowd... Now I have to say that I'm not old enough to have first-hand knowledge of this history. I was born, but not weaned, when McCarthy unleashed Lisp on an unsuspecting world ... I learned that ca. 1978-9, while discovering VLisp. While I can't really help you (I still think that processing "..." at C level is either hubris of the worst sort, pure folly or a desperate case), I hope to have entertained you. Sincerely, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bootstrap inference for the sample median?
Le dimanche 30 août 2009 à 18:43 +0530, Ajay Shah a écrit : > Folks, > > I have this code fragment: > > set.seed(1001) > x <- c(0.79211363702017, 0.940536712079832, 0.859757602692931, > 0.82529998629531, > 0.973451006822, 0.92378802164835, 0.996679563355802, > 0.943347739494445, 0.992873542980045, 0.870624707845108, > 0.935917364493788) > range(x) > # from 0.79 to 0.996 > > e <- function(x,d) { > median(x[d]) > } > > b <- boot(x, e, R=1000) > boot.ci(b) > > The 95% confidence interval, as seen with `Normal' and `Basic' > calculations, has an upper bound of 1.0028 and 1.0121. > > How is this possible? If I sample with replacement from values which > are all lower than 1, then any sample median of these bootstrap > samples should be lower than 1. The upper cutoff of the 95% confidence > interval should also be below 1. Nope. "Normal" and "Basic" try to adjust (some form of) normal distribution to the sample of your statistic. But the median of such a small sample is quite far from a normal (hint : it is a discrete distribution with only very few possible values, at most as many value as the sample. Exercise : derive the distribution of median(x)...). To convince yourself, look at the histogram of the bootstrap distribution of median(x). Contrast with the bootstrap distribution of mean(x). Meditate. Conclude... HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] (quite possibly OT) Re: how to make R running on a Linux server display a plot on a Windows machine
ools. After all, Unix has been compared to the Tao for a reason... :-). If your major has anything to do with mathematics, science or engineering|technologies, it would be smart to at least consider this seriously. In short, you should discuss your options (memory, X server, VNC/RDP, Linux) with your *local* friendly help. Again, the R help list is *definitely* *NOT* the right place for learning about the care and feeding of computers. HTH, Emmanuel Charpentier preparing to be shot at dawn by the list police... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Use of R in clinical trials
Le samedi 20 février 2010 à 09:44 -0800, Dieter Menne a écrit : > If you check > > http://n4.nabble.com/R-help-f789696.html > > you will note that this thread has the largest number of read since years. > Looks like an encouragement to Mark to keep the mentioned CRAN document > updated. > > To add a more serious note to my sarcastic comments earlier: > > I don't think the FDA or our national agencies in Europe are to blame. I > know that they have eminent statisticians there who know what they are > talking of and are much more flexible than the culprits. > > The people to blame for the "Nobody got fired for using SAS" attitude are > reviewers and bosses with mainly medical background who make decisions. It > the difference in the use of the term "validated" which leads to confusion. > > A method is considered "validated" in medicine when it has been compared > with a gold standard and is non-inferior within reasonable limits. For > example, in the diagnosis of a gastric ulcer gastroscopy is the gold > standard, and cheaper or less invasive tests are measured against that. > > However, sometimes gold standards are the easy way out to avoid litigation, > and statistical evidence against these is brushed aside. Think of year it > needed to accept the Australian bush doctor's evidence that the bacteria > Helicobactor pylori is a main determinant for gastric ulcers; everyone > "knew" that ulcers were caused by stress alone. > > Or gastric emptying: the "gold standard" is (was?) the use of a radioactive > marker that was recorded after a meal. Since radioactivity cannot rise out > of nothing, it was a well known fact that stomach content always goes down > after a meal. After people started measuring the real volume of the liquid > in the stomach with MRI imaging, it came out that the content INCREASED > after the meal due to strong gastric secretion. Despite visible evidence > from the images, the increase was considered "not validated", because is was > in contradiction of the gold standard. > > "Validated" in medicine means: Some well-known person has made a publication > on the subject. He or she may be right, but not always. > > Mention the word three times in a discussion, and my blood pressure is at > 200 mmHg Furthermore, the "gold" standard might have to be replaced by some platinium. The FDA itself seems open to some radical changes. See : http://www.fda.gov/MedicalDevices/DeviceRegulationandGuidance/GuidanceDocuments/ucm071072.htm Tho only software "products" explicitely mentioned in this documents are ... WinBUGS, BRugs and OpenBugs. Not exactly a common component of your average "validated" toolchain... By the way, a "validated" acceptance of Bayesian methods for regulatory purposes (for devices, admittedly, but FDA"s arguments are *also* applicable to drugs assessment) might well be a "window of opportunity" for R, which offers : - interfaces to WinBUGS - interface to JAGS, an alternative implementation of the BUGS language and model - various packages for alternative MCMC and/or conjugate estimation of posterior distributions - a good system of production of formally validatable documents (Sweave|odfWeave, and possibly RRPM (Bioconductor)). Looks like R might have a head start over SAS for this "new" race... However, Bert Gunter's remarks are *still* valid : a lot of money has already been invested in SAS "production systems", esp. in the pharmaceutical industry. This industry will consider switching systems if and only if this switch allows them to save *serious* money or to *seriously* streamline their current process. We all know it's possible, but the economic|financial case still has to be done. An implementation of such a system in a new sector (namely medical devices), using the "Bayesian headstart" and the FDA incentive to consider Bayesian methods, might help making this case. > My message: If you hear "not validated" or "validated", question it. My message : timeo apothicarios et dona ferentes Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R on Linux - a primer
Le dimanche 14 mars 2010 à 18:04 -0400, Axel Urbiz a écrit : > Hi, > > I'm looking to move from Windows into a 64-bit Linux environment. Which is > the best Linux Flavor to use within R? To install R on this environment, do > I need to do any compiling? I'd like to add two cents of folly to the (wise) advice you've received already. Indeed , Ubuntu is one very good distribution whose management system has made the care and feeding of a Linux system a *lot* easier for the not-so-system-oriented people like yours truly. Whereas my first contacts with a Unix-like system were about 30 years ago (Oh my, how time flies, and how far away are Xenix and our 68000 systems ...), I'm *still* not fond of system maintenance for it's own sake. Ubuntu added an (almost) fool-proof maintenance system to an excellent distribution called Debian, thus lowering the Linux entry bar to as low as it can be humanely made. Some pretended that "Ubuntu" was a code word for "I'm too stupid to configure Debian" ; quite untrue ! It only means "I'm too busy|lazy to configure Debian", which is a Good Thing (TM). But Debian has its strong points, and one of them is *extremely* strong for an R user : Dirk Eddelbuettel (whose name I'm almost surely misspelling (sorry, Dirk !)) has created a marvelous system called cran2deb which routinely creates binary Debian packages from (almost) the 2000+ R packages available nowadays. That might look small change : the basic tools used for developing/compiling most R packages are small beer (at least by today's standards).But some of them might depend on fiendishly difficult-to-maintain foreign libraries. Dirk's cran2deb takes care of that and creates any information that Debian's dpkg maintenance system needs to automate *your* R maintenance chores by integrating them in Debian's maintenance scheme, which is as automatic as you can get without becoming an incomprehensible beast. In fact, cran2deb is so good that Im seriously tempted to go back to Debian (after almost 8 years of Debian use, Ubuntu's ease-of-use, easy access to no-so-exotic hardware drivers (and the then-incessant politically correct yack-yacking on some Debian mailing lists...) made me switch to an early Ubuntu distribution). I did not yet switch back (mostly for not-so-"superficial" hardware support reasons), but I maintain a backup Debian installation "for the hell of it" and to test waters. So far, they have been a lot less rough than they used to be, but there are still occasional rows (e. g. a recent gotcha with openoffice.org, which would have render myself unable to work with those d*mn Word files for about a month, or forced me to do a maual repair (which I hate...)). So consider Debian as a (desirable) alternative to Ubuntu. HTH, Emmanuel Charpentier, DDS, MSc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] recommendations on use of -> operator
An old (Lisp ? C ?) quote, whose author escapes me now, was : "syntactic sugar causes cancer of the semicolon" .. but nowadays semicolons are rarely used in "real-life" R. Emmanuel Charpentier PS and, BTW, neither C nor Lisp have much use for semicolons... Was that Pascal (or one of its spawns) ? Le jeudi 18 mars 2010 à 12:00 +, Prof Brian Ripley a écrit : > -> is usually only used when you suddenly remember you wanted to keep > result of a computation, especially in the days before command-line > editors were universal. > > fn(a,b) ... oh I'd better keep that ... -> z > > On Thu, 18 Mar 2010, Dan Kelley wrote: > > > I have never used -> but I noticed at > > > > http://github.com/jiho/r-utils/blob/master/beamer_colors.R > > > > that some people do. > > > In fact, the above-named code has a sort of elegance about it > (except > for the use of "=" for assignment...). To my eye, -> calls to mind a type > of assignment that is meant to stand out. For example, perhaps it would make > sense to use -> to assign to things that are not expected to vary through > the rest of the code block. > > > > Q1: is there a convention, informal or otherwise, on when to use the -> > > operator? > > > > Q2: if there is no convention, but if people think a convention might help, > > what would that convention be? > > > > Dan Kelley, PhD > > Professor and Graduate Coordinator > > Dept. Oceanography, Dalhousie University, Halifax NS B3H 4J1 > > kelley@gmail.com (1-minute path on US server) or dan.kel...@dal.ca > > (2-hour path on Cdn server) > > Phone 902 494 1694; Fax 902 494 3877 > > > > > > > > > > > > > > > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] recommendations on use of -> operator
Le jeudi 18 mars 2010 à 15:49 -0400, Duncan Murdoch a écrit : > On 18/03/2010 3:10 PM, Emmanuel Charpentier wrote: > > An old (Lisp ? C ?) quote, whose author escapes me now, was : > > > > "syntactic sugar causes cancer of the semicolon" > > > > .. but nowadays semicolons are rarely used in "real-life" R. > > > > Emmanuel Charpentier > > > > PS and, BTW, neither C nor Lisp have much use for semicolons... Was that > > Pascal (or one of its spawns) ? > > Not sure what you mean about C: it's a statement terminator there. > Pascal also uses it, but as a statement separator. Quite right ! And I know I'm falling in that trap every time I don't code in C for more than a few weeks (it has been three months, now...). That's probably because this "end-of-statement" role desn't strike my psyche as important enough... (And maybe because Lisp and, to a lesser extend, Pascal were my computer "mother tongues", APL a serious crush and Fortran something of an "(almost)-read-only" acquaintance). Nice thinko... Emmanuel Charpentier Still recovering... > > Duncan Murdoch > > > > > Le jeudi 18 mars 2010 à 12:00 +, Prof Brian Ripley a écrit : > >> -> is usually only used when you suddenly remember you wanted to keep > >> result of a computation, especially in the days before command-line > >> editors were universal. > >> > >> fn(a,b) ... oh I'd better keep that ... -> z > >> > >> On Thu, 18 Mar 2010, Dan Kelley wrote: > >> > >>> I have never used -> but I noticed at > >>> > >>> http://github.com/jiho/r-utils/blob/master/beamer_colors.R > >>> > >>> that some people do. > >>> In fact, the above-named code has a sort of elegance about it > >> (except > >> for the use of "=" for assignment...). To my eye, -> calls to mind a > >> type of assignment that is meant to stand out. For example, perhaps it > >> would make sense to use -> to assign to things that are not expected to > >> vary through the rest of the code block. > >>> Q1: is there a convention, informal or otherwise, on when to use the -> > >>> operator? > >>> > >>> Q2: if there is no convention, but if people think a convention might > >>> help, what would that convention be? > >>> > >>> Dan Kelley, PhD > >>> Professor and Graduate Coordinator > >>> Dept. Oceanography, Dalhousie University, Halifax NS B3H 4J1 > >>> kelley@gmail.com (1-minute path on US server) or dan.kel...@dal.ca > >>> (2-hour path on Cdn server) > >>> Phone 902 494 1694; Fax 902 494 3877 > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> __ > >>> R-help@r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >>> > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ANOVA with means and SDs as input
Le jeudi 25 juin 2009 à 12:15 +0200, Sebastian Stegmann a écrit : > Dear R-community, > > I'm struggling with a paper that reports only fragmented results of a > 2by2by3 experimental design. However, Means and SDs for all cells are given. > > Does anyone know a package/function that helps computing an ANOVA with only > Means and SDs as input (resulting in some sort of effect size and > significance test)? > > The reason why I'm interested is simple: I'm conducting a meta-analysis. If > I included only what the authors would like to show the world, the results > would be somewhat biased... What you aim to do is not, as far as I can tell, possible using already-programmed solutions in R packages. Some packages offer very close solutions (e. g. lme()) but lack some necessary options (e. g. the ability to fix intra-level variance in lme() : varFixed() works as documented, not as what its name suggests...). Why not use some established meta-analysis package such as meta or rmeta ? You may have to recompute some variances or effect sizes, but it can usually be done. If you aim to go a bit farther (i. e. more than 2-groups comparisons), the newly-released metafor package seems quite interesting. Wolfgang Viechtbauer aimed to offer a tool for meta-regression, and the current stage of the package (0.5.something IIRC) seems to be quite useable, if somewhat counter-intuitive in places. I'd recommend to have a look at it. WV didn't think useful to include the vignettes and documentation for mima(), predecessor of metafor, which explains quite nicely the motivation and algorithms he used. Too bad... However, this documentation is still available on his Web site. Be aware that the current "main interface" to the rma() function (workhorse of the package) does not (yet ?) follow the usual format for regression-like functions common to many R packages (the (formula, data, subset, etc...) format). For example, you may make meta-regression, but you'll have to code the model matrix yourself. Similarly, the necessary bricks for (e. g.) indirect comparisons are here, but you'll have to build them yourself (bring your wheelbarrow an some mortar...). An alternative to be seriously considered : Bayesian modeling. Meta-analysis is one of the rare medical domains where journal reviewers won't immediately reject a paper at the sight of the B***sian word... As far as I know, psychological journals are not as asinine. Here again, some manual work will be in order. Some tutorial material is readily available (google "Bayesian meta analysis tutorial", which should lead you to the various "Statistics in Medicine" tutorials, among other things). > I've combed through the web and my various books on R, but it's not that > easy to find. Or is it? Variance analysis through manual decomposition of sum of squares ? Hugh ... I did that when I first learned anova(so long a time ago that I am uncomfortable to think of it. In these times, even a pocket calculator was too expensive for a random student (therefore forbidden in exams), and I learned to cope with slide rule...). Tiring and messy. Furthermore, "manual" algorithms (i. e. simple, non-iterative algorithms) exists only for the simplest (e. g. one-way anova) or ideal (e. g. balanced two-way anova with equal group sizes) cases ; furthermore, modern critiques of the bases of such works cast some doubts on their foundations (e. g. random- and mixed-effect anova)... Therefore, the art is fast forgotten, if not totally lost. If you insist, I might try to unearth my copy of Winer's handbook (ca. 1959) and look up specific questions. HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] questions about meta-analysis
Le samedi 27 juin 2009 à 13:02 +0800, sdzhangping a écrit : > Dear R users: > > In the example of meta-analysis (cochrane, package rmeta), I can not > found the p-value of Test for overall effect, and some other indices > (Z, I, weight and et al). How can I get the these indices listed? > > library(rmeta) > > data(cochrane) > > cochrane > name ev.trt n.trt ev.ctrl n.ctrl > 1 Auckland 36 532 60538 > 2Block 169 5 61 > 3Doran 481 11 63 > 4Gamsu 14 131 20137 > 5 Morrison 367 7 59 > 6 Papageorgiou 171 7 75 > 7 Tauesch 856 10 71 > > a=meta.MH(n.trt,n.ctrl,ev.trt,ev.ctrl,names=name,data=cochrane) > > summary(a) > Fixed effects ( Mantel-Haenszel ) meta-analysis > Call: meta.MH(ntrt = n.trt, nctrl = n.ctrl, ptrt = ev.trt, pctrl = > ev.ctrl, > names = name, data = cochrane) > >OR (lower 95% upper) > Auckland 0.580.38 0.89 > Block0.160.02 1.45 > Doran0.250.07 0.81 > Gamsu0.700.34 1.45 > Morrison 0.350.09 1.41 > Papageorgiou 0.140.02 1.16 > Tauesch 1.020.37 2.77 > > Mantel-Haenszel OR =0.53 95% CI ( 0.39,0.73 ) (where is Z and > p-value ?) > Test for heterogeneity: X^2( 6 ) = 6.9 ( p-value 0.3303 ) You might easily recompute them : use the confidence interval to guesstimate the standard deviation of OR, its value and the damn Z and p you long for... Remember that it's log(OR) that is (asymptotically) normally distributed. The weights are proportional to inverse of variance and sum to 1. You might peek at the source code for enlightment... Alternatively, you may grow lazy an use the "meta" package, whose function metabin() will give you all that. The recent "metafor" package seems also quite interesting. Another (smarter(?)) alternative is to ask yourself *why* you need Z and p. Z is just a computing device allowing you to use a known density. And the real meaning of p and its usefulness has been discussed, disputed and fought over at exceedingly large lengths in the past 80 years. Of course, if you have an instructor to please ... HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] odfWeave : problems with odfInsertPlot() and odfFigureCaption()
Dear Max, dear list, I have trouble using odfInsertPlot to insert plots outside fig=TRUE chunks. My goal is to dump a bunch of (relatively uninteresting, but required) graphs from within a loop, possibly interspeded with some tables. This is not possible within a normal fig=TRUE chunk. odfInsertPlot() is able to insert a previously saved plot. When used in a fig=TRUE, results=hide chunk, it produces just a blank frame (with a caption if caption=odfFigureCaption() is used). Used in a fig=TRUE, results=xml chunk, it produces an error at file rte-assembly (see end of post). However, when wrapped in a cat(), the same code produces 1) the figure (without caption), then a blank frame with the caption, or just the figure (without caption) if the caption=() argument is absent. The same cat(odfInsertPlot()) construct can be used in a results=xml chunk (no fig=TRUE) ; however, I found no way to insert the caption. Therefore : I can programmatically create graphs and insert them via results=xml chunks, but I can't caption them. Do someone has a workaround (except reprogramming odfFigureCaption() with a new argument taking the output of odfInsertPlot()) ? Sincerely, Emmanuel Charpentier Annex : log of compilation with fig=true, results=xml. > odfWeave("Test1Src.odt", "Test1.odt") Copying Test1Src.odt Setting wd to /tmp/RtmpenMTgz/odfWeave3023572231 Unzipping ODF file using unzip -o Test1Src.odt Archive: Test1Src.odt extracting: mimetype inflating: content.xml inflating: layout-cache inflating: styles.xml extracting: meta.xml inflating: Thumbnails/thumbnail.png inflating: Configurations2/accelerator/current.xml creating: Configurations2/progressbar/ creating: Configurations2/floater/ creating: Configurations2/popupmenu/ creating: Configurations2/menubar/ creating: Configurations2/toolbar/ creating: Configurations2/images/Bitmaps/ creating: Configurations2/statusbar/ inflating: settings.xml inflating: META-INF/manifest.xml Removing Test1Src.odt Creating a Pictures directory Pre-processing the contents Sweaving content.Rnw Writing to file content_1.xml Processing code chunks ... 1 : term hide(label=ProlégomènesStandard) 2 : term xml(label=EssaiGraphe1) 3 : echo term verbatim(label=TestChunk) 4 : term xml(label=SecondChunk) 'content_1.xml' has been Sweaved Removing content.xml Post-processing the contents error parsing attribute name attributes construct error xmlParseStartTag: problem parsing attributes Couldn't find end of Start Tag draw:frame line 6 error parsing attribute name attributes construct error xmlParseStartTag: problem parsing attributes Couldn't find end of Start Tag draw:image line 6 Opening and ending tag mismatch: text:p line 6 and draw:frame Opening and ending tag mismatch: office:text line 2 and text:p Opening and ending tag mismatch: office:body line 2 and office:text Opening and ending tag mismatch: office:document-content line 2 and office:body Extra content at the end of the document Erreur : 1: error parsing attribute name 2: attributes construct error 3: xmlParseStartTag: problem parsing attributes 4: Couldn't find end of Start Tag draw:frame line 6 5: error parsing attribute name 6: attributes construct error 7: xmlParseStartTag: problem parsing attributes 8: Couldn't find end of Start Tag draw:image line 6 9: Opening and ending tag mismatch: text:p line 6 and draw:frame 10: Opening and ending tag mismatch: office:text line 2 and text:p 11: Opening and ending tag mismatch: office:body line 2 and office:text 12: Opening and ending tag mismatch: office:document-content line 2 and office:body 13: Extra content at the end of the document > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] NaiveBayes fails with one input variable (caret and klarR packages)
Le mardi 30 juin 2009 à 14:51 -0400, Max a écrit : > I just put a new version on cran... Now, *that's* support !!! bug report at 17:31, acknowledgement at 20:12, (probable) bug fix at 20:51. Bravo, bravissimo, Max ! Try that, SAS support ! Emmanuel Charpentier And I insist that this is not ironic in the least... It's genuinely admirative. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] odfWeave : problems with odfInsertPlot() and odfFigureCaption()
Dear Max, Le mardi 30 juin 2009 à 21:36 -0400, Max Kuhn a écrit : > Well, on this one, I might bring my record down to commercial software > bug fix time lines... Having actually looked at the code before wailing, I had such a hunch... [ Snip ... ] > We had to do some interesting things to get figure captions working > with odfWeave. You would think that the figure caption function just > write out some xml and are done (like the insert plot code and the > table caption code). Unfortunately, it doesn't. > > It writes the caption to a global variable (I know, I know) and other > code actually writes the caption. We had good reason to do this, > mostly related to the complexity associated with the context of the > code chunk (e.g. was it inside a table, at paragraph, etc). I've been > thinking about this one for a while since the current implementation > doesn't allow you to pass the figure caption to other functions (it > writes a global variable and returns NULL). I haven't come up with a > good solution yet. ISTR from my Lisp/Scheme days that a good heuristic in this kind of case is to try to build a closure and return that to a wrapper. Unfortunately, R doesn't have anything as simple to use as let/let*, and you have to play fast and loose with environments... > > I've not been as prompt with odfWeave changes in comparison to other > packages (many of you have figured this out). This is mostly due to > the necessary complexity of the code. In a lot of ways, I think that > we let the feature set be too rich and should have put some constrains > on the availability of code chunks (e.g. only in paragraphs). I didn't even think of this. A priori, that's hard to do (whereas (\La)Texw will need "\par" or *two* newlines to end a paragraph and start another, oowriter will take one newline as an end of paragraph and will emit the corresponding XML. Come to think of it, you might try to insert a chunk in a table case, or in a footnote... And I don't know about other ODF tools... > I want > to add more features (like the one that you are requesting), but > seemingly small changes end up being a considerable amount of work > (and I don't time). I've been thinking about simplifying the code (or > writing rtfWeave) to achieve the same goals without such a complex > system. What about something along the line of : foo<-odfTable(something) bar<-odfInsertPlot(somethingelse) odfCaptionize(foo,"Something") odfCaptionize(bar("Something else") ? > The ODF format is a double-edged sword. It allows you to do many more > things than something like RTF, but makes it a lot more difficult to > program for. Take tables as an example. The last time I looked at the > RTF specification, this was a simple markup structure. In ODF, the > table structure is somewhat simple, but the approach to formatting the > table is complex and not necessarily well documented. Indeed... >For example, > some table style elements go into styles.xml whereas others go into > contest.xml (this might be implementation dependent) > > So, I could more easily expand the functionality of odfWeave by > imposing some constraints about where the code chunks reside. The > earlier versions of odfWeave did not even parse the xml; we got a lot > done via regular expressions and gsub. However, we went to a xml > parsing approach when we found out that the context of the code chunk > matters. > > Feedback is welcome: do people actually use code chunks in places > other than paragraphs? > I didn't even think of it. But now that you tell it, you got me a nice idea for a workaround ... :-) > So, back to your question, the relevant code for captions is in > odfWeave:::withCaptionXML (odfInsertPlot uses this to write the xml). > You can try that and I might be able to look at it in the next few > days. I did look that. Tjhe problem is that your code has dire warnings not to call it with a caption more than once in a figure=TRUE (sic..) chunk. > Thanks, *I* thank *you, Max, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unix commands on R
Le mercredi 08 juillet 2009 à 16:36 -0400, suman Duvvuru a écrit : > I am using R on unix. While in R how do I execute the unix shell commands? > Is there a way to do it? I am executing a function in R and the matrix > resulting from the function has to be passed on as an input to unix command. > > > Any directions will be helpful. ?system ?capture ?textConnection These three functions (I'm not sure bout the real name of the second) should allow you to get the standard output of a Unix command line. Consider also cpturing your command output in a file (reusable...). HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to perform a Likelihood-ratio test?
Le lundi 13 juillet 2009 à 13:04 +0200, Lars Bergemann a écrit : > Hi. > > > > I would like to use a likelihood-ratio test to compare a linear model (lm()) > to another linear model containing the first one to see if the extra factors > are needed - but I was unable to find any help on how to do that. > > Could you help me and tell me what to do? Or tell me where to find help on > this topic? ?anova? Check the "test" argument ...:-) Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] odfWeave : sudden and unexplained error
Dear list, dear Max, I a currently working on a report. I'm writing it with OpenOffice.org and odfWeave. I'm working increentally : I write a bit, test (interactively) some ideas, cutting-and-pasting code to the Ooo report when satisfied with it. I the process, I tend to recompile the .odt source a *lot*. Suddenly, odfWeave started to give me an incomprehensible error even before starting the compilation itself (InFile is my inut .odt file, Outfile is the resultant .odt file) : > odfWeave(InFile, OutFile) Copying SrcAnalyse1.odt Setting wd to /tmp/RtmphCUkSf/odfWeave01225949667 Unzipping ODF file using unzip -o SrcAnalyse1.odt Archive: SrcAnalyse1.odt extracting: mimetype inflating: content.xml inflating: layout-cache inflating: styles.xml extracting: meta.xml inflating: Thumbnails/thumbnail.png inflating: Configurations2/accelerator/current.xml creating: Configurations2/progressbar/ creating: Configurations2/floater/ creating: Configurations2/popupmenu/ creating: Configurations2/menubar/ creating: Configurations2/toolbar/ creating: Configurations2/images/Bitmaps/ creating: Configurations2/statusbar/ inflating: settings.xml inflating: META-INF/manifest.xml Removing SrcAnalyse1.odt Creating a Pictures directory Pre-processing the contents Erreur : cc$parentId == parentId is not TRUE > Perusing the documentation and the r-help list archives didn't turn up anything relevant. This error survived restarting OOo, restarting R, restarting its enclosing Emacs session and even rebooting the damn hardware... Any idea ? Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Testing year effect in lm() ***failed first time, sending again
Le jeudi 30 juillet 2009 à 16:41 -0600, Mark Na a écrit : > Dear R-helpers, > > I have a linear model with a year effect (year is coded as a factor), i.e. > the parameter estimates for each level of my year variable have significant > P values (see some output below) and I am interested in testing: > > a) the overall effect of year; > b) the significance of each year vis-a-vis every other year (the model > output only tests each year against the baseline year). install.packges("multcomp") help.start() and use the vignettes ! They're good (and are a good complement to somewhat incomplete help pages : what in h*ll are valid arguments to mcp() beyond "Tukey" ??? Curently, you'll have to dig in the source to learn that...). HTH Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Testing year effect in lm() ***failed first time, sending again
Le mercredi 05 août 2009 à 09:52 -0700, Mark Difford a écrit : > Emmanuel, > > >> somewhat incomplete help pages : what in h*ll are valid arguments to > >> mcp() beyond "Tukey" ??? Curently, you'll have to dig in the source to > >> learn that...). > > Not so: they are clearly stated in ?contrMat. Oops.. I oversaw that. With my apologies, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multilevel modeling with count variables
Your request might find better answers on the R-SIG-mixed-models list ... Anyway, some quick thoughts : Le vendredi 26 mars 2010 à 15:20 -0800, dadrivr a écrit : > By the way, my concern with lmer and glmer is that they don't produce > p-values, The argumentation of D. Bates is convincing ... A large thread about these issues exist on this (R-help) list archive. I won't get in the (real) debate about the (dubious) value of hypothesis testing, and I'm aware that there are domains where journal editors *require* p-values. Sigh... This insistence seems to be weakening, however. >and the techniques used to approximate the p-values with those > functions (pvals.fnc, HPDinterval, mcmcsamp, etc.) only apply to Gaussian > distributions. (Probably ?) true for pvals.fnc, false for HPDinterval and mcmcsamp. You can always generate a "sufficient" number of estimates and assess a confidence interval and a p-value from such samples. See any good text on use of simulation in statistics... The fly in the ointment is that current versions of lmer seem to have problems with mcmcsamp(). Bootstrap comes to mind, but might be non-trivial to do efficiently, especially in a hierarchical/multilevel context. > Given that I would likely be working with quasi-poisson > distributions, is there a good alternative mixed effects modeling approach glmer, Bayesian modeling through BUGS (whose "probabilities" interpretation is (quite) different from p-values). > that would output significance values? No. You'd have to compute them yourself ... if you're able to derive the (possibly asymptotic) distribution of your test statistic under the hypothesis you aim to refute. This nut might be harder to crack than it appears... Now, to come back to your original question, you might try 1) to use glm/glmer to model your dependent variable as a (quasi-)Poisson variable, and 2) use log transformations of the independent cout variables ; a more general solution is given by the Box-Cox transformation family : see the relevant function in MASS, which also offers the logtrans family. ISTR that John Fox's car package offers a function aiming at finding the optimal simultaneous transformations of a dependent variable and its predictors. Other packages I'm not aware of might offer other solutions ; in particular, I'd recommend to peek at Frank Harrell's Hmisc and rms packages documentation, whose wealth I did not yet seriously assess... Bayesian modeling with BUGS would also allow you to try to fit any model you might wish to test (provided that it can be written as a directed acyclic graph and that the distributions of you variables are either from a "standard" family available in BUGS or that you are able to express the (log-)density of the non-standard distribution you wish to use). But, again, no p-values in sight. Would you settle for Bayes factors between two models ? or DIC comparisons ? HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logistic regression in an incomplete dataset
Dear Desmond, a somewhat analogous question has been posed recently (about 2 weeks ago) on the sig-mixed-model list, and I tried (in two posts) to give some elements of information (and some bibliographic pointers). To summarize tersely : - a model of "information missingness" (i. e. *why* are some data missing ?) is necessary to choose the right measures to take. Two special cases (Missing At Random and Missing Completely At Random) allow for (semi-)automated compensation. See literature for further details. - complete-case analysis may give seriously weakened and *biased* results. Pairwise-complete-case analysis is usually *worse*. - simple imputation leads to underestimated variances and might also give biased results. - multiple imputation is currently thought of a good way to alleviate missing data if you have a missingness model (or can honestly bet on MCAR or MAR), and if you properly combine the results of your imputations. - A few missing data packages exist in R to handle this case. My ersonal selection at this point would be mice, mi, Amelia, and possibly mitools, but none of them is fully satisfying(n particular, accounting for a random effect needs special handling all the way in all packages...). - An interesting alternative is to write a full probability model (in BUGS fo example) and use Bayesian estimation ; in this framework, missing data are "naturally" modeled in the model used for analysis. However, this might entail *large* work, be difficult and not always succeed (numerical difficulties. Furthermore, the results of a Byesian analysis might not be what you seek... HTH, Emmanuel Charpentier Le lundi 05 avril 2010 à 11:34 +0100, Desmond Campbell a écrit : > Dear all, > > I want to do a logistic regression. > So far I've only found out how to do that in R, in a dataset of complete > cases. > I'd like to do logistic regression via max likelihood, using all the study > cases (complete and incomplete). Can you help? > > I'm using glm() with family=binomial(logit). > If any covariate in a study case is missing then the study case is dropped, > i.e. it is doing a complete cases analysis. > As a lot of study cases are being dropped, I'd rather it did maximum > likelihood using all the study cases. > I tried setting glm()'s na.action to NULL, but then it complained about NA's > present in the study cases. > I've about 1000 unmatched study cases and less than 10 covariates so could > use unconditional ML estimation (as opposed to conditional ML estimation). > > regards > Desmond > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [R-pkgs] SOAR - Stored object caches for R
Le mercredi 07 avril 2010 à 18:03 +1000, William Venables a écrit : [ Snip ... ] > [ ... ] The effect is > somewhat like that of the use of the .Data directory in S-PLUS, (a > program not unlike R), though somewhat more manually driven. I'm not sure that such "old-timers" puns qualify for fortune()ification, but this is a good one... William : did you consider possible interaction with other cacheing efforts such as cachesweave ? It seems that the idea of alleviating memory/CPU time requirements is a concern to many people (of course, the intrinsically interactive development process that S/R allow(s) makes this a big concern). Unifying those efforts, and maybe incorporating them in the base interpreter, might be worthwhile... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glmer with non integer weights
05 1.1565 1.3412 sigma.lpi[4] 0.5304 0.67740 0.8189 1.0067 1.5987 The same model re-fitted with epsilon=0.01 gives : > summary(LogitModFix[[2]]) Iterations = 1001:2000 Thinning interval = 1 Number of chains = 3 Sample size per chain = 1000 1. Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE deviance 534.2639 4.42164 0.0807278 0.122396 pi[1] 0.6396 0.10877 0.0019859 0.001892 pi[2] 0.2964 0.06592 0.0012036 0.001163 pi[3] 0.5083 0.04120 0.0007522 0.000805 pi[4] 0.5702 0.07170 0.0013091 0.001221 sigma.lpi[1] 3.0387 0.36356 0.0066376 0.008803 sigma.lpi[2] 2.0519 0.22665 0.0041381 0.005358 sigma.lpi[3] 1.0928 0.12531 0.0022879 0.003305 sigma.lpi[4] 0.8699 0.26650 0.0048656 0.010146 2. Quantiles for each variable: 2.5% 25% 50% 75%97.5% deviance 527.9142 530.9711 533.4347 536.6295 544.7738 pi[1] 0.4130 0.5683 0.6429 0.7195 0.8251 pi[2] 0.1826 0.2511 0.2909 0.3373 0.4390 pi[3] 0.4266 0.4809 0.5093 0.5359 0.5892 pi[4] 0.4183 0.5254 0.5701 0.6150 0.7106 sigma.lpi[1] 2.4362 2.7772 2.9965 3.2600 3.8395 sigma.lpi[2] 1.6606 1.8935 2.0320 2.1811 2.5598 sigma.lpi[3] 0.8803 1.0035 1.0754 1.1711 1.3670 sigma.lpi[4] 0.5022 0.6901 0.8198 0.9906 1.5235 One should also note that the deviance is *extremely* sensitive to the choice of epsilon, whic tends to indicate that, at small epsilon values, the "sharp" "impossible" values dominate the evaluation of the oefficients they are involved in. Beta models (not shown) give similar results, to a lesser extend. In short, your "sharp" data are essentially incompatible with your model, which can't be fitted unless you get at least a rough idea of their precision. HTH, Emmanuel Charpentier, DDS, MSc > best regards, > kay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glmer with non integer weights
Addendum to my previous answer : In that special case, the limited range of the asin(sqrt()) transformation, which is a shortcoming, turns out to be useful. The fixed-effect doefficients seem semi-reasonable (except for stageB) : > (sin(coef(lm(asin(sqrt(MH.Index))~0+stage, data=similarity^2 stageAstageBstageCstageD 0.6164870 0.3389430 0.5083574 0.5672021 The random effects being nested in the fixed efect, one can't afford to be lazy in the parametrization of the corresponding random effect : > summary(lmer(asin(sqrt(MH.Index))~stage+(stage|site), data=similarity)) Linear mixed model fit by REML Formula: asin(sqrt(MH.Index)) ~ stage + (stage | site) Data: similarity AIC BIC logLik deviance REMLdev 155.3 199 -62.65111.8 125.3 Random effects: Groups NameVariance Std.Dev. Corr site (Intercept) 0.043579 0.20876 stageB 0.033423 0.18282 -0.999 stageC 0.043580 0.20876 -1.000 0.999 stageD 0.043575 0.20875 -1.000 0.999 1.000 Residual 0.128403 0.35833 Number of obs: 136, groups: site, 39 Fixed effects: Estimate Std. Error t value (Intercept) 0.930360.08431 11.035 stageB -0.308790.10079 -3.064 stageC -0.136600.09981 -1.369 stageD -0.077550.14620 -0.530 Correlation of Fixed Effects: (Intr) stageB stageC stageB -0.836 stageC -0.845 0.707 stageD -0.577 0.482 0.487 > v<-fixef(lmer(asin(sqrt(MH.Index))~stage+(stage|site), data=similarity)) > v[2:4]<-v[1]+v[2:4] > names(v)[1]<-"stageA" > (sin(v))^2 stageAstageBstageCstageD 0.6429384 0.3390903 0.5083574 0.5672021 But again, we're exploiting a shortcoming of the asin(sqrt()) transformation. HTH, Emmanuel Charpentier Le vendredi 16 avril 2010 à 00:15 -0800, Kay Cichini a écrit : > thanks thierry, > > i considered this transformations already, but variance is not stabilized > and/or normality is neither achieved. > i guess i'll have to look out for non-parametrics? > > best regards, > kay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glmer with non integer weights
Le lundi 19 avril 2010 à 03:00 -0800, Kay Cichini a écrit : > hi emmanuel, > > thanks a lot for your extensive answer. > do you think using the asin(sqrt()) transf. can be justified for publishing > prurpose or do i have to expect criticism. Hmmm ... depends of your reviewers. But if an half-asleep dental surgeon caught that after an insomnia, you might expect that a fully caffeinated reviewer will. Add Murphy's law to the mix and ... boom ! > naivly i excluded that possibility, because of violated anova-assumptions, > but if i did get you right the finite range rather posses a problem here. No. your problem is that you model a probability as a smooth (linear) finite function of finite variables. Under those assumptions, you can't get a *certitude* (probability 0 or 1). Your model is *intrinsically* inconsistent with your data. In other word, I'm unable to believe both your model (linear whathyoumaycallit regression) and your data (wich include certainties) *simultaneously*. I'd reconsider your 0 or 1, as meaning *censored* quantities (i. e. no farther than some epsilon from 0 or 1), with *hard* data (i. e. not a cooked-up estimate such as the ones i used) to estimate epsilon. There are *lots* of ways to fit models with censored dependent variables. > why is it in this special case an advantage? It's bloody hell *not* a specific advantage : if you want to fit a linear model to a a probability, you *need* some function mapping R to the open ]0 1[ (i. e. all reals strictly superior to 0 and strictly inferior to 1 ; I thing that's denoted (0 1) in English/American usage). Asin(sqrt()) does that. However, (asin(sqrt()))^-1 has a big problem (mapping back [0 1] i. e. *including* 0 and 1, *not* (0 1), to R) which *hides* the (IMHO bigger) problem of the inadequacy of your model to your data ! In other words, it lets you shoot yourself in the foot after a nice sciatic nerve articaïne block making the operation painless (but still harmful). On the other hand, logit (or, as pointed by Martin Maechler, qlogis), is kind enough to choke on this (i. e. returning back Inf values, which will make the regression program choke). So please quench my thirst : what exactly is MH.Index supposed to be ? How is it measured, estimated, guessed or divined ? HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glmer with non integer weights
Sorry for this late answer (I've had a seriously nonmaskable interrupt). Since I have technical questions not related to R, I take the liberty to follow this up by e-mail. I might post a followup summary if another R problem arises... Emmanuel Charpentier Le lundi 19 avril 2010 à 23:40 -0800, Kay Cichini a écrit : > hello, > > it's the Morisita Horn Index, which is an ecological index for similarity > between two multivariate objects (vegetation samples with species and its > abundance) where a value of one indicates completely same relative > importance of species in both samples and 0 denotes total absence of any > same species. > > it can be expressed as a probability: > > (prob. that an individual drawn from sample j > and one from sample k belong to the same species) > - = MH-INdex > (prob. that two ind. drawn from either sample will > belong to same species) [ Technical rambling ] Hmmm ... that's the *ratio* of two probabilities, not a probability. According to http://www.tnstate.edu/ganter/B412%20Ch%2015&16% 20CommMetric.html, that I found in the first page of answers to a simple google query, in can also be thought of the ratio of two "distances" between sites : (maximal distnce - actual distance) / maximal distance (with a massive (over-?) simplification). There is no reason to think a priori that the logit transformation (or the asin(sqrt()) transformation has better properties for this index than any other mapping from [0 1] to R. (BTW, a better mapping might be from [0 1] to [0 Inf] or conversely, but "negative" distances have no (obvious) meaning. Here asin(sqrt()) might make more sense that qlogis().) [ End of technical rambling ] But I have trouble understanding how a "similarity" or "distance" index can characterize *one* site... Your data clearly associate a MH.Index to each site : what distance or similarity do you measure at this site ? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lme4 and incomplete block design
0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0)) AIC BIC logLik deviance 87.33 98.72 -38.6777.33 Random effects: Groups NameVariance Std.Dev. block (Intercept) 0.86138 0.9281 Number of obs: 72, groups: block, 24 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.9246 0.7117 -2.704 0.00684 ** treatment21.3910 0.8568 1.624 0.10446 treatment30.4527 0.9163 0.494 0.62124 treatment40.4526 0.9163 0.494 0.62131 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Correlation of Fixed Effects: (Intr) trtmn2 trtmn3 treatment2 -0.775 treatment3 -0.721 0.598 treatment4 -0.721 0.598 0.558 In the first case (your original "data"), "treatment" is interpreted as a numeric (quantitative) variable , and whr lmre estimtes is a logistic regression coefficient of the outcome n this numeric variable. Probbly nonsensical, unless you hve reason to thin that your factor is ordered and should be treated as numeric). In the second case, "treatment" is a factor, so you get an estimate for each treatment level except the first, to be interpreted as difference of means with the first level. I fell in that trap myself a few times, and took the habit to give evels to my fctors tht cannot be interpreted as numbers (such as f<-paste("F", as.character(v))). > Any help is appreciated! HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Models for Discrete Choice in R
Le dimanche 08 novembre 2009 à 17:07 -0200, Iuri Gavronski a écrit : > Hi, > > I would like to fit Logit models for ordered data, such as those > suggested by Greene (2003), p. 736. > > Does anyone suggests any package in R for that? look up the polr function in package MASS (and read the relevant pages in V&R4 and some quoted references...) or the slightly more sophisticated (larger range of models) lrm function in F. Harrell's Design (now rms) packge (but be aware that Design is a huge beast witch carries its own "computing universe", based on (strong) Harrell's view of what a regression analysis should be : reading his book is, IMHO, necessary to understand his choices and agree (or disgree) with them). If you have a multilevel model (a. k. a. one "random effect" grouping), the "repolr" packge aims at that, but I've been unable to use it recently (numerical exceptions). > By the way, my dependent variable is ordinal and my independent > variables are ratio/intervalar. Numeric ? Then maybe some recoding/transformation is in order ... in which case Design/rms might or might not be useful. HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Models for Discrete Choice in R
Le dimanche 08 novembre 2009 à 19:05 -0600, Frank E Harrell Jr a écrit : > Emmanuel Charpentier wrote: > > Le dimanche 08 novembre 2009 à 17:07 -0200, Iuri Gavronski a écrit : > >> Hi, > >> > >> I would like to fit Logit models for ordered data, such as those > >> suggested by Greene (2003), p. 736. > >> > >> Does anyone suggests any package in R for that? > > > > look up the polr function in package MASS (and read the relevant pages > > in V&R4 and some quoted references...) or the slightly more > > sophisticated (larger range of models) lrm function in F. Harrell's > > Design (now rms) packge (but be aware that Design is a huge beast witch > > carries its own "computing universe", based on (strong) Harrell's view > > of what a regression analysis should be : reading his book is, IMHO, > > necessary to understand his choices and agree (or disgree) with them). > > > > If you have a multilevel model (a. k. a. one "random effect" grouping), > > the "repolr" packge aims at that, but I've been unable to use it > > recently (numerical exceptions). > > > >> By the way, my dependent variable is ordinal and my independent > >> variables are ratio/intervalar. > > > > Numeric ? Then maybe some recoding/transformation is in order ... in > > which case Design/rms might or might not be useful. > > I'm not clear on what recoding or transformation is needed for an > ordinal dependent variable and ratio/interval independent variables, nor > why rms/Design would not be useful. I was thinking about transformations/recoding of the *independent* variables... Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What parts of 'Statistical Models in S' are not applicable to R?
Le mercredi 11 novembre 2009 à 10:22 +0100, Uwe Ligges a écrit : > > Peng Yu wrote: > > According to Amazon review, 'Statistical Models in S' is a key > > reference for understanding the methods implemented in several of > > S-PLUS' high-end statistical functions, including 'lm()', predict()', > > 'design()', 'aov()', 'glm()', 'gam()', 'loess()', 'tree()', > > 'burl.tree()', 'nls()' and 'ms()'. > > > > But since it is for S, some part of the book may not be applicable to > > R. Some examples (e.g. interaction.plot()) discussed in this book are > > not available in R. Without, working examples, it is sometimes > > difficult for me to understand the materials in the book. > > > > Besides the functions mentioned in the Amazon review, could somebody > > give me a hint on what chapters (or sections) in this book are not > > appropriate to R? > > > They all are appropriate, but nuances differ these days, as some nuances > differ for recent S-PLUS versions, 17 years later. It should still be > fine to learn some relevant concepts. You could also note that, at least in the 4th (last) edition of the book, the authors have marked passages with differences between R and S+ with a marginal "R". Now this book has grow a bit out of date since its lst edition (2002 ?) : Newer R packages implements various things previously not implemented in R (e.g. multiple comparisons after ANOVA, previously available in S+ with the "multicomp" function, nd implemented (with a lot more generalizability) in the "multcomp" package). A 5th edition might be in order, but that would be a *daunting* task : The "language" R has grew (e. g. namespaces), the nature and extend of avilable tasks has grew *enormously*, and I don't think that producing a book that would be to 2009 R what V&R4 was to 2002 R is doable by a two-person team, as talented, dedicated and productive as these two persons might be (modulo Oxford sarcasm :-). Furthermore, these two persons already give an enormous amount of time and effort to other R development (search for R-help activity of BV and BDR, or meditate on the recently published stats on R source activity...). Such a document would probably have to be something other than a book to stay up to date and accurate, and even coordinating such a task would need serious time... Even if it would exclude anything present in the vrious packages help files, and should limit to tutorial introductions, examples and discussions, the sheer volume (1700+ packages last time I looked) and the difficulty of coordination (how do you discuss 5 different packages, implementing various means to solve the same problem ?) would involve serious organizational difficulties. So I doubt such a document will get produced in the foreseeable future. Frequent R-help reading and note-taking is the second-best option... To come back to R-vs-S+ topic : unless I'm mistaken, R seems to be currently the dominant version of the S language, and most published S material will nowadays (implicitly) be aimed at R. This should reduce the importance of the problem. Sincerely, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to use SQL code in R
Le lundi 16 novembre 2009 à 13:14 +0800, sdlywjl666 a écrit : > Dear All, > How to use SQL code in R? Depends on what you mean by "using SQL code in R"... If you mean "Query, update or otherwise mess with an existing database", look at the RODBC packages and/or various RDBI-related packages (hint : look at BDR's nice manual on importing/exporting data in R which comes with the standrd R documentation). If you mean "use standard SQL code to manipilte R dataframes", look at the sqldf ackage (quite nice, but be aware that is work in progress"). If you mean "use Oracle's command language or pgsql or ... in order to transfer in R code written for a specific database", I'm afraid your SOL... In that last case, however, note that you may be able to use R *inside* your external database instead, depending on its ability to use external code (I'm thinking of the rpgsql language, which allows running R code in a pgsql function in PostgreSQL). HTH, Emmanuel Charpentier BTW, there exist a R database Special Interest Group, with a mailing list. Lookup their archive, and maybe ask a (slightly less general, if possible) version of your question there... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with lme4 model specification
Le mercredi 06 mai 2009 à 15:21 -0700, boeinguy2 a écrit : > I am new to R and am trying to specify a model for mixed model analysis. > When I run the following model I get an error: > > AAT<- lmer(Y ~ S + A + (1|S:A/H), data=AT, REML=True) > > The error looks like this: > > Error in Spar_loc:`:` : NA/NaN argument > In addition: Warning messages: > 1: In model.matrix.default(mt, mf, contrasts) : > variable 'Spar_loc' converted to a factor This is suspicious : Spar_loc doesn't appear in your statement... > 2: In Spar_loc:`:` : > numerical expression has 720 elements: only the first used Ditto... > I am having trouble specifying th random component. It should reflect the > random term as H nested within the interaction of S & A. What am I doing > wrong? The precedence of modelling operators is not very clear (and does not seem to be formally defined in the "standard" documentation...). In doubt, use parentheses. So try ...+(1|(S:A)/H), just in case... You might also try to define it outside the model by adding variables : AT<-within(AT,{SA=S:A, SAH=SA:H[,drop=TRUE]}) AAT<-lmer(y~S+A+(1|SAH),data=AT,REML=TRUE) > :confused: Enlightened ? Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Do you use R for data manipulation?
Le lundi 11 mai 2009 à 23:20 +0800, ronggui a écrit : [ Snip... ] > > But, at least in my trade, the ability to handle Excel files is a must > > (this is considered as a standard for data entry. Sigh ...). [ Re-snip... ] > I don't think Excel is a standard tool for data entry. Epidata entry > is much more professional. Irony squared ? This *must* go in the fortunes file ! Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to save R "clean" sessions in BATCH mode?
Le samedi 16 mai 2009 à 17:21 +0200, mcnda...@mncn.csic.es a écrit : > Thanks a lot for all of you that have reply me about opening and > ending R workspaces in BATCH mode. However replies were a king general > and Im afraid I could not take the entire message from them. > Therefore I chose to expose here a representative fraction of my work. > > I have 50 Rdata files (F1,F2,F3,F4, ,F50) with objects inside. > I need to: > > open F1: > - perform some simple operations with the objects > - export the solution with write.table > - end F1 session > open F2 > repeat procedures as F1 > > open F50 > repeat procedures as F1 > > > My difficulty here is to end a workspace and open one from the scratch > to avoid mixing files from consecutive worksessions, and thus using R > memory unnecessarily. I could use rm() to delete objects from the > previous sessions but it seems not an efficient task. And re-loading R, rebuilding a whole process context, re-allocating memory is an efficient one ? Hah ! > Any suggestions on how to perform this in Batch Mode? An examplified > help would be nice! Why not encapsulate your procedures in a function taking the filename as its argument and loopîng on the filenames list ? Anything created in the function, being local to the function, will be (efficiently) cleaned up at the function exit. Magic... Exemple : > ls() character(0) > Foo<-runif(10,0,1) > ls() [1] "Foo" > ?save.image > save.image("Foo1.RData") > ls() [1] "Foo" > rm(list=ls()) > Foo<-letters[round(runif(10,min=1,max=26))] > Foo [1] "v" "m" "b" "y" "g" "u" "r" "f" "y" "q" > save.image("Foo2.RData") > rm(list=ls()) > bar<-edit() bar<-edit() Waiting for Emacs... > bar function(filename) { load(file=filename) print(ls()) print(Foo) invisible(NULL) } > ls() [1] "bar" > bar("Foo1.RData") [1] "filename" "Foo" # Note : by default, ls() list the function's # environment, not the global one... **> no "bar" here... [1] 0.8030422 0.6326055 0.8188481 0.6161665 0.5917206 0.6631358 0.7290200 [8] 0.2970315 0.2016259 0.4473244 > ls() [1] "bar" # Bar is still in the global environment... > bar("Foo2.RData") [1] "filename" "Foo" [1] "v" "m" "b" "y" "g" "u" "r" "f" "y" "q" > ls() [1] "bar" > Good enough for you ? HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to google for R stuff?
Le mercredi 20 mai 2009 à 09:02 -0400, Kynn Jones a écrit : > Hi! I'm new to R programming, though I've been programming in other > languages for years. > > One thing I find most frustrating about R is how difficult it is to use > Google (or any other search tool) to look for answers to my R-related > questions. With languages with even slightly more distinctive names like > Perl, Java, Python, Matlab, OCaml, etc., usually including the name of the > language in the query is enough to ensure that the top hits are relevant. > But this trick does not work for R, because the letter R appears by itself > in so many pages, that the chaff overwhelms the wheat, so to speak. > > So I'm curious to learn what strategies R users have found to get around > this annoyance. ISTR having this question or very close ones at least thrice in the last two months. Time for a FAQ entry ? (It does not seem to exist : I checked...) Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear Regression with Constraints
Le mardi 26 mai 2009 à 14:11 -0400, Stu @ AGS a écrit : > Hi! > I am a bit new to R. > I am looking for the right function to use for a multiple regression problem > of the form: > > y = c1 + x1 + (c2 * x2) - (c3 * x3) > > Where c1, c2, and c3 are the desired regression coefficients that are > subject to the following constraints: > > 0.0 < c2 < 1.0, and > 0.0 < c3 < 1.0 Sounds rather like an in-the-closet Bayesian problem (with a very strange prior...). Did you consider to submit it to WinBUGS (or JAGS) ? If you still want a direct optimization, you could have started : RSiteSearch("optimization constraint") Which would have quickly led you to ask : ? constrOptim > y, x1, x2, and x3 are observed data. > I have a total of 6 rows of data in a data set. ??? I that's real-life data, I wonder what kind of situation forces you to estimate 3+1 parameters (c1, c2, c3 and the residual, which is not really a parameter) with 6 data points ? Your problem can be written as a system of 6 linear equations with 3 unknowns (c1, c2, c3), leaving you room to search in (a small piece of) R^3 (the residual is another way to express your objective function, not an independent parameter). Of course, if it's homework, get lost ! Emmanuel Charpentier > Is "optim" in the stats package the right function to use? > Also, I can't quite figure out how to specify the constraints. > Thank you! > > -Stu > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear Regression with Constraints
-optim(par=c(c1=0.5, c2=0.5, c3=0.5), + fn=objfun, + gr=objgrad, + data=Data2, + method="L-BFGS-B", + lower=rep(-Inf, 3), + upper=rep(Inf, 3))) utilisateur système écoulé 0.008 0.000 0.006 > > D2.unbound $par c1 c2 c3 2.3988370 0.1983261 -0.3141863 # Not very close to the "real" values $value [1] 8.213914 # keep in mind $counts function gradient 12 12 $convergence [1] 0 $message [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH" > > system.time(D1.bound<-optim(par=c(c1=0.5, c2=0.5, c3=0.5), + fn=objfun, + gr=objgrad, + data=Data1, + method="L-BFGS-B", + lower=c(-Inf,0,0), + upper=c(Inf,1,1))) utilisateur système écoulé 0.004 0.000 0.004 > > D1.bound $par c1 c2 c3 -0.5052117 0.2478706 0.7626612 $value [1] 12.87678 # About the same values as for the unbounded case. Okay $counts function gradient 99 $convergence [1] 0 $message [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH" > > # The real test : what will be found when the "real" objective is out of > # bounds ? > > system.time(D2.bound<-optim(par=c(c1=0.5, c2=0.5, c3=0.5), + fn=objfun, + gr=objgrad, + data=Data2, + method="L-BFGS-B", + lower=c(-Inf,0,0), + upper=c(Inf,1,1))) utilisateur système écoulé 0.004 0.000 0.004 > > D2.bound $par c1 c2 c3 2.0995442 0.2192566 0.000 # The optimizer bangs is pretty little head on the c3 wall. c1 is out of # the picture, but c2 happens to be not so bad... $value [1] 14.4411 # The residual is (as expected) quite larger than in the unbound case... $counts function gradient 88 $convergence [1] 0 $message [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH" # ... and that's not a computing anomaly. This also illustrates the hubris necessary to fit 3 coefficients on 6 data points... HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can I Compile R with MS Access 2003 Developer Extensions?
Le dimanche 31 mai 2009 à 12:08 -0700, Felipe Carrillo a écrit : > Hi: > I have an application that uses MS Access as front end interacting > with Excel,Word,Sigmaplot and R in the background. I use the programs > to do different tasks but I mainly use R to create graphics on my > Access forms without having to manually open R. > To make this short, I was wondering if R can be compiled from Access > using the Developer Extensions. I know that the MS office programs can > be compiled with it but don't know if it can be done including R. Somehow, I strongly doubt it : R compilation needs tools totally alien to Microsoft's world vision. However, ISTR that there exists some sort of R-WinWorld bridge. Have a look here : http://cran.miroir-francais.fr/contrib/extra/dcom/ Note that I can't vouch for it myself : I didn't touch a Windows computer for years... HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] montly mean temp plot
Le mardi 02 juin 2009 à 16:32 +, ms.com a écrit : > Dear all > i got a problem in monthly mean temperature. here i am attaching the data set > as well as the plot i got with the following command > plot(month,type='n') > plot(month,X1999) > > this command gave the plot where the month names are in alphabetic order, i > want the plot in monthly sequence > could you please suggest me how can i solve my problem? >From the depths of my limbic system: ? factor ? relevel ? ordered (two different solutions, with varying implications...). __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Small mystery : passing a "subset=" argument to lme|lm through "..."
Dear list, I have problems involving passing a "subset=" argument through "...". I'm trying to augment the set of defined analyses for mice (homonymous package) with a call to lme. This package create multiple imputations of missing data in a "mids" object, each completed data set may be obtained through the complete(data, set) function. > sessionInfo() R version 2.9.0 (2009-04-17) x86_64-pc-linux-gnu locale: LC_CTYPE=fr_FR.UTF-8;LC_NUMERIC=C;LC_TIME=fr_FR.UTF-8;LC_COLLATE=fr_FR.UTF-8;LC_MONETARY=C;LC_MESSAGES=fr_FR.UTF-8;LC_PAPER=fr_FR.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=fr_FR.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] odfWeave_0.7.10 XML_2.3-0 lattice_0.17-25 loaded via a namespace (and not attached): [1] grid_2.9.0 MASS_7.2-47 nlme_3.1-92 rpart_3.1-44 > # First attempt : foo<-function(formula, data, ...) { library(nlme) call <- match.call() if (!is.mids(data)) stop("The data must have class mids") analyses<-lapply(1:data$m, function(i) lme(formula, data=complete(data,i), ...)) object <- list(call = call, call1 = data$call, nmis = data$nmis, analyses = analyses) oldClass(object) <- c("mira", oldClass(object)) return(object) } > bar1<-foo(ttotal~nbpradio, data=Data1.Imp, random=~1|ctr) > class(bar1$analyses[[1]]) [1] "lme" # Fine : the "random=" argument *HAS BEEN* passed to lme() # through "..." ; therefore it's (usually) possible. # Further tests (not shown) show that results are "reasonable" (i. e. # compliant with expectations...). > bar2<-foo(log(ttotal)~log(nbpradio), data=Data1.Imp, random=~1|ctr, subset=nbpradio>0) Erreur dans eval(expr, envir, enclos) : ..2 utilisé dans un mauvais contexte, aucun ... où chercher > Further exploration show that a "subset=" argument *DOES NOT GET PASSED* to lme(), which complaints furthermore not to find a "..." argument. Since lme is a method private to the nlme package, I can't trace the source of error. [Later : Same observation with lm() instead of lme()...] Now, ISTR that an analogous problem *has already been discussed* on this list, but even a prolonged use of RSiteSearch() did not retrieve it. http://finzi.psych.upenn.edu/R/Rhelp02/archive/10139.html hints that "Modelling and graphics functions with formulas all have slightly different nonstandard evaluation rules, and it turns out that lme() doesn't evaluate extra arguments like subset in the parent environment", but this does not tell me why a "..." argument is *not* found. Different frame of evaluation ? May some kind soul point me to the right direction ? (He may even further my shame by telling me how he retrieved the relevant information...). Sincerely, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] New Meta-Analysis Package (metafor)
Le vendredi 05 juin 2009 à 14:03 +0200, Viechtbauer Wolfgang (STAT) a écrit : > A new package is now available via CRAN, called metafor. > > The metafor package consists of a collection of functions for > conducting meta-analyses in R. Fixed- and random-effects models (with > and without moderators) can be fitted via the general linear > (mixed-effects) model. For 2x2 table data, the Mantel-Haenszel and > Peto's method are also implemented. The package also provides various > plot functions (for example, for forest, funnel, and radial plots) and > functions for assessing the model fit and obtaining case diagnostics. Thank you very much ! I think that this package was expected ... Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] p-values from VGAM function vglm
Dear David, Le vendredi 05 juin 2009 à 16:18 -0400, David Winsemius a écrit : > On Jun 5, 2009, at 3:15 PM, Steven Matthew Anderson wrote: > > > Anyone know how to get p-values for the t-values from the > > coefficients produced in vglm? > > Attached is the code and output — see comment added to output to > > show where I need p-values > > > > > > + print(paste("** Using VGAM function gamma2 > > **")) > > + modl2<- > > vglm(MidPoint~Count,gamma2,data=modl.subset,trace=TRUE,crit="c") > > + print(coef(modl2,matrix=TRUE)) > > + print(summary(modl2)) > > > > > > [1] "** Using VGAM function gamma2 **" > > VGLMlinear loop 1 : coefficients = > > 0.408464609241, 3.255887520104, -0.000220585671 > > VGLMlinear loop 2 : coefficients = > > 2.34723239e-01, 1.28969691e+00, -4.52393778e-05 > > VGLMlinear loop 3 : coefficients = > > 2.19500481e-01, 1.92534895e+00, -3.02160949e-05 > > VGLMlinear loop 4 : coefficients = > > 2.19383151e-01, 2.26845910e+00, -3.00838664e-05 > > VGLMlinear loop 5 : coefficients = > > 2.19383045e-01, 2.34645688e+00, -3.00836087e-05 > > VGLMlinear loop 6 : coefficients = > > 2.19383045e-01, 2.34977070e+00, -3.00836082e-05 > > VGLMlinear loop 7 : coefficients = > > 2.19383045e-01, 2.34977637e+00, -3.00836082e-05 > > VGLMlinear loop 8 : coefficients = > > 2.19383045e-01, 2.34977637e+00, -3.00836082e-05 > > log(mu) log(shape) > > (Intercept) 2.193830e-01 2.349776 > > Count -3.008361e-05 0.00 > > > > Call: > > vglm(formula = MidPoint ~ Count, family = gamma2, data = modl.subset, > >trace = TRUE, crit = "c") > > > > Pearson Residuals: > > Min 1Q Median 3Q Max > > log(mu)-1.7037 -0.82997 0.072275 0.78520 1.72834 > > log(shape) -2.5152 -0.32448 0.254698 0.58772 0.70678 > > > > > > # NEED P-VALUES HERE # > > Perhaps: > > dt(summary( modl2 )@coef3[ , 3], 1) ??? 1) dt() is the density. didn't you mean pt() ? 2) I'd rather quote 2*min(pt(), 1-pt()) ("bilateral tests", y'know...) 3) The real hitch here is : what are the right DoF ? I do not think there is an easy answer to *this* one... Emmanuel Charpentier > > > > Coefficients: > >Value Std. Error t value > > (Intercept):1 2.1938e-01 5.2679e-02 4.16455 > > (Intercept):2 2.3498e+00 1.7541e-01 13.39574 > > Count -3.0084e-05 8.9484e-05 -0.33619 > > > > Number of linear predictors: 2 > > > > Names of linear predictors: log(mu), log(shape) > > > > Dispersion Parameter for gamma2 family: 1 > > > > Log-likelihood: -26.39268 on 123 degrees of freedom > > > > Number of Iterations: 8 > > > > > > Steven Matthew Anderson > > > > Anderson Research, LLC > > Statistical Programming and Analysis > > SAS (R) Certified Professional > > adastr...@mac.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] large numbers of observations using ME() of spdep
Le dimanche 07 juin 2009 à 02:50 +0900, lucero mariani a écrit : > Dear All, > We aim to remove the spatial structure of our data using Moran Eigen > Vectors and spdep package . Our data has 3694 samples and 13 > variables. > The computer stop working after almost 4 days of processing (we found > it emitting a sharp sound and with all colors on the screen. No > wories, it was restared without problem!). Cool ! Farm it out to a night club (and get a real computer with the proceedings). Sorry. Couldn't resist ! And twice sorry to be unable to help with your *real* problem... Emmanuel Charpentier > And we are left with > nothing: no result file was produced since the calculations were > interrumpted! > Consequently, I am looking for a way to accelerate calculations with > ME() of spdep and to save the results step by step (for each eigen > value). > I came accross several promissing ideas: 1)make a compiled program in > C and call it in R with the command system(); 2)use muliR. However, I > do not know how to use these tools. I do not understand exactely why > the calculations are so slow and I do know know how to write a program > in C... > I am using Ubuntu 9.04 and R version 2.8.1 . > > The code in R is a fallows: > taimin.xy = taimin[c("dy", "dx")]; > coordinates(taimin.xy) <- ~dy+dx; #dy, dx, name of the respective > coordinate columns > coords<-coordinates(taimin.xy); > library(spdep); > TaiminGabrielGraph<-gabrielneigh(coords, nnmult = 12); > tai.gab.nb<-graph2nb(TaiminGabrielGraph,sym=TRUE); > nbtaim_distsg <- nbdists(tai.gab.nb, coords); > nbtaim_simsg <- lapply(nbtaim_distsg, function(x) (1-((x/(4*50))^2)) ); > MEtaig.listw <- nb2listw(tai.gab.nb, glist=nbtaim_simsg, style="B"); > sevmtaig <- ME(Presabs > ~Age+Curv+Zon2005+ZoneMeiji+Wi+Sol+Slope+Seadist+Elev+Basin,data=taimin, > family=binomial,listw=MEtaig.listw) > > Any help is welcome! > > Thanks > > Lucero Mariani, Yokohama National University, Japan. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] User defined GLM?
Le lundi 22 juin 2009 à 09:35 -0700, francogrex a écrit : > Hello, I have this generalized linear formula: > log(x[i]/n[i])=log(sum(x)/sum(n)) + beta[i] I do not understand the problem as stated. if x[i] and n[i] are known, and unless sum(n)=0, your dataset reduces to a set of nrow(dataset) independent linear equations with nrow(dataset) unknowns (the beta[i]), whose solution is trivially beta[i]=log(x[i]/n[i])-log(sum(x)/sum(n), except when n[i]=0 in which case your equation has no solution. Could you try to re-express your problem ? Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] LME as part of meta-analysis
Le vendredi 27 mars 2009 à 15:30 -0700, mrint a écrit : > Hi, I'm having a problem using LME and hopefully I can get some help. Here > is what I'm doing: > > I have a model like this: > > yi = alpha + theta * xi + beta * zi + error, errors are normally distributed > mean 0, var sigma^2 > > xi and zi are generated from normal distributions within a specified range. > > True values of alpha, theta, beta and sigma^2 are chosen with a specific > mean and variane tau^2. > > I have a function which generates the yi from the other variables, then a > function that does a linear regression using lm to create the estimates. > > I then want to use this data to do a meta-anaylsis with the above repeated > between 10-20 times. Within this, I want to use lme to create estimates for > the average true value, sample mean and average standard error for alpha, > theta, beta and the respective tau^2 values for each of these. For the lme > part, I'm using this > > a<-summary(lme(alp~1,random=~1|alp, weights=varFixed(~staalp^2))) > > This is the one for alpha. This isn't producing the type of results I would > expect, can anyone see where I'm going wrong? (I suppose that your simulation aims to assess a specific model) This, and closely related subjects, have already been discussed on this very list. To make a long story short : lme doesn't (currently) accepts means and variances of groups as an input, it needs individual data. Someone (that should be Wolfgang Vischbauer, but I'm almost surely mutilating his name's spelling ; apologies, Wolfgang !) has written, specifically for meta-regression purposes, a "mima" function that does what you're requesting. Wolfgang has stated his intentions to turn this function into a full-fledged R package (with calling conventions closer to what other regression functions use), but the "mima" function available on his site still his 2 years old 0.4 version. For further details, look for "mima" or for "meta-regression" in the list archives. RSiteSearch() is your friend... However, if what you're interested with is strictly speaking a meta-analysis of 2-samples comparisons (i. e. your theta is scalar and your x_i are logicals), (at least) two R packages available on CRAN are built for this purpose : rmeta and meta. Both offer separate analyses for boolean or continuous dependent variables (i. e. y_i logical or continuous). If your theta is scalar but your x_i is continuous (i. e. you're meta-analysing a single regression coefficient), both package offer a variant for meta-analysis of effects, that might be relevant for you. A more general solution would be to enhance the forthcoming lme4 package to accept an alternative specification of random effects variances-covariances, which would allow "general" meta-regression. But I understand that Douglas Bates has already way too much work and not too much time on his hands, and I doubt he might be coaxed to work in this direction right now... A suggestion : you might forward your question to the "r-mixed-models" SIG mailing list with some profit... Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculate directions between points
Le samedi 28 mars 2009 à 16:06 +, Wanja Mathar a écrit : > Hi everyone, > > i have a matrix of points and want to calculate the direction and length of > each vector between the points. > > I can easily calculate the length with dist(testdata), but is their a > function that returns a matrix of the direction of the vectors between points > (angles 0-360°)? > > testdata <- matrix(1:8, nrow=4, ncol=2, > dimnames=list(c("A","B","C","D"), c("x","y")) ?atan2 ; ?outer Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Discriminant Analysis - Obtaining Classification Functions
reauire(MASS) ; ?predict.lda should enlighten you. Glancing at V&R4 might be a bit more illuminating... HTH Emmanuel Charpentier Le vendredi 03 avril 2009 à 22:29 +0200, Pavel Kúr a écrit : > Hello! > > I need some help with the linear discriminant analysis in R. > I have some plant samples (divided into several groups) on which I > measured a few quantitative characteristics. Now, I need to infer some > classification rules usable for identifying new samples. > I have used the function lda from the MASS library in a usual fashion: > > lda.1 <- lda(groups~char1+char2+char3, data=xxx) > > I'd like to obtain the classification functions for the particular > groups, with the aid of which I could classify unknown samples. I know > I can use predict.lda to classify such samples, but I need to obtain > some equations into which I could simply put the measured values of an > unknown sample manually and the result would predict which group the > sample most probably belongs to (like in eg. STATISTICA). > I haven't found out how to extract these functions from the lda output. > Could somebody give me some advice? > > Thank you in advance, > > Pavel Kur > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [OT ?] rant (was : Re: Conversions From standard to metric units)
Le vendredi 03 avril 2009 à 14:17 -0400, stephen sefick a écrit : > I am starting to use R for almost any sort of calculation that I need. > I am a biologist that works in the states, and there is often a need > to convert from standard units to metric units. US/Imperial units are *not* standard units. The former "metric system" is now called "Système International" (International System) for a reason, which is *not* gallocentrism of a "few" 6e7 frogs, but rather laziness of about 5.6e9 losers who refuse to load their memories with meaningless conversion factors... Emmanuel Charpentier who has served his time with pounds per cubic feet, furlongs per fortnight, BTU and other figments of British/American sadistic imagination, thank you very much... # Again, didn't work the first time... > Is there a package in > R for this already? If not I believe that I am going to write some of > the most often used in function form. My question is should I include > this in my StreamMetabolism package. It is not along the same theme > lines, but could loosely fit. The reason that I ask is that I don't > want to clutter CRAN with a small package containing some conversion > functions because I am to lazy to source them into R every time that I > use them, but I also don't want the StreamMetabolism package to turn > into StephenMisc Fuctions. Thoughts, comments, or suggestions would > be appreciated. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [OT ?] rant (was : Re: Conversions From standard to metricunits)
Le vendredi 03 avril 2009 à 20:01 -0400, Murray Cooper a écrit : > For science yes. For pleasure I'll still take a pint instead of 570ml! Yes, but do you realize that you'll have to pee in fl. oz ? Aie ... Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Basic doubts on the use of several script files in R batch mode
Le vendredi 03 avril 2009 à 22:30 +0200, mcnda...@mncn.csic.es a écrit : > I already searched for information regarding the batch file operations > within R. But I could not locate the information I need. > > Basically I have a doubt regarding the procedures on the batch use of > several script files (*.R). > How can I do this? > How can I define the order of files? > > My intention is to programme openings, math operations and savings of > *.Rdata sessions, but in a sequential manner. >From the top of my limbic system : $ cat file1 file2 file3 > R --vanilla --no-save # Of course, file3 should call save.image() at some convenient place Another, slightly more "correct" solution would be a batch job doing the relevant R CMD BATCH invocations See also the littler package, which allows to use R as a shell language. I don't know zilch about it ... HTH Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cross-platforms solution to export R graphs
Le jeudi 09 avril 2009 à 15:04 +0200, Philippe Grosjean a écrit : > Hello Rusers, > > I have worked on a R Wiki page for solutions in exporting R graphs, > especially, the often-asked questions: > - How can I export R graphs in vectorized format (EMF) for inclusion in > MS Word or OpenOffice outside of Windows? > - What is the best solution(s) for post-editing/annotating R graphs. > > The page is at: > http://wiki.r-project.org/rwiki/doku.php?id=tips:graphics-misc:export. > > I would be happy to receive your comments and suggestions to improve > this document. Well, if you insist ... The PDF import plugin in OpenOffice is still beta, and some report deem it difficult to install correctly an/or flaky. Having checked that both MSWord (>=2000) and OpenOffice (>=2.4) import and display correctly (i. e. vectorially, including fonts) EPS files, I switched to this format, most notably for use with the marvellous Max Kuhn's odfWeave package, which is a *must* for us working in state/administrative/corporate salt mines, where \LaTeX is deemed obscene and plain \TeX causes seizures ... The point is that this format doesn't need any intermediary step, thus allowing for automatisation. Be aware, however, that the embedded EPS images are not editable in-place by OpenOffice nor, as far as I know, by MS Word. But my point was to *avoid* post-production as much as humanly possible (I tend to be inhumanly lazy...). HTH, Emmanuel Charpentier > All the best, > > PhG __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] F test
Le jeudi 16 avril 2009 à 14:08 -0300, Mike Lawrence a écrit : > summary(my_lm) will give you t-values, anova(my_lm) will give you > (equivalent) F-values. Ahem. "Equivalent", my tired foot... In simple terms (the "real" real story may be more intricate) : The "F values" stated by anova are something entierely different of t values in summary. The latter allow you to assess properties of *one* coefficient in your model (namely, do I have enough suport to state that it is nonzero ?). The former allows you to assess whether you have support for stating that *ALL* the coefficient related to the same factor cannot be *SIMULTANEOUSLY* null. Which is a horse of quite another color... By the way : if your "summary" indeed does give you the mean^K^K an unbiased estimate of your coefficient and an (hopefully) unbiased estimate of its standard error, the "F" ration is the ratio of estimates of "remaining" variabilities with and without the H0 assumption it tests, that is that *ALL* coefficients of your factor of interest are *SIMULTANEOUSLY* null. F and t "numbers" will be "equivalent" if and only if your "factor of interest" needs only one coefficient to get expressed, i. e. is a continuous covariable or a two-class discrete variable (such as boolean). In this case, you can test your factor either by the t value which, under H0, fluctuates as a Student's t with n_res dof (n_res being the "residual degrees of freedom" of the model) or by the F value, which will fluctuate as a Fisher F statistic with 1 and n_res dof, which happens (but that's not happenstance...) to be the *square* of a t with n_dof. May I suggest consulting a textbook *before* flunking ANOVA 101 ? Emmanuel Charpentier > summary() might be preferred because it also > provides the estimates & SE. > > > a=data.frame(dv=rnorm(10),iv1=rnorm(10),iv2=rnorm(10)) > > my_lm=lm(dv~iv1*iv2,a) > > summary(my_lm) > > Call: > lm(formula = dv ~ iv1 * iv2, data = a) > > Residuals: > Min 1Q Median 3Q Max > -1.8484 -0.2059 0.1627 0.4623 1.0401 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) -0.4864 0.4007 -1.2140.270 > iv1 0.8233 0.5538 1.4870.188 > iv2 0.2314 0.3863 0.5990.571 > iv1:iv2 -0.4110 0.5713 -0.7190.499 > > Residual standard error: 1.017 on 6 degrees of freedom > Multiple R-squared: 0.3161, Adjusted R-squared: -0.02592 > F-statistic: 0.9242 on 3 and 6 DF, p-value: 0.4842 > > > anova(my_lm) > Analysis of Variance Table > > Response: dv > Df Sum Sq Mean Sq F value Pr(>F) > iv11 1.9149 1.9149 1.8530 0.2223 > iv21 0.4156 0.4156 0.4021 0.5494 > iv1:iv21 0.5348 0.5348 0.5175 0.4990 > Residuals 6 6.2004 1.0334 > > > On Thu, Apr 16, 2009 at 10:35 AM, kayj wrote: > > > > Hi, > > > > > > How can I find the p-value for the F test for the interaction terms in a > > regression linear model lm ? > > > > I appreciate your help > > > > > > -- > > View this message in context: > > http://www.nabble.com/F-test-tp23078122p23078122.html > > Sent from the R help mailing list archive at Nabble.com. > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] F test
Le jeudi 16 avril 2009 à 13:00 -0500, Jun Shen a écrit : > Mike, > > I kind of have the same question. What if for a mixed effect model, say > using lme(), how to specify the interaction effect (between a fixed effect > and a random effect)? With lme, you have to specify a *list* of random effects as the "random=" argument, which, off the top of my (somewhat tired) limbic system, is specified by an "lmList", iirc. But I'm pretty sure that "?lme" will give you much better information... lmer is simpler. Say you basic model is "Z~Y" with another variable X acting as a possible source of random variation. You express random effects by adding (1|X) terms (meaning X is a simple (intercept) random effect) to the model, or (Y|X), which specifies both an intercept random effect of X and an effect of the slope of the (supposed) line binding Y to the independent variable. If you want to specify a possible variation of slopes with no possible intercept effect, say "Z~Y+(0+Y|X)" (unless I'm mistaken : I never used that, because I never had to work on such a model in a context where it would make sense...). > and where to find the result of the interaction? Ah. That's the (multi-)million dollars question. You're dealing with a fixed*random interaction, which has a totally different meaning of a fixed*fixed interaction. The test of the latter means "does the (possible) effect of the first factor vary with levels of the second factor ?", whereas the second reads as "does the random factor increase variability about what I know of the effect of the fixed factor ?", with totally different consequences. Pinhero and Bates' book (2000), which among other thing, describes the care, feeding and drinks of lme, give explanations about 1) why the problem is computationnaly hard 2) how to get an approximate answer and 3) in which proportion previous advice might be misleading. This book is, IMHO, required reading for anybody with more than a passing interest on the subject, and I won't paraphrase it... Since then, Pr Bates started to develop lme4, which has different inner algorithms. In the cours of this work, he started to have doubts about the "conventional wisdom" about testing effects in mixed models, and did not (yet) provide these "conventionals" means of testing. Instead, he seems to work along the lines of MCMC sampling to assess various aspects of mixed models. But there I suggest to walk along the R-SIG-mixed-model archives, which are *very* interesting. Of course, if you want an authoritative answer (i. e. an answer that will pass a medical journal's reviewer unquestioned), you can always use SAS' proc mixed. But I wouldn't swear this answer is "exact", or even sensible, as far as I can judge... Pr Bates seems to answer readily any (sensible) questions on the ME mailing list, where you will also find folks much more authorized than yours truly to answer this and that question... HTH, Emmanuel Charpentier > Thanks. > > Jun > > On Thu, Apr 16, 2009 at 12:08 PM, Mike Lawrence wrote: > > > summary(my_lm) will give you t-values, anova(my_lm) will give you > > (equivalent) F-values. summary() might be preferred because it also > > provides the estimates & SE. > > > > > a=data.frame(dv=rnorm(10),iv1=rnorm(10),iv2=rnorm(10)) > > > my_lm=lm(dv~iv1*iv2,a) > > > summary(my_lm) > > > > Call: > > lm(formula = dv ~ iv1 * iv2, data = a) > > > > Residuals: > >Min 1Q Median 3Q Max > > -1.8484 -0.2059 0.1627 0.4623 1.0401 > > > > Coefficients: > >Estimate Std. Error t value Pr(>|t|) > > (Intercept) -0.4864 0.4007 -1.2140.270 > > iv1 0.8233 0.5538 1.4870.188 > > iv2 0.2314 0.3863 0.5990.571 > > iv1:iv2 -0.4110 0.5713 -0.7190.499 > > > > Residual standard error: 1.017 on 6 degrees of freedom > > Multiple R-squared: 0.3161, Adjusted R-squared: -0.02592 > > F-statistic: 0.9242 on 3 and 6 DF, p-value: 0.4842 > > > > > anova(my_lm) > > Analysis of Variance Table > > > > Response: dv > > Df Sum Sq Mean Sq F value Pr(>F) > > iv11 1.9149 1.9149 1.8530 0.2223 > > iv21 0.4156 0.4156 0.4021 0.5494 > > iv1:iv21 0.5348 0.5348 0.5175 0.4990 > > Residuals 6 6.2004 1.0334 > > > > > > On Thu, Apr 16, 2009 at 10:35 AM, kayj wrote: > > > > > > Hi, > > > > > > > > > How can I find the p-value for the F test for the interaction terms in a > > > regression linear model lm ? > &g
[R] Modelling an "incomplete Poisson" distribution ?
Dear list, I have the following problem : I want to model a series of observations of a given hospital activity on various days under various conditions. among my "outcomes" (dependent variables) is the number of patients for which a certain procedure is done. The problem is that, when no relevant patient is hospitalized on said day, there is no observation (for which the "number of patients" item would be 0). My goal is to model this number of patients as a function of the "various conditions" described by my independant variables, mosty of them observed but uncontrolled, some of them unobservable (random effects). I am tempted to model them along the lines of : glm(NoP~X+Y+..., data=MyData, family=poisson(link=log)) or (accounting for some random effects) : lmer(NoP~X+Y+(X|Center)), data=Mydata, family=poisson(link=log)) While the preliminary analysis suggest that (the right part of) a Poisson distribution might be reasonable for all real observations, the lack of observations with count==0 bothers me. Is there a way to cajole glm (and lmer, by the way) into modelling these data to an "incomplete Poisson" model, i. e. with unobserved "0" values ? Sincerely, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Modelling an "incomplete Poisson" distribution ?
I forgot to add that yes, I've done my homework, and that it seems to me that answers pointing to zero-inflated Poisson (and negative binomial) are irrelevant ; I do not have a mixture of distributions but only part of one distribution, or, if you'll have it, a "zero-deflated Poisson". An answer by Brian Ripley (http://finzi.psych.upenn.edu/R/Rhelp02/archive/11029.html) to a similar question leaves me a bit dismayed : if it is easy to compute the probability function of this zero-deflated RV (off the top of my head, Pr(X=x)=e^-lambda.lambda^x/(x!.(1-e^-lambda))), and if I think that I'm (still) able to use optim to ML-estimate lambda, using this to (correctly) model my problem set and test it amounts to re-writing some (large) part of glm. Furthermore, I'd be a bit embarrassed to test it cleanly (i. e. justifiably) : out of the top of my head, only the likelihood ration test seems readily applicable to my problem. Testing contrasts in my covariates ... hum ! So if someone has written something to that effect, I'd be awfully glad to use it. A not-so-cursory look at the existing packages did not ring any bells to my (admittedly untrained) ears... Of course, I could also bootstrap the damn thing and study the distribution of my contrasts. I'd still been hard pressed to formally test hypotheses on factors... Any ideas ? Emmanuel Charpentier Le samedi 18 avril 2009 à 19:28 +0200, Emmanuel Charpentier a écrit : > Dear list, > > I have the following problem : I want to model a series of observations > of a given hospital activity on various days under various conditions. > among my "outcomes" (dependent variables) is the number of patients for > which a certain procedure is done. The problem is that, when no relevant > patient is hospitalized on said day, there is no observation (for which > the "number of patients" item would be 0). > > My goal is to model this number of patients as a function of the > "various conditions" described by my independant variables, mosty of > them observed but uncontrolled, some of them unobservable (random > effects). I am tempted to model them along the lines of : > > glm(NoP~X+Y+..., data=MyData, family=poisson(link=log)) > > or (accounting for some random effects) : > > lmer(NoP~X+Y+(X|Center)), data=Mydata, family=poisson(link=log)) > > While the preliminary analysis suggest that (the right part of) a > Poisson distribution might be reasonable for all real observations, the > lack of observations with count==0 bothers me. > > Is there a way to cajole glm (and lmer, by the way) into modelling these > data to an "incomplete Poisson" model, i. e. with unobserved "0" > values ? > > Sincerely, > > Emmanuel Charpentier > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Modelling an "incomplete Poisson" distribution ?
Dear Ben, Le samedi 18 avril 2009 à 23:37 +, Ben Bolker a écrit : > Emmanuel Charpentier bacbuc.dyndns.org> writes: > > > > > I forgot to add that yes, I've done my homework, and that it seems to me > > that answers pointing to zero-inflated Poisson (and negative binomial) > > are irrelevant ; I do not have a mixture of distributions but only part > > of one distribution, or, if you'll have it, a "zero-deflated Poisson". > > > > An answer by Brian Ripley > > (http://finzi.psych.upenn.edu/R/Rhelp02/archive/11029.html) to a similar > > question leaves me a bit dismayed : if it is easy to compute the > > probability function of this zero-deflated RV (off the top of my head, > > Pr(X=x)=e^-lambda.lambda^x/(x!.(1-e^-lambda))), and if I think that I'm > > (still) able to use optim to ML-estimate lambda, using this to > > (correctly) model my problem set and test it amounts to re-writing some > > (large) part of glm. Furthermore, I'd be a bit embarrassed to test it > > cleanly (i. e. justifiably) : out of the top of my head, only the > > likelihood ration test seems readily applicable to my problem. Testing > > contrasts in my covariates ... hum ! > > > > So if someone has written something to that effect, I'd be awfully glad > > to use it. A not-so-cursory look at the existing packages did not ring > > any bells to my (admittedly untrained) ears... > > > > Of course, I could also bootstrap the damn thing and study the > > distribution of my contrasts. I'd still been hard pressed to formally > > test hypotheses on factors... > > > > I would call this a truncated Poisson distribution, related > to hurdle models. You could probably use the hurdle function > in the pscl package to do this, by ignoring the fitting to > the zero part of the model. On the other hand, it might break > if there are no zeros at all (adding some zeros would be a > pretty awful hack/workaround). Indeed ... > If you defined a dtpoisson() for the distribution of the > truncated Poisson model, you could probably also use bbmle > with the formula interface and the "parameters" argument. This I was not aware of... Thank you for the tip ! By the way, I never delved into stats4, relying (erroneously) on its one-liner description in CRAN, which is (more or less) an implementation of stats with S4 classes. Therefore mle escaped me also... May I suggest a better one-liner ? (This goes also for bbmle...) > The likelihood ratio test seems absolutely appropriate for > this case. Why not? Few data, therefore a bit far from the asymptotic condition... Anyway, a better look at my data after discovering a bag'o bugs in the original files led me to reconsider my model : I wanted to assess, among other things, the effect of a boolean (really a two-class variable). After the *right* graph, I now tend to think that my distribution is indeed zero-deflated Poisson in one group and ... zero-deflated negative binomial in the other (still might be zero-deflated Poisson with very small mean, hard to say graphically...). Which gives me some insights on the mechanisms at work (yippeee !!) but will require some nonparametric beast for assessing central value differences (yes, this still has a meaning...), such as bootstrap. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] graph with 15 combinations
Le lundi 20 avril 2009 à 21:04 +0200, Penner, Johannes a écrit : > Dear R helpers, > > I have a data set with 4 types (W, C, E & S). Now I have values for all > types plus all possible combinations (the order is unimportant): W, C, > WC, E, WE, CE, WCE, S, WS, CS, WCS, ES, WES, CES & WCES. Ideally I would > like to represent everything in one graph and as concise as possible. > Drawing 4 circles and depicting it as overlap just gives me 13 out of > the 15 possibilities needed (as e.g. depicted here > http://www.psy.ritsumei.ac.jp/~akitaoka/classic9e.html in the graph > "Four circles surrounding illusion"). > > Does anybody has a nice solution, ideally with a possible solution in R? Plot on a torus. Should be trivial in R once you've found the torus feeder for your printer... :-) Emmanuel Charpentier > Thanks in advance! > Johannes > > -- > Project Coordinator BIOTA West Amphibians > > Museum of Natural History > Dep. of Research (Herpetology) > Invalidenstrasse 43 > D-10115 Berlin > Tel: +49 (0)30 2093 8708 > Fax: +49 (0)30 2093 8565 > > http://www.biota-africa.org > http://community-ecology.biozentrum.uni-wuerzburg.de > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Multiple imputations : wicked dataset ? Wicked computers ? Am I cursed ? (or stupid ?)
uot; (Stef van Buuren & Karin Groothuis-Oudshoorn) : > system.time(DataTest.mice<-mice(DataTest,m=5)) iter imp variable 1 1 cadredpErreur dans mice.impute.logreg(c(1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, : dims [produit 28] ne correspond pas à la longueur de l'objet [31] De plus : Warning message: In beta + rv %*% rnorm(ncol(rv)) : la taille d'un objet plus long n'est pas multiple de la taille d'un objet plus court Timing stopped at: 0.092 0 0.094 > The "textbook example" (example for mice) runs almost perfectly (warnings in the second example) Package "mix" (Joseph Schafer, maintained by Brian Ripley) : > # mix needs dataset columns sorted and counted > p<-ncol(DTM<-DataTest[sapply(DataTest,is.factor)]) > DTM<-cbind(DTM,DataTest[!sapply(DataTest,is.factor)]) > # Preliminary grouping and sorting > system.time(s<-prelim.mix(DTM,p)) user system elapsed 0.012 0.000 0.009 Warning messages: 1: In storage.mode(w) <- "integer" : NAs introduits lors de la conversion automatique 2: In max(w[!is.na(w[, j]), j]) : aucun argument pour max ; -Inf est renvoyé 3: NAs introduits lors de la conversion automatique 4: In max(w[!is.na(w[, j]), j]) : aucun argument pour max ; -Inf est renvoyé 5: NAs introduits lors de la conversion automatique 6: In max(w[!is.na(w[, j]), j]) : aucun argument pour max ; -Inf est renvoyé 7: NAs introduits lors de la conversion automatique > # Parameter estimation > system.time(thetahat<-em.mix(s,showits=TRUE)) Erreur dans rep(prior, s$ncells) : argument 'times' incorrect Timing stopped at: 0 0 0 ### indeed : s$ncells is NA ! ### Therefore, the rest crashes : > # Data augmentation proper > system.time(newtheta <- da.mix(s, thetahat, steps=100, showits=TRUE)) Erreur dans rep(prior, s$ncells) : argument 'times' incorrect Timing stopped at: 0 0 0 > # Single imputation... > system.time(ximp1 <- imp.mix(s, newtheta)) Erreur dans imp.mix(s, newtheta) : objet 'newtheta' introuvable Timing stopped at: 0.004 0 0.001 > The "textbook example" (example for da.mix) runs perfectly (and fast !). = I might be particularly stupid and misunderstanding manual and the "textbook" examples of one or two packages, but five ! Visual/manual/graphical examination of my dataset does not suggest an explanation. I am not aware of anything "fishy" in my setup (one of the machines is a bit aging, but the one used for the reported tests is almost new. Could someone offer an explanation ? Or must I recourse to sacrifying a black goat at midnight next new moon ? Any hint will be appreciated (and by the way, next new moon is in about 3 days...). Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loess over split data
Dear Luc, I stumbled on your unanswered question only this morning. Sorry for this lateness... Le jeudi 23 avril 2009 à 15:56 -0400, Luc Villandre a écrit : > Dear R users, > > I am having trouble devising an efficient way to run a loess() function > on all columns of a data.frame (with the x factor remaining the same for > all columns) and I was hoping that someone here could help me fix my > code so that I won't have to resort to using a for loop. > > (You'll find the upcoming lines of code in a single block near the end > of the message.) > > Here's a small sample of my data : > > my.data=data.frame(birth.vec=c(274, 259, 288, 284, 276, 274, 284, 288, > 273, 278), final.weight= c(2609.328, 2955.701, 4159.837, 4591.258, > 5144.559, 2877.159, 4384.498, 3348.454, 3323.391, 2918.055)) ; > > "birth.vec" is in days and "final.weight" is in grams. Now, what I'm > trying to do is to output smoothed curves for the 10th, 50th and 90th > percentiles over completed weeks. In other words, I transform > "birth.vec" into a factor vector of completed weeks with > > completed.weeks = as.factor(floor(my.data$birth.vec/7)) ; > > I use these factors to get weight quantiles by completed weeks with > > quan.list = tapply(my.data$final.weight,INDEX = > completed.weeks,FUN=quantile,probs=c(0.1,0.5,0.9)) ; > > I then collapse the list into a data.frame with > > quan.frame = data.frame(do.call(rbind,quan.list)) ; > > Now comes the time to apply the loess() function to each percentile > curve (i.e. to each column in quan.frame). Note that the x values for > all percentile columns (i.e. X10., X50., X90.) > > x.values = as.numeric(rownames(quan.frame)) ; > > Once again, using tapply() (and transposing beforehand): > > t.quan.frame = t(quan.frame) ; > > ## The following command doesn't work > > smooth.result = > tapply(t.quan.frame,INDEX=as.factor(rep(1:nrow(t.quan.frame),each=ncol(t.quan.frame))),FUN=loess) > > ; > > The "formula" argument of loess() is of course missing, since I have no > idea how I could specify it in a way that the function could understand it. Why not tapply(yadda, yadda, function(x)loess(whatever you mean, x is data...)) ? But the whole idea seems ... bizarre : 1) converting time to a *class* variable destroys any ordinal relation between different times ; a "curve" has no meaning. At the minimum, keep this variable numeric. 2) You lose any notion of individual data (e. g. "weight" of each week). 3) Do you have reason to expect empirical quantiles very much different of asymptotic quantiles (modulo a possible variable transformation) ? That forces you to "coarsen" your data, which is rarely a good idea... If no, the asymptotic IC80 curves can be obtained much cheaper. Modulo some possible typos : my.loess<-loess(final.weight~birth.vec,data=my.data) my.pred<-predict(my.loess, new.data=data.frame(birth.vec=my.pred.times<-with(my.data, seq(min(birth.weight), max(birth.weight, se=TRUE) my.curve=with(my.pred, data.frame(time=my.pred.times, center=fit, lb=fit+qnorm(0.1)*se.fit, ub=fit+qnorm(0.9*se.fit)) # Graphics *will* help : with(my.curve,{ plot(x=range(time), y=range(center,lb,ub), type="n", xlab="Achieved time (days)", ylab="Final weight (g)", main="A suspicious loess() fit"); lines(final.weight~birth.vec, type="p", data=my.data) ; lines(center~time, lty=1) ; lines(lb~time, lty=3) ; lines(ub~time, lty=3) }) Adding your empirical centiles to this graph might help... NB : on your example data set, there is a big gap between the earliest point and the rest, thus suspicious predictions in the gap (lower bound is *negative* ! ) ; maybe a variable change (log ?) might be in order ? This can be assessed only on the full data. HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple Imputation in mice/norm
Le vendredi 24 avril 2009 à 14:11 -0700, ToddPW a écrit : > I'm trying to use either mice or norm to perform multiple imputation to fill > in some missing values in my data. The data has some missing values because > of a chemical detection limit (so they are left censored). I'd like to use > MI because I have several variables that are highly correlated. In SAS's > proc MI, there is an option with which you can limit the imputed values that > are returned to some range of specified values. Is there a way to limit the > values in mice? You may do that by writing your own imputation function and assign them for the imputation of particular variable (see argument "imputationMethod" and details in the man page for "mice"). > If not, is there another MI tool in R that will allow me to > specify a range of acceptable values for my imputed data? In the function amelia (package "Amelia"), you might specify a "bounds" argument, which allows for such a limitation. However, be aware that this might destroy the basic assumption of Amelia, which is that your data are multivariate normal. Maybe a change of variable is in order (e. g. log(concentration) has usually much better statistical properties than concentration). Frank Harrell's aregImpute (package Hmisc) has the "curtail" argument (TRUE by default) which limits imputations to the range of observed values. But if your left-censored variables are your dependent variables (not covariates), may I suggest to analyze these data as censored data, as allowed by Terry Therneau's "coxph" function (package "survival") ? code your "missing" data as such a variable (use : coxph(Surv(min(x,,na.rm=TRUE), !is.na(x),type="left")~) to do this on-the-fly). Another possible idea is to split your (supposedly x) variable in two : observed (logical), and value (observed value if observed, if not) and include these two data in your model. You probably will run into numerical difficulties due to the (built-in total separation...). HTH, Emmanuel Charpentier > Thanks for the help, > Todd > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple Imputation in mice/norm
Danke sehr, herr Professor ! This one escaped me (notably because it's a trifle far from my current interests...). Emmanuel Charpentier Le samedi 25 avril 2009 à 08:25 -0500, Frank E Harrell Jr a écrit : > Emmanuel Charpentier wrote: > > Le vendredi 24 avril 2009 à 14:11 -0700, ToddPW a écrit : > >> I'm trying to use either mice or norm to perform multiple imputation to > >> fill > >> in some missing values in my data. The data has some missing values > >> because > >> of a chemical detection limit (so they are left censored). I'd like to use > >> MI because I have several variables that are highly correlated. In SAS's > >> proc MI, there is an option with which you can limit the imputed values > >> that > >> are returned to some range of specified values. Is there a way to limit > >> the > >> values in mice? > > > > You may do that by writing your own imputation function and assign them > > for the imputation of particular variable (see argument > > "imputationMethod" and details in the man page for "mice"). > > > >> If not, is there another MI tool in R that will allow me > >> to > >> specify a range of acceptable values for my imputed data? > > > > In the function amelia (package "Amelia"), you might specify a "bounds" > > argument, which allows for such a limitation. However, be aware that > > this might destroy the basic assumption of Amelia, which is that your > > data are multivariate normal. Maybe a change of variable is in order (e. > > g. log(concentration) has usually much better statistical properties > > than concentration). > > > > Frank Harrell's aregImpute (package Hmisc) has the "curtail" argument > > (TRUE by default) which limits imputations to the range of observed > > values. > > > > But if your left-censored variables are your dependent variables (not > > covariates), may I suggest to analyze these data as censored data, as > > allowed by Terry Therneau's "coxph" function (package "survival") ? code > > your "missing" data as such a variable (use : > > coxph(Surv(min(x,,na.rm=TRUE), > >!is.na(x),type="left")~) to do this on-the-fly). > > > > Another possible idea is to split your (supposedly x) variable in two : > > observed (logical), and value (observed value if observed, > limit> if not) and include these two data in your model. You probably > > will run into numerical difficulties due to the (built-in total > > separation...). > > > > HTH, > > > > Emmanuel Charpentier > > > >> Thanks for the help, > >> Todd > >> > >> > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > All see > > @Article{zha09non, >author = {Zhang, Donghui and Fan, Chunpeng and Zhang, > Juan and Zhang, {Cun-Hui}}, >title ={Nonparametric methods for measurements below > detection limit}, >journal = Stat in Med, >year = 2009, >volume = 28, >pages ={700-715}, >annote = {lower limit of detection;left censoring;Tobit > model;Gehan test;Peto-Peto test;log-rank test;Wilcoxon test;location > shift model;superiority of nonparametric methods} > } > > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple imputations : wicked dataset. Need advice for follow-up to a possible solution.
Answering to myself (for future archive users' sake), more to come (soon) : Le jeudi 23 avril 2009 à 00:31 +0200, Emmanuel Charpentier a écrit : > Dear list, > > I'd like to use multiple imputations to try and save a somewhat badly > mangled dataset (lousy data collection, worse than lousy monitoring, you > know that drill... especially when I am consulted for the first time > about one year *after* data collection). > > My dataset has 231 observations of 53 variables, of which only a very > few has no missing data. Most variables have 5-10% of missing data, but > the whole dataset has only 15% complete cases (40% when dropping the 3 > worst cases, which might be regained by other means). [ Big snip ... ] It turns out that my problems were caused by ... the dataset. Two very important variables (i. e. of strong influence on the outcomes and proxies) are ill-distributed : - one is a modus operandi (two classes) - the second is center (23 classes, alas...) My data are quite ill-distributed : some centers have contributed a large number of observations, some other very few. Furthermore, while few variables are quite badly known, the "missingness pattern" is such as : - some centers have no directly usable information (= complete cases) under one of the modi operandi - some other have no complete case at all... Therefore, any model-based prediction method using the whole dataset (recommended for multiple imputations, since one should not use for inference a richer set of data than what was imputed (seen this statement in a lot of references)) fails miserably. Remembering some fascinating readings (incl. V&R) and an early (20 years ago) excursion in AI (yes, did that, didn't even got the T-shirt...), I have attempted (with some success) to use recursive partitioning for prediction. This (non-)model has some very interestind advantages in my case : - model-free - distribution-free (quite important here : you should see my density curves... and I won't speak about the outliers !) - handles missing data gracefully (almost automagically) - automatic selection and ranking of the pertinent variables - current implementation in R has some very nice features, such as surrogate splits if a value is missing, auto-pruning by cross-validation, etc ... It has also some drawbacks : - no (easy) method for inference - not easy to abstract (you can't just publish an ANOVA table and a couple of p-values...) - no "well-established" (i. e. acknowledged by journal reviewers) => difficult to publish These latter point do not bother me in my case. So I attempted to use this for imputation. Since mice is based on a "chained equations" approach and allows the end-user to write its own imputation functions, I wrote a set of such imputers to be called within the framework of the Gibbs sampler. They proceed as follow : - build a regression or classification tree of the relevant variable using the rest of the dataset - predict the relevant variable for *all* the dataset, - compute "residuals" from known values of the relevant variable and their prediction - impute values to missing data as prediction + a random residual. It works. It's a tad slower than prediction using normal/logistic/multinomial modelling (about a factor of 3, but for y first trial, I attempted to err on the side of excessive precision ==> deeper trees). It does not exhibit any "obvious" statistical misfeatures. But I have questions : 1) What is known of such imputation by regression/classification trees (aka recursive partitionning) ? A quick research didn't turn up very much : the idea has been evoked here and there, but I am not aware of any published solution. In particular, I have no knowledge of any theoretical (i. e. probability) wotrk on their properties ? 2) Where could I find published datasets having been used to validate other imputation methods ? 3) Do you think that these functions should be published ? Sincerely yours, Emmanuel Charpentier PS : > Could someone offer an explanation ? Or must I recourse to sacrifying a > black goat at midnight next new moon ? > > Any hint will be appreciated (and by the way, next new moon is in about > 3 days...). The goat has endured the fear of her life, but is still alive... will probably start worshipping the Moon, however. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rounding problem
Seconded ! Emmanuel Charpentier Le lundi 02 mars 2009 à 21:06 -0700, Greg Snow a écrit : > > -Original Message- > > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- > > project.org] On Behalf Of Prof Brian Ripley > > Sent: Monday, March 02, 2009 12:38 AM > > To: tedzzx > > Cc: r-help@r-project.org > > Subject: Re: [R] rounding problem > > > > I nominate the following for the fortunes package: > > > R is Open Source and so you can modify it to emulate the bugs in > > other software: that is not one of the aims of its developers so > > please don't expect us to do so for you. > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Do you use R for data manipulation?
Le mercredi 06 mai 2009 à 00:22 -0400, Farrel Buchinsky a écrit : > Is R an appropriate tool for data manipulation and data reshaping and data > organizing? [ Large Snip ! ... ] Depends on what you have to do. I've done what can be more or less termed "data management" with almost uncountable tools (from Excel (sigh...) to R with SQL, APL, Pascal, C, Basic (in 1982 !), Fortran and even Lisp in passing...). SQL has strong points : join is, to my tastes, more easily expressed in SQL than in most languages, projection and aggregation are natural. However, in SQL, there is no "natural" ordering of row tables, which makes expressing algorithms using this order difficult. Try for example to express the differences of a time series ... (it can be done, but it is *not* a pretty sight). On the other hand, R has some unique expressive possibilities (reshape() comes to mind). So I tend to use a combination of tools : except for very small samples, I tend to manage my data in SQL and with associated tools (think data editing, for example ; a simple form in OpenOffice's Base is quite easy to create, can handle anything for which an ODBC driver exists, and won't crap out for more than a few hundreds line...). finer manipulation is usually done in R with native tools and sqldf. But, at least in my trade, the ability to handle Excel files is a must (this is considered as a standard for data entry. Sigh ...). So the first task is usually a) import data in an SQL database, and b) prepare some routines to dump SQL tables / R dataframes in Excel tor returning back to the original data author... HTH Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] system vs shell wait command
Le dimanche 24 janvier 2010 à 04:25 -0800, robbert blonk a écrit : > Dear All, > > I try to invoke a second program (called "gencont") within a R script. > However this second program needs some time to finish. After finishing the > output should be read into R and used for further calculation. > > Under windows, it perfectly works: > > result<-shell("gencont < yn.txt",intern=TRUE,wait=TRUE) > # R waits untill the program finishes (apporx 3 min) and uses the result as > an object. "yn.txt" contains the answer to a question prompted by the second > program before running > resultf<-result[length(result)] > # reads the last line, which contains the required result > > However, At our Linux server, shell() doesn't work. Here, I'll need > system()... > result<-system("./gencont.exe < yn.txt",intern=TRUE,wait=TRUE) #slightly > different You're using a Windows executable through wine (maybe "transparently" via mod_binmisc) ? And this executable prints its output on the console ? Right ? > resultf<-result[length(result)] #same > > The problem now is that R under Linux does not wait for the second program > to finish, even if wait = TRUE...how come? Might be wine, might be, more probably, gencont.exe : I have encountered some windows programs whose main purpose was to set things up and launch a second process which did "the real work". Cases in point : Office 2000 and Office 2003 installers [yuck !], for example. FWIW, system("wineconsole cmd") opens a new window and waits for it's user to type exit. It returns 0 as a result. ISTR that Windows98' COMMAND.COM had an (internal ?) command called "start" launching another process, with an option to wait or not for it to terminate before returning ; this could be true of Windows 2000 and later, but I can't check right now. Wine's own cmd.exe has a start program : wineconsole --backend=curses cmd CMD version 1.1.31 Z:\home\charpent>start Lance un programme, ou ouvre un document dans le programme normalement utilisé avec cette extension. Usage : start [options] fichier_programme [...] start [options] fichier_document Optionss: /M[inimized] Lance le programme minimisé. /MAX[imized] Lance le programme maximisé. /R[estored] Lance le programme normalement (ni minimisé ni maximisé). /W[ait] Attend que le programme lancé se termine, et termine ensuite avec son code de sortie. /ProgIDOpen Open a document using the following progID. /L Montre la licence d'utilisation. start.exe version 0.2 Copyright (C) 2003, Dan Kegel Start est fourni sans AUCUNE GARANTIE ; pour les ddtails lancez avec l'option /L . Ceci est un logiciel libre, et vous êtes invité à le redistribuer sous certaines condition; lancez 'start /L' pour les détails. Z:\home\charpent> This might help you ... > What should I do? I seriously doubt that system') can read back wht is spit in the console. Better redirect that to another text file. Some options come to mind, but beware : I didn't try any of them, not having your "gencont.exe" program. - system("wineconsole cmd gencont.exe < yn?txt > result.txt"), then result<-read.lines("result.txt") ? - ftp your input to a windows box, "ssh -f gencont.exe < yn.txt > results.txt", ftp your results back (or, equivalently, working on a Windows "share") ? - port gencont.exe to linux ? :-) (this might seem silly, but might be the easiest solution if this program is (as it seems according to your partial explanations) a simple "filter" (read data, munch them, spit output) not using windows-specific functions or system calls). HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What are Type II or III contrast? (contrast() in contrast package)
Le mercredi 03 février 2010 à 00:01 -0500, David Winsemius a écrit : > On Feb 2, 2010, at 11:38 PM, Peng Yu wrote: > > > ?contrast in the contrast package gives me the following description. > > However, I have no idea what Type II and III contrasts are. Could > > somebody explain it to me? And what does 'type' mean here? > > > >*‘type’*: set ‘type="average"’ to average the individual contrasts > >(e.g., to obtain a Type II or III contrast) > > In no particular order: > http://courses.washington.edu/b570/handouts/type3ss.pdf > http://core.ecu.edu/psyc/wuenschk/SAS/SS1234.doc > http://n4.nabble.com/a-kinder-view-of-Type-III-SS-td847282.html > > Don't expect any follow-up questions to be answered or further > citations offered. This is really more in the realm of statistics > education than an R specific question. Nonwhistanding David Winsemius' closing remark, I'd like to add something that should be requested reading (and maybe hinted at in lm()'s help page) : http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf (BTW, despite is age, MASS *is* requested reading, and Bill Venables' exegeses should be part of it). HTH, -- Emmanuel Charpentier, DDS, MsC :-) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What are Type II or III contrast? (contrast() in contrast package)
Le mercredi 03 février 2010 à 09:23 -0600, Peng Yu a écrit : > On Wed, Feb 3, 2010 at 2:12 AM, Emmanuel Charpentier > wrote: > > Le mercredi 03 février 2010 à 00:01 -0500, David Winsemius a écrit : > >> On Feb 2, 2010, at 11:38 PM, Peng Yu wrote: > >> > >> > ?contrast in the contrast package gives me the following description. > >> > However, I have no idea what Type II and III contrasts are. Could > >> > somebody explain it to me? And what does 'type' mean here? > >> > > >> >*‘type’*: set ‘type="average"’ to average the individual contrasts > >> >(e.g., to obtain a Type II or III contrast) > >> > >> In no particular order: > >> http://courses.washington.edu/b570/handouts/type3ss.pdf > >> http://core.ecu.edu/psyc/wuenschk/SAS/SS1234.doc > >> http://n4.nabble.com/a-kinder-view-of-Type-III-SS-td847282.html > >> > >> Don't expect any follow-up questions to be answered or further > >> citations offered. This is really more in the realm of statistics > >> education than an R specific question. > > > > Nonwhistanding David Winsemius' closing remark, I'd like to add > > something that should be requested reading (and maybe hinted at in > > lm()'s help page) : > > > > http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf > > > > (BTW, despite is age, MASS *is* requested reading, and Bill Venables' > > exegeses should be part of it). > > Do you by any means indicate that MASS describes Type I, II, III, IV > contrast? Although MASS describes contrasts, but I don't think it > describes Type I, II, III, IV contrast. it does not. But it explains (tersely) *WHY* one has *serious* logical difficulties when tryng to interpret contrasts not abiding to marginality principle. HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data views (Re: (Another) Bates fortune?)
. Such an organization is probably impossible with most "modern" algorithms : see Douglas Bates' description of the lmer() algorithms for a nice, big counter-example, or consider MCMC... But coming closer to such an organization *seems* possible : see for example biglm. So I think that data views are a a worthy but not-so-easy possible goal aimed at various data structure problems (including hierarchical data), but not *the* solution to data-representation problem in R. Any thoughts ? Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Of possible interest for (Win|Open)BUGS users on Linux platforms
Dear list, It might be of interest to (Win|Open)BUGS users trying to use it from R on Linux platform, and to Linux R users trying to call (Win|Open)BUGS to note that two new packages have appeared on the OpenBUGS "User Conributed Code" (http://openbugs.info/w/UserContributedCode). R2OpenBUGS seems to be an adaptation of the R2WinBUGS package, able to run "native" OpenBUGS without Wine contorsions (of late, I have been unable to run WinBUGS from R2WinBUGS correctly without "debug" option, WinBUGS not opening the script file. Probably yet another wine regression...). A version of BRugs able to use the linux version of OpenBUGS is also avilable. Their author (apparently Chris Jackson from MRC Cambridge, which I thank heartily) states that "[p]ending additional user testing, they will be posted to CRAN". I add that such testing (and reporting !) might hasten this posting, an therefore the general availability of these packages. These packages might also be useful to stress-test the "other" BUGS implementation (JAGS), which gave me pause recently (in the "deviance/pD/DIC" department...). First data point : I have been able to install these two packages on Ubuntu Lucid (64 bits, using the g++-multilib package) and Debian Squeeze (32 bits) and to test BRugs quickly without apparent problems. HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] (S|odf)weave : how to intersperse (\LaTeX{}|odf) comments in source code ? Delayed R evaluation ?
Dear list, Inspired by the original Knuth tools, and for paedaogical reasons, I wish to produce a document presenting some source code with interspersed comments in the source (see Knuth's books rendering TeX and metafont sources to see what I mean). I seemed to remember that a code chunk could be defined piecewise, like in Comments... <>= SomeCode @ Some other comments... <>= MoreCode @ And finally, <>= <> <> EndOfTheCode @ That works ... as long as SomeCode, MoreCode and EndOfTheCode are self- standing pieces of R code, but *not* code fragments. You can *not* intersperse comments in, say, a function body, or local() environment this way : when Sweaving, *R* complains of an incomplete source (makes noise about an unexpected end of input at the end of Chunk1, IIRC, and never sees Chunk2). I hoped that Sweave's "alternative" syntax could offer a way out : no such luck. There seems to be no way to delay R evaluation of a R chunk passed by Sweave ; at least, the "eval=FALSE" option of chunk declaration is not sufficient for that. Am I missing something in the Sweave nd odfWeve documentations (that I read till I grew green and moldy) ? Or does this require a fundamental change in the relevant Sweave drivers ? Can you suggest alternative ways of doing what I mean to do ? The only workaround I found is to paste a second copy of my code in a \verbatim environment (or, in the case of odfWeave, in the "text" part), and spice it with \end{verbatim} comments.. \begin{verbatim} chunks. This way, I lose any guarantee of consistency between commented text and effective code. Any other idea ? Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Summary (Re: (S|odf)weave : how to intersperse (\LaTeX{}|odf) comments in source code ? Delayed R evaluation ?)
Dear list, see comment at end. On Sat, 11 Dec 2010 22:58:10 +, Emmanuel Charpentier wrote : > Dear list, > > Inspired by the original Knuth tools, and for paedaogical reasons, I > wish to produce a document presenting some source code with interspersed > comments in the source (see Knuth's books rendering TeX and metafont > sources to see what I mean). > > I seemed to remember that a code chunk could be defined piecewise, like > in > > Comments... > > <>= > SomeCode > @ > > Some other comments... > > <>= > MoreCode > @ > > And finally, > > <>= > <> > <> > EndOfTheCode > @ > > That works ... as long as SomeCode, MoreCode and EndOfTheCode are self- > standing pieces of R code, but *not* code fragments. You can *not* > intersperse comments in, say, a function body, or local() environment > this way : when Sweaving, *R* complains of an incomplete source (makes > noise about an unexpected end of input at the end of Chunk1, IIRC, and > never sees Chunk2). > > I hoped that Sweave's "alternative" syntax could offer a way out : no > such luck. > > There seems to be no way to delay R evaluation of a R chunk passed by > Sweave ; at least, the "eval=FALSE" option of chunk declaration is not > sufficient for that. > > Am I missing something in the Sweave nd odfWeve documentations (that I > read till I grew green and moldy) ? Or does this require a fundamental > change in the relevant Sweave drivers ? > > Can you suggest alternative ways of doing what I mean to do ? The only > workaround I found is to paste a second copy of my code in a \verbatim > environment (or, in the case of odfWeave, in the "text" part), and spice > it with \end{verbatim} comments.. \begin{verbatim} chunks. This way, I > lose any guarantee of consistency between commented text and effective > code. > > Any other idea ? > > Emmanuel Charpentier To summarize the answers I got so far, there seems to be no satisfactory solutions to my current problem with either Sweave or odfWeave : - every (Sw|odfW)eave code chunk has to be parseable in itself. One cannot break it in unparseable pieces in home to paste it later ; - the (hypothetical) "parse=FALSE" option, suggested by Duncan Murdoch, is, well, hypothetical ; - the brew package is not integrated in either Sweave or odfWeave ; - R-style comments will get you only so far (where is math markup when you need it ?) ; - subdivising work in small units is not a practical solution, when your "big piece" of software is a collection of small parts and local variable assignments, embedded in a (perforce large) local environment to keep things clean... Therefore, I let this problem to sleep. However, I Cc this answer (with the original question below) to Max Kuhn and Friedrich Leisch, in the (faint) hope that this feature, which does not seem to have been missed by anybody in 8 years, might be considered sufficiently useful to grant addition someday... Sincerely yours, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Second summary [( S|odf)weave : how to intersperse (\LaTeX{}|odf) comments in source code ? Delayed R evaluation ?]
Dear list, The "starting" solution proposed by Duncan Murdoch is an *excellent* start for Sweave'ing. One can trivially redefine Sweave to start using it quasi-transparently (to use it transparently would require patching Sweave, which would be worth some failsafes and possibly options tinkering). Integrating it on odfWeave is not as simple, because the driver as to spit out correct XML. I suspect that the odfCat() function of odfWeave will be useful, but my (very) cursory glance at odfWeave source did not (yet) ensured me that one could use the shortcuts that the simple structure of \TeX{} file allows. XML *needs* some caution, and the Open Document specification is *not* a simple beast. However, I have good hopes. I'll keep the list posted about further enhancements, as my (somewhat busy) schedule will allow. Again, kudos and a big thank you to Duncan Murdoch, who was able to give a *correct* solution before I have been able to *correctly* state the problem. Emmanuel Charpentier On Sat, 11 Dec 2010 22:58:10 +, Emmanuel Charpentier wrote : [ Snip... ] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multivariate binary response analysis
Dear Derek, dear list, A couple suggestions (and more) : 0) check that your problem can't be expressed in terms of what glmer() (lme4(a|b)? package(s)) can solve. I use this often, and am quite impressed with its abilities. 1) Why not modeling that directly in BUGS ? That might not be easy, will almost certainly not be fast to converge and will require lots of post- convergence analysis but offers a direct solution. Current BUGS implementations (OpenBUGS 3.1.x, JAGS 2.x) are well- interfaced to R. My current favorite is JAGS + rjags, but this preference might come from years of frustration about (Win|Open)BUG on Wine on Linux, and the current offering of OpenBUGS + BRugs, currently in beta on the "Community" page of the OpenBUGS website, seems quite decent. Memory use might have been a problem in the "(not so) good old days", but in a era of multigigabyte netbooks, I doubt this is still relevant. Computation time is more of a concern : although Lunn & al. (2009)(1) allude to a parallelized version of WinBUGS(2), I am not aware of any publicly-accessible parallelized BUGS interpreter, which is a shame in times where even said netbooks get multicore CPUs ...). 2) The nice MCMCglmm package is able to run a large subclass of DAG Bayesian models expressible as generalizations of (possibly multidependent) multivaried mixed-model GLMs, *way* faster than current BUGS implementations (the author claims an acceleration factor of about 30 on his test problem, and my (limited) experience does not disprove him). It seems (to me) that your problem *might* be expressible in terms palatable to MCMCglm(). Be aware, however, that using this package will *require* some serious understanding of the author's notation used in his (excellent) documentation. Furthermore, if you're interested in the "random" effects values, you're SOL : I have not (yet) been able to retrieve traces of their distributions in the resultant object. But again, my understanding of this package is limited. My personal bias goes to BUGS, as you might have guessed, but the "public health warning" that the Spiegelhalter gang put in front of the WinBUGS manual still apply : "MCMC sampling can be dangerous". To which I add "model checking, convergence assessment and validation might eat more time than modeling itself". HTH, Emmanuel Charpentier -- 1. David Lunn et al., “Rejoinder to commentaries on ‘The BUGS project: Evolution, critique and future directions’,” Statistics in Medicine 28, no. 25 (11, 2009): 3081-3082. 2. http://www.page-meeting.org/pdf_assets/6232-PARALLEL% 20Girgis_POSTER_final.pdf. Accessed on Dec, 14, 2010. On Mon, 13 Dec 2010 15:31:53 -0800, Janszen, Derek B wrote : > Greetings ~ > > I need some assistance determining an appropriate approach to analyzing > multivariate binary response data, and how to do it in R. > > The setting: Data from an assay, wherein 6-hours-post-fertilization > zebrafish embryos (n=20, say) are exposed in a vial to a chemical > (replicated 3 times, say), and 5 days later observed for the > presence/absence (1/0) of defects in several organ systems (yolk sac, > jaw, snout, body axis, pericardial edema, etc.) for each fish. The assay > can be used as a screen (in which case a single response 'any' is first > analyzed) as well as for comparing the responses of different classes of > chemicals (a MANOVA-type analysis). Interest is focused on where > response(s) are occurring, any associations among responses, and > ultimately on possible biological mechanisms (the fish are tested for > behavioral activity prior to this assay, and then ground up to provide > RNA for microarray assays. A-lotta-data!). > > What I *wish* I could do is something like glm(response.matrix ~ > treat/vial, family=binomial(logit), data=zf.dat) but I know this can't > be done. I came across the baymvb (Bayesian analysis of multivariate > binary data) package in the R contributed packages archives, but it is > no longer supported and the author is incommunicado. In the baymvb > function the model is specified as single.stacked.response ~ > structure.factor + linear.model.terms. Huh? This looks suspiciously > similar to analyzing repeated measures data in SAS as a univariate > response with a split-plot structure (which forces the response > covariance matrix to have a compound symmetric structure). If this is > what's happening with this function it is definitely not appropriate. > How about a GEE approach (I'm not familiar with the liter > ature, or with any of the R packages that implement it)? Any other > recommendations? (NB: just discovered the bayesm package. Don't know if > this will work for me or not.) > &g
Re: [R] Including a text file (JAGS model) in an R package
Sorry for chiming in a tad late (I was incommunicado for two weeks) Le lundi 02 août 2010 à 18:50 +, Ben Bolker a écrit : > Prof Brian Ripley stats.ox.ac.uk> writes: > > > > > Here is how I do it: > > > > write(" > > model { > >for (year in 1:N) { > > D[year] ~ dpois(mu[year]) > > log(mu[year]) <- b[1] + step(year - changeyear) * b[2] > >} > >for (j in 1:2) { b[j] ~ dnorm(0.0, 1.0E-6) } > >changeyear ~ dunif(1,N) > > } > > ", "coalmining.jags") > > > > which has the advantage that the model description is right there in > > the .Rd file. > > > > If you want a file to be part of the package, put it in e.g. > > > > /inst/JAGSmodels/model.txt > > > > and in the example use > > > > system.file("JAGSmodels", "model.txt", package="whatever") > > What Prof Ripley said, but also: for JAGS and WinBUGS models you can > actually specify > > coalmining <- function() { > for (year in 1:N) { > D[year] ~ dpois(mu[year]) > log(mu[year]) <- b[1] + step(year - changeyear) * b[2] > } > for (j in 1:2) { b[j] ~ dnorm(0.0, 1.0E-6) } > changeyear ~ dunif(1,N) > } > > which is even more nicely integrated into the R code ... ... but will fail with BUGS truncation/censoring notation, which cannot pass for R syntax. The R2WinBUGS package has a write.model() function, which I shoplifted, simplified (JAGS doesn't seem to be as picky as WinBUGS in tne numerical notation department), extended (to take advantage of the JAGS' "data transform" feature) for use with JAGS (which is much more convenient to use nowadays with R on a Unix-like platform than either WinBUGS or OpenBUGS). I planned to submit it to Martyn Plummer, but got interrupted... HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] odfWeave 0.7.17 stutters on Debian testing 64-bit amd64 systems.
Dear list, This is a copy of a mail sent to Max Kuhn, original author and maintainer of the odfWeave package, which seems not to have received it. It reports a problem that seems to be very implementation specific (reproductible on three Debian testing amd64 machine, does *not* happen on two i686 Debian testing systems, does *not* happen on an Ubuntu 11.06 amd64 machine) and therefore not attributable to odfWeave itself (which is pure R) but to a software component it uses (XML and the underlying libxml2, R itself, etc ...), but I need to know how to track this problem. Apologies fror cross-posting to r-help and r-debian-sig, but I think that the issue is ... complicated and might not be as Debian-specific as it seems at first view. Sincerely, Emmanuel Charpentier Dear Max, A few days ago, I started to have problems with odfWeave 0.7.17 on a couple of amd64 systems : the compiled files contained numerous copies of the source files, more or less interlaced, "and a few copies of the target productions. Then I noticed that using an older 32-bit system resulted in correct files. An attempt with yet another machine (recent i686 netbook) confirmed that 32-bit systems gave okay results. Setup : in all machines, I use Debian testing with updates. My packages are self-compiled (i. e. installed via install.packages()). I enclose a very minimalist source and the resulting targets. Logs of execution on 32- and 64-bit systems are affixed after this message. Since odfWeave is pure R, I doubt that it could be the source of the problem. This leaves us with two obvious targets : R itself (the Debian package is current), or the XML library. Do you have any idea about how to proceed to find the source of the problem (and how to fix in) ? Sincerely, Emmanuel Charpentier Execution on a 32-bit system : > library(odfWeave) Le chargement a nécessité le package : lattice Le chargement a nécessité le package : XML > sessionInfo() R version 2.13.0 (2011-04-13) Platform: i486-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=fr_FR.utf8 LC_NUMERIC=C [3] LC_TIME=fr_FR.utf8LC_COLLATE=fr_FR.utf8 [5] LC_MONETARY=C LC_MESSAGES=fr_FR.utf8 [7] LC_PAPER=fr_FR.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=fr_FR.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] odfWeave_0.7.17 XML_3.2-0 lattice_0.19-26 loaded via a namespace (and not attached): [1] grid_2.13.0 > system.time(odfWeave("In1.odt", "Out1-32.odt")) Copying In1.odt Setting wd to /tmp/RtmpS8lBt8/odfWeave11161739126 Unzipping ODF file using unzip -o In1.odt Archive: In1.odt extracting: mimetype creating: Configurations2/statusbar/ inflating: Configurations2/accelerator/current.xml creating: Configurations2/floater/ creating: Configurations2/popupmenu/ creating: Configurations2/progressbar/ creating: Configurations2/toolpanel/ creating: Configurations2/menubar/ creating: Configurations2/toolbar/ creating: Configurations2/images/Bitmaps/ inflating: content.xml inflating: manifest.rdf inflating: styles.xml extracting: meta.xml extracting: Thumbnails/thumbnail.png inflating: settings.xml inflating: META-INF/manifest.xml Removing In1.odt Creating a Pictures directory Pre-processing the contents Sweaving content.Rnw Writing to file content_1.xml Processing code chunks ... 'content_1.xml' has been Sweaved Removing content.xml Post-processing the contents Removing content.Rnw Removing styles.xml Renaming styles_2.xml to styles.xml Removing manifest.xml Renaming manifest_2.xml to manifest.xml Removing extra files Packaging file using zip -r In1.odt . adding: manifest.rdf (deflated 54%) adding: mimetype (stored 0%) adding: Pictures/ (stored 0%) adding: Configurations2/ (stored 0%) adding: Configurations2/images/ (stored 0%) adding: Configurations2/images/Bitmaps/ (stored 0%) adding: Configurations2/menubar/ (stored 0%) adding: Configurations2/progressbar/ (stored 0%) adding: Configurations2/toolbar/ (stored 0%) adding: Configurations2/floater/ (stored 0%) adding: Configurations2/accelerator/ (stored 0%) adding: Configurations2/accelerator/current.xml (stored 0%) adding: Configurations2/popupmenu/ (stored 0%) adding: Configurations2/toolpanel/ (stored 0%) adding: Configurations2/statusbar/ (stored 0%) adding: content.xml (deflated 75%) adding: META-INF/ (stored 0%) adding: META-INF/manifest.xml (deflated 83%) adding: Thumbnails/ (stored 0%) adding: Thumbnails/thumbnail.png (deflated 60%) adding:
Re: [R] odfWeave 0.7.17 stutters on Debian testing 64-bit amd64 systems.
Dear Brian, Thank you very much for this answer. However ... Le samedi 14 mai 2011 à 11:52 +0100, Prof Brian Ripley a écrit : > Note the difference in XML versions. > > odfWeave does not work correctly with XML 3.4-x: this has been > reported to the maintainer (and can be seen on the CRAN package checks > at http://cran.r-project.org/web/checks/check_results_odfWeave.html). > I suggest you try downgrading to an earlier version of XML. Do you mean the XML R package ? Or the libxml2 Debian package ? I am worried by the diference in behaviour of i386 and amd64 versions (and, on amd64, the discrepancies between Ubuntu 11.04 and Debian testing, which are easier to explain : Debian has slightly more recent version). I'll try to downgrade the R xml package (easier) and report results. Again, thank you very much ! Emmanuel Charpentier > On Sat, 14 May 2011, Emmanuel Charpentier wrote: > > > Dear list, > > > > This is a copy of a mail sent to Max Kuhn, original author and > > maintainer of the odfWeave package, which seems not to have received it. > > It reports a problem that seems to be very implementation specific > > (reproductible on three Debian testing amd64 machine, does *not* happen > > on two i686 Debian testing systems, does *not* happen on an Ubuntu 11.06 > > amd64 machine) and therefore not attributable to odfWeave itself (which > > is pure R) but to a software component it uses (XML and the underlying > > libxml2, R itself, etc ...), but I need to know how to track this > > problem. > > > > Apologies fror cross-posting to r-help and r-debian-sig, but I think > > that the issue is ... complicated and might not be as Debian-specific as > > it seems at first view. > > > > Sincerely, > > > > Emmanuel Charpentier > > > > Dear Max, > > > > A few days ago, I started to have problems with odfWeave 0.7.17 on a > > couple of amd64 systems : the compiled files contained numerous copies > > of the source files, more or less interlaced, "and a few copies of the > > target productions. > > > > Then I noticed that using an older 32-bit system resulted in correct > > files. An attempt with yet another machine (recent i686 netbook) > > confirmed that 32-bit systems gave okay results. > > > > Setup : in all machines, I use Debian testing with updates. My packages > > are self-compiled (i. e. installed via install.packages()). > > > > I enclose a very minimalist source and the resulting targets. Logs of > > execution on 32- and 64-bit systems are affixed after this message. > > > > Since odfWeave is pure R, I doubt that it could be the source of the > > problem. This leaves us with two obvious targets : R itself (the Debian > > package is current), or the XML library. > > > > Do you have any idea about how to proceed to find the source of the > > problem (and how to fix in) ? > > > > Sincerely, > > > >Emmanuel Charpentier > > > > Execution on a 32-bit system : > > > >> library(odfWeave) > > Le chargement a nécessité le package : lattice > > Le chargement a nécessité le package : XML > >> sessionInfo() > > R version 2.13.0 (2011-04-13) > > Platform: i486-pc-linux-gnu (32-bit) > > > > locale: > > [1] LC_CTYPE=fr_FR.utf8 LC_NUMERIC=C > > [3] LC_TIME=fr_FR.utf8LC_COLLATE=fr_FR.utf8 > > [5] LC_MONETARY=C LC_MESSAGES=fr_FR.utf8 > > [7] LC_PAPER=fr_FR.utf8 LC_NAME=C > > [9] LC_ADDRESS=C LC_TELEPHONE=C > > [11] LC_MEASUREMENT=fr_FR.utf8 LC_IDENTIFICATION=C > > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > > > other attached packages: > > [1] odfWeave_0.7.17 XML_3.2-0 lattice_0.19-26 > > > > loaded via a namespace (and not attached): > > [1] grid_2.13.0 > >> system.time(odfWeave("In1.odt", "Out1-32.odt")) > > Copying In1.odt > > Setting wd to /tmp/RtmpS8lBt8/odfWeave11161739126 > > Unzipping ODF file using unzip -o In1.odt > > Archive: In1.odt > > extracting: mimetype > > creating: Configurations2/statusbar/ > > inflating: Configurations2/accelerator/current.xml > > creating: Configurations2/floater/ > > creating: Configurations2/popupmenu/ > > creating: Configurations2/progressbar/ > > creating: Configurations2/toolpanel/ > > creating: Configurations2/menubar/ > > creating: Configurati
Re: [R] odfWeave 0.7.17 stutters on Debian testing 64-bit amd64 systems.
Dear Brian, You are right (as usual...) : downgrading XML to 3.2.0 (the next-to-most-recent version) enables me to compile my minimalistic example correctly. And yes, there is no other XML than XML. Silly me... Now, I have to understand why the i386 and Ubuntu machines were *not* "upgraded" to 3.4.0 by the "update.packages(ask=FALSE)" I'm used to do ... Thank you again ! Now I can again use a multiprocessor eficiently to run the same analyses on 18 datasets (thanks to mclapply (multicore)). A serious win in my case... Emmanuel Charpentier Le samedi 14 mai 2011 à 12:59 +0100, Prof Brian Ripley a écrit : > On Sat, 14 May 2011, Emmanuel Charpentier wrote: > > > Dear Brian, > > > > Thank you very much for this answer. However ... > > > > Le samedi 14 mai 2011 à 11:52 +0100, Prof Brian Ripley a écrit : > >> Note the difference in XML versions. > >> > >> odfWeave does not work correctly with XML 3.4-x: this has been > >> reported to the maintainer (and can be seen on the CRAN package checks > >> at http://cran.r-project.org/web/checks/check_results_odfWeave.html). > >> I suggest you try downgrading to an earlier version of XML. > > > > Do you mean the XML R package ? Or the libxml2 Debian package ? I am > > Only the R package is called XML. Why would I write XML if I meant > libxml2? > > > worried by the diference in behaviour of i386 and amd64 versions (and, > > on amd64, the discrepancies between Ubuntu 11.04 and Debian testing, > > which are easier to explain : Debian has slightly more recent version). > > > > I'll try to downgrade the R xml package (easier) and report results. > > > > Again, thank you very much ! > > > > Emmanuel Charpentier > > > > > >> On Sat, 14 May 2011, Emmanuel Charpentier wrote: > >> > >>> Dear list, > >>> > >>> This is a copy of a mail sent to Max Kuhn, original author and > >>> maintainer of the odfWeave package, which seems not to have received it. > >>> It reports a problem that seems to be very implementation specific > >>> (reproductible on three Debian testing amd64 machine, does *not* happen > >>> on two i686 Debian testing systems, does *not* happen on an Ubuntu 11.06 > >>> amd64 machine) and therefore not attributable to odfWeave itself (which > >>> is pure R) but to a software component it uses (XML and the underlying > >>> libxml2, R itself, etc ...), but I need to know how to track this > >>> problem. > >>> > >>> Apologies fror cross-posting to r-help and r-debian-sig, but I think > >>> that the issue is ... complicated and might not be as Debian-specific as > >>> it seems at first view. > >>> > >>> Sincerely, > >>> > >>> Emmanuel Charpentier > >>> > >>> Dear Max, > >>> > >>> A few days ago, I started to have problems with odfWeave 0.7.17 on a > >>> couple of amd64 systems : the compiled files contained numerous copies > >>> of the source files, more or less interlaced, "and a few copies of the > >>> target productions. > >>> > >>> Then I noticed that using an older 32-bit system resulted in correct > >>> files. An attempt with yet another machine (recent i686 netbook) > >>> confirmed that 32-bit systems gave okay results. > >>> > >>> Setup : in all machines, I use Debian testing with updates. My packages > >>> are self-compiled (i. e. installed via install.packages()). > >>> > >>> I enclose a very minimalist source and the resulting targets. Logs of > >>> execution on 32- and 64-bit systems are affixed after this message. > >>> > >>> Since odfWeave is pure R, I doubt that it could be the source of the > >>> problem. This leaves us with two obvious targets : R itself (the Debian > >>> package is current), or the XML library. > >>> > >>> Do you have any idea about how to proceed to find the source of the > >>> problem (and how to fix in) ? > >>> > >>> Sincerely, > >>> > >>>Emmanuel Charpentier > >>> > >>> Execution on a 32-bit system : > >>> > >>>> library(odfWeave) > >>> Le chargement a nécessité le package : lattice > >>> Le chargem
Re: [R] Dataframe horizontal scrolling
under Unix/X11 (and IIRC under Windows), edit(some.data.frame) will invoke a smallish spreadsheet-like data editor, which will allow you to move around your data with no fuss. I probably wouldn't use it for any serious data entry, but found damn useful in a lot of data-debugging situations (you know, the sort of ... thing ... that hit the fan when you've been coaxed to accept Excel spreatsheet as "dats sets"). HTH, Emmanuel Charpentier Le lundi 10 mai 2010 à 16:06 -0400, Michael H a écrit : > R experts, > > I am working with large multivariable data frames (> 50 variables) > and I would like to scroll horizontally across my output > to view each data frame rather than having to scroll down vertically- > wrapped data frames.I have been using R Commander as a programming > interface. If I assign a long character string to a vector I can scroll > across its output easily but not a dataframe of equivalent width. I > just want my data frame variables to fully output horizontally rather > than partially with vertical wrapping, if that is possible. Any help > would be appreciated. > > Thank you, > > Mike > _ > Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox. > > N:WL:en-US:WM_HMP:042010_1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] FYI : XML 3.4.2 still breaks odfWeave 0.7.17
Dear list, perusing the GMane archive shows that the issue with XML 3.4.x still bugs odfWeave users. I just checked that the newer XML 3.4.2 version still give the same problem. Using it to weave a bit of documentation written with LibreOffice 3.3.3 (current in Debian testing) leads me to a 192 Kb, 590 pages document, whereas reinstalling XML 3.2.0 gives me a 4 pages, 24Kb file (not entirely correct, though, but I haven't yet debugged it). I can send the relevant file (.RData, source and "good" and "bad" compilations) if this can help. Sincerely yours, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Graphics device storable in a variable
Just an couple of ideas that might or might not work ... Josh Tolley a écrit : > I'm using R embedded in PostgreSQL (via PL/R), and would like to use > it to create images. It works fine, except that I have to create every > image in a file (owned by and only readable by the PostgreSQL server), > and then use PostgreSQL to read from that file and return it to the > client. It would be much nicer if I could plot images into an R > variable (for instance, a matrix), and return that variable through > the PostgreSQL client. Leaving aside the details of returning the data > to PostgreSQL (which I realize are beyond the scope of this list), I > envision R code along the lines of the following: > > # Create a png device, a handle to which is stored in myDevice >> myDevice <- png.variable(height = 1280, width = 1024, bg = "white") > # Plot some data into the current device >> plot(myX, myY, col="red") > # Finalize >> dev.off() > # Print out the contents of myDevice >> myDevice > > ...and the data would print out just like any other matrix or class or > whatever type myDevice actually is. > > So my question is does such a thing already exist? I know about > piximage, but apparently I have to load image data from somewhere for > it to work; I can't use plot(), etc. to create the piximage data. GDD > would be a nice way to do it, because GD libraries are widely > available for use with other languages I might use with PostgreSQL and > PL/R (for instance, Perl would talk to PostgreSQL, call a PL/R > function to use R to return an image, which Perl would then process > further), but GDD requires temporary files as well, as far as I can > see. Thanks for any help anyone can offer. The documentation for the png device mentions a "filename" argument. That might be a temporary file that another process (or a R or plpgsql function, btw) could read and store in the correct Postgres table. The package odfWeave uses something like this to insert graphs in an Open Document file. # What follows might work on unix and unix-alikes, possibly on Mac OS X # (BSD-like). I do not know about Windows... Furthermore, the png device doc cross-references the postcript device for details ; this latter doc mentions that a filename of the form "|cmd" will effectively pipe device output to cmd, which is a possibility that avoids file creation (but may well be as slow as with temporary file creation, since one would have to go to the filesystem to create a new process for cmd...). Furthermore, you'd have to pass additional arguments to cmd (some reference for the graph, where is it to be stored, and so on...). I do not know if the "| cmd" syntax would pass additional arguments to cmd... However, I wonder if the png device supports the "|cmd" syntax. It *should*, but I've never tried that... A third possibility would be to write to a named pipe whose output is consumed by a process storing data in the relevant table . This process should be somehow controlled (passing an identifier, telling it what to store and when, etc...), possibly from *another* named pipe A fourth possibility would be to force the output (of plot and such) to stdout and capture stdout with capture.output (package utils). However, I have no idea on how to do that (from the outside, it seems theoretically possible, but one might need special hooks in the interpreter... Of all those possibilities, the first one seems reasonable, easy to implement, but has the drawback of using temporary files, which might be a resource hog. The second possibility is also a probable resource hog. The third might be much more economical (a daemon might be set up with its input and control pipes once for good), but might be difficult to set up in pl/r (does this implementatin allows I/O from/to external files ?). The fourth is much more problematic... HTH Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Writing a helper function that takes in the dataframe and variable names and then does a subset and plot
Dear Daniel, May I point you to Thomas Lumley's paper in R News 2001-3 ("Programmer’s Niche: Macros in R\n Overcoming R’s virtues) and to the defmacro utility of the gtools package ? HTH Emmanuel Charpentier Daniel Myall a écrit : > Hi, > > I have a large dataframe than I'm writing functions to explore, and to > reduce cut and paste I'm trying to write a function that does a subset > and then a plot. > > Firstly, I can write a wrapper around a plot: > > plotwithfits <- function(formula, data, xylabels=c('','')) { > xyplot(formula, data, panel = > function(x,y, ...) { > panel.xyplot(x,y, ...) > panel.abline(lm(y~x),lwd=2, ...) > panel.loess(x,y,lwd=2,col.line='red', ...) > }, > xlab = xylabels[1], ylab= xylabels[2]) > } > > plotwithfits(Latency ~ Stimulus.number | Subject.ID,eye_subsetted_data) > # Works > > > > However, I can't get it working if I try the same for a subset and plot: > > explorebysubject <- > function(xvar,yvar,data,condition=TRUE,xlim=c(-Inf,Inf),ylim=c(-Inf,Inf)) { > > temp_subset <- subset(data, > > subset=condition&xvar>xlim[1]&xvarylim[1]&yvar select=c(Group,Subject.ID,xvar,yvar) > ) > > plotwithfits(xvar~yvar | Subject.ID,temp_subset) > } > > explorebysubject(Latency,Primary.gain,eye,Analysis.type == 'reflexive') > # Doesn't work as can't find 'Analysis.type', 'Latency', etc > > I can see why it doesn't work, however, I've looked at substitute, > deparse, etc, without much luck. > > Is there a something simple I could do to fix this up? Is there any > material I should be reading? > > Using the arguments eye$Latency, eye$Primary.gain, etc would partially > solve this problem, however I would prefer to avoid this if possible. > > One unclean way that does work is constructing strings and then > evaluating them. However, as I am planning to write several similar > functions I would prefer to avoid this: > > explorebysubject <- > function(xvar,yvar,dataset,condition,xlim=c(-Inf,Inf),ylim=c(-Inf,Inf)) { > # xvar - variable to plot along x-axis > # yvar - variable to plot along y-axis > # the dataset to use > # condition - used to provide extra conditions on the data selected > # xlim, ylim - optional limits of data to select. i.e. xlim=c(0,1) > > # Generate command to select appropriate data > cmd <- > paste("subset(",dataset,",",condition,"&",xvar,">",xlim[1],"&",xvar,"<",xlim[2],"&",yvar,">",ylim[1],"&",yvar,"<",ylim[2],",select=c(",xvar,",",yvar,",Group,Subject.ID))") > > > temp_subset <- eval(parse(text=cmd)) > > #generate plot command > cmd <- > paste("plotwithfits(",xvar,"~",yvar,"|Subject.ID,temp_subset)") > eval(parse(text=cmd)) > > } > > explorebysubject('Latency','Primary.gain','eye',"Analysis.type == > 'reflexive'",xlim=c(90,500),ylim=c(0.5,1.5)) # Works > > Thanks. > > Cheers, > Daniel > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sample nth day data in each month
Carles Fan a écrit : > Dear all > > i have a time series containing trading dates and historical stock prices: > Date Price > 10-Jan-2007 100 > 11-Jan-2007 101 > 13-Jan-2007 99 > .. > .. > .. > 10-Nov-2007 200 > > i want to sample every 21st data of each month: > 21-Jan-2007 101 > 21-Feb-2007 111 > 21-Mar-2007 131 > .. > .. > .. > 21-Oct-2007 140 > > 1) how can i do that? YourDataFrame[strptime(YourDataFrame$Date,"%Y-%b-%d")$mday==21,] # beware your locale ! > 2) if some of the dates are non-trading day, how can i tell "R" to use > "modified following" or "following" data? Dunno : what is a"non-trading day" ? HTH Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] odf and unzip: unzip not found
Steve Powell a écrit : > hi list members > I am trying to use odfWeave with R 2.5.1 on Windows XP. > > however when running e.g. > odfWeave(demoFile, outputFile) > > I get: > Error in odfWeave(demoFile, outputFile) : Error unzipping file > In addition: Warning message: > unzip not found in: system(zipCmd[2], invisible = TRUE) > > presumably my zip and unzip are not set up correctly but I dont know how to > do that. I installed zip and unzip from info-zip.org as suggested in the help > file, and think I managed set my Windows path to include the folders where > they are installed, but still no luck. Any ideas? > thanks ?odfWeaveControl It'll give you, among other things : [ ... ] Usage: odfWeaveControl( zipCmd = c("zip -r $$file$$ .", "unzip -o $$file$$"), cleanup = TRUE, verbose = TRUE) Arguments: zipCmd: a string for the zipping/unzipping the 'odt' file via a system call. The token '$$file$$' will be gsub'ed with the file name. [ ... ] HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] analysis of large data set
Prof Brian Ripley a écrit : [ ... ] > What is strange is that no one has ever thanked us for finding out > (despite most Microsoft documentation) that you can get up to 3.5Gb in a > 32-bit process on certain 64-bit versions of Windows and enabling you to > use it in recent versions of R. Maybe because most people seeking to use large memory spaces gave up on Windows and went Unix/Linux instead ? Don't get me wrong : Maintaining and extending R abilities is an amazing feat, and is extremely appreciated by lots of users. However, many people find with time that the Unix/Linux environment, facilities, etc... are much more fitted to statistical workflow than their Windows counterparts and switch to it (at least for this part of their workload). Afterwards, the Windows-specific parts of a "New features end enhancements" section in an R release announcement tent to be a bit ... overlooked (seen, read and almost immediately forgotten, I confess...). Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Getting Annual (Conditional) Averages
Dear Lucia, lucia a écrit : > Hello, > I'm very new to R, and so my question is simple. > > I have data record with 80 years of daily temperatures in one long > string. The dates are also recorded, in YYMMDD format. I'd like to > learn an elegant simple way to pull out the annual averages. > (Obviously, every 4th year has 366 days.) > > I know I can set up a formal loop to create annual records and then > average. But R seems to have such neat methods, is there some better > way to do this? For sake of simplicity, let's say you managed to store your data in a two-column dataframe df, with columns date and temperature. The first step is to know how to extract the "year" part of the dates. The obvious solution is of course as.numeric(substr(date,1,2)), but I'd rather transform your date variable in genuine R's Date class variable, by as.Date(date,format="%y%m%d") or as.POSIXlt(date,format="%y%m%d"), the latter allowing easy year extraction by reading the "year" component. The second step is of course to apply to each of your relevant subtables the "mean" function ; that's what tapply() is meant for. So, a one-liner for your proble might be : Means<- tapply(df$temperature, as.POSIXlt(df$date,format="%y%m%d")$year, FUN=mean, na.rm=TRUE) However, this is only a very crude way to work with time series. I'd consider converting permanently your date variable in a suitable datetime representation. There are more than one way to do it : Julian dates, chron objects and, more recently introduced, DateTime classes, which seems to be "the standard way" to represent dates and times. Unfortunately, different packages teeem to expect different representations... . HTH __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Equal confidence interval arrowhead lengths across multiple-paneled lattice plots with free y-scales
Hmmm... Isn't this : Deepayan Sarkar a écrit : > On Nov 16, 2007 5:26 AM, Bob Farmer <[EMAIL PROTECTED]> wrote: >> Yep, that did it, thanks. > > Great. > > [...] > >>> The obvious approach would be to use a relative unit like "cm" or >>> "inch" instead of an absolute one like "native". Does that not work? > > Of course, that should have had 'absolute' instead of 'relative' and vice > versa. a good candidate for the "fortune()" hall of fame ? Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multiple comparisons/tukey kramer
Tyler Smith a écrit : > Hi, > > I'm trying to make sense of the options for multiple comparisons > options in R. I've found the following options: [ Snip ... ] > As I understand it, there is no universal consensus as to which test > is best. There is no such thing. Each of the procedures is aimed at some subset of the possible contrasts you may want to test. From my limited knowledge, with some(I hope not too gross) simplifications : - Tukey HSD will enable you to test the p(p-1)/2 pair differences one can create with p groups ; - Dunnett's procedure is made to compare (p-1) "treatments" to a common control ; - Scheffé's procedure is applicable to *any* ("reasonable") set of contrasts you can form ; - Newman-Keuls : aims to create separate subset of groups (but has serious conceptual and technical flaws ! Don't do that nunless you know what you're doing...). - etc ... Those procedures have the same hypothesis as the base ANOVA : homoscedasticity, normality of residuals, etc ... Their robustness to a small departure friom these conditions vary. As a last resort, a set of non-parametric comparisons with the (overly conservative) Bonferroni adjustment may help (but less power !). You'll have to refer to the subject matter to make a choice. > TukeyHSD appears to be appropriate for > balanced designs. I > have an unbalanced design to analyze. Therefore, no aov(..) (hence no TukeyHSD(aov(...))) for you... >I can use glht, but can someone > tell me what each option is actually calculating? A reference to a > paper that describes the procedures would be great, but I'm afraid I > the one offered in ?glht[1] is beyond me. Google ("multiple comparisons") will offer you some dubious and quite a few good references... HTH Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unweighted meta-analysis
Roy Sanderson a écrit : > Hello > > I'm very much a beginner on meta-analysis, so apologies if this is a > trivial posting. I've been sent a set data from separate experimental > studies, Treatment and Control, but no measure of the variance of effect > sizes, numbers of replicates etc. Instead, for each study, all I have > is the mean value for the treatment and control (but not the SD). With possibly three very special kind of exceptions, what you've been sent is insufficient for any kind of analysis (meta- or otherwise) : no way to assess variability, hence no way to assess relative importance of noise to data or relative importance of different set of data. One possible exception is when the very nature of the experiment imply that your data come from a truly one-parameter distribution. I'm thinking, for example, of count data of rare events, which might, under not-so-unreasonable(-in-special-circumstances) conditions, come from a Poisson distribution. Another possible set of exception is that when the second (and following) parameter(s) can be deduced from "obvious" general knowledge. For example (set in a semi-imaginary setup), one may give you the number of people using a given service at least once during a specified period, *provided* that in order to use this service, people have to register with the service provider first. The data might be a simple number (no valid indication of variability, if service use is too ferquent to be modeled by a Poisson distribution), but looking up the number of registered users in some data bank might provide you with a valid proportion and population size, which is enough to meta-analyze. But the third possibility is of course that your "means" are indeed the result of experimenting on *ONE* experimental unit of each group. This is highly dependent of what is measured and how (an example of this might be industrial production per unit time with two different set of tools/machines in various industrial setups : here, the experimental unit is the industrial setup, and your "mean" is *one* measure of speed). Then, you have *individual* data, that you should analyze accordingly (e. g. t-test or Wilcoxon test if there is no relationship between "treated" and "control" experimental unit, paired t-test or paired Wilcoxon test if you are told that the "means" may be related, etc ...). This is not a "meta-analysis", but an analysis. Outside these possibilities, I see no point of "meta-analysing" anything that isn't analysable by itself. > As > far as I can tell, this forces me into an unweighted meta-analysis, with > all the caveats and dangers associated with it. As far as I can tell, you're forced to tell your sponsor/tutor/whatever either that he doesn't know what he asks for or that he's trying to fool you (and you saw it !) ; which might lead you to ask him to rethink his question, give you more informatin about the measumements and experimental setup, to provide you (or help you find) the missing data, to stop playing silly games or to go fly a kite... > Two possible approaches > might be: > > a) Take the ln(treatment/control) and perform a Fisher's randomisation > test (and also calculate +/- CI). > b) Regress the treatment vs control values, then randomise (with or > without replacement?) individual values, comparing the true regression > coefficient with the distribution of randomisation regression > coefficients. I haven't the foggiest idea of what you're trying to do here : introducing artficial variability in order to separate it for variation between groups ? Unless you are in the case of "one experiment = one experimental unit per group" (see above) with no information about variability, the only information you can use is the *sign* of the difference "Experimental"-"Control" : if all or "almost all" of them go "in the same direction", one might be tempted to conclude that this direction is not random (that's the sign test) . But this is only valid if the hypothesis "Direction is 50/50 random under H0" has validity under the experimental setup, which your story doesn't tell... > Both approaches would appear to be fraught with risks; for example in > the regression approach, it is probable that the error distribution of > an individual randomised regression might not be normal - would this > then invalidate the whole set of regressions? Again, you'd work against an artificial variability that you'd have introduced yourself : what is the point ? HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] pass lm( ) a char vector as the variables to be included
Gavin Simpson a écrit : > On Mon, 2007-11-26 at 14:17 +, [EMAIL PROTECTED] wrote: >>> Here are the codes of the example of lm( ): >>> >>> ## Annette Dobson (1990) "An Introduction to >>> Generalized Linear Models". >>> ## Page 9: Plant Weight Data. >>> ctl <- >>> (4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) >>> trt <- >>> (4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) >>> group <- gl(2,10,20, labels=c("Ctl","Trt")) >>> weight <- c(ctl, trt) >>> anova(lm.D9 <- lm(weight ~ group)) >>> lm.D90 <- lm(weight ~ group - 1) # omitting intercept >>> >>> What I am doing is let the variable name "group" >>> stored in a vector, say, g <- "group". The question is >>> how to strip the quotation marks when we call lm( ) >>> through g? >> Try: >> w = "weight" >> g = "group" >> form = as.formula(paste(w,g,sep="~")) >> lm(form) >> >> Regards, >> Richie. > > For more complicated automation, the ideas and examples from Bill > Venables Programmer Niche article in the R newsletter from a few years > ago might be of use: > > [39] Bill Venables. Programmer's niche. R News, 2(2):24-26, June 2002. > [ bib | PDF | http ] > > The PDF is available here: > > http://cran.r-project.org/doc/Rnews/Rnews_2002-2.pdf Another possibility is to create macro (library(gtools) ; ? defmacro). See Thomas Lumley's paper in R News 2001-3 ("Programmer’s Niche: Macros in R\n Overcoming R’s virtues). HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] newbie polr() question
Max a écrit : > Prof Brian Ripley explained : >> On Mon, 26 Nov 2007, Max wrote: >> >>> Hi everyone, I'm trying to understand some R output here for ordinal >>> regression. I have some integer data called "A" split up into 3 ordinal >>> categories, top, middle and bottom, T, M and B respectively. >>> >>> I have to explain this output to people who have a very poor idea about >>> statistics and just need to make sure I know what I'm talking about >>> first. >>> >>> Here's the output: >>> >>> Call: >>> polr(formula = Factor ~ A, data = a, Hess = TRUE, method = "logistic") >>> >>> Coefficients: >>> ValueStd. Error t value >>> A -0.1259028 0.04758539 -2.645829 >>> >>> Intercepts: >>> Value Std. Error t value >>> B|M -2.5872 0.5596 -4.6232 >>> M|T 0.3044 0.4864 0.6258 >>> >>> Residual Deviance: 204.8798 >>> AIC: 210.8798 >>> >>> I really am not sure what the intercepts mean at all. However, my >>> understanding of the coefficient of A is that as the category >>> increases, A decreases? If I have an A value of 10, how to I figure out >>> the estimated probability that this score is in one of the three >>> categories? >> Use predict(): see the book polr supports for examples (and the theory). > > I appreciate the reply, but have difficulty understanding what you mean > by "the book polr supports"? :-? > > The manuals in R don't reference the polr() command, nor do they write > about ordinal regression in R. (from what I can tell) The documentation > of the polr() doesn't explain the output or the theory... I've done web > searches on polr() and the MASS library and have found little of direct > help to my question. Brian Ripley probably means "Modern Applied Statistics with S", W Venables and B. Ripley (4th edn), Springer, 2002. I'd also add "Categorical Data Analysis", Alan Agresti, Wiley (2000). HTH Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] meta-analysis on diagnostic tests (bivariate approach)
Pedro Emmanuel Alvarenga Americano do Brasil a écrit : > Hello friends of R list, > > Im a physician and Im not that good in statistics. I have posted similar > email in the epi-sig list but no one aswered so far. Im cunducting a > systematic review on two diagnostic test for a particular tropical disease. > I found a free software that make most of the analysis called MetaDiSc. > However there is a paticular analysis that I wuould like to try that it is > not implemented in this software. I looked in R for meta-analysis functions > but the packeges available dont work with diagnostic tests. > > Im trying for a while to adapt in R a function develped in SAS and published > in "J.B. Reitsma et al. / Journal of Clinical Epidemiology 58 (2005) > 982?990", wich is a bivariate approach to meta-analysis of diagnotic tests. lmer (in package lme4, IIRC), which *does* treat logistic regression (among other GLMs) directly, should do the trick. Probably with more ease... > Since I do not have any experience with SAS, Im having a hard time trying to > do so. > > There is an appendix at the original text with some step by step (in SAS > syntax) on how to do it but Im stuck on a more advanced stuff which I cant > find a solution. In a step it is mentioned a Van Houwelingen (Statist. Med. > 2002; 21:589–624) approach to incorporate between and within study variances > and using the restricted maximum likelihood estimation. This is a large tutorial paper, which goes well beyond your goals. > This is the script (or function) so far > > bivar<-function(TP,FP,FN,TN){ > ># find a way to sum 0.5 to each null cell if any > > Se<-TP/(TP+FN) > Sp<-TN/(TN+FP) > logSe<-log(Se/(1-Se)) > logSp<-log(Sp/(1-Sp)) > vlogSe<-1/(Se*(1-Se)*(TP+FN)) > vlogSp<-1/(Sp*(1-Sp)*(TN+FP)) > bsvlogSp<-0 > cblog<-0 > bsvlogSp<-0 > sdata<-rbind(bsvlogSp,cblog,bsvlogSp,vlogSe,vlogSp) > > Does anyone know if there is any package or function with this bivariate > random effect model that I cuold try to use? Neither a I could find any > apckage or function to aply the restricted maximum likelihood estimation. > > Any tip that helps me go on or hand in the development of this script > (function) would be most welcome. I'll try to have a look on this problem one of these days (I already have a serious backlog in daylight life...). Out of the top of (what I use as) my mind (assuming that technique and study are factors correctly identifying studies and compared techniques (and forgetting +0.5 in empty cells which can be done along the lines of TP<-TP+0.5*(TP==0), etc ...)) : data<-rbind(data.frame(suc=TP, fai=FN, situation=rep("Diseased",nrow(TP)), study=study, technique=technique), data.frame(suc=TN, fai=FP, situation=rep("NonDiseased",nrow(TN)), study=study, technique=technique)) data$situation<-as.factor(as.character(data$situation)) Now, model<-lmer(cbind(suc,fai)~situation:technique-1+(1|situation %in% study), data=data, family=binomial(link=logit)) should be close to what you're aiming at. But asking for Douglas Bates' advice (in the R-SIG mixed model list) would be much better... However, I am not a sanguine as Reitsma & al about the whole ROC methodology. It has sound foundations for one specific problem : modeling (a simplified type of) signal perception. Its use in medical diagnostic evaluation is perfecly justified for problems of this kind (e. g. choosing, for example, a cutoff point to interpret a biological measurement in clinical terms, or comparing two numerical diagnostic indices). However, this approach ignores the "costs" and "utilities" of a false positive and a false negative, which may lead to very different choices and judgement criteria (I think that Frank Harrell has much better credentials than mine to explain this point). Furthermore, I have reservations about the (ab?)use of this model in situation when there is no numerical measure for which a cutoff point is to be choosen (e. g. ROC methodology in radiology) : while this is curent practice, I am not sure what "real world" meaning such a ROC curve has : as far as I can tell, a physician does *not* choose (reliably) his/her sensitivity level, and therefore cannot "move" his/her operating point along this hypothetical ROC curve. If one accepts the hypothesis that sensitivities and specificities are the only relevant data, the DOR estimation makes more sense (less untestable hypotheses...). Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] odfWeave error
Cleber N. Borges a écrit : > hello all, > I trying to use the package 'odfWeave' > and I get the follow error: > > > > ### error message > # > ... > Removing content.xml > > Post-processing the contents > Error in .Call("RS_XML_Parse", file, handlers, as.logical(addContext), : > Error in the XML event driven parser for content_1.xml: error parsing > attribute name > > > > The piece of the code is: > > ### code > > ... > > <>= > fileplot='C:\\DADOS\\tmp\\plotx.emf' > win.metafile(fileplot) > plot( rnorm(300), col='red', t='l', lwd=2 ); grid() > dev.off() > @ This chunk is pure R code and shouldn't output anything directly in the output file. Since you are telling "echo=T', whatever is output by your code is sent to your output file, and since you assert "results=xml",this output is interpreted as xml ; since this isn't XML, your're getting guff from the XML interpreter. I'd rather do : <>= invisible({ # Whatever you did ... }) @ instead. Your second chunk : > <>= > odfInsertPlot(file=fileplot, height=5, width=5 ) > @ should insert your plot in the output file. [Snip ... ] BTW : Windows (enhanced) metafile are Windows-specific. As far as I can tell, recent versions of Office and OpenOffice.org correctly render Encapsulated Postcript files, thus freeing you from another Windows demendency. Unless you *have* to have an EMF output (it happens, I know ...), youd'better use use this format directly. HTH Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Which Linux OS on Athlon amd64, to comfortably run R?
Prof Brian Ripley a écrit : > Note that Ottorino has only 1GB of RAM installed, which makes a 64-bit > version of R somewhat moot. See chapter 8 of > > http://cran.r-project.org/doc/manuals/R-admin.html Thank you for this reminder|tip ! I didn't re-read this document since ... oh, my ... very early (1.x ?) versions. At which time my favorite peeve against R was the fixed memory allocation scheme. I would have thought that 64 bits machines could take advantage of a wider bus and (marginally ?) faster instructions to balance larger pointers. Am I mistaken ? Thank again ! Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simple problems
Gabor Grothendieck a écrit : > On Dec 6, 2007 3:26 PM, marciarr <[EMAIL PROTECTED]> wrote: [ Snip ] >> 2- how do I solve a simple equation? Considering the equation y= exp(-x)^12, >> I would like to find the values of x for, for example, y=0.01, so >> exp(-x)^12=0.01. How do I do that using R? >> I know those a probably very, very simple questions, but for which I do not >> seem to find the answer. > > Search between 0 and 1 for the root of the indicated function: > > uniroot(function(x) 0.1 - exp(-x)^12, 0:1) I beg your pardon ? I'd rather use high-school algebra/analysis : log(exp(-x)^12)=log(0.01) 12log(exp(-x)=log(0.01) -12x=log(0.01) x=-log(0.01)/12=log(100)/12 Rushing for a sophisticated numerical tool without thinking for an explicit solution is easy, fast, lazy (I do that every day ...), but deprives you of the process of understanding the problem. Which was probably the point of this (probable) homework... Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using "eval(parse(text)) " , gsub(pattern, replacement, x) , to process "code" within a loop/custom function
Thomas Pujol a écrit : > R-help users, > Thanks in advance for any assistance ... I truly appreciate your expertise. > I searched help and could not figure this out, and think you can probably > offer some helpful tips. I apologize if I missed something, which I'm sure I > probably did. > > I have data for many "samples". (e.g. 1950, 1951, 1952, etc.) > > For each "sample", I have many data-frames. (e.g. temp.1952, births.1952, > gdp.1952, etc.) > > (Because the data is rather "large" (and for other reasons), I have chosen > to store the data as individual files, as opposed to a list of data frames.) > > I wish to write a function that enables me to "run" any of many custom > "functions/processes" on each sample of data. > > I currently accomplish this by using a custom function that uses: > "eval(parse(t=text.i2)) ", and "gsub(pat, rep, x)" (this changes the "sample > number" for each line of text I submit to "eval(parse(t=text.i2))" ). > > Is there a better/preferred/more flexible way to do this? Beware : what follows is the advice of someone used to use RDBMS and SQL to work with data ; as anyone should know, everything is a nail to a man with a hammer. Caveat emptor... Unless I misunderstand you, you are trying to treat piecewise a large dataset made of a large number of reasonably-sized independent chunks. What you're trying to do seems to me a bit reinventing SAS macro language. What's the point ? IMNSHO, "large" datasets that are used only piecewise are much better handled in a real database (RDBMS), queried at runtime via, for example, Brian Ripley's RODBC. In your example, I'd create a table births with all your data + the relevant year. Out of the top of my mind : # Do that ONCE in the lifetime of your data : a RDBMS is probably more # apt than R dataframes for this kind of management library(RODBC) channel<-odbcConnect(WhateverYouHaveToUseForYourFavoriteDBMS) sqlSave(channel, tablename="Births", rbind(cbind(data.frame(Year=rep(1952,nrow(births.1952))), births.1952), cbind(data.frame(Year=rep(1953,nrow(births.1953))), births.1953), # ... ^W^Y ad nauseam ... )) rm(births.1951, births.1952, ...) # get back breathing space Beware : certain data types may be tricky to save ! I got bitten by Dates recently... See RODBC documentation, your DBMS documentation and the "R Data Import/Export guide"... At analysis time, you may use the result of the relevant query exactly as one of your dataframes. instead of : foo(... data=birth.1952, ...) type : foo(... data=sqlQuery(channel,"select * from \"Births\" where \"Year\"=1952;", ...) # Syntax illustrating talking to a "picky" DBMS... Furthermore, the variable "Year" bears your "d" information. Problem (dis)solved. You may loop (or even sapply()...) at will on d : for(year in 1952:1978) { query<-sprintf("select * from \"Births\" where \"Year\"=%d;",year) foo(... data=sqlQuery(channel,query), ...) ... } If you already use a DBMS with some connection to R (via RODBC or otherwise), use that. If not, sqlite is a very lightweight library that enables you to use a (very considerable) subset of SQL92 to manipulate your data. I understand that some people of this list have undertaken the creation of a sqlite-based package dedicated to this kind of large data management. HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.