created when you used one of the read.*() functions. Use str(samples) to see
what you are dealing with.
0.67403447 1.79825459
The decimal point is at the |
-2 | 5
-2 | 4
-1 |
-1 | 432000
-0 | 87755
-0 | 442110
0 | 001244
0 | 556789
1 | 113
1 | 5788
> # Success
Depending on your operating system, you may also be able to save the output
with File | S
Please let me know, if I have used the function in right way?.
Thank you
Let's try a simple example.
> # Create a script file of commands
> # Note we must print the results of quantile explicitly
dates[3] - dates[1])
Time difference of 2 days
# But
> with(myData, p_dates[3] - p_dates[1])
Error in p_dates[3] - p_dates[1] :
non-numeric argument to binary operator
, but
ks.test() just sees that you have provided two samples, not one sample and
values along a cumulative distribution.
s a symmetric matrix. Just like
There is also a way to do this without a loop:
> strsplit(x, "")
[1] "t" "e" "s" "t" "i" "n" "g"
# Or if you just want the vector
> strsplit(x, "")[[1]]
[1] "t" "e" "s&q
In addition to stem() in the graphics package, there are other implementations
of stem-and-leaf plots that add additional features such as stem.leaf() in
package aplpack which will includes a function to produce back to back stem and
leaf plots.
ble: Petal.Width
meansd n
setosa 0.246 0.105 50
versicolor 1.326 0.198 50
virginica 2.026 0.275 50
3456789 1011
[3,]3456789 10 1112
[4,]456789 10 11 1213
[5,]56789 10 11 12 1314
pace) == 0 && !missing(namespace),
fixNamespaces = c(dummy = TRUE, default = TRUE))
Try the help files:
useful to
know if the missing values are concentrated in particular rows or columns so
that eliminating a few rows and columns could substantially reduce the
percentage of missing values.
Dear Prof Carlson,
Thank you for your reply. I'm using 'vegan' with 'vegdist' and 'bray'. I have a
selection of datasets that cover different time periods (converted to
z-scores), so a record that start
la to BloodPressure~Age (which makes more
sense than predicting age from blood pressure) or change the plot command to
plot(BloodPressure, Age, ...) and change the xlab to ylab and ylab to xlab).
: 5 obs. of 3 variables:
# $ Sites : Factor w/ 5 levels "Site1","Site2",..: 1 2 3 4 5
# $ temp : num 14 15 12 12.5 17
# $ precip: Factor w/ 4 levels "20","high","low",..: 2 3 4 1 1
Or add the type column first and then rbind:
x <- list(A=data.frame(x=1:2, y=3:4),B=data.frame(x=5:6,y=7:8))
x2 <-, lapply(names(x), function(z)
data.frame(type=z, dat[[z]])))
idxa <- rep(1:4, each=2)
idxb <- rep(1:2, 4)
ab <- (a[idxa, ] & b[idxb, ]) == b[idxb, ]
c <- cbind(idxa, idxb)[apply(ab, 1, all), ]
# idxa idxb
# [1,]21
# [2,]22
# [
Typo: dat[[z]] should be x[[z]]:
x2 <-, lapply(names(x), function(z)
data.frame(type=z, x[[z]])))
type x y
1A 1 3
2A 2 4
3B 5 7
4B 6 8
clude all of the R
The tutorial is from the website, but this post is missing.
Have you contacted the author at
"West")), .Names = c("Year", "Product", "Sales", "Region"), row.names = c(NA,
-15L), class = "data.frame")
It is not clear what you want in your new data frame. This one has 5 years of
data for each tape brand and you seem to want o
sent reasonably close?
What should it look like after it is transformed?
This is not an assignment for school.
This is a project at WORK
The data frame group_PrivateLabel does not contain variables called
Product_Name or Region.
Product_Name or Region.
I think the OP does not realize that head() and tail() do not print anything.
They extract the first or last values/rows and if they are not assigned to an
object, they automatically go to print().
Redefining would also fix that problem.
Also look at the DescTools package for functions KendallTauA, KendallTauB,
larger magnitudes will
determine the groups more than the variables with the smaller magnitudes.
#A B
# 1 0.9148060 0.4577418
# 2 0.9370754 0.7191123
# 4 0.8304476 0.2554288
# 5 0.6417455 0.4622928
# 8 0.134 0.1174874
# 9 0.6569923 0.4749971
# 10 0.7050648 0.5603327
Notice that whichever one we use, the row numbers match the original data frame.
This is your answer:
> str(hold)
Classes 'summaryDefault', 'table' Named num [1:6] -2.602 0.636 1.514 1.54
2.369 ...
..- attr(*, "names")= chr [1:6] "Min." "1st Qu." "Median" "Mean" ...
hold is a table of named numbers, i.e. a vector with a names attribute. It is
not a data.frame so it does
41083 1.000 0.9342174
disp -0.8138289 0.9342174 1.000
> Y$n
mpg cyl disp
mpg5 55
cyl5 55
disp 5 55
> Y$P
mpgcyl disp
mpg NA 0.02010207 0.09368854
cyl 0.02010207 NA 0.02005248
disp 0.09368854
(m), type.convert,
# Ulrik's solution but without the pipes. Shows why you need 2 as_tibbles()
After creating ppdat and ppdat$Valbin, aggregate() will get you the churn
> aggregate(Churn~Valbin, ppdat, mean)
Valbin Churn
1 (20.9,43.7] 0.833
2 (43.7,66.3] 0.000
3 (66.3,89.1] 0.500
important information.
I need help with R,
could also use
Cum_RespRate <- cum_R/cum_n)*100
Subject: Re: [R] Looking for a package to replace xtable
All is perfect, almost - after I ran your corrections.
Is there a way I can have more control of the column names, i.e.,
not be restrict
1's and 0's and those need to be randomized, sample(data) will do it for you.
Then those numbers are replicated 10 times. Why not just select 500 values
using rbinom() initially?
I've attached a modification of your script file (called .txt so it doesn't get
stripped). See if this does what you want.
u wish to include
# Your list of variables will be the vector mycols
mycols <- choose$cols[choose$select==1]
6zy NA
7xz 67
8yz 23
> moredata.df[order(moredata.df$Freq, decreasing=TRUE), ]
Var1 Var2 Freq
7xz 67
8yz 23
2 yx NA
6zy NA
7xz 67
8yz 23
$ z: int 67 23 0
> data.frame(as.table(as.matrix(mydf)))
Var1 Var2 Freq
2yx NA
3zx NA
6zy NA
No. You are not using the correct command. Time to read the manual:
You will find the answer to your question by looking at the alternate forms of
formatted and contains the number of records you think it does?
You could try installing package ExtDist and using distribution Beta_ab in that
even though it
does not appear in the data:
> set.seed(42)
> x <-, 10, replace=TRUE)
> table(x)
1 2 4 5 6
1 1 3 3 2
> y <- factor(x, levels=1:6)
> y
[1] 6 6 2 5 4 4 5 1 4 5
Levels: 1 2 3 4 5 6
What Rui said, but as important, you have four columns in your data called
"town", "year", "revenue", and "supply". You do not have a column called
Actually, r is a vector, not an index value. You need
apply(compare_data, 1, function(r) cor(r, t(test_data)))
Actually, not using apply() would be faster and simpler
cor(t(compare_data), t(test_data))
M F 52 18 28
# 26 4 5 F F 33 73 22
# 27 4 6 F F 33 66 29
# 28 4 7 F F 33 18 47
# 34 5 6 F F 73 66 55
# 35 5 7 F F 73 18 7
# 42 6 7 F F 66 18 14
dta12 <- data.frame(dta12, dsim=as.vector(dsim)) # Typo was here
dta12 <- dta12[, c("ID1", "ID2", "gender1", "gender2", "age1", "age2", "dsim")]
64 65 66 67 68 69 70
$ X8 : int [1:2, 1:5] 71 72 73 74 75 76 77 78 79 80
$ X9 : int [1:2, 1:5] 81 82 83 84 85 86 87 88 89 90
$ X10: int [1:2, 1:5] 91 92 93 94 95 96 97 98 99 100
est[, "id"]), test[, "id"], sample, size=1)
test[indx, ]
# xcor ycor id
# [1,]46 1
# [2,]42 2
[5,] 48.28083 51.24702 44.78204
[6,] 45.69531 45.71741 48.25982
[7,] 47.42731 43.86328 55.15668
[8,] 54.55450 55.67621 47.28236
[9,] 56.42899 47.26354 51.90019
[10,] 50.89833 41.99718 50.46564
[11,] 55.81824 51.63207 53.83847
[12,] 50.88440 53.68807 44.30
] [,3] [,4] [,5]
# 25 - 34 11 15 NA NA NA
# 25 - 77 15 85 NA NA NA
# 34 - 39 11 NA NA NA NA
lts) <- c("mealAcode", "mealBcode", "id")
This pre-allocates space for a million rows so it should be even faster, but it
will fail if there are more rows, so guess high.
There are some specialized packages such as data.table and dplyr in R that
might b
19 50 21
# 5 1981 2 1 17 49 25
# 6 1981 2 2 20 47 23
# 7 1981 2 3 21 52 27
The attached .png image file shows you how to send plain text emails to r-help
using gmail.
using gmail.
How about?
Trade <- xtabs(FLOW ~ iso_o + iso_d + year, dta)
Gives you a 3d table with FLOW as the cell entry. Then
apply(Trade, 1:2, sum, na.rm=TRUE)
Gives you a 2d table with the total flow
Refer to columns by position rather than name and everything is simpler:
for (i in 2:4 ) {
test[, i] <- test[, i] + test[, i-1]
Note your approach fails on the first line since you start with i=1 and there
is no Day0. Another approach that is simpler:
t(apply(test, 1, cumsum))
# We get 8 groups, 4x2
al page for function read.csv(). One of the problems with
spreadsheets is that these extra spaces are not readily apparent.
It looks like your printouts are based on the R summary() function? The
function lists the number of cases in the 5 largest categories when the
variable is coded as a FACTOR.
f2(z, 2)
all.equal(z1, z2)
# [1] TRUE
erence: 0.444 >"
[2] "Mean relative difference: 0.1262209"
> all.equal(f(z,4),f2(z,4))
[1] "Attributes: < Component “dim”: Mean relative difference: 0.5714286 >"
[2] "Mean relative difference: 0.5855162"
David C
-Original Message-
have it appeared to do what I want.
Thanks again,
Thanks again,
My error. Clearly I did not do enough testing.
z <- array(1:24,dim=2:4)
[1] TRUE
[1] TRUE
[1] &qu
(DFM <- data.frame(DFM, tmat[idx, ]))
#obs startend D bin t1 t2 t3 t4 t5
# 1 1 2015-02-01 2017-01-01 700 [500,Inf) 0 0 0 0 0
# 2 2 2010-04-11 2011-01-01 265 [200,300) 0 0 1 -1 -1
# 3 3 2006-01-04 2007-05-03 484 [400,500) 0 0 0 0 1
# 4 4 2007-10-
m1 100 300 - -
# m2 - - - -
# m3 - - 400 -
# m4 - - - -
# m5 - - - -
# [1] "xtabs" "table"
# MN is a table. If you want a data.frame
MN <-
# [1] "data.frame"
that index would be to use by():
idx <- as.vector(by(Daily, Daily$wyr, function(x) rownames(x)[which.max(x$Q)]))
Daily[idx, ]
quot;, diag=F)
text(1, dim(M)[1] - .1, "mpg", srt=90, xpd=TRUE)
# Replace first row/colnames if you will be using M later
colnames(M)[1] <- "mpg"
rownames(M)[1] <- "mpg"
gt; str(Test)
'data.frame': 3 obs. of 2 variables:
$ TransitDate: Date, format: "2013-04-01" "2013-06-01" ...
$ CargoTons : int 50 40 30
> Test
TransitDate CargoTons
1 2013-04-0150
2 2013-06-0140
3 2013-07-0130
;ID" "Ageclass"
> levels(ind.davis$Ageclass) <- c("Adult", "Juvenile", "Sub-adult")
> levels(ind.davis$Ageclass)
[1] "Adult" "Juvenile" "Sub-adult"
> str(ind.davis)
'data.frame': 10 obs. of 2 va
dnom # These are the alternating denominators
[1] 150 200
> for (i in res) {
+ r <- i %% 2 + 1
+ s <- seq_len(i-1)
+ L[i] <- abs(sum(L[s] * rows[r, s]))/ dnom[r]
+ }
> L
[1] 0.1000 0.1333 0.0667 0.0889 0. 0.14814815
[8] 0.24691358 0.3086
How about
> difftime(LAI_simulation$Date, LAI_simulation$Date[1], units="days")
Time differences in days
[1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13
.4519231NA NANA
n are limited to the resolution of a computer screen,
pretty low. You probably want to plot to a graphics device that saves the file
so that you can specify a higher resolution. The R command
should get you started.
re more columns than rows.
at only one site. I suspect there is a problem with your data or with the
way you have coded the data.
way you have coded the data.
it, but no one seems to be able to provide an authoritative citation
before proceeding to demonstrate that it is false.
but would use
some measure of association/correlation.
y are still character strings.
3rd No 35 387 1789
Yes 1375 1476
Crew No 0 670 0 3
Yes 0 192 020
The default margins are set as lines below, left, top, and right using
mar=c(5.1, 4.1, 4.1, 2.1). Just change the top margin something like 1.1:
par(mfrow=c(1,2), mar=c(5.1, 4.1, 1.1, 2.1))
dom number:
> rnd()
[1] 4.036111
> rnd()
[1] 3.88048
> rnd()
[1] 3.984268
> rnd()
[1] 3.808441
> rnd()
[1] 4.219925
Just change the separator:
Titanic.df <-
boxplot(Freq~Class*Sex, Titanic.df, cex.axis=.6, sep="\n")
See attached .png.
Minor modification:
fff <- function(x) as.numeric(chartr(",", ".", x))
BX <- sapply(AX, fff)
Or this keeps the original data frame:
AX[, 1:2] <- sapply(AX[, 1:2], fff)
e tables by hand
> write.csv(dat1[Samples$A[ , 1:10], ], row.names=FALSE, file="Test.csv")
4 1
20014 1
20050 0
4391076 19990 0
20000 0
20010 0
20059 1
You should read these manual pages:
> sapply(test, gsub, pattern=",", replacement=";")
"a;b;c;d" "g;h;f"
+ ylab=lbls[colnos[i, 2]])
+ }
Plots all of the unique plots.
ts 3 groups as well:
> plot(density(data_mat))
ot;a" "A"
1.1 "a" "A"
2 "b" "A"
2.1 "b" "A"
3 "c" "A"
4 "d" "A"
> rownames( <- NULL # Optional - get rid of row names
> head(
a b
[1,] "a&qu
8 4.905341 6.035104 5.089833
9 7.018424 4.391074 2.006910
14 4.721211 2.585792 6.399737
16 5.635950 5.205999 6.302543
18 2.343545 5.758163 6.038506
19 2.559533 4.273295 5.920729
20 6.320113 3.631719 5.720878
23 4.828083 6.444101 5.623518
directory you want and then click the More tab and select "Set As
Working Directory."
Working Directory."
Warning message:
In chisq.test(rbind(c(transitions1), c(transitions2))) :
Chi-squared approximation may be incorrect
Running this code will create the function.
50), ylabs=rep("", 10))
s. But labeling them is not
easy since the coordinates are based on the columns:
> par("usr")
[1] -6.705729 7.179791 -6.705729 7.179791
