I have two data frames of flight data, but they have very different
numbers of rows. They come from different sources, so the data are not
> names(oooi)
[1] "FltOrigDt" "MkdCrrCd"
[3] "MkdFltNbr" "DprtTrpnStnCd"
[5] "ArrTrpnStnCd"
On 3/20/2010 11:52 AM, Daniel Malter wrote:
If the flight identifiers runway$Flight and oooi$Flight are unique (i.e.
only one observation has the same identifier in each dataset), you could use
merge() to bind together the dataset based on matching the two. See,
Also, I see an OnDate vari
I have a date/time imported from Excel in my dataframe oooi (with
several hundred thousand rows), for example the input data near row 3100 is
11/12/2008 21:35
11/12/2008 22:03
11/12/2008 22:12
11/12/2008 22:38
11/12/2008 23:16
11/12/2008 23:23
11/13/2008 7:00
11/13/2008 7:03
I need to write out the result oif a tapply
avtaxi = tapply(mdf$TaxiTime, list(mdf$Runway, mdf$OnHour,
mdf$ArrivalGate), FUN=mean, na.rm = TRUE)
to a data file that I can import into Excel.
[1] 10 24 100
dput(avtaxi, file = outfile, control = c("keepNA", "keepInteger",
I have a data frame with some rows that are almost, but not quite
duplicates of each other.
By using duplicated(key) on one column and on the column reversed, I can
get 2 columns in my data frame that have adjacent rows with TRUE. For
keyvalue ddi
On 3/24/2010 10:58 PM, Peter Alspach wrote:
> jim
key value
1 1 1
2 2 0
3 2 2
4 3 0
5 4 0
6 5 1
7 6 3
8 6 2
9 7 0
> > tt <- rle(jim$key)$lengths
> > ttJim <- jim[cumsum(tt)-tt+tapply(jim$value, jim$key, which.max),]
> > ttJim
What is the definition of the whiskers in the ggplot2 qplot with
Why is it so hard to find?
Jim Rome
In general, one should be able to turn off the legend entirely.
I did a plot with geom_jitter() and then overlaid it with geom_boxplot() and
I got a legend
with a sort of box drawn in a legend that was meaningless since there was no
factor involved.
View this message in context:
I have data that is collected in two different time zones
z$OnDateTime <- as.POSIXct(runway$OnDateTime, tz = "EST5EDT",
format="%m/%d/%Y %H:%M")
is in Eastern time, and
zi$ActualOnLocal <- as.POSIXct(oooi$ActualOnLocal, tz="MST7MDT",
format="%m/%d/%Y %H:%M")
is in Mountain".
I converted the runway
I need to make a bunch of PDF files of histograms. I tried
gatelist = unique(mdf$ArrivalGate)
for( gate in gatelist) {
outfile = paste("../", airport, "/", airport, "taxiHistogram", gate,
".pdf", sep="")
pdf(file = outfile, width = 10, height=8, par(lwd=1))
title=paste("Taxi tim
On 3/31/2010 10:01 PM, Berwin A Turlach wrote:
G'day James,
On Wed, 31 Mar 2010 21:44:31 -0400
James Rome wrote:
> I need to make a bunch of PDF files of histograms.
> What am I doing wrong?
I am drawing a density histogram, and want to label the plots with the
mean using ltext(). But I need the x,y coordinates to feed into ltext,
and I can't calculate them easily from my data. Is there a way to get
the x and y ranges being used for the plot, so I can put the text at the
correct positi
The key was to use grid.text() inside the panel function. It allows you
to specify things in 0-1 "npc" units.
On 4/1/10 12:23 PM, David Winsemius wrote:
> On Apr 1, 2010, at 11:53 AM, James Rome wrote:
>>> I am drawing a density histogram, and want to label the plo
I am trying to calculate quantiles of a data frame column split up by
two factors:
# Calculate the quantiles
quarts = tapply(gdf$tt, list(gdf$Runway, gdf$OnHour), FUN=quantile,
na.rm = TRUE)
This does not work:
> quarts
04L 04R 15R 22L 22R 2732
I would like to make a series of bwplots with scales that are the same
on each plot. x is hours of the day, so I chose
hrs = seq(0, 24, 4)
hrlabs = c("0","4","8","12","16","20","")
g = bwplot((gdf$tt)~gdf$OnHour | gdf$Runway, data=gdf,
ylab="Taxi time (min
Dear List,
I am having problems getting my box and whisker plots to put boxes on
the right x-axis values.
x is the hour of the day from 0 to 23.
> unique(mdf$OnHour)
[1] 5 4 6 7 11 12 9 8 19 14 13 21 20 10 18 17 15 16 23 22 0 1 3 2
mdf$OnHFact = factor(mdf$OnHour, levels=seq(0,23),
Alas, no one answered my last post about my problem of doing bwplot.
I think my problem is related to the fact that there is a value missing
in my data:
[1] "0" "1" "2" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13"
"14" "15"
[16] "16" "17" "18" "19" "20" "21
Dear R-Help,
With the attached data set, I am still getting incorrect bwplots
> xyplot(gdf$tt~gdf$OnHour |gdf$Runway, data=gdf) # Is correct
> bwplot(gdf$tt~gdf$OnHour |gdf$Runway, data=gdf, horizontal=FALSE) #
Puts the boxes on the wrong x-axis values
# look especially at 0 and 3. How do
On 4/16/2010 8:27 PM, Jun Shen wrote:Jim,
Try this,
Jun Shen from Millipore
I already tried using a factor, and the data set I enclosed had
gdf$OnHFact which was already a factor. It gave the same wrong plot.
What did work was to call xyplot instead
The data are at http://dl.dropbox.com/u/537118/gdf.zip
On 4/17/2010 1:42 AM, Deepayan Sarkar wrote:
On Fri, Apr 16, 2010 at 1:54 PM, James Rome wrote:
> Dear R-Help,
> With the attached data set, I am still getting incorrect bwplots
None of your attachments came through
I have a large data set of airport data and wish to analyze it by hour
and day of the week. hour and day of the week are factors.
I can do something such as:
histogram(~(Arrival.Val) | DAY*Hour, type="count", breaks=60)
which displays the data the way I want it in principle, but the plots
are too
Thanks for the help.
I tried making the pdf file as suggested. Acrobat said it was damaged
and could not be opened. Is this an R bug?
It did make a PostScript file that I was able to distill into PDF, but
it was gray scales. How do I get the color back?
And yes, I did do the layout I wanted so I c
I did
histogram(~(Arrival4) | as.factor(Hour), type="count",
breaks=16,ylab="Arrival Count",
xlab="Arrival Rate/4",main="Friday EWR A22R D22L Configiration",
layout=c(6,4), par.strip.text=list(cex=0.7))
Why do I get plots with different bar widths? See
I am sorry to be bothering the list so much.
I made a table of counts of flight arrivals by hour:
cnts=tapply(Arrival4,list(Hour),table). There are up to 15 arrivals in a
> cnts
1 2 3 4 5 6 7 8 9 10 13
1 2 5 9 2 7 5 4 2 4 1
1 2 3 4
3 2 2 1
1 3
2 2
. .
I had an NA in one row of my data frame, so I called na.omit(). But I do
not understand where that row disappeared to.
> fri
Date.OnlyDAY Hour Min15 Quarter Arrival.Val Arrival4
109/05/2008 Friday833 3 328
210/24/2008 Friday 21
Dear kind list people:
I have the following code:
[1] "0" "1" "2" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13"
"14" "15"
[16] "16" "17" "18" "19" "20" "21" "22" "23"
> alist
[1] 3 10 10 6 5 6 4 8 9 3 7 5 8 3 6 7 2 6 6 1 4 8
10 4 10
[26] 13 6 2 8 4 7
rix(length(alist), 1000)
> and then within the loop:
> mn[i,j] <- value
I am creating a lattice plot with
with(ordgdp, xyplot(delivery~AAR | GDP_ID, xlab="AAR", ylab="Actual
Arrival Rate"))
which works, and gives me 48 plots, one for each GDP_ID.
But I would like to put the number of hours that each GDP lasted on the
relevant plot. This is given by the following table
I have two data frames. One (arr) has all arrivals to an airport for a
year, and the other (gw) has the dates and quarter hour of the day when
the weather is good. arr has a Date and quarter hour column.
[1] "Date" "weekday" "hour" "month""minute"
[6] "q
> arr$Date <- as.Date(as.character(arr$Date),format=something)
> Then again, it may be possible to do the actual merging using merge().
> Stephan
> James Rome schrieb:
>> I have two data frames. One (arr) has all arrivals to an airport for a
> >
> > Then again, it may be possible to do the actual merging using
> merge().
> >
> > HTH
> > Stephan
> >
> >
> > James Rome schrieb:
> >> I have two data frames. One (arr) has all arrivals to an
On 1/17/10 1:06 PM, David Winsemius wrote:
> On Jan 17, 2010, at 12:37 PM, James Rome wrote:
>> I don't think it is that simple because it is not a one-to-one match. In
>> the arr data frame, there are many arrivals in a quarter hour with good
>> weather on
L842", "AWE307", "BTA1234",
> "BTA2064", "BTA2085", "BTA2347", "BTA2405", "BTA2916", "BTA3072",
> "BTA3086", "CHQ5312", "CJC3225", "CJC3359", "COA1166", &q
ction) would do that
> or you could construct a matrix whose entries were the categories good
> /bad. The table function could create the matrix for the purpose of
> using an indexed solution if you are dead-set against the merge concept.
> On Jan 17, 2010, at 4:47
ched to those that are in
> "arr", which are also apparently not so defined. Let's see a better
> codebook or description of these variables.
> On Jan 17, 2010, at 6:47 PM, James Rome wrote:
>> Here are some sample data sets.
>>> I also tried m
[1] 11269
>>> table(arr2$gw)
>>> 0 1
>>> 661 465
>>> with(arr2, table(Date, gw))
>>> gw
>>> Date 0 1
>>> 2009-01-01 368 99
>>> 2009-01-02 266 348
I successfully combined my data frames, and am now on my next hurdle.
I had combined the data and quarter, and used tapply to count the
entries for each unique date/quarter pair.
ar= tapply(ewrgnd$gw, list(ewrgnd$dq), sum) #for each date/quarter
combination sums the gw (which are all 1)
ames(x) <- c('Date', 'quarter')
> x$Date <- as.Date(x$Date)
> x$quarter <- as.numeric(x$quarter)
> do.call() takes a function as its first argument and a list as its
> second argument.
> HTH,
> Dennis
> On Mon, Jan 18, 2010 at 1:48 PM
On my Mac (Snow Leopard and R64), I had been using Rcmdr nicely. But now
when I do
> library(Rcmdr)
the tk libraries load, but the Rcmdr window never appears.
> library(Rcmdr)
Loading required package: tcltk
Loading Tcl/Tk interface ...
And if I try to do anything else, R goes to 100% cpu and ha
Apparently tcltk is the issue. That hung my R. How do I reinstall tcltk?
On 1/27/10 3:06 PM, John Fox wrote:
> library(tcltk
> tk_choose.dir(getwd(), "Choose folder")
Yes, reinstalling it worked. I wonder what changed to cause this,
especially if it happened to someone else?
And I was a bit worried because I think tcl/tk is installed elsewhere
too. In /usr/local/lib, there is a tcl 8 and tcl 8.5. How do I clean up
this install?
On 1/27/10 5:07 PM,
Is there a way to allow a user to select multiple items in a menu()?
Jim Rome
and prov
Dear kind R helpers,
I have a vector of runway names in rwy ("31R", "31L",... the number is
user selectable)
arrgnd is a data frame with data for all flights and all runways, with a
Runway column.
I am trying to subset arrgnd into a dat frame for each selected runway,
and then combine them b
AircraftType Tail Arrived STA Runway FromTo Delay Operator dq gw
Jim Rome
On Feb 1, 2010, at 5:30 PM, David Winsemius wrote:
On Feb 1, 2010, at 5:16 PM, James Rome wrote:
Dear kind R helpers,
I have a vec
, and had to use rbind. If the help for merge said
"To merge two dataframes (datasets) horizontally" I would have known
right away that it was the wrong function to use.
Thanks for the help,
Jim Rome
t someone on the list can take actually change this file for
the benefit of others.
On 2/2/2010 2:00 PM, Erik Iverson wrote:
James Rome wrote:
> On 2/1/2010 5:51 PM, David Winsemius wrote:
> I figured this out finally. I really believe that the R help write-ups
> are sorely lackin
In my code, I calculate the maximum values with 2 factors using
maxr=with(arrdf, tapply(rate,list(weekday,quarter), max, na.rm=T))
and I want to write out the file so that Excel can read it.
I used
write.table(maxr, fname, sep=",", col.names=TRUE, row.names=TRUE,
quote=TRUE, na="0"
> "1",1,4,7
> "2",2,5,8
> "3",3,6,9
>> write.csv(x.df)
> "","V1","V2","V3"
> "1",1,4,7
> "2",2,5,8
> "3",3,6,9
> write.csv makes it ex
I am trying to get hourly totals, given 15-minute bins.
s = seq(0, 95, 1)
s = floor(s/4) # 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 . . .
> s
[1] 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5
5 5 6
[26] 6 6 6 7 7 7 7 8 8 8 8 9 9 9 9 10 10 10 10 11
uot;28", "29", "30", "31", "32", "33", "34",
"35", "36", "37", "38", "39", "40", "41", "42", "43", "44", "45",
"46", &
On 2/7/2010 1:32 PM, David Winsemius wrote:You have a dataframe with 96
columns and a single row named "Sunday". My guess is that was not your
intent. How did "d" come to exist?
I was trying to make a simpler example. The actual code uses a data
frame maxrdf:
> dput(maxrdf)
structure(list(`0` = c(
On 2/7/2010 1:35 PM, David Winsemius wrote:But to answer your question:
> apply(d, 1, function(z) aggregate(z, by=list(s), FUN=sum) )
That works, but I do not understand why I could not use aggregate
directly. And the answer comes out as a list, which thus far baffles me.
How do I get the
On 2/7/2010 1:57 PM, James Rome wrote:
On 2/7/2010 1:35 PM, David Winsemius wrote:But to answer your question:
> apply(d, 1, function(z) aggregate(z, by=list(s), FUN=sum) )
That works, but I do not understand why I could not use aggregate
directly. And the answer comes out a
I have two dataframe columns of POXIXct data/times that include seconds.
I got them into this format using for example
zsort$ETA <- as.POSIXct(as.character(zsort$ETA), format="%m/%d/%Y %H:%M:%S")
My problem is that when I subtract the two columns, sometimes the
difference is given in seconds, and
(when you don't specify, it chooses the units according to some rules)
> -Don
> At 4:24 PM -0400 6/14/10, James Rome wrote:
>> I have two dataframe columns of POXIXct data/times that include seconds.
>> I got them into this format using for example
>>> zsort$ETA &
On a Windows platform I am trying to count the number of lines in a file.
In a DOS window, the following works:
C:\Users\jar>findstr /R /N "^" D:\my_dir\my_file | find /C ":"
(it works with double \\ also)
But in R, I need to make this string up with the file name I get from
I have a data set where the lines look like:
2011-05-13 00:00:00 EONAAL330 dfa13002516PSCNONA
2011-05-13 00:00:01 EONAAL223 laa13044510AS.NONM
Some lines are missing the field before and after the NON:
2011-05-13 00:00:05 EONBHS229 mia13001621NON
I read them into R using
df = read.fwf(file, wi
I have a data frame as follows:
MsgTypeeotpd fn
FI 2011-05-13 01:40:00 0
FF 2011-05-13 01:39:53 0
TC 2011-05-13 01:39:45 0
FI 2011-05-14 00:58:46 1
FF 2011-05-14 00:58:46 1
FI 2011-05-15 00:48:32
I have a data frame (attached) that has interpolated EOT errors for
each minute before flight landing. It also has the runway and an index
for the flight:
> > times[1:4,]
time error runway flight
10 -0.0220623504R 1
21 -0.0796163104R 1
32 -0.1379538004R
ror' not found
So I am confused.
On 6/7/2011 4:12 PM, Ista Zahn wrote:
Hi James,
Specify data = times in the qplot call and get rid of times$
everywhere. For example, do
pp2 = qplot(time, error, data = times)
pp2 + facet_wrap(~ runway)
On Tue, Jun 7, 2011 at
the qplot call and get rid of times$
everywhere. For example, do
pp2 = qplot(time, error, data = times)
pp2 + facet_wrap(~ runway)
On Tue, Jun 7, 2011 at 4:01 PM, James Rome wrote:
> I have a data frame (attached) that has interpolated EOT errors for
> each minute before flight la
I am using ggplot2 to make a boxplot that overlays a scatterplot:
pp = qplot(time, error, data=times, size=I(1), geom="jitter", main=title,
ylab="Error (min)", xlab="Time before ON (min)", alpha=I(1/10),
pp2 = pp + with(times, facet_wrap(~ run
print(pp2 + geom_boxplot(alpha=.5, color="blue",
outlier.colour="green", outlier.size=1))
But this ruins the boxplot--I get one box instead of a box at every minute.
On 6/8/2011 3:59 PM, Ista Zahn wrote:
Hi James,
It's hard for me to see where the problem might be. Pleas
I am trying to overlay raw data with a boxplot as follows:
pp = qplot(factor(time, levels=0:60, ordered=TRUE),
error, data=dfsub, size=I(1), main =" title", ylab="Error
xlab="Time before ON (min)", alpha=I(1/10),
ylim=c(-30,40),geom="jitter") +
In my main R program, I have
stuff to read data and calculate ads
sapply(ads,function(x) {doAirport(x, base)} )
And doAirport has
# analyze the flights for a given airport
doAirport = function(df, base) {
# Get rid of unused runway factor
df is a very large data frame with arrival estimates for many flights
(DF$flightfact) at random times (df$PredTime). The error of the estimate
is df$dt.
My problem is that I want to know the prediction error at each minute
before landing. This code works, but is very slow, and dominates
ndexing. A little more description of the problem you are
> trying to solve would also be useful. I tend to ask people "tell me
> what you want to do, not how you want to do it".
> On Sun, Jul 17, 2011 at 1:30 PM, James Rome wrote:
>>> df is a very large data fram
; [45,] 44 5.54579946 2 1
> [46,] 45 5.54986450 2 1
> [47,] 46 5.55392954 2 1
> [48,] 47 5.55799458 2 1
> [49,] 48 5.56205962 2 1
> [50,] 49 5.56612466 2 1
> [51,] 50 5.57018970 2 1
> [52
69 matches
Mail list logo