On Oct 21, 2011, at 8:14 PM, Rich Shepard wrote:
On Fri, 21 Oct 2011, David Winsemius wrote:
The only variable in that dataframe with what appears to be a
continuous
value (which is how I would expect "total dissolved solids" to be
measured) is "quant" Are you saying that the value of quant is
measuring
something with different units depending on the value of 'param'
and that
'site' and 'date' shoud be used to identify associated
measurements? This
would appear to be the case based on what you are saying below.
David,
'Quant' is the measured concentration of the different chemicals
identified in 'param'. I want to plot (and model) the quant values
associated with 'TDS' and other chemicals, preferably from samples
at the
same location and date. Units are mg/L except for pH (standard
units) and
specific conductance (microSiemens/cm).
What I'm not understanding is how to specify the 'quant' values for
the
params 'TDS' and 'Cond' (for example) for an xyplot() or lm().
If this is so the problem is to break apart the dataframe by type
of measurement ('param') butone way would be to split into separate
dataframes then merge back together by an appropriate linkage on
site and date. I'm guessing that 'stream' and 'basin' are
superfluous for the matching and can be later associated with 'site'?
Yes, stream and basin are supersets of site. I used subset() to
create
separate dataframes from a set I called 'streamdata' (which
aggregated the
sites in an individual stream into one), but I'm not satisfied with
how I
did that and would rather learn to work with the overall 'chemdata'
set.
The goal would be a dataframe with 7 renamed 'param' columns
('TDS', 'Cond', 'Mg', 'SO4', 'Cl', 'Na', and 'Ca') and two
identifier columns ('site' and 'sampdate'. For the moment I would
think you would want all the data together an not make any
decisions about excluding NA values until you get an overall
picture of the situation.
I agree that's what I want.
The first thing I would try would be
with(subset(chemdata, param %in% c('TDS', 'Cond', 'Mg', 'SO4',
'Cl', 'Na', and 'Ca') , 1:4) ,
xtabs(quant ~ site + sampdate + param) )
You would get 7 tables One for each 'param' with up to 143 rows and
as many columns as you have sampdates.
This might be a good use for package reshape2 since it generally
returns a dataframe. The above operation would return an array with
3 dimensions. You might get immediate success with something like:
dcast( subset(chemdata, param %in% c('TDS', 'Cond', 'Mg', 'SO4',
'Cl', 'Na', and 'Ca') , 1:4) ,
site + sampdate ~ param)
# the omitted varialble name should ent up in the values columns
To do your testing it might be wise to apply more selective use of
subset. Perhaps on;u go for a few sites and dates.
OK. I need to read and increase my understanding of with() and learn
dcast(). May not get to all this over the weekend, but I'll be back
with
results.
`with` is not what's doing the work. I just use `with` to simplify the
code. It is like a local version of `attach`. Within the
"parenthetical' enclosure of the `with` function you can refer to the
(unquoted) column names as objects. I could have referred to them with
chemdata[["param"]] instead of with(chemdata, ..... param ...)
These are all equivalent:
with(chemdata, table(site, param))
table(chemdata$site, chemdata$param)
table(chemdata[["site"]], chemdata[["site"]])
--
David.
Thanks very much,
Rich
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.