On Oct 21, 2011, at 8:14 PM, Rich Shepard wrote:

On Fri, 21 Oct 2011, David Winsemius wrote:

The only variable in that dataframe with what appears to be a continuous
value (which is how I would expect "total dissolved solids" to be
measured) is "quant" Are you saying that the value of quant is measuring something with different units depending on the value of 'param' and that 'site' and 'date' shoud be used to identify associated measurements? This
would appear to be the case based on what you are saying below.

David,

 'Quant' is the measured concentration of the different chemicals
identified in 'param'. I want to plot (and model) the quant values
associated with 'TDS' and other chemicals, preferably from samples at the same location and date. Units are mg/L except for pH (standard units) and
specific conductance (microSiemens/cm).

What I'm not understanding is how to specify the 'quant' values for the
params 'TDS' and 'Cond' (for example) for an xyplot() or lm().

If this is so the problem is to break apart the dataframe by type of measurement ('param') butone way would be to split into separate dataframes then merge back together by an appropriate linkage on site and date. I'm guessing that 'stream' and 'basin' are superfluous for the matching and can be later associated with 'site'?

Yes, stream and basin are supersets of site. I used subset() to create separate dataframes from a set I called 'streamdata' (which aggregated the sites in an individual stream into one), but I'm not satisfied with how I did that and would rather learn to work with the overall 'chemdata' set.

The goal would be a dataframe with 7 renamed 'param' columns ('TDS', 'Cond', 'Mg', 'SO4', 'Cl', 'Na', and 'Ca') and two identifier columns ('site' and 'sampdate'. For the moment I would think you would want all the data together an not make any decisions about excluding NA values until you get an overall picture of the situation.

 I agree that's what I want.

The first thing I would try would be

with(subset(chemdata, param %in% c('TDS', 'Cond', 'Mg', 'SO4', 'Cl', 'Na', and 'Ca') , 1:4) ,
  xtabs(quant ~ site + sampdate + param) )

You would get 7 tables One for each 'param' with up to 143 rows and as many columns as you have sampdates.

This might be a good use for package reshape2 since it generally returns a dataframe. The above operation would return an array with 3 dimensions. You might get immediate success with something like:

dcast( subset(chemdata, param %in% c('TDS', 'Cond', 'Mg', 'SO4', 'Cl', 'Na', and 'Ca') , 1:4) ,
  site + sampdate ~ param)
# the omitted varialble name should ent up in the values columns

To do your testing it might be wise to apply more selective use of subset. Perhaps on;u go for a few sites and dates.

 OK. I need to read and increase my understanding of with() and learn
dcast(). May not get to all this over the weekend, but I'll be back with
results.

`with` is not what's doing the work. I just use `with` to simplify the code. It is like a local version of `attach`. Within the "parenthetical' enclosure of the `with` function you can refer to the (unquoted) column names as objects. I could have referred to them with chemdata[["param"]] instead of with(chemdata, ..... param ...)

These are all equivalent:

with(chemdata, table(site, param))

table(chemdata$site, chemdata$param)

table(chemdata[["site"]], chemdata[["site"]])

--
David.

Thanks very much,

Rich

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to