[R] Setting up 3D tensor product interactions in mgcv

2013-08-23 Thread Mark Payne
Hi,

I am trying to fit a smoothing model where there are three dimensions
over which I can smooth (x,y,z). I expect interactions between some,
or all, of these terms, and so I have set up my model as

mdl <- gam(PA ~ s(x) + s(y) + s(z) + te(x,y) + te(x,z) + te(y,z) +
te(x,y,z),...)

I have recently read about the ti(), "tensor product interaction
smoother", which takes care of these interaction terms elegantly and
does the nesting properly. The help file says "This is much better
than attempting the same thing with ‘s’or ‘te’ terms representing the
interactions (although mgcv does not forbid it)." There is a 2D
example there also. But I don't understand how I should set this up
for my 3D example. Do I simply replace the te's above with ti? Or is
there more to it than that?

Does anyone have experience with this, and can explain how I should do
it properly?

Mark

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Seasonal smoothing of data with large gaps (mgcv)

2012-10-26 Thread Mark Payne
Hi,

I have a set of measurements that are made on a daily basis over many
years. I would like to produce a *non-parametric* smooth of these data to
estimate the seasonal cycle - to achieve this, I have been using the cyclic
cubic splines from the mgcv package. This works superbly in most
situations, but not all.

The problem is that for various practical reasons the data is not available
all year around at some sites - in some cases only in summer for example.
The code below (hopefully) should give you an indication of the problem.

#Setup
set.seed(42)
n <- 500
#Generate some randomly distributed observations during (southern) summer
t <- rnorm(n,mean=0,sd=25)%%365
logchl <- cos(t/365*2*pi)^2 +rnorm(n,sd=0.025)
#Now add a seasonal GAM
library(mgcv)
mdl <- gam(logchl ~ s(t,bs="cc"),knots=list(t=c(0,365)))
#Plot
plot(t,logchl,xlab="Day of year",ylab="Log Chl",xlim=c(0,365),xaxs="i")
t.grid <- data.frame(t=seq(0,365,len=500))
lines(t.grid$t,predict(mdl,newdata=t.grid),lwd=2,col="red",xpd=NA)


When you run it, you can see that the spline smoother is extrapolating well
outside the range of the data during the data-sparse middle of the year.
This is, after all, consistent with how a spline is defined. But in my case
it causes a lot of problems - in some cases the extrapolations can be
really extreme e.g. reaching +/-1e10. This is a pain.

I agree that trying to produce an estimate of the seasonal cycle in this
case is a bit silly - I'm doing it more for consistency with the rest of
the analysis. Ideally I would like to produce something that combines the
features  of a gam with linear interpolation e.g. still smoothes where
there is data, but in the absence of data (or low data density regions),
essentially does a linear interpolation between the high-density regions.
Is there anyway that this can be achieved, either in mgcv or other
packages?

An alternative approach might be to make the weight given to an observation
proportional to the density of data in that region - I haven't tried this
as yet.

I'd love to hear peoples opinions these two approaches matter, or
suggestions for other approaches.

Best wishes,

Mark

List of approaches tried thus far

Removing all spline knots in the data-poor regions
Adaptive smoother bases ie s(t,bs="ad")
Loess smoothers
Non-parametric kernel regressions (e.g. ksmooth())

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Named list of data.frames to data.frame with names

2012-10-30 Thread Mark Payne
Hi,

I very frequently end up in a situation where I have a named list of
data.frames that I wish to combine. e.g.

l <- list(A=data.frame(x=rnorm(5),
y=rnorm(5)),
  B=data.frame(x=rnorm(3),y=rnorm(3)),
  C=data.frame(x=rnorm(4),y=rnorm(4)),
  D=data.frame(x=rnorm(7),y=rnorm(7)))

I would like to combine these data.frames into a single data.frame,
with the column-names. This is easy with rbind and do.call

l2 <- do.call(rbind,l)

However, I would also like l2 to contain a column containing the names
in the original list? Is there a more elegant way to do this than:

l2 <- do.call(rbind,l)
l2$name <- rep(names(l),times=sapply(l,nrow))


Mark

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] MRF smoothers in MGCV - specifying neighbours

2014-05-07 Thread Mark Payne
Hi,

Does anyone have an example of a Markov Random Field smoother (MRF) in MGCV
where they have specified the neighbourhood directly, rather than supplying
polygons? Does anyone understand how the rules should be? Based on the
columb example, I have setup my data set and neighbourhood like so:

> head(nb.l)
$`10/10`
[1] 135 155 153

$`10/2`
[1] 27  8  6

$`10/3`
[1] 48  7 28 26

$`10/4`
[1] 69 27 49 47

$`10/5`
[1] 48 70 68

$`10/7`
[1] 115  95  93

> head(obs)
x  y  truth x.idx y.idx xy.idx
24  1.4835147  0.8026673  2.36052041310  13/10
26  1.0452111  0.4673685  1.831674111 8   11/8
43  2.1514977 -0.2640058 -2.881202617 5   17/5
46  2.8473951  0.5445714  3.634779920 9   20/9
53  1.7983253 -0.6905912 -2.547398415 3   15/3
86 -0.1839814 -0.7824026 -0.5776616 5 25/2
>

but get the following error:

> mdl <- gam(truth ~
s(xy.idx,bs="mrf",xt=list(nb=nb.l)),data=obs,method="REML")
Error in smooth.construct.mrf.smooth.spec(object, dk$data, dk$knots) :
  mismatch between nb/polys supplied area names and data area names

However, there is a perfect match between the nb list names and the data
area names:
> all(levels(obs$xy.idx) %in% names(nb.l))
[1] TRUE
>

Any suggestions where to start?

Mark

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] MRF smoothers in MGCV - specifying neighbours

2014-05-08 Thread Mark Payne
Hi Roger and Simon,

Thanks for the replies. Simon's suggestion of an isolated or missing
neighbourhood doesn't hold either.

I've attached the code below - its my attempt to solve the FELSPLINE
sausage using mrf rather than a soap smoother. Its a bit convoluted, but
should run ok. I thought this would be a good starting example to get a
GMRF running, but then hit the problem mentioned. My attempts to track the
bug suggest that there is something wierd in the knots argument that is
being supplied to smooth.construct.mrf.smooth.spec() - but haven't come so
much further than that.

Code follows.

Mark

#GMRF Example
#Solves the classic FELSPINE problem using
#GMRF in mgcv. This example is a modified version of the
#example from smooth.construct.so.smooth.spec()
#in the mgcv package

rm(list=ls())
library(mgcv)

#Extract boundary
fsb <- fs.boundary()

#Create an underlying grid and evaluate the function on it
#Based on mgcv::fs.boundary() example
dx<-0.2;dy<-0.2#Grid steps
id.fmt <- "%i/%i"
x.vals <- seq(-1,4,by=dx)
y.vals<-seq(-1,1,by=dy)
grd <- expand.grid(x=x.vals,y=y.vals)
tru.mat <- matrix(fs.test(grd$x,grd$y),length(x.vals),length(y.vals))
grd$truth <- as.vector(tru.mat)
grd$x.idx <- as.numeric(factor(grd$x,x.vals))
grd$y.idx <- as.numeric(factor(grd$y,y.vals))
grd$xy.idx <- sprintf(id.fmt,grd$x.idx,grd$y.idx)
grd <- subset(grd,!is.na(truth))

## Simulate some fitting data, inside boundary...
n.samps<-250
x <- runif(n.samps)*5-1
y  <- runif(n.samps)*2-1
obs <- data.frame(x=x,y=y)
obs$truth <- fs.test(obs$x,obs$y,b=1)
obs$z <- obs$truth + rnorm(n.samps)*.3 ## add noise
pt.inside <- inSide(fsb,x=x,y=y) ## remove outsiders

## Associate observation with grid cell
obs$x.rnd <- round(obs$x/dx)*dx
obs$y.rnd <- round(obs$y/dy)*dy
obs$x.idx <- as.numeric(factor(obs$x.rnd,x.vals))
obs$y.idx <- as.numeric(factor(obs$y.rnd,y.vals))
obs$xy.idx <- sprintf(id.fmt,obs$x.idx,obs$y.idx)
obs$xy.idx <- factor(obs$xy.idx,levels=grd$xy.idx)

#Filter observations that are outside or don't have an associated grid cell
obs <- subset(obs,pt.inside & xy.idx %in% grd$xy.idx )

## plot boundary with truth and data locations
par(mfrow=c(1,2))
image(x.vals,y.vals,tru.mat,col=heat.colors(100),xlab="x",ylab="y")
contour(x.vals,y.vals,tru.mat,levels=seq(-5,5,by=.25),add=TRUE)
lines(fsb$x,fsb$y);
points(obs$x,obs$y,pch=3);

#Plot grid
plot(y~x,grd)
lines(fsb$x,fsb$y);

#Setup neighbourhood adjancey
nb <- grd[,c("x.idx","y.idx","xy.idx")]
nb$N <- factor(sprintf(id.fmt,nb$x.idx,nb$y.idx+1),levels=nb$xy.idx)
nb$S <- factor(sprintf(id.fmt,nb$x.idx,nb$y.idx-1),levels=nb$xy.idx)
nb$E <- factor(sprintf(id.fmt,nb$x.idx+1,nb$y.idx),levels=nb$xy.idx)
nb$W <- factor(sprintf(id.fmt,nb$x.idx-1,nb$y.idx),levels=nb$xy.idx)
nb.mat <- sapply(nb[,c("N","S","E","W")],as.numeric)
nb.l <- lapply(split(nb.mat,nb$xy.idx),function(x) x[!is.na(x)])

#Fit MRF gam
mdl <- gam(z ~ s(xy.idx,bs="mrf",xt=list(nb=nb.l)),data=obs,method="REML")










On 8 May 2014 15:15, Simon Wood  wrote:

> Hi Mark,
>
> I'm not sure what is happening here - there is no chance that nb.l
> contains a neighbourhood not in the levels of obs$xy.idx, I suppose? i.e. is
>
> all(names(nb.l)%in%levels(obs$xy.idx))
>
> also TRUE? Here is some code illustrating what nb should look like (and in
> response to Roger Bivand's suggestion I also tried this replacing all the
> labels with things like "x/y", but it still works).
>
>
> ## example mrf fit using polygons
> library(mgcv)
> ## Load Columbus Ohio crime data (see ?columbus for details and credits)
> data(columb)   ## data frame
> data(columb.polys) ## district shapes list
> xt <- list(polys=columb.polys) ## neighbourhood structure info for MRF
> par(mfrow=c(2,2))
> ## First a full rank MRF...
> b0 <- gam(crime ~ s(district,bs="mrf",xt=xt),data=columb,method="REML")
>
> ## same fit based on direct neighbour spec...
> nb <- mgcv:::pol2nb(columb.polys)$nb
> xt <- list(nb=nb)
> b <- gam(crime ~ s(district,bs="mrf",xt=xt),data=columb,method="REML")
>
> best,
> Simon
>
>
>
>
> On 08/05/14 01:58, Mark Payne wrote:
>
>> Hi,
>>
>> Does anyone have an example of a Markov Random Field smoother (MRF) in
>> MGCV
>> where they have specified the neighbourhood directly, rather than
>> supplying
>> polygons? Does anyone understand how the rules should be? Based on the
>> columb example, I have setup my data set and neighbourhood like so:
>>
>>  head(nb.l)
>>>
>> $`10/10`
>> [1] 135 155 153
>>
>> $`10/2`
>> [1] 27  8  6

[R] Negative binomial parameterisation in mgcv

2013-11-13 Thread Mark Payne
Dear R-help,

The negative binomial distribution has several different
parameterisations, but I can't seem to figure out what the exact one
used in mgcv's negbin family is? negbin() requires a theta argument,
but its not clear anywhere in the documentation (that I can find), how
this parameter should be interpreted - for example, can it be less
than 1?

Best wishes,

MArk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] scatter.smooth() line colour

2012-06-22 Thread Mark Payne
Hi David,

Thanks for the reply. I agree - it just seemed that it was something
fairly obvious that was missing from an otherwise handy little
function Would it be appropriate to file it as a "request for
improvement" in the bug tracking system?

Mark

On 22 June 2012 16:30, David L Carlson - dcarl...@tamu.edu
<+r-help+trevva+3b274928d0.dcarlson#tamu@spamgourmet.com> wrote:
> You are correct about scatter.smooth, but loess.smooth
> (listed along with scatter.smooth on the same
> help page) gives you the way to get what you want:
>
> x <- rnorm(25)
> y <- rnorm(25)
> plot(x, y)
> lines(loess.smooth(x,y), col="red", lty=2, lwd=2)
>
> --
> David L Carlson
> Associate Professor of Anthropology
> Texas A&M University
> College Station, TX 77843-4352
>
>
>> -Original Message-
>> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
>> project.org] On Behalf Of r-help.20.tre...@spamgourmet.com
>> Sent: Friday, June 22, 2012 6:04 AM
>> To: r-help@r-project.org
>> Subject: [R] scatter.smooth() line colour
>>
>> Hi,
>>
>> I really like the scatter.smooth() function - its extremeley useful.
>> However, as far as I understand it, there is no way to change the
>> properties of the smoothing line e.g. col, lty, lwd. The
>> scatter.smooth function accepts a ... argument, but these are not
>> passed to the plotting function that actually plots the smoother -
>> only to the function that plots the points. Could I please therefore
>> request that an argument be added to this function to give easier
>> control over line properties? e.g. line.par=list()
>>
>> Best wishes,
>>
>> Mark Payne
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Manually modifying an hclust dendrogram to remove singletons

2012-05-25 Thread Mark Payne
Hi,

Thanks for the replies - they have helped shaped my thinking and are
starting to push me in a better direction. Maybe I should explain a
little more about what I'm trying to achieve.

I am analysing satellite data across the global ocean, and am
interested in trying to classify areas of the ocean according to the
similarity between the pixels. Singletons in this case therefore
represent individual pixels that are different to the rest in terms of
the similarity metric, but aren't really all that interesting in terms
of the broad picture - I consider them "outliers" or "noise". However,
they are annoying when it comes to splitting up the dendrogram,
because I'm mainly interested in the reclassification of large areas
of ocean at each step, rather than changes in the similarity.

The dynamic tree-cut approach looks like a promising and sensible
solution to the problem - I'll see if I can get something out of it.

However,  this discussion has started me wondering how I can use the
spatial proximity of the pixels in the analysis - does anyone have any
insights? Can the WGCNA approach be used in such a context?

Best wishes,

Mark Payne

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.