Re: [R] Using PCA to filter a series

David L Carlson Fri, 03 Oct 2014 08:35:40 -0700

You can reconstruct the data from the first component. Here's an example using 
singular value decomposition on the original data matrix:


> d <- cbind(d1, d2, d3, d4)
> d.svd <- svd(d)
> new <- d.svd$u[,1] * d.svd$d[1]

new is basically your cp1. If we multiply it by each of the loadings, we can 
create reconstructed values based on the first component:

> dnew <- sapply(d.svd$v[,1], function(x) new * x)
> round(head(dnew), 1)
      [,1]  [,2]  [,3]  [,4]
[1,] 119.3 134.1 135.7 134.6
[2,] 104.2 117.2 118.6 117.6
[3,] 109.7 123.3 124.8 123.8
[4,] 109.3 122.9 124.3 123.3
[5,] 105.8 119.0 120.4 119.4
[6,] 111.5 125.4 126.9 125.8
> head(d)
      d1  d2  d3  d4
[1,] 113 138 138 134
[2,] 108 115 120 115
[3,] 105 127 129 120
[4,] 103 127 129 120
[5,] 109 119 120 117
[6,] 115 126 126 123

> diag(cor(d, dnew))
[1] 0.9233742 0.9921703 0.9890085 0.9910287

Since you want a single variable to stand for all four, you could scale new to 
the mean:

> newd <- new*mean(d.svd$v[,1])
> head(newd)
[1] 130.9300 114.3972 120.3884 119.9340 116.1588 122.3983

-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352



-----Original Message-----
From: Jonathan Thayn [mailto:jth...@ilstu.edu] 
Sent: Thursday, October 2, 2014 11:11 PM
To: David L Carlson
Cc: r-help@r-project.org
Subject: Re: [R] Using PCA to filter a series

I suppose I could calculate the eigenvectors directly and not worry about 
centering the time-series, since they essentially the same range to begin with:

vec <- eigen(cor(cbind(d1,d2,d3,d4)))$vector
cp <- cbind(d1,d2,d3,d4)%*%vec
cp1 <- cp[,1]

I guess there is no way to reconstruct the original input data using just the 
first component, though, is there? Not the original data in it entirety, just 
one time-series that we representative of the general pattern. Possibly 
something like the following, but with just the first component:

o <- cp%*%solve(vec)

Thanks for your help. It's been a long time since I've played with PCA.

Jonathan Thayn




On Oct 2, 2014, at 4:59 PM, David L Carlson wrote:

> I think you want to convert your principal component to the same scale as d1, 
> d2, d3, and d4. But the "original space" is a 4-dimensional space in which 
> d1, d2, d3, and d4 are the axes, each with its own mean and standard 
> deviation. Here are a couple of possibilities
> 
> # plot original values for comparison
>> matplot(cbind(d1, d2, d3, d4), pch=20, col=2:5)
> # standardize the pc scores to the grand mean and sd
>> new1 <- scale(pca$scores[,1])*sd(c(d1, d2, d3, d4)) + mean(c(d1, d2, d3, d4))
>> lines(new1)
> # Use least squares regression to predict the row means for the original four 
> variables
>> new2 <- predict(lm(rowMeans(cbind(d1, d2, d3, d4))~pca$scores[,1]))
>> lines(new2, col="red")
> 
> -------------------------------------
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77840-4352
> 
> 
> 
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf Of Don McKenzie
> Sent: Thursday, October 2, 2014 4:39 PM
> To: Jonathan Thayn
> Cc: r-help@r-project.org
> Subject: Re: [R] Using PCA to filter a series
> 
> 
> On Oct 2, 2014, at 2:29 PM, Jonathan Thayn <jth...@ilstu.edu> wrote:
> 
>> Hi Don. I would like to "de-rotate� the first component back to its original 
>> state so that it aligns with the original time-series. My goal is to create 
>> a �cleaned�, or a �model� time-series from which noise has been removed. 
> 
> Please cc the list with replies. It�s considered courtesy plus you�ll get 
> more help that way than just from me.
> 
> Your goal sounds almost metaphorical, at least to me.  Your first axis 
> �aligns� with the original time series already in that it captures the 
> dominant variation
> across all four. Beyond that, there are many approaches to signal/noise 
> relations within time-series analysis. I am not a good source of help on 
> these, and you probably need a statistical consult (locally?), which is not 
> the function of this list.
> 
>> 
>> 
>> Jonathan Thayn
>> 
>> 
>> 
>> On Oct 2, 2014, at 2:33 PM, Don McKenzie <d...@u.washington.edu> wrote:
>> 
>>> 
>>> On Oct 2, 2014, at 12:18 PM, Jonathan Thayn <jth...@ilstu.edu> wrote:
>>> 
>>>> I have four time-series of similar data. I would  like to combine these 
>>>> into a single, clean time-series. I could simply find the mean of each 
>>>> time period, but I think that using principal components analysis should 
>>>> extract the most salient pattern and ignore some of the noise. I can 
>>>> compute components using princomp
>>>> 
>>>> 
>>>> d1 <- c(113, 108, 105, 103, 109, 115, 115, 102, 102, 111, 122, 122, 110, 
>>>> 110, 104, 121, 121, 120, 120, 137, 137, 138, 138, 136, 172, 172, 157, 165, 
>>>> 173, 173, 174, 174, 119, 167, 167, 144, 170, 173, 173, 169, 155, 116, 101, 
>>>> 114, 114, 107, 108, 108, 131, 131, 117, 113)
>>>> d2 <- c(138, 115, 127, 127, 119, 126, 126, 124, 124, 119, 119, 120, 120, 
>>>> 115, 109, 137, 142, 142, 143, 145, 145, 163, 169, 169, 180, 180, 174, 181, 
>>>> 181, 179, 173, 185, 185, 183, 183, 178, 182, 182, 181, 178, 171, 154, 145, 
>>>> 147, 147, 124, 124, 120, 128, 141, 141, 138)
>>>> d3 <- c(138, 120, 129, 129, 120, 126, 126, 125, 125, 119, 119, 122, 122, 
>>>> 115, 109, 141, 144, 144, 148, 149, 149, 163, 172, 172, 183, 183, 180, 181, 
>>>> 181, 181, 173, 185, 185, 183, 183, 184, 182, 182, 181, 179, 172, 154, 149, 
>>>> 156, 156, 125, 125, 115, 139, 140, 140, 138)
>>>> d4 <- c(134, 115, 120, 120, 117, 123, 123, 128, 128, 119, 119, 121, 121, 
>>>> 114, 114, 142, 145, 145, 144, 145, 145, 167, 172, 172, 179, 179, 179, 182, 
>>>> 182, 182, 182, 182, 184, 184, 182, 184, 183, 183, 181, 179, 172, 149, 149, 
>>>> 149, 149, 124, 124, 119, 131, 135, 135, 134)
>>>> 
>>>> 
>>>> pca <- princomp(cbind(d1,d2,d3,d4))
>>>> plot(pca$scores[,1])
>>>> 
>>>> This seems to have created the clean pattern I want, but I would like to 
>>>> project the first component back into the original axes? Is there a simple 
>>>> way to do that?
>>> 
>>> Do you mean that you want to scale the scores on Axis 1 to the mean and 
>>> range of your raw data?  Or their mean and variance?
>>> 
>>> See
>>> 
>>> ?scale
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Jonathan B. Thayn
>>>>       
>>>> 
>>>> ______________________________________________
>>>> R-help@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide 
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>>> Don McKenzie
>>> Research Ecologist
>>> Pacific WIldland Fire Sciences Lab
>>> US Forest Service
>>> 
>>> Affiliate Professor
>>> School of Environmental and Forest Sciences 
>>> College of the Environment
>>> University of Washington
>>> d...@uw.edu
>> 
> 
> Don McKenzie
> Research Ecologist
> Pacific WIldland Fire Sciences Lab
> US Forest Service
> 
> Affiliate Professor
> School of Environmental and Forest Sciences 
> College of the Environment
> University of Washington
> d...@uw.edu
> 
> 
>         [[alternative HTML version deleted]]
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using PCA to filter a series

Reply via email to