Re: [R] LM with summation function

Peter Ehlers Tue, 22 May 2012 18:38:47 -0700

Robbie,

Here's what I *think* you are trying to do:


1.
y is a cubic function of x:

  y = b1*x + b2*x^2 + b3*x^3

2.
s is the cumsum of y:

  s_i = y_1 + ... + y_i

3.
Given a subset of x = 1:n and the corresponding
values of s, estimate the coefficients of the cubic.

If that is the correct understanding, then you should
be able to estimate the coefficients as follows:

a) since s_i = b1 * sum of x_k for k=1, ..., i
               + b2 * sum of (x_k)^2 for k=1, ..., i
               + b3 * sum of (x_k)^3 for k=1, ..., i

we can regress s on the cumsums of x, x^2 and x^3:

using your sample data:
  d <- data.frame(x = c(1, 4, 9, 12),
                  s = c(109, 1200, 5325, 8216))

  e <- data.frame(x = 1:12)
  e <- merge(e, d, all.x = T)
  e <- within(e,
             {z3 <- cumsum(x^3)
              z2 <- cumsum(x^2)
              z1 <- cumsum(x)})

  coef(lm(s ~ 0 + z1 + z2 + z3, data = e))

#  z1  z2  z3
# 100  10  -1


Peter Ehlers

On 2012-05-22 09:43, Robbie Edwards wrote:

I don't think I can.

For the sample data

  d<- data.frame(x=c(1, 4, 9, 12), s=c(109, 1200, 5325, 8216))

when x = 4, s = 1200.  However, that s4 is sum of y1 + y2 + y3 + y4.
  Wouldn't I have to know the y for x = 2 and x = 3 to get the value of y
for x = 4?

In the previous message, I created two sample data frames.  d is what I'm
trying to use to create df.  I only know what's in d, df is just used to
illustrate what I'm trying to get from d.

robbie





On Tue, May 22, 2012 at 12:30 PM, R. Michael Weylandt<
michael.weyla...@gmail.com>  wrote:

But if I understand your problem correctly, you can get the y values
from the s values. I'm relying on your statement that "s is sum of the
current y and all previous y (s3 = y1 + y2 + y3)." E.g.,

y<- c(1, 4, 6, 9, 3, 7)

s1 = 1
s2 = 4 + s1 = 5
s3 = 6 + s2 = 11

more generally

s<- cumsum(y)

Then if we only see s, we can get back the y vector by doing

c(s[1], diff(s))

which is identical to y.

So for your data, the underlying y must have been c(109, 1091, 4125,
2891) right?

Or have I completely misunderstood your problem?

Michael

On Tue, May 22, 2012 at 12:25 PM, Robbie Edwards
<robbie.edwa...@gmail.com>  wrote:

Actually, I can't.  I don't know the y values.  Only the s and only for a
subset of the data.

Like this.

d<- data.frame(x=c(1, 4, 9, 12), s=c(109, 1200, 5325, 8216))



On Tue, May 22, 2012 at 11:57 AM, R. Michael Weylandt
<michael.weyla...@gmail.com>  wrote:


You can reconstruct the y values by taking first-differences of the s
vector, no? Then it sounds like you're good to go

Best, Michael

On Tue, May 22, 2012 at 11:40 AM, Robbie Edwards
<robbie.edwa...@gmail.com>  wrote:

Hi all,

Thanks for the replies, but I realize I've done a bad job explaining

my

problem.  To help, I've created some sample data to explain the

problem.


df<- data.frame(x=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), y=c(109,
232,
363, 496, 625, 744, 847, 928, 981, 1000, 979, 912), s=c(109, 341, 704,
1200, 1825, 2569, 3416, 4344, 5325, 6325, 7304, 8216))

In this data frame, y results from y = x * b1 + x^2 * b2 + x^3 * b3

and

s
is sum of the current y and all previous y (s3 = y1 + y2 + y3).

I know I can find b1, b2 and b3 using:
lm(y ~ 0 + x + I(x^2) + I(x^3), data=df)

yielding...
Coefficients:
     x  I(x^2)  I(x^3)
   100      10      -1

However, I need to find b1, b2 and b3 using the s column.  The reason
being, I don't actually know the values of y in the actual data set.
  And
in the actual data, I only have a few of the values.  Imagine this

data

is
being used a reward schedule for like a loyalty points program.  y
represents the number of points needed for each level while s is the
total
number of points to reach that level.  In the real problem, my data
looks
more like this:

d<- data.frame(x=c(1, 4, 9, 12), s=c(109, 1200, 5325, 8216))

Where I need to use a few sample points to help define the parameters

of

the curve.

thanks again and hopefully this makes the problem a bit clearer.

robbie



On Fri, May 18, 2012 at 7:40 PM, David Winsemius
<dwinsem...@comcast.net>wrote:


On May 18, 2012, at 1:44 PM, Robbie Edwards wrote:

  Hi all,


I'm trying to model some data where the y is defined by

y = summation[1 to 50] B1 * x + B2 * x^2 + B3 * x^3

Hopefully that reads clearly for email.

cumsum( rowSums( cbind(B1 * x,  B2 * x^2, B3 * x^3)))



  Anyway, if it wasn't for the summation, I know I would do it like

this


lm(y ~ x + x2 + x3)

Where x2 and x3 are x^2 and x^3.

However, since each value of x is related to the previous values of

x,

I
don't know how to do this.  Any help is greatly appreciated.


David Winsemius, MD
West Hartford, CT


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] LM with summation function

Reply via email to