> On Jul 13, 2016, at 6:48 AM, stn021 <[email protected]> wrote:
>
> Hello,
>
> so here a numerical example in R-code. Code is appended below.
>
> The output should be
> 1) the numerical values of the abilities of the persons
> 2) the multiplyer
>
>
> Please note that
>
> 1) I have used non-linear optimization to solve this problem and got
> the expected result, though not with R but other software.
>
> 2) I have applied lm() to this problem, even before I posted the
> question. I am well aware of the syntax of formulas. I my last posting
> I wrote the formula "freehand" so I made the previously mentioned
> errors. Sorry about that.
>
>
>
> Unfortunately the formulas with I() as well as multiplying variables
> before running R does not work here. I() does not apply to factors (R
> tells me) and multiplying in advance also works only for continuous
> variables, not for factors, because there is no known numerical value
> to multiply.
>
> The latter is actually what my question is about, along with the
> question on how to get R to treat two columns as two instances of the
> same factor.
>
>
> Just to be sure I used R to check if the data really counts as a
> factor according to R-terminology. It really is a factor, see code
> below.
>
>
>
> This is the code for generating the example-data:
>
> # --------------------------------------------------------------- #
> pnames = c( "alice" , "bob" , "charlie" , "don" , "eve" , "freddy"
> , "grace" , "henry" )
> pcount = length( pnames )
>
> # abilities = runif( pcount )
> abilities = (1:pcount) / 10
>
> persons = data.frame( name = pnames , ability = abilities )
> persons
>
> # random subset of possible combinations and extra df
> combinations = combn( nrow( persons ) , 2 ) ;
> combinations = cbind( combinations,combinations,combinations,combinations )
> combinations = combinations[ , runif(ncol(combinations))<0.5 ]
> ccount = ncol( combinations )
>
> observed_data = data.frame(
> idx1 = combinations[1,]
> , idx2 = combinations[2,]
> , p1 = ( persons$name[ combinations[1,] ] )
> , p2 = ( persons$name[ combinations[2,] ] )
> )
>
> abilities_data = data.frame(
> a1 = persons$ability[ combinations[1,] ]
> , a2 = persons$ability[ combinations[2,] ]
> )
>
> # y = result of cooperation of each pair
> multiplyer = runif(1) + 1
> offset = 1
> cat( "multiplyer = " , multiplyer , "\n" )
> cat( "offset = " , offset , "\n" )
>
> y0 = multiplyer * ( offset - ( abilities_data$a1 - abilities_data$a2 ) ^ 2 )
> noise = .05 * rnorm( ccount )
>
> # check variables are really factors :
> str( observed_data$p1 )
> dput( observed_data$p1 )
>
> observed_data = data.frame( y = round( y0+noise,3 ) , observed_data )
> observed_data
>
> # --------------------------------------------------------------- #
Is this what is intended?
> observed_data$p1ab <- persons$ability[ match(observed_data$p1, persons$name) ]
> observed_data$p2ab <- persons$ability[ match(observed_data$p2, persons$name) ]
> head(observed_data)
y idx1 idx2 p1 p2 p1ab p2ab
1 1.149 1 6 alice freddy 0.1 0.6
2 1.006 1 7 alice grace 0.1 0.7
3 1.529 2 3 bob charlie 0.2 0.3
4 1.404 2 5 bob eve 0.2 0.5
5 1.205 2 6 bob freddy 0.2 0.6
6 1.187 2 7 bob grace 0.2 0.7
> lm( y ~ I( (p1ab -p2ab)^2 ), data=observed_data)
Call:
lm(formula = y ~ I((p1ab - p2ab)^2), data = observed_data)
Coefficients:
(Intercept) I((p1ab - p2ab)^2)
1.506 -1.435
> separate_term <- lm( y ~ I( (p1ab -p2ab)^2 ), data=observed_data)
> summary(separate_term)
Call:
lm(formula = y ~ I((p1ab - p2ab)^2), data = observed_data)
Residuals:
Min 1Q Median 3Q Max
-0.116249 -0.030996 0.002633 0.032765 0.136282
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.50589 0.01067 141.08 <2e-16 ***
I((p1ab - p2ab)^2) -1.43527 0.05863 -24.48 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.05304 on 44 degrees of freedom
Multiple R-squared: 0.9316, Adjusted R-squared: 0.93
F-statistic: 599.2 on 1 and 44 DF, p-value: < 2.2e-16
You could also have compared 2 models differing only with rest to the includion
of an interaction term that was the squared difference in abilities:
> full <- lm( y ~ p1ab + p2ab + I( (p1ab -p2ab)^2 ), data=observed_data)
> reduced <- lm( y ~ p1ab + p2ab , data=observed_data)
> anova(full,reduced)
Analysis of Variance Table
Model 1: y ~ p1ab + p2ab + I((p1ab - p2ab)^2)
Model 2: y ~ p1ab + p2ab
Res.Df RSS Df Sum of Sq F Pr(>F)
1 42 0.11823
2 43 0.17315 -1 -0.05492 19.509 6.892e-05 ***
--
David
>
>
> 2016-07-11 19:16 GMT+02:00 Jeff Newmiller <[email protected]>:
>> Your clarification is promising. A reproducible example is always
>> preferred, though never a guarantee. I expect to be somewhat preoccupied
>> this week so responses may be rather delayed, but the less setup we have to
>> the more likely that someone on the list will tackle it.
>>
>> Re an answer: If you can make the example simple enough that you can tell us
>> what the right numerical result will be, we will have a better chance of
>> understanding what you are after. E.g. if you start with a solution and use
>> it to create sample input data with then you don't need to actually solve it
>> to illustrate what you are after. [1]
>>
>> Note that I am not aware of any package dedicated to this type of problem,
>> so unless someone else responds otherwise then you will likely have to use
>> bootstrapping or your own statistical analysis (Bayesian?) of the result.
>>
>> [1]
>> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
>> --
>> Sent from my phone. Please excuse my brevity.
>>
>> On July 11, 2016 7:28:41 AM PDT, stn021 <[email protected]> wrote:
>>> Hello,
>>>
>>> thank you for the replies. Sorry about the html-email, I forgot.
>>> Should be OK with this email.
>>>
>>>
>>> Don't be fooled be the apparent simplicity of the problem. I have
>>> tried to reduce it to only a single relatively simple question.
>>>
>>> The idea here is to model cooperation of two persons. The model is
>>> about one specific aspect of that cooperation, namely that two persons
>>> with similar abilities may be able to produce better results that two
>>> very different persons.
>>>
>>> That is only one part of the model with other parts modeling for
>>> example the fact that of course two persons with a higher degree of
>>> ability will produce better results per se.
>>>
>>>
>>> It is not classic regression with factors. That can be easily done by
>>> something like lm( y ~ (p1-p2)^2 ).
>>>
>>> This expands to lm( y ~ p1^2 - 2*p1*p2 + p2^2 ). This contains a
>>> multiplicagtions and for lm() this implies interactions between the
>>> factor-levels and produces one parameter for each combination of
>>> factor-levels that occurs in the data. That is not what the question
>>> is about.
>>>
>>> Also p1 and p2 are different levels of the same factor, while for lm()
>>> it would be two different factors with different levels.
>>>
>>>
>>> As for the sensical part: this has a real world application therefore
>>> it makes sense.
>>>
>>> Also it is not so difficult to solve with non-linear optimization. I
>>> was hoping to be able to use R for that purpose because then the
>>> results could easily be checked with statistical tests.
>>>
>>> So my question is not "how to solve" but "how to solve with R".
>>>
>>>
>>> As for the excess degrees of freedom, in real observations there would
>>> of course be added noise due to either random variations or factors
>>> not included in the model. So to generate a more reality-conforming
>>> example I could add some random normal-distributed noise to the
>>> dependent variable y. I previously left that part out because to me it
>>> did not seem relevant.
>>>
>>>
>>> Would you like me to make a complete example dataset with more records
>>> and noise ?
>>>
>>>
>>> The answer I look for would be the numerical values of the
>>> factor-levels and numerical values for the multiplier (f) and the
>>> offset (o), with p1 and p2 given as names (here: persons) and y given
>>> as some level of achievement they reach by cooperating.
>>>
>>> y = f * ( o - ( p1 - p2 )^2 )
>>>
>>> Is that what you meant by "answer" ?
>>>
>>>
>>> THX
>>> stefan
>>>
>>>
>>>
>>>
>>> 2016-07-10 2:27 GMT+02:00 Jeff Newmiller <[email protected]>:
>>>>
>>>> I have seen less sensical questions.
>>>>
>>>> It would be nice if the example were a bit more complete (as in it
>>> should have excess degrees of freedom and an answer) and less like a
>>> homework problem (which are off topic here). It would of course also be
>>> helpful if the OP were to conform to the Posting Guide, particularly in
>>> respect to using plain text email.
>>>>
>>>> It looks like the kind of nonlinear optimization problem that
>>> evolutionary algorithms are often applied to. It doesn't look (to me)
>>> like a typical problem that factors get applied to in formulas though,
>>> because multiple instances of the same factor variable are present.
>>>> --
>>>> Sent from my phone. Please excuse my brevity.
>>>>
>>>> On July 9, 2016 4:59:30 PM PDT, Rolf Turner <[email protected]>
>>> wrote:
>>>>> On 09/07/16 20:52, stn021 wrote:
>>>>>> Hello,
>>>>>>
>>>>>> I would like to analyse a model like this:
>>>>>>
>>>>>> y = 1 * ( 1 - ( x1 - x2 ) ^ 2 )
>>>>>>
>>>>>> x1 and x2 are not continuous variables but factors, so the
>>>>> observation
>>>>>> contain the level.
>>>>>> Its numerical value is unknown and is to be estimated with the
>>> model.
>>>>>>
>>>>>>
>>>>>> The observations look like this:
>>>>>>
>>>>>> y x1 x2
>>>>>> 0.96 Alice Bob
>>>>>> 0.84 Alice Charlie
>>>>>> 0.96 Bob Charlie
>>>>>> 0.64 Dave Alice
>>>>>> etc.
>>>>>>
>>>>>> Each person has a numerical value. Here for example Alice = 0.2
>>> and
>>>>> Bob =
>>>>>> 0.4
>>>>>>
>>>>>> Then y = 0.96 = 1* ( 1- ( 0.2-0.4 ) ^ 2 ) , see first observation.
>>>>>>
>>>>>> How can this be done in R ?
>>>>>
>>>>>
>>>>> This question makes about as little sense as it is possible to
>>> imagine.
>>>>>
>>>>> cheers,
>>>>>
>>>>> Rolf Turner
>>>>
>>
>
> ______________________________________________
> [email protected] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA
______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.