Re: (MATH-607) Current Multiple Regression Object

2011-07-09 Thread Greg Sterijevski
All, I pushed updated patches a couple of days ago. Please review. -Greg On Wed, Jul 6, 2011 at 1:38 PM, Ted Dunning wrote: > Indeed. > > On Wed, Jul 6, 2011 at 11:36 AM, Phil Steitz > wrote: > > > Up to whoever is doing > > the patching :) > > >

Re: (MATH-607) Current Multiple Regression Object

2011-07-06 Thread Ted Dunning
Indeed. On Wed, Jul 6, 2011 at 11:36 AM, Phil Steitz wrote: > Up to whoever is doing > the patching :) >

Re: (MATH-607) Current Multiple Regression Object

2011-07-06 Thread Phil Steitz
On 7/6/11 11:12 AM, Ted Dunning wrote: > In the simplest instance, an operator based solver might be able to > advertise an incremental accumulation interface if it maintains an internal > representation that could be used as a linear operator. As more data is > added, the internal representation

Re: (MATH-607) Current Multiple Regression Object

2011-07-06 Thread Ted Dunning
In the simplest instance, an operator based solver might be able to advertise an incremental accumulation interface if it maintains an internal representation that could be used as a linear operator. As more data is added, the internal representation could be augmented and then whatever form exist

Re: (MATH-607) Current Multiple Regression Object

2011-07-06 Thread Phil Steitz
On 7/6/11 10:37 AM, Greg Sterijevski wrote: > I like the following: > It is conceivable that the interface you designed could be a facade over a > lower level linear operator interface where that makes sense. If so, that > is great. > > Looking through commons there is public interface Decompositi

Re: (MATH-607) Current Multiple Regression Object

2011-07-06 Thread Greg Sterijevski
I like the following: It is conceivable that the interface you designed could be a facade over a lower level linear operator interface where that makes sense. If so, that is great. Looking through commons there is public interface DecompositionSolver. Perhaps an extension of this interface is wha

Re: (MATH-607) Current Multiple Regression Object

2011-07-06 Thread Ted Dunning
It isn't really necessary to commingle approaches at all. It is just nice to think about the alternatives at once to get better designs. It is conceivable that the interface you designed could be a facade over a lower level linear operator interface where that makes sense. If so, that is great.

Re: (MATH-607) Current Multiple Regression Object

2011-07-06 Thread Greg Sterijevski
I see. Why would it be a good reason to commingle functionality? Aside from diagnostics like condition numbers and maybe eigenvalues, these approaches don't seem to share much commonality. I could be wrong since my knowledge of Mahout style problems is a bit spotty. On Wed, Jul 6, 2011 at 11:34 AM

Re: (MATH-607) Current Multiple Regression Object

2011-07-06 Thread Ted Dunning
The other way that regression is done at scale is with a linear operator. This linear operator is often defined by the behavior of some external system that is not susceptible to incremental construction. A good example is a large text retrieval system. It would be useful to support that style i

Re: (MATH-607) Current Multiple Regression Object

2011-07-06 Thread Greg Sterijevski
At Ted's suggestion I looked at LSMR in mahout. In general, I have no complaints about the algorithm or how it is coded up. I have seen the algorithm in the PrimalDual solver that Micheal Saunders et al cooked up. I believe the solver is part of the COIN project. I have nothing but praises for it.

Re: (MATH-607) Current Multiple Regression Object

2011-07-05 Thread Ted Dunning
If it helps, there is a new LSMR implementation in Mahout. Steal it at will. We are moving to having at LinearOperator interface specifically for conjugate gradient methods like LSMR and power methods like Lanczos or stochastic projection decompositions. On Tue, Jul 5, 2011 at 7:08 PM, Greg Ster

(MATH-607) Current Multiple Regression Object

2011-07-05 Thread Greg Sterijevski
Hello All, I have an open issue with respect to improving the linear regression techniques in Commons. The current regression technique has some limitations. Before introducing new implementations, I thought it better to suggest two interfaces. One interface would define an api each regression imp