Hi Shiva, Maybe you are interested in the following paper:
Learning when Training Data are Costly: The Effect of Class Distribution on Tree Induction. G. Weiss and F. Provost. Journal of Artificial Intelligence Research 19 (2003) 315-354. For validating the models in those enviroments, William Elazmeh, Nathalie Japkowicz, Stan Matwin. (2006). A Framework for Comparative Evaluation of Classifiers in the Presence of Class Imbalance. Proceedings of the third Workshop on ROC Analysis in Machine Learning, Pittsburgh, USA. Regards, Pedro -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Wensui Liu Sent: Wednesday, October 01, 2008 7:20 PM To: [EMAIL PROTECTED] Cc: r-help@r-project.org Subject: Re: [R] Bias in sample - Logistic Regression Hi, Shiva, The idea of reject inference is very simple. Let's assume a credit card environment. There are 100 applicants, out of which 50 will be approved and booked in. Therefore, we can only observe the adverse behavior, such as default and delinquency, of 50 booked accounts. Again, let's assume out of 50 booked cards, 5 are bad(default / delinquency). A normal thought is to build a model to "cherry pick" bad guys and then apply the same model to all applicants. However, we can only observed the behavior of the applicants booked, which is 50, but not all applicants, which is 100. Therefore, the model result looks better than what it is supposed to be. This is so-called 'sample bias'. The same thing can happen to healthcare or direct marketing as well. Luckily enough, many people have done some excellent work on this problem. Please do some readings by Heckman. Greene in NYU has paper in this area as well. And I believe there is also implementation in R. If you use SAS(large in industry), take a look at proc qlim. HTH. -- =============================== WenSui Liu Acquisition Risk, Chase Email : [EMAIL PROTECTED] Blog : statcompute.spaces.live.com =============================== [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.