Setting: 200 input variables, 1 binary target variable. Run a principle component analysis on the data then use the output of the principle component analysis (the generated factors) as input into a neural network -but first having partitioned the pca data into training and testing sets so that a neural network model can be trained on the first partition and tested on the second.
I was told that it was not logically sound to include the target variable as an input into the principle component algorithm. Normally that sounds correct. You never want to include the target variable as an input variable in your model. However, I argued that it is ok here because I am only using the target variable to build the principle components the model. So each record now has a value for each of the principle components. Then take the training partition only to build the neural network. Then test the neural network on the testing partition. Is this wrong? -- View this message in context: http://www.nabble.com/logic-question-tp24369772p24369772.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.