On Saturday, January 27, 2018 at 5:21:15 PM UTC-8, Dan Stromberg wrote: > On Sat, Jan 27, 2018 at 1:05 PM, qrious wrote: > > I am attempting to understand how scikit learn works for sentiment analysis > > and came across this blog post: > > > > https://marcobonzanini.wordpress.com/2015/01/19/sentiment-analysis-with-python-and-scikit-learn > > > > The corresponding code is at this location: > > > > https://gist.github.com/bonzanini/c9248a239bbab0e0d42e > > > > My question is while trying to predict, why does the curr_class in Line 44 > > of the code need a classification (pos or neg) for the test data? After > > all, am I not trying to predict it? Without any initial value of > > curr_class, the program has a run time error. > > I'm a real neophyte when it comes to modern AI, but I believe the > intent is to divide your inputs into "training data" and "test data" > and "real world data". > > So you create your models using training data including correct > classifications as part of the input. > > And you check how well your models are doing on inputs they haven't > seen before with test data, which also is classified in advance, to > verify how well things are working. > > And then you use real world, as-yet-unclassified data in production, > after you've selected your best model, to derive a classification from > what your model has seen in the past. > > So both the training data and test data need accurate labels in > advance, but the real world data trusts the model to do pretty well > without further labeling.
Dan, Thanks and I was also thinking along this line: 'So both the training data and test data need accurate labels in advance'. It makes sense to me. For this part: 'the real world data trusts the model to do pretty well without further labeling', the question is: how do I do this using sklearn library functions? Is there some code example for using the actual data that needs prediction? -- https://mail.python.org/mailman/listinfo/python-list