I'm writing a program to analyse the profiles of the 15500 users of my forum. I have the profiles as html files stored locally and I'm using ClientForm to extract the various details from the html form in each file.
My goal is to identify lurking spammers but also to learn how to better spot spammers by calculating statistical correlations in the data against known spammers. I need advise with how to organise my data. There are 50 fields in each profile, some fields will be much more use than others so I though about creating say 10 files to start off with that contained dictionaries of userid to field value. That way I'm dealing with 10 to 50 files instead of 15500. Also, I am inexperienced with using classes but eager to learn and wonder if they would be any help in this case. Any advise much appreciated and thanks in advance, Thomas _______________________________________________ Tutor maillist - [email protected] http://mail.python.org/mailman/listinfo/tutor
