On 3/17/2011 6:01 AM, phuong hanu wrote: > > Hi, > > I've just read ur post on nabble > > I just send this message to you to ask about one problem that I have to > solve now. I have a database of email in my linux virtual machine. this > table includes some fiedls such as ID, Spam, Data, Time, Sender_add, > sender_ip, sender_domain,.... > > since I do a project on automatic whitelist so the data preprocessing is > very important. My problem is that i still dont know how to generate a > database for my whitelist from that database because one domain can include > many IP addresses. My job is to group them all (by a script maybe). > > For example: gmail.com: 38.98.127.148, 74.125.46.29, 74.125.46.30, ..... > > From those pair of IP-domain, I have to find threshold to figure out which > IP is used for sending spam. threshold can be "3 days" (for example) because > spammers will just use IP to spread spams in such a short time. after > removing the illegal IP, we have final whitelist to apply in email sys > > so what i just want to care abt are sender_ip, and sender_domain. And when I > use mySQL command to list out the number of rows in the table, the result is > more than 46,000 rows >.< (SELECT sender_ip, sender_domain FROM emailsl;) > ---> i can not do it manually by see each line and note down the paper "what > domain" has "what IP" > > That why i just ask u for method to solve this pre-problem. This step in > data preprocessing is very important because it creats the DB for my > whitelist in any email sys. After that, i'll create plugin for SpamAssassin > to whitelist email sys automatically based on the list that i preprocessed > > What i'm having: email db, linux virtual machine, mySQL > > What i want: build db in which show the pairs of sender domain-legal IP > (cross out domains-illegal IPs based on threshold) > > Hope u see my point and help me abt that
That's a very open-ended question. And other than the comment about creating a plugin, is mostly off-topic here. Questions related to pulling data from your DB would probably be better asked in a mySQL forum. Once you get a bit farther and are trying to integrate with SA, we can help with that. In either case, try to ask simple, specific questions. You will get far more responses that way. -- Bowie