Re: Corpus of Spam/Ham headers(Source IP) for research

Shivram Krishnan Wed, 29 Jun 2016 08:38:55 -0700

Hello Bill,

There has been enough research which has been done in this field were the
authors have obtained the data from network operators. This
<http://repository.upenn.edu/cgi/viewcontent.cgi?article=1962&context=cis_reports>
for
instance is a paper from UPenn, which has collected over 31 million Mail
Headers (not only IP address) to validate their method.


We are trying to get HAM/SPAM lists from different networks, to validate
our technique, which curates Blacklists for specific Network.




On Wed, Jun 29, 2016 at 8:02 AM, Bill Cole <
sausers-20150...@billmail.scconsult.com> wrote:

> On 29 Jun 2016, at 1:00, Shivram Krishnan wrote:
>
> Hello Bill,
>>
>> Thank you so much for your views. I agree that your customers would not
>> like it if you share information. But Oliver suggested , I need only the
>> source IP addresses of the Spam and Ham emails , which can even be
>> anonymized in the last octet.
>>
>> Will that still be a privacy concern?
>>
>
> No, but there would still be a data collection and preparation cost that
> is substantial and a fundamental study design problem: you have no controls
> for data validity or sampling issues.
>
> In total honesty: if your approach to this research has been cleared by
> your faculty advisor and not stopped, that advisor is either incompetent or
> is intentionally sabotaging you. You cannot gather a valid data set this
> way and the data you are asking for cannot even be verified to be anything
> other than pure invention. If your advisor does not see that, they are in
> the wrong profession.
>

Re: Corpus of Spam/Ham headers(Source IP) for research

Reply via email to