[SAtalk] Distributed HTTP server blocklist system [dhttp-bl]

Matthew Cline Fri, 02 Jan 2004 19:01:41 -0800

An interesting proprosal for a distributed blocklist system.  Text found at 
http://www.sysdesign.ca/dhttp-bl.txt


=-=-=-=-=-=-=-=-=

Distributed HTTP server blocklist system [dhttp-bl]
Copyright (C) 2003 Jem Berkes

Posted 2003-09-29

It's clear from spammers' distaste of (and aggression towards) blocklists
that these things really are effective in blocking spam :)

I'm thinking that the future successful blocklists will be distributed
in nature. I've read some good ideas so far, like using existing USENET
infrastructure or moderated mailing lists. But I had this thought and
wanted to share it, and see what people think.

NOTE: this is simply a data distribution system. The maintainers can be
the same people we have today, and they still have to add/remove listings
by getting feedback through the Internet via some other means. Also, this
infrastructure could support multiple distinct blocklists, each with its
own service ID. So one strong infrastructure could support different
flavours of blocklists (e.g. one run by the spamhaus folk, another by
SPEWS) and people could participate in whatever network they wish.

==== Thinking aloud: a distributed HTTP server BL system ====

A. Features
---
+ one authority retains full control of list, without investing resources
+ efficient, built in caching, won't burden existing USENET or mail lists
+ uses existing HTTP servers, which are easy to set up
+ anybody can contribute by running the CGI, even small home user
+ HUGE total capacity to serve and grow
+ very resistant to DDOS attacks
+ completely resistant to poisoning
+ some elements of bittorrent, freenet, gnutella

B. The <entities> that make up the system:
---
1) A select few "maintainers" that will make ALL decisions about what
netblocks will be listed. Since these people have to convince others to
participate in their project, they have to be trustworthy and already
well known. They will widely distribute their PGP public keys.

2) A large number of "participants". Each runs the system's software on
their private/corporate/educational HTTP servers. These people may be
friendly or malicious.

3) The basic data unit, a "package" that stores the blocklist for a
specific class B (a.b.*.*), and is PGP signed by a maintainer. I estimate
that such a compressed, signed file might be about 10KB -- a nice unit
to throw around. So for instance, to see if 200.60.243.224 is listed one
must seek out the 200.60 package, and examine the contents. They might
then find that 200.60.243.100/24 is listed.

C. Jobs for each participant's HTTP server:
---
o Stores the maintainers' PGP public keys
o Stores a few other participants' URLs
o Stores up to X packages, covering some % of IPv4 address space
o Answers public queries with the appropriate package if available
o Refers public queries to other participants if package unavailable
o Accept an incoming newer package, only if signature is valid
o Always drop stale packages in favour of newer packages
o Expire packages with time (TTL)

D. How a user queries the BL
---
Ideally, a user wishing to make use of the BL would also be a participant.
Here is how the user/participant would query the BL status of any given IP:

1) From the first two octets of IP, determine the package required.
   Note that there are 2^16 ~= 64,000 unique packages in total.
2) Check local storage for package, maybe we already have the data
3) Query other participants' URLs for the package (may get referred)
4) Verify signature on downloaded package, and store (cache) it locally

You can see from the relatively small number of unique packages, and the
referring nature of queries that it doesn't take long to find a package.
Once found, that package is cached. Optimizations can let participants
keep track of who to query in the future (like freenet...)

E. How maintainers update BL data
---
The maintainers, who privately maintain the master list will release
updated data packages signed with their keys. The maintainers can inject
the new data into the system by uploading their data to any participant.
Updated blocklists can even be sent using dial-up modem; it doesn't matter.

Participants will accept only this valid data because of the PGP signatures.
Newer data invalidates any older versions of the package, and the new data
will propagate throughout the distributed network.

F. Initial deployment
---
Current well known blocklist maintainers would come forward and say they
will distribute their lists via dhttp-bl, posting a service ID and their
PGP public keys. Internet users and admins who want to participate in this
person's blocklist/service then configure their HTTP server to run the
necessary CGI scripts, using the service ID and maintainers' keys. These
participants with working installations can then advertise their URLs
anywhere (search engines, USENET, mailing lists). Maintainers will start
uploading their blocklist, partitioned into multiple packages, across many
participants. Lesser known participants will pick up the data as necessary.

G. DDoS response scenario
---
Let's say that the 10 most popular high-bandwidth participants (that
everyone uses by default install :) get DDoS'd out of existence. The mildly
worried system admin searches google or phones a couple friends, and finds
out of any other known participants. That's it -- because all participants
store equally reliable data. And a participant who has limited resources
might fall back to the task of sending referrals to the numerous lesser
known participants, which is also a useful job.

H. Extensions
---
Although the unit package I'm suggesting is identified by the first two
octets of IP, other keyed approaches that can segment address space would
work equally well. For example, package id = first N bits of hash of IP

Participants could run local DNSBL front-ends to the networked database
to use with MTAs or within their organization.

Participants can locally store as much or little of the total blocklist
as they want. A major ISP can store 100% of the blocklist meaning that BL
lookups would be instant unless a package is outdated.



-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

[SAtalk] Distributed HTTP server blocklist system [dhttp-bl]

Reply via email to