Download URLs of plain text files
Here are plaintext versions of our blacklists. The domain blacklist consists of two files, the 419 blacklist of one file:
- Domain blacklist base (spamvertised domains):
This file contains the bulk of the spam domains. It is updated infrequently and therefore need not be downloaded more than once a week. You must not download it more than once a day or your IP address may be blocked without notice:
https://joewein.net/dl/bl/dom-bl-base.txt
- Domain blacklist new (recently added spamvertised domains):
This file contains additions made during the last week or two only. You can download once per day or more, but we currently don't recommend intervals more frequent than hourly.
https://joewein.net/dl/bl/dom-bl.txt
- Email sender blacklist (419 scam and other spam senders):
https://joewein.net/dl/bl/from-bl.txt
MD5 checksums
The following very small files contain hash codes computed from the above files. You can download the following files every hour or even every 15 minutes (make sure your script works properly before you try this rate!) and then run "md5sum -c filename" on each one. If the checksum fails it means the corresponding data file has changed and it's time to download it as well. That way you will never download copies of the actual data files unless they have changed.
Blacklisting policy
We are aiming primarily at blacklisting domains that have no legitimate uses. There are a number of domains that have questionable privacy policies or no confirmed opt-in (closed loop) subscription process and are often reported as spam that we don't list, because some people do indeed subscribe to their sites.
The current blacklisting procedure has been in place since
December 2003. All entries added to the list before that
have been purged. Our false positive rate is less than one per month, which means an error rate below 0.01%. None of these have been widely used domains. Here are the main points about
our process:
- We are trying to be conservative in our blacklisting. We recognize that
false positives are far more painful and costly than false negatives. That
means: If in doubt, don't blacklist. Use built-in checks and double check
whatever you can.
- We don't blacklist on hearsay. Every entry is backed by at least one
evidence email originally sent to our mailboxes or to customers of an ISP
we're cooperating with. We recognize that there are Joe jobs, fake sender
addresses and innocent bystanders mentioned in spam. We make efforts to
detect these cases.
- In order to minimize false positives, we start out with a pre-selected
set of messages. Many of the mails we receive at our domains go to largely
or completely unused accounts that we don't sign up for anything.
Furthermore, unless these mails meet certain criteria, our spamfilter won't
even look at the embedded domain names. At our partner ISP every mail has to
reach a certain SpamAssassin score before our filter gets to take a look at it.
- Every mail then goes through our in-house spam filter, which extracts
domains names, makes WHOIS queries and together with other data about the
original mails, stores the information in a database. It sorts domains by
perceived spamminess, taking into account factors such as domain age,
registrar, supporting name servers, Spamhaus SBL records for related
servers, etc.
- Domains registered by a fixed small set of hardcore spammers such as for
many of the "OEM software" and pharmaceutical spams are automatically
detected and blacklisted.
- Other domains get sorted into several bins for manual inspection. This is
where it gets labour-intensive. We generally discard the least suspicious
domains because there's too much of the more interesting stuff.
- For the more suspicious ones we look at the reasons the filter didn't
like the mail, the sender, the subject, we check WHOIS info, the actual
message itself, we perform Google web searches, Google NANAS lookups, etc.
We look for signs mail for third parties might have been legitimate and
subscribed to or - the opposite - for signs of obfuscation to defeat
filters, in order to determine if it may be a legitimate newsletter or not.
This is not always easy if the recipient is a third party, but there are
certain patterns that can be detected.
- The older a domain, the more evidence we require to list it. SBL listings
are a strong indicator but not sole determinator of spamminess. We always
judge several factors in combination.
- We don't currently have a process for purging discarded spam domains, but
are working on that.
- If a listing is challenged, we provide information about the email that
triggered the listing, but without identifying the mailbox it was sent to.
If the listing appears to be because of a mistake or if we think it is unlikely
the domain will appear in spam again we remove the listing.