The web hosting publication by web hosting users for web hosting users
Update a Host / Editor Login
Search
Article News Host Business Name
DIRECTORY TOP RATING EDITOR REVIEW SEARCH HOSTING SHOWCASE BECOME AN EDITOR
RECENT NEWS
Pingdom Adds Business Monitoring Plan
Apptix Offers Hosted VoIP Products
Hostway Offers Free Server Setup
FWHN Offers 3 Discount Programs
Hosting Networking Site Launches
Infinera Names Strategic Materials VP
Egenera Hosts Virtualization Webinar
DRT Offers Euro Data Center Study
ARTICLES
Co-location Hosting
Dedicated Servers
Domain Names
E-Commerce / Merchant Accounts / Payment Gateways
Free Web Hosting
General Web Hosting
Hosting Software & Control Panels
Managed Web Hosting
Programming
Reseller Hosting
Running a Web Hosting Business
Search Engine Optimization
Specific Web Hosting Provider or Company
Technical & Security
Useful Website Tools
Virtual Private Servers
Web Design & Content
Website Marketing Campaign
SEARCH ARTICLES
WEBHOST DIRECTORY
By Location

By Category
Application Hosting
Collocation Hosting
Dedicated Servers
Domain Name Registration
Ecommerce Hosting
Free Web Hosting
Reseller Domain Name Registration
Reseller Hosting
Shared Web Hosting
Virtual Private Servers
By Function
Windows Web hosting
PHP Web Hosting
Mysql Web Hosting
ASP Web Hosting
MS SQL Server Web Hosting
Coldfusion Web Hosting
MS FrontPage Web Hosting
Ecommerce Web Hosting
Cheap/Discount Web Hosting
Personal Web Hosting
Domain Name Web Hosting
A-Z Listing
Enter web host domain:




Articles
  You are here : Home Articles Technical & Security
Preventing SPAM
Submitted by Hendry Matthew on | 207 reads
Preventing SPAM

Let us talk about spam. Yes, again!! Spam is mass-mail un-requested identical or nearly-identical email messages, particularly those containing advertising, which especially used when mail addresses have been culled from network traffic or databases without the consent of the recipients.
However, this time we are going to look at ways to prevent spam. First, we will see various methods to filter spam once it reaches the MTA. And then, about more robust technologies that kill spam before it enters the mail server.

Filtering Spam
Spam filtering is done using Rule based filtering methods and statistical methods. The Rule based methods analyze the presence of regular expressions or combinations of them to filter spam. On the other hand, statistical spam filtering is more robust and uses probability of occurrence of certain tokens in the mail.

Rule based Filtering - Server side
This is the basic method for filtering spam mails. The filter searches for specific words found in spam mails. The probability for false positives (marking legitimate mail as spam) is very high. So, it is best to use this method in combination with the methods described in the sections given below for better results. It is advisable to use the filtering at the server itself, so that you can avoid wasting bandwidth caused by transporting unwanted spam to your local inbox.
Using .procmailrc + Sendmail or Postfix
Both Sendmail and Postfix use procmail as Mail Delivery Agent. Procmail can be made use to filter spam using rules (recipes) defined in the procmailrc. A few examples are given below. These can be expanded to suit your environment.
:0
* ^Received:.*ispam.net
* ^From:.*spammer@nigerian-scam.com
/dev/null
It is an ugly way to filter, but will work if you often get spam from spam.net or from spammer@nigerian-scam.com. All mail from these sources will be redirected (saved) to /dev/null. You can redirect the spams to a regular file by replacing /dev/null by a file name.
:0
* ^Content-Type:.*multipart/
* 1^1 B ?? ^Content-Type:.*application/x-msdownload
* 1^1 B ?? ^Content-Type:.*name=.*.(exe|scr|pif|com|bat)
/dev/null
Send those lil virii to /dev/null. Let them live in the time space void forever.
:0
* ^(From|To|Sender|Reply-To):[    ]*.STRING-ADDED-BY-MAILSERVER
This will catch unqualified addresses as the unqualified addresses are used only by spammers. STRING-ADDED-BY-MAILSERVER must be replaced by the string added by the mail server to such mails.
:0
* ^Subject:.*REMOVE ME|
^Subject:.*viagra
dev/null
This is the simplest filter based on subject.
.procmailrc + Exim
When Exim is used as MTA, the filtering can be done by asking Exim to invoke procmail via .forward file. To enable this, add the following to .forward file.
|IFS=''&&p=`which procmail`
   && test -f $p && exec $p
      -Yf- || exit 75 #username
Once this is added to the .forward, the procmail recipes described above will start working.
Drawbacks of rule based filtering
•    Percentage of false positives is high (about 5-10%).
•    Filtering is mostly English specific
•    It is very difficult to automate the process and becomes impractical in large establishments.

Statistical method for filtering
Bayesian Filtering
Bayesian spam filters calculate the spam score of a message by assigning actual probability to the tokens found in the mail. The value assigned to a word may be positive or negative, which are remembered; and when the message is completely processed, these values are added up. The resulting value may be positive or negative. If this value is equal to or greater than the threshold value for a mail to be identified as spam, the mail will be marked as spam.
Unlike blind Rule based filters, Bayesian spam filtering learns from all mails it sees. As the time passes, system adapts itself to more efficient and mature system, and the number false positives also decreases.
In general, Bayesian filtering filters mail to good and bad, spam and non-spam, not only as spam. Hence, even if a ham comes with the words/tokens found in spam, it will still be recognized as ham based on the overall probability of tokens. The effect of "bad" tokens will be nullified by innocent tokens. This can be exploited by hackers by inserting innocent tokens into spams.
The advantages of statistical spam filtering given below.
1.    They are effective.
2.    They generate few false positives.
3.    They learn.
4.    They let each user define spam.
5.    They are hard to trick.
Spamassassin: The most popular Bayesian Filter
Spamassassin is developed by the Apache foundation and the latest version of spamassassin (version 3.0) uses Bayesian filtering to filter spam. Spamassassin comes in two main flavors: an On-demand scanner and a daemon. The former can be invoked every time a message comes in, and the latter continuously runs in memory and scans all the incoming messages. This article focuses on the latter approach.
Spamassassin is a complete set of tools which can prevent spam in various methods. Spamassassin comprises of three executables spamassassin, spamd ( perl scripts) and spamc (C program).
The perl scripts run in "tainted mode" due to security reasons. The C program spamc is intended to be called from other programs. The basic perl module associated with spamassassin is Mail::Spamassassin::Bayes (spam detector and markup Engine); and via plugins and addon modules, its functionality can be extended. Spamassassin comes with a Bayes algorithm which "learns" to recognize new spam on the basis of old messages (both spam and ham). This makes it possible for the software to automatically adapt and identify spam even in the absence of specific header or body tests.
A (automatic) white list system makes it easy to list e-mail addresses that you already know or verified as valid; messages from these senders are exempted from further filtering and directly get routed to your mailbox.
How it works?
Spamassassin works by performing a range of tests on all the messages it sees. A wide number of tests are provided, including checks to see if the sender address and IP, recipient address, message dates etc are valid, the message body contains any words from a list of forbidden words stored locally, if any of the sending servers are blacklisted, and so on. Each test added to a message's overall spam score and messages whose score exceeds a certain user-defined threshold are treated as spam and can be either deleted or marked with a special spam header for further processing by other programs.
Spamassassin can be configured at a mail server with a MTA like Postfix, Exim or Sendmail. In such a setup, the MTA accepts mail, passes it to the MDA and it gives spamassassin control over the mail before completing the processing. Spamassassin verifies the mail headers, Body of the message etc and makes modification in the header of the message or takes appropriate action based on the configuration file. Whenever a message comes in, spamassassin tries to scan it by parsing /etc/spamassassin.rules or $ENV{HOME}.'/.spamassassin/user_prefs. Finally, MDA delivers mail to the correct location.
Various actions that can be performed by Spamassassin are:
•    submit to a distributive spam detecting network,
•    mark as spam,
•    move to a separate spam box,
•    write the information collected in the form of tokens to various databases.
Spamassassin is designed in such a way that more plugins can be added to it very easily, and it can be combined with various mail clients like Mutt, Outlook express etc. In addition to the above mentioned functionalities, Spamassassin supports Hashcash and SPF; it can submit the mails marked as spam to various distributive spam filtering networks like Vipul's Razor and perform lookup in various DNSBL's etc. More about Hashcash and SPF are discussed in later sections.

Integrating with mail servers
In mail servers, where the same binary handles functionalities of both MTA and MDA, configuration is done in the main startup script of the SMTP daemon, e.g.: Exim. In the case of SMTP servers which make use of a MDA like procmail, we can forward the mails to the spamassassin.
E.g.: in the case of Postfix and sendmail add the following to .procmailrc of the user.
:0fw
| /usr/bin/spamc # correct path to spamc
But, this is a very basic configuration and impractical in servers with heavy traffic. We can use software called amavisd-new as an interface between the MTA and content filters (antivirus softwares, Spam filters etc). A combination of Postfix, amavisd-new and Spamassassin is claimed to be the best method to block both spam and viruses. More about the topic can be found here: http://www.linuxplanet.com/linuxplanet/tutorials/5561/3/.

Blacklists – RBL
It is practically no use. RBLs are in existence for years, but it never worked beyond a certain level. Spammers are too fast and smart to be blocked by RBLs. More over, it can be used as a weapon against web hosting service providers or companies by adding their IP or domains hosted in them to the black lists. The chance of generating false positives and causing harm is more with such black lists. So, we don't discuss it here.

Filtering is brain dead
I believe that spam "filtering" is a brain dead method to fight spam. It is far better to prevent the chances of spam than fighting against spam by filtering. This will save lot of network bandwidth and CPU time to process spam. It is believed that more than 30% of emails sent in a day are spam. Hence, in the following sections we will discuss about methods other than filtering to prevent spam.

HashCash
Hashcash is a denial-of-service counter measure tool. Its main current use is to help hashcash users avoid losing email due to content based and blacklist based anti-spam systems. Hashcash stamps each message with X-Hashcash: header and filtering systems and blacklists are encouraged to exempt mails with the valid stamp.
How it works?
The basic idea is that the clients must do some work before they can send mail (proof-of-work). They spend the proof of labor like money to get service. Hashcash creates the stamp similar to md5 sum, but it uses SHA1 to compute the stamp. The work required to compute the stamp can be made arbitrarily expensive (from fractions of a second to hours). The process described above is called minting. At the receivers end, the receiver can check stamps using the checking function; and if the proof-of-work value is too low or bogus, it simply rejects the mail. The validity of a stamp is by default set to 28 days; after this period, the stamp expires. This is very necessary, since if this is not enabled, the available pool of stamps will exhaust within a short period.
Since each mail requires considerable amount of work, it becomes very hard to send spam mails for the spammers, and as a result, the total number of spam decreases considerably.

SPF: Sender Policy Framework
This technology works by keeping a record of locations (IPs) from where a user sends mail. So, e-mail spoofing becomes almost impossible. Even if the spammers want to send mail, they have to use their own identity to do so. And by detecting the spam source, we can simply block them. SPF works by domains publishing reverse MX records to tell the world what machines send mail from the domain. When receiving a message from a domain, the recipient can check those records to make sure mail is coming from where it should be coming from. With SPF, the reverse MX records can be published by just one line in the DNS record.
How to do it?
Add a single DNS record of type TXT to your DNS record in the format given below.
domainname.com. TXT v=SPF_version_identifier default_ mechanism
This announces which computers are allowed to deliver e-mail from your domains. So, it is checked by the receiving SMTP server even before accepting the content (this significantly reduces bandwidth usage). Detailed information about configuration for various scenarios like shared hosting can be found at the implementer’s site: http://spf.pobox.com
The working of the SPF protocol can be described in 3 steps.
1.    A user sends mail from sender.com or a spammer forges from sender.com to a user at receiver.com.
2.    The SMTP server at receiver.com checks sender.com's SPF record.
3.    If the origin is not listed, receiver.com gives the message a fail.
SPF is probably not the end to Spam in total, but it might be the end to Spam as we know it today: coming in masses, spoofing e-mail addresses and most of all severely annoying.
Possible flaw in SPF
With the recent invention of techniques like invisible bullet proof hosting in which the certain hosting companies provide untraceable domains by providing dynamically changing, webspaces can bring down SPF to certain level. If the content filtering software installed in the server marks the incoming mail as spam and adds the IP of the origin to spam blocking lists, innocent end users will suffer. And, it may bring the entire Internet to its knees by corrupting the whole IP name space. (Note that SPF check pass doesn't mean that the mail is not spam)
An example: gmail.com
You can simply type "dig gmail.com txt" to see the SPF entries. The received mail contains Received-SPF: field in the header. A list of domains supporting SPF can be found here: http://personal.telefonica.terra.es/web/news/spf/

Domain Keys
This is a method proposed by Yahoo.com to fight against spam that is fairly easy to understand and there is no centralized authority, no need to change the existing protocols etc. The mail servers generate a public/private key pair and publish their public key as a part of their DNS record. Each outgoing mail is signed with the secret private key. Now, the signature can be used to verify that the mail is not forged. In this way, the presence or lack of a valid signature can be used to classify mail as spam and ham.
How it works?
A rough overview of the working of the domain keys is given below.
1.    Generate a private key/public key pair and add the public key in the TXT field of the DNS record.
2.    Each outgoing mail is signed with the private key (rsa-sha1), and this is added to the e-mail header.
3.    The receiving SMTP server verifies the sender by checking the signature with the public key available in the TXT field of the DNS record of the sending domain.
4.    If the signature is verified to be of the server public/private key pair, the message is accepted.
5.    If the test fails, the message is rejected.

An Example: gmail.com
Gmail launched Domain keys around 15th September 2004, and it is still in the beta stages. The headers of the mails coming from gmail.com look something like this:
DomainKey-Signature: a=rsa-sha1; c=nofws;
s=beta; d=gmail.com;
h=received:message-id:date:from:reply-to:to:subject:
mime-version:content-type:content-transfer-encoding;
b=UHWIvAn9.....jw5mJ7H+A
The various tokens in the header are:
•    s=beta shows the sender name. ie "beta" is the sender name.
•    d=gmail.com is the sending domain's name.
•    a=rsa-sha1 algorithm used to generate the key pair.
•    b=UHWIvA...7H+A is the signature of the message.
The following DNS query can be used to get the domain key information about a domain.
dig user._domainkey.domain.com TXT
The Domain key information about gmail can be found by issuing "dig beta._domainkey.gmail.com TXT".

Conclusion
These are some of the general methods used for spam prevention. New protocols like Sender ID from Microsoft are in development, and some time in the near future we can expect to live in a world without spam and spammers.

Addendum
1.    One can use Content filtering and Bayesian logic for CRM (Customer Relationship Management). A company usually manages communication via various mail addresses like info@company.com, careers@company.com etc. By using Bayesian filtering, we can avoid using multiple mail address or sorting mails manually based on the subject line etc. The system will require initial training; and after that, it will automatically do the work for us. Spam can be automatically deleted. If there are any unclassified mails, we can sort them manually.
2.    Popmail: is an excellent program which can be used at the client side to filter spam. The main advantage of the program is that it will perform filtering at the server and deletes spam from the server itself. This program can be used in combination with programs like fetchmail. The popmail project is hosted in Sourceforge.net.


ARTICLES | NEWS | DIRECTORY | TOP REVIEWS| TOP RATINGS| SEARCH | SHOWCASE | UPDATE A HOST
OUR EDITORS | CONTACT US | ADVERTISING | TERMS OF AGREEMENT
© Copyright 2006 , The Web Hosting Herald. All rights reserved.