Minimal IT logo and link to home page
Research, training, consultancy and software to reduce IT costs
Home | About | Newsletter | Contact
Previous | Next Printer friendly
22 March 2011

AntiSamy

By Andrew Clifford

AntiSamy is a very effective open-source library for making web sites more secure.

Cross-site scripting (XSS) describes a broad category of web site security problems, in which malicious code is inserted into web pages. The malicious code then runs on the user's browser under the same security profile as the original website. This can allow, for example, the malicious code to steal the user's password or cause other disruption.

XSS is a serious problem, and accounts for the majority of web-based security threats. One of the most notorious XSS attacks was the Samy worm inflicted on MySpace in October 2005, which infected 1,000,000 MySpace users in less than 24 hours.

The most important part of avoiding XSS attacks is to prevent users from adding JavaScript to pages, so that they can not get code to run in other users' browsers. For most websites, this is not a problem, because users do not need to enter any data. However, if sites allow users to enter content to be redisplayed on the website, such as comments, then there is a potential for problems.

There are various approaches to preventing users from adding JavaScript to content.

The easiest is simply to disallow users to enter anything other than plain text, and use standard HTML escape codes to represent any special characters in the content. This works well, but it means that your users can not enter any additional formatting on their content.

Another method is to use your own set of limited formatting codes, for example using simplified "bulletin-board code" markup such as *bold* and /italic/. This can work, but the codes are non-standard and it is difficult to do well.

Another option is to allow users to enter a limited set of HTML markup, and then to filter the HTML that they enter to remove any JavaScript or other malicious code. However, as the technical explanation of Samy demonstrates, there are all sorts of ways to get around filtering.

This was the background to AntiSamy, part of The Open Web Application Security Project (OWASP).

AntiSamy is an open source code library which you can add to web applications to filter user-provided HTML content and remove intricate XSS attacks such as Samy. The main version is written in Java, and a version is also available for .NET.

You only need a few lines of code to add AntiSamy to a web application. It takes a string of text as input, mends invalid HTML, and removes everything other than the allowed markup. It returns the filtered HTML as a string or as XML. It returns friendly error messages to help users understand how their input has been interpreted.

AntiSamy is configured using a policy file which describes what markup should be allowed. A selection of pre-built policy files are available.

The documentation for AntiSamy is brief, and I found it took a while to work out what other libraries it depends on. However, the code works very well, it is easy to use, and it is fast.

If you are responsible for websites that to take formatted content from users, then I recommend you look at AntiSamy.

Next: Caveat emptor

Subscription

Subscribe to RSS feed

Latest newsletter:
Magical metadata

We use the term "metadata-driven" to describe IT solutions in which functionality is defined in data. Taking this to the extreme can provide unparalleled levels of speed, simplicity and versatility.
Read full newsletter

System governance

System governance helps you implement high-quality systems, manage existing systems proactively, and improve failing systems.

Find out more