|Research, training, consultancy and software to reduce IT costs|
AntiSamy is a very effective open-source library for making web sites more secure.
Cross-site scripting (XSS) describes a broad category of web site security problems, in which malicious code is inserted into web pages. The malicious code then runs on the user's browser under the same security profile as the original website. This can allow, for example, the malicious code to steal the user's password or cause other disruption.
XSS is a serious problem, and accounts for the majority of web-based security threats. One of the most notorious XSS attacks was the Samy worm inflicted on MySpace in October 2005, which infected 1,000,000 MySpace users in less than 24 hours.
The easiest is simply to disallow users to enter anything other than plain text, and use standard HTML escape codes to represent any special characters in the content. This works well, but it means that your users can not enter any additional formatting on their content.
Another method is to use your own set of limited formatting codes, for example using simplified "bulletin-board code" markup such as *bold* and /italic/. This can work, but the codes are non-standard and it is difficult to do well.
AntiSamy is an open source code library which you can add to web applications to filter user-provided HTML content and remove intricate XSS attacks such as Samy. The main version is written in Java, and a version is also available for .NET.
You only need a few lines of code to add AntiSamy to a web application. It takes a string of text as input, mends invalid HTML, and removes everything other than the allowed markup. It returns the filtered HTML as a string or as XML. It returns friendly error messages to help users understand how their input has been interpreted.
AntiSamy is configured using a policy file which describes what markup should be allowed. A selection of pre-built policy files are available.
The documentation for AntiSamy is brief, and I found it took a while to work out what other libraries it depends on. However, the code works very well, it is easy to use, and it is fast.
If you are responsible for websites that to take formatted content from users, then I recommend you look at AntiSamy.Next: Caveat emptor
Minimal IT: research, training, consultancy and software to reduce IT costs.