PHP: Preventing Cross-Site Scripting Attacks

Fortunately, as easily as an XSS attack can carried out against an unprotected website, protecting against them are just as easy. Prevention must always be in your thoughts, though, even before you write a single line of code.

The first rule which needs to be “enforced” in any web environment (be it development, staging, or production) is never trust data coming from the user or from any other third party sources. This can’t be emphasized enough. Every bit of data must be validated on input and escaped on output. This is the golden rule of preventing XSS.

In order to implement solid security measures which prevents XSS attacks, we should be mindful of data validation, data sanitization, and output escaping.

Data Validation

Data validation is the process of ensuring that your application is running with correct data. If your PHP script expects an integer for user input, then any other type of data would be discarded. Every piece of user data must be validated when it is received to ensure it is of the corrected type, and discarded if it doesn’t pass the validation process.

If you wanted to validate a phone number, for example, you would discard any strings containing letters, because a phone number should consist of digits only. You should also take the length of the string into consideration. If you wanted to be more permissive, you could allow a limited set of special characters such as plus, parenthesis, and dashes which are often used in formatting phone numbers specific to your intended locale.

<?php
// validate a US phone number
if (preg_match('/^((1-)?\d{3}-)\d{3}-\d{4}$/'$phone)) {
echo $phone " is valid format.";
}

Data Sanitization

Data sanitization focuses on manipulating the data to make sure it is safe by removing any unwanted bits from the data and normalizing it to the correct form. For example, if you are expecting a plain text string as user input, you may want to remove any HTML markup from it.

<?php
// sanitize HTML from the comment
$comment strip_tags($_POST["comment"]);
?>

Sometimes, data validation and sanitization/normalization can go hand in hand.

<?php
// normalize and validate a US phone number
$phone = preg_replace('/[^\d]/'""$phone);
$len strlen($phone);
if ($len == 7 || $len == 10 || $len == 11) {
echo $phone " is valid format.";
}
?>

Output Escaping

In order to protect the integrity of displayed/output data, you should escape the data when presenting it to the user. This prevents the browser from applying any unintended meaning to any special sequence of characters that may be found.

<?php
// escape output sent to the browser
echo "You searched for: " . htmlspecialchars($_GET["query"]);

All Together Now!

To better understand the three aspects of data processing, let’s take another look at the file-based comment system from earlier and modify it to make sure it’s secure. The potential vulnerabilities in the code stem from the fact that $_POST["comment"] is blindly appended to thecomments.txt file which is then displayed directly to the user. To secure it, the$_POST["comment"] value should be validated and sanitized before it is added to the file, and the file’s contents should be escaped when displayed to the user.

<?php
// validate comment
$comment = trim($_POST["comment"]);
if (empty($comment)) {
 exit("must provide a comment");
}

// sanitize comment
$comment strip_tags($comment);
// comment is now safe for storage
file_put_contents("comments.txt"$comment, FILE_APPEND);
// escape comments before display
$comments file_get_contents("comments.txt");
echo htmlspecialchars($comments);

The script first validates the incoming comment to make sure a non-zero length string as been provided by the user. After all, a blank comment isn’t very interesting.

Data validation needs to happen within a well defined context, meaning that if I expect an integer back from the user, then I validate it accordingly by converting the data into an integer and handle it as an integer. If this results in invalid data, then simply discard it and let the user know about it.

Then the script sanitizes the comment by removing any HTML tags it may contain.

And finally, the comments are retrieved, filtered, and displayed.

Generally the htmlspecialchars() function is sufficient for filtering output intended for viewing in a browser. If you’re using a character encoding in your web pages other than ISO-8859-1 or UTF-8, though, then you’ll want to use htmlentities(). For more information on the two functions, read their respective write-ups in the official PHP documentation.

Bare in mind that no single solution exists that is 100% secure on a constantly evolving medium like the Web. Test your validation code thoroughly with the most up to date XSS test vectors. Using the test data from the following sources should reveal if your code is still prone to XSS attacks.