Fortunately, as easily as an XSS attack can carried out against an unprotected website, protecting against them are just as easy. Prevention must always be in your thoughts, though, even before you write a single line of code.
The first rule which needs to be “enforced” in any web environment (be it development, staging, or production) is never trust data coming from the user or from any other third party sources. This can’t be emphasized enough. Every bit of data must be validated on input and escaped on output. This is the golden rule of preventing XSS.
In order to implement solid security measures which prevents XSS attacks, we should be mindful of data validation, data sanitization, and output escaping.
Data Validation
Data validation is the process of ensuring that your application is running with correct data. If your PHP script expects an integer for user input, then any other type of data would be discarded. Every piece of user data must be validated when it is received to ensure it is of the corrected type, and discarded if it doesn’t pass the validation process.
If you wanted to validate a phone number, for example, you would discard any strings containing letters, because a phone number should consist of digits only. You should also take the length of the string into consideration. If you wanted to be more permissive, you could allow a limited set of special characters such as plus, parenthesis, and dashes which are often used in formatting phone numbers specific to your intended locale.
<?php // validate a US phone numberif
(preg_match(
'/^((1-)?\d{3}-)\d{3}-\d{4}$/'
,
$phone
)) {
echo
$phone
.
" is valid format."
;
}
Data Sanitization
Data sanitization focuses on manipulating the data to make sure it is safe by removing any unwanted bits from the data and normalizing it to the correct form. For example, if you are expecting a plain text string as user input, you may want to remove any HTML markup from it.
<?php // sanitize HTML from the comment$comment
=
strip_tags
(
$_POST
[
"comment"
]);
?>
Sometimes, data validation and sanitization/normalization can go hand in hand.
<?php // normalize and validate a US phone number$phone
= preg_replace(
'/[^\d]/'
,
""
,
$phone
);
$len
=
strlen
(
$phone
);
if
(
$len
== 7 ||
$len
== 10 ||
$len
== 11) {
echo
$phone
.
" is valid format."
;
} ?>
Output Escaping
In order to protect the integrity of displayed/output data, you should escape the data when presenting it to the user. This prevents the browser from applying any unintended meaning to any special sequence of characters that may be found.
<?php // escape output sent to the browserecho
"You searched for: "
. htmlspecialchars(
$_GET
[
"query"
]);
All Together Now!
To better understand the three aspects of data processing, let’s take another look at the file-based comment system from earlier and modify it to make sure it’s secure. The potential vulnerabilities in the code stem from the fact that $_POST["comment"]
is blindly appended to thecomments.txt
file which is then displayed directly to the user. To secure it, the$_POST["comment"]
value should be validated and sanitized before it is added to the file, and the file’s contents should be escaped when displayed to the user.
<?php // validate comment$comment
= trim(
$_POST
[
"comment"
]);
if
(
empty
(
$comment
)) {
exit
(
"must provide a comment"
);
} // sanitize comment$comment
=
strip_tags
(
$comment
);
// comment is now safe for storagefile_put_contents
(
"comments.txt"
,
$comment
, FILE_APPEND);
// escape comments before display$comments
=
file_get_contents
(
"comments.txt"
);
echo
htmlspecialchars(
$comments
);
The script first validates the incoming comment to make sure a non-zero length string as been provided by the user. After all, a blank comment isn’t very interesting.
Data validation needs to happen within a well defined context, meaning that if I expect an integer back from the user, then I validate it accordingly by converting the data into an integer and handle it as an integer. If this results in invalid data, then simply discard it and let the user know about it.
Then the script sanitizes the comment by removing any HTML tags it may contain.
And finally, the comments are retrieved, filtered, and displayed.
Generally the htmlspecialchars()
function is sufficient for filtering output intended for viewing in a browser. If you’re using a character encoding in your web pages other than ISO-8859-1 or UTF-8, though, then you’ll want to use htmlentities()
. For more information on the two functions, read their respective write-ups in the official PHP documentation.
Bare in mind that no single solution exists that is 100% secure on a constantly evolving medium like the Web. Test your validation code thoroughly with the most up to date XSS test vectors. Using the test data from the following sources should reveal if your code is still prone to XSS attacks.
- RSnake XSS cheatsheet (a pretty comprehensive list of XSS vectors you can use to test your code)
- Zend Framework’s XSS test data
- XSS cheatsheet (makes use of HTML5 features)