Sanitizing User Input: Preventing XSS Attacks

Template strings are a useful feature in JavaScript that allows you to generate strings with dynamic content. However, when generating HTML code dynamically using template strings, there is a risk of introducing security vulnerabilities such as cross-site scripting (XSS) attacks. This can happen when user-generated content is included in the HTML code without being properly sanitized.

To understand the risk of XSS attacks, imagine that you have a website that allows users to submit comments. If you use template strings to generate HTML code for these comments and include them directly in your page, a malicious user could potentially inject JavaScript code into their comment that gets executed by other users who view the comment. This could allow the attacker to steal sensitive information from other users or perform actions on their behalf.

To prevent XSS attacks, you need to properly sanitize any user-generated content that gets included in the HTML code. Sanitization means checking the content for any potentially harmful code and removing or escaping it so that it can't be executed by the browser.

Here's a simple example of how you could sanitize user-generated content using the DOMPurify library:

<!DOCTYPE html>
<html>
<head>
  <meta charset="UTF-8">
  <title>Sanitizing User Input</title>
  <!-- Include the DOMPurify library from a CDN -->
  <script src="https://cdn.jsdelivr.net/npm/dompurify@2.3.2/dist/dompurify.min.js"></script>
</head>
<body>
  <!-- Display existing comments -->
  <h1>Comments</h1>
  <ul id="comments-list">
    <li><strong>User 1:</strong> <span id="comment-1"></span></li>
    <li><strong>User 2:</strong> <span id="comment-2"></span></li>
  </ul>
  <!-- Form for submitting new comments -->
  <form id="comment-form">
    <label for="comment-input">Leave a comment:</label>
    <input type="text" id="comment-input" name="comment">
    <button type="submit">Submit</button>
  </form>
  <script>
    // Get references to the comments list, comment form, and comment input
    const commentsList = document.getElementById("comments-list");
    const commentForm = document.getElementById("comment-form");
    const commentInput = document.getElementById("comment-input");

    // Add an event listener for when the comment form is submitted
    commentForm.addEventListener("submit", (event) => {
      // Prevent the default form submission behavior
      event.preventDefault();

      // Get the value of the comment input
      const comment = commentInput.value;

      // Sanitize the comment input using DOMPurify
      const sanitizedComment = DOMPurify.sanitize(comment);

      // Create HTML elements to display the new comment
      const li = document.createElement("li");
      const strong = document.createElement("strong");
      const span = document.createElement("span");
      strong.textContent = "You:";
      span.innerHTML = sanitizedComment;
      li.appendChild(strong);
      li.appendChild(span);

      // Add the new comment to the comments list and clear the input
      commentsList.appendChild(li);
      commentInput.value = "";
    });
  </script>
</body>
</html>

In this example, we're using the DOMPurify library to sanitize the user-generated content before including it in the HTML code. We call the DOMPurify.sanitize function on the comment input to remove any potentially harmful code. This ensures that even if a user submits a comment with malicious code, it won't be executed by the browser.

Here's a more complex example of how you could manually sanitize user-generated content using regular expressions:

<!DOCTYPE html>
<html>
<head>
  <meta charset="UTF-8">
  <title>Sanitizing User Input</title>
</head>
<body>
  <!-- A heading and unordered list to display comments -->
  <h1>Comments</h1>
  <ul id="comments-list">
    <!-- Two list items with placeholders for comments from User 1 and User 2 -->
    <li><strong>User 1:</strong> <span id="comment-1"></span></li>
    <li><strong>User 2:</strong> <span id="comment-2"></span></li>
  </ul>
  <!-- A form for submitting comments -->
  <form id="comment-form">
    <label for="comment-input">Leave a comment:</label>
    <!-- An input field for entering a new comment -->
    <input type="text" id="comment-input" name="comment">
    <!-- A button to submit the comment -->
    <button type="submit">Submit</button>
  </form>
  <script>
    // Get references to the comments list, comment form, and input field
    const commentsList = document.getElementById("comments-list");
    const commentForm = document.getElementById("comment-form");
    const commentInput = document.getElementById("comment-input");

    // Add an event listener for the comment form submission
    commentForm.addEventListener("submit", (event) => {
      // Prevent the default form submission behavior
      event.preventDefault();
      // Get the value of the input field
      const comment = commentInput.value;
      // Sanitize the comment to remove potentially harmful HTML code
      const sanitizedComment = sanitize(comment);
      // Create a new list item, strong element for the username, and span element for the comment text
      const li = document.createElement("li");
      const strong = document.createElement("strong");
      const span = document.createElement("span");
      // Set the text content of the strong element to "You:"
      strong.textContent = "You:";
      // Set the inner HTML of the span element to the sanitized comment
      span.innerHTML = sanitizedComment;
      // Append the strong and span elements to the list item, and the list item to the comments list
      li.appendChild(strong);
      li.appendChild(span);
      commentsList.appendChild(li);
      // Clear the input field
      commentInput.value = "";
    });

    // Sanitize the input by removing potentially harmful HTML tags and attributes
    function sanitize(input) {
      const tagsRegex = /<\/?([a-z][a-z0-9]*)\b[^>]*>/gi;
      const attributesRegex = /([a-z][a-z0-9]*)="[^"]*"/gi;
      const protocols = ["http", "https", "mailto", "tel"];
      const hrefRegex = /href="(.*?)"/gi;
      const srcRegex = /src="(.*?)"/gi;

      // Remove HTML tags and attributes from the input, except for allowed protocols in href and src attributes
      const sanitizedInput = input
        .replace(tagsRegex, "")
        .replace(attributesRegex, (match, attribute) => {
          if (attribute === "href" || attribute === "src") {
            return match.replace(/"/g, "'").replace(hrefRegex, (match, url) => {
              const protocol = url.split(":")[0];
              if (protocols.includes(protocol)) {
                return `href="${url}"`;
              } else {
                return "";
              }
            }).replace(srcRegex, (match, url) => {
              const protocol = url.split(":")[0];
              if (protocols.includes(protocol)) {
                return `src="${url}"`;
              } else {
                return "";
              }
       }     else {
            return `${attribute}="${match.slice(attribute.length + 2, -1)}"`;
          }
        });

      return sanitizedInput;
    }
      </script>
    </body>
</html>

This is an HTML file that displays a list of comments and allows users to submit new comments. The comments are sanitized to remove any potentially harmful HTML tags and attributes before being displayed on the page.

The file starts with a doctype declaration and an HTML tag that contains a head and body section. The head section includes a title tag and a meta tag that sets the character encoding to UTF-8.

In the body section, there is a heading and an unordered list with two list items that serve as placeholders for comments from User 1 and User 2. Below that is a form for submitting comments, with an input field and a submit button.

The JavaScript code begins with three variables that get references to the comments list, comment form, and input field using the document.getElementById() method. An event listener is added to the comment form that listens for a submit event and prevents the default form submission behavior using event.preventDefault().

When a user submits a comment, the value of the input field is retrieved and passed through the sanitize() function, which removes any potentially harmful HTML tags and attributes using regular expressions. The sanitized comment is then used to create a new list item with a strong element for the username and a span element for the comment text. These elements are appended to the list item, which is then appended to the comments list. Finally, the input field is cleared.

The sanitize() function uses several regular expressions to remove HTML tags and attributes from the input string. The tagsRegex variable matches any HTML tag, while the attributesRegex variable matches any attribute name and value pair. The protocols variable is an array of allowed protocols for href and src attributes. The hrefRegex and srcRegex variables match href and src attribute values, respectively.

The replace() method is used to remove HTML tags and attributes from the input string, while the second argument to replace() is a function that handles each attribute name and value pair individually. If the attribute is href or src, the function checks if the URL uses an allowed protocol and replaces the attribute value if it does. If the attribute is not href or src, the function returns the attribute name and value pair unchanged.

Overall, this file demonstrates how to sanitize user input to prevent potentially harmful HTML code from being injected into a website.

Jon Christie

jonchristie.net

Template Strings: Some Risks and How to Fix 'em

Some simple JS to take care of potentially malicious attacks involving user-created messages.

Jon Christie