What Are HTML Entities and Why Do They Matter?
HTML uses a handful of characters as part of its own syntax. The angle brackets < and > delimit tags, the ampersand & begins entity references, and double quotes " wrap attribute values. If any of these characters appear unescaped inside your markup, the browser will try to interpret them as HTML structure rather than display them as visible text.
HTML entities are special sequences that tell the browser to render a specific character literally. For example, writing < in your source code produces a visible < on the page instead of opening a tag. Entities come in two main forms: named entities like & that use a human-readable label, and numeric entities like & that reference the Unicode code point directly. Both produce the same rendered output, but they serve different use cases depending on readability needs and character support.
Without proper encoding, even a simple blog comment containing <b>bold</b> could inject unwanted markup into the page. Encoding these characters is not optional — it is a fundamental requirement for producing valid, predictable HTML.
HTML Encoding and XSS Prevention
Cross-site scripting (XSS) is one of the most common web security vulnerabilities, and improper handling of HTML special characters is its primary enabler. When a web application takes user-supplied input and inserts it directly into the page without encoding, an attacker can inject executable code. Consider a search page that displays the query back to the user: if someone searches for <script>alert('xss')</script> and the value is reflected unescaped, the browser will execute that JavaScript in the context of your domain.
HTML encoding neutralizes this attack vector by converting every < into < and every > into >. The encoded string renders as harmless visible text rather than being parsed as HTML structure. This is why every server-side framework and templating engine applies HTML encoding to dynamic output by default — it is the first line of defense against injection attacks.
The five characters you must always encode in HTML content are <, >, &, ", and '. Encoding just angle brackets is not enough — an attacker can break out of an attribute value using an unescaped quote and inject event handlers like onmouseover without ever needing a script tag.
Named vs Numeric vs Hexadecimal Entities
HTML supports three different syntaxes for encoding the same character, each with its own strengths. Taking the ampersand as an example:
- Named entity —
&. Human-readable and easy to remember. The HTML specification defines roughly 2,200 named entities, but only a few dozen are commonly used. - Decimal numeric entity —
&. Uses the decimal Unicode code point (U+0026 = 38). Works for any Unicode character, even those without a named entity. - Hexadecimal numeric entity —
&. Same as the decimal form but uses the hex value, which maps directly to the Unicode code chart and is preferred in some codebases.
Named entities are the best choice for the common five (<, >, &, ", ') because they are instantly recognizable in source code. Numeric entities become useful when you need to represent characters that lack a named form — such as the zero-width joiner (‍) or right-to-left mark (‏). Hexadecimal entities are commonly seen in generated markup and WAF (web application firewall) bypass testing because they map one-to-one with Unicode charts.
Common HTML Entities Reference
Below are the most frequently used HTML entities that every developer should know:
<— Less-than sign (<). Essential for displaying HTML tags as text.>— Greater-than sign (>). Closes the tag pair in documentation and code samples.&— Ampersand (&). Must be encoded even in URLs withinhrefattributes."— Double quote ("). Required inside double-quoted attribute values.'— Single quote / apostrophe ('). Not defined in HTML4 but fully supported in HTML5 and XML. — Non-breaking space. Prevents line breaks between words and is commonly used for layout spacing in email templates.—— Em dash (—). Used in punctuation as an alternative to parentheses or colons.©— Copyright symbol. Often seen in website footers.™— Trademark symbol. Used alongside brand names in legal text.
While modern UTF-8 encoded pages can include most of these characters directly, using named entities in your source code makes the intent explicit. It also avoids ambiguity when the file encoding is uncertain — a common issue with legacy systems, email HTML, and third-party content feeds. This tool handles both encoding (converting raw characters to entities) and decoding (converting entities back to their original characters), so you can work in whichever direction your task requires.
Related Tools
HTML encoding is one of several encoding and escaping schemes used in web development. If you are working with query strings or path segments, the URL Encode/Decode tool handles percent-encoding for URI components — a different escaping context with its own reserved character set.
For embedding binary data directly in HTML (such as inline images via data URIs or embedding files in JSON payloads), the Base64 Encode/Decode tool converts between raw bytes and an ASCII-safe representation that can be safely placed inside attribute values.
If you have encoded HTML that you need to inspect or clean up, the HTML Beautifier can format and indent the decoded markup so its structure is easier to read and debug.