Quick Summary
- PII includes any data that can identify a person: email, name, IP, device ID.
- Common leak vectors: URLs, page titles, form actions, and tracker payloads.
- Leaked PII can reach third-party analytics, ad networks, and session replay tools.
- Under GDPR, uncontrolled PII exposure is a reportable data breach.
Introduction
Most PII data exposure on websites is completely accidental. Developers build URLs containing user email addresses for convenience, never realizing that every analytics script and advertising pixel on the page automatically captures the full URL, including that personal data.
This is not a theoretical risk. Google has actively warned and suspended Analytics accounts for receiving PII. European regulators have fined companies for leaking personal data to third-party trackers. And session replay tools like Hotjar and Microsoft Clarity can record keystrokes, form inputs, and even passwords if not properly configured.
This guide explains what counts as PII under the GDPR, the most common ways websites accidentally expose it, and practical steps to detect and fix these leaks before regulators find them.
What Counts as PII?
Defining Identifiable Information
Personally Identifiable Information (PII) is any data that can directly or indirectly identify a specific individual. Under the GDPR, this definition is deliberately broad, even data that seems anonymous can qualify if it can be linked back to a person. You can read the GDPR Article 4 for the direct legal definition of what counts as personal data.
| Data Type | PII? | Why |
|---|---|---|
| Email address | Yes | Directly identifies a person |
| Phone number | Yes | Directly identifies a person |
| IP address | Yes (GDPR) | Can be linked to an individual with ISP records |
| Cookie identifier | Yes (GDPR) | Uniquely tracks individual browsing behavior |
| Browser fingerprint | Yes (GDPR) | Creates a unique device profile |
| Pseudonymized ID | Yes (GDPR) | Can be re-identified with additional data |
| Full name | Yes | Directly identifies a person |
| Location data | Yes (GDPR) | Can identify a person through movement patterns |
Broad Definition Under GDPR
Why It Matters
Regulatory & Reputational Damages
- GDPR fines: Uncontrolled PII leaks to third parties constitute unauthorized data processing, a violation that can result in fines up to €20M or 4% of global revenue.
- Analytics account suspension: Google actively scans GA data for PII patterns. Persistent violations result in account suspension and data deletion.
- Data breach obligation: Under GDPR Article 33, if PII is accidentally exposed to unauthorized third parties, you must notify your supervisory authority within 72 hours.
- User trust: If users discover their personal data is being sent to ad networks without their knowledge, the reputational damage can be severe.
- Security risk: PII in URLs gets logged in web server access logs, CDN logs, browser history, and proxy servers, creating multiple points of potential unauthorized access.
How PII Gets Exposed on Websites
Visualizing the Leak
Common Vectors
The most common exposure vectors are:
| Vector | Example | Risk |
|---|---|---|
| URL query parameters | ?email=user@example.com | Sent to all third-party scripts + Referer header |
| URL path segments | /user/john-doe | Captured by analytics and CDN logs |
| Page titles | <title>Dashboard, john@example.com</title> | Sent to analytics as page title |
| Form actions via GET | action="/search?name=John" | Appears in server logs and browser history |
| Error messages | "User john@example.com not found" | Captured by session replay tools |
| Custom event data | analytics.track({email: 'john@example.com'}) | Sent directly to analytics server |
This Is Extremely Common
Real-World Examples
Major Leak Incidents
- Healthcare data breach: A major healthcare provider included patient IDs in URL query parameters. Analytics scripts captured these IDs and transmitted them to Google servers, triggering both a HIPAA and GDPR investigation.
- E-commerce PII leak: An online retailer used customer email addresses in URL paths for order tracking. The Referer header leaked these to 23 different third-party services loading on the page.
- Session replay capture: A financial services company used Hotjar without configuring input masking. The tool recorded credit card numbers and passwords typed into forms, storing them on external servers without encryption.
- Meta Pixel lawsuit: In 2022, Meta (Facebook) faced class-action lawsuits after hospitals discovered the Meta Pixel was capturing patient appointment data, including health conditions, from URL parameters.
Check if your site is accidentally exposing personal data.
Run Free PII Leak CheckHow to Detect PII Leaks
- Run an automated PII scan: Use the PII Leak Checker to scan your pages for exposed personal data in URLs, titles, and form fields.
- Audit your URL structure: Review all routes in your application for patterns that include PII, email addresses, names, phone numbers, or user-specific tokens.
- Check Referer header leaks: Use the Referrer Policy Checker to verify your URLs are not leaking to third-party domains.
- Review analytics data: Check your Google Analytics or Mixpanel reports for URLs containing email-like patterns. Search for
@in page URLs and event data. - Inspect session replay recordings: If you use session replay tools, review recordings for captured password fields, credit card inputs, or other sensitive data.
How to Fix PII Leaks
Application Fixes
- Use POST for sensitive forms: Never pass PII through GET query parameters. POST requests keep data in the request body, where it is not captured by analytics or the Referer header.
- Use opaque IDs: Replace email-based URLs with UUIDs (
/user/abc-123instead of/user/john@example.com). - Set Referrer-Policy: Configure
strict-origin-when-cross-originto prevent URL leaks via the Referer header. - Mask replay fields: Configure session replay tools to mask all input fields by default. Most tools (Hotjar, Clarity) support CSS-based masking selectors.
- Sanitize page titles: Never include user-specific data in document titles. Use generic titles: “Dashboard” instead of “Dashboard, john@example.com”.
- Filter analytics payloads: Configure your analytics to strip PII before transmission. GA4 has built-in PII redaction, but it's not enabled by default.
- Implement CSP: Use a Content Security Policy to restrict which domains can receive data from your page.
Best Practices
- Treat URLs as public data, anything in a URL should be considered visible to third parties, logs, and browser extensions.
- Default to masking in session replay, configure tools to mask all inputs by default, then selectively unmask non-sensitive fields.
- Establish URL hygiene guidelines, create a team standard that prohibits PII in URLs, and enforce it via code review.
- Run PII scans in CI/CD, add automated URL pattern scanning to your deployment pipeline to catch PII before it reaches production.
- Layer your defenses, combine Referrer-Policy, CSP, and URL design to create multiple barriers against PII leakage.
- Document your data flows, maintain a record of which third-party services receive data from your pages. This is required under GDPR Article 30.
Common Mistakes
- Assuming PII only means names and emails: Under GDPR, IP addresses, cookie IDs, and device fingerprints all qualify as PII. Your exposure surface is broader than you think.
- Only scanning the homepage: PII leaks are most common on dynamic pages, dashboards, search results, account settings, and checkout flows. Scan ALL pages.
- Relying on HTTPS to prevent leaks: HTTPS encrypts the connection, but it does not prevent your server from sending PII to third-party analytics in the Referer header or JavaScript payloads.
- Not auditing third-party scripts: Third-party trackers often auto-capture page URLs, form data, and DOM content. You are responsible for what they collect on your pages.
- Treating hashed data as non-PII: Hashed emails can be matched against platform databases (Facebook, Google) to re-identify users. Regulators treat reversible pseudonymization as personal data.
Conclusion
PII data exposure is one of the most common, and most underestimated, privacy risks on the web. It typically happens silently, with developers unaware that their URL structure, page titles, or form design is leaking personal data to dozens of third-party services on every page load.
The fix is straightforward: use opaque IDs in URLs, set a strong Referrer-Policy, mask sensitive inputs in replay tools, and scan regularly for new leaks. These steps protect both your users and your organization from GDPR liability.
Scan Your Website
Related Guides
Frequently Asked Questions
Why did Google threaten to delete my Analytics account for PII?+
Is a hashed email still considered PII?+
How do I prevent PII in URLs?+
Can session replay tools capture passwords?+
Does my GDPR privacy policy cover accidental PII leaks?+
How often should I scan for PII exposure?+
Audit for PII Exposure
Run a comprehensive scan to identify every place your website might be leaking personal data to third parties.
For deeper runtime checks, run the full privacy audit →