Back to Data Glossary

What is PII

PII (Personally Identifiable Information) includes any data that can be used to identify a specific individual, such as a name, address, or Social Security Number. It is typically subject to strict data protection rules like GDPR, HIPAA, or CCPA.

How It Works

When a system collects PII, it must store it securely, often via encryption or tokenization. Access is usually restricted to specific roles. If the data breaches these rules, or if it’s used in a way the individual hasn’t consented to, companies risk legal consequences. Techniques like masking or hashing can protect PII—while still allowing certain operations like partial matches or analytics.

Technical Details

Storing PII securely often involves hashing (one-way transformation) or encryption (with a decryption key). Many databases also provide column-level encryption to protect sensitive fields. You can mask PII in views for analysts, or create fine-grained access controls to ensure that only authorized roles see unmasked data. Auditing and logging access to PII is another important step for compliance.

How to Write It

Basic Syntax

-- Example: Handling PII in SQL

-- 1. Create a table with encrypted PII
CREATE TABLE customers_secure (
  id SERIAL PRIMARY KEY,
  encrypted_ssn BYTEA,
  encrypted_email BYTEA,
  last_login TIMESTAMP,
  -- Store a hash for searching
  email_hash VARCHAR(64)
);

-- 2. Create a view to mask sensitive info
CREATE VIEW customers_masked AS
SELECT 
  id,
  CONCAT('XXX-XX-', RIGHT(decrypt_ssn_function(encrypted_ssn), 4)) AS masked_ssn,
  CONCAT(
    LEFT(decrypt_email_function(encrypted_email), 2), 
    '***@',
    SPLIT_PART(decrypt_email_function(encrypted_email), '@', 2)
  ) AS masked_email,
  last_login
FROM customers_secure;

-- 3. Role-based access
CREATE ROLE analyst;
GRANT SELECT ON customers_masked TO analyst;
REVOKE SELECT ON customers_secure FROM analyst;

-- 4. Audit table for PII access
CREATE TABLE pii_access_log (
  access_id SERIAL PRIMARY KEY,
  user_name VARCHAR(100),
  accessed_table VARCHAR(100),
  accessed_columns TEXT[],
  access_time TIMESTAMP DEFAULT NOW()
);

Supported Platforms

Learn More

Best Practices

  • Encrypt PII at rest and in transit (e.g., TLS for connections)
  • Use masking or hashing for fields not requiring direct user identification
  • Implement strict role-based access controls (RBAC) and audit logging
  • Regularly review database logs for unauthorized access attempts

Common Pitfalls

  • Storing sensitive columns in plain text
  • Not hashing or salting sensitive strings (like passwords)
  • Mixing personal data with analytics data that does not need identifying info
  • Insufficient logging of queries, making compliance auditing difficult

Advanced Tips

  • Use HSM (Hardware Security Modules) or cloud KMS for key management
  • Leverage tokenization services to replace sensitive data with reversible tokens
  • Automate PII scanning tools to discover new, unprotected fields in your DB
  • Adopt a “privacy by design” approach, minimizing data collection from the start

Related Terms

Space, at your fingertips
astronaut

What will you discover?