Data Sources Taxonomy Design

In the field of Anti-Money Laundering (AML), Counter-Terrorist Financing (CFT), and Counter-Proliferation Financing (CPF), a data source is any origin or system that provides structured or unstructured information critical to detecting, assessing, or investigating potential financial crime activity. Such information feeds into detection models, risk assessments, and regulatory reporting, helping analysts and investigators identify indicators, validate red flags, and trace illicit flows of value.

When designing a taxonomy for these data sources, we chose to ensure each source fits into exactly one category—thus avoiding overlap and confusion. We have grouped data sources based on how they are most commonly used within AML/CFT/CPF processes, with the understanding that each source is assigned to the single category where it is most likely to be referenced.


1. Transaction & Payment Data

Purpose:

Captures any movement of money or digital value, whether fiat or cryptocurrency. It includes transactional details, balances, and usage logs. This category often forms the first line of defense for spotting unusual transaction patterns.

Examples of Included Sources:

  • Transaction logs
  • ATM usage and geolocation data
  • Records from donation platforms, online payment platforms, currency exchanges, and cryptocurrency services
  • Casino, cross-border, and prepaid card transaction data
  • Bank account data and account activity logs
  • Blockchain analytics and trading activity records

2. Customer Onboarding & Identity Data

Purpose:

Covers records used to verify and document the identities of individuals or entities, including KYC (Know Your Customer) and due diligence information. It often includes specialized document verification systems and any centralized repositories for identity records.

Examples of Included Sources:

  • KYC & customer due diligence files
  • Document verification tools
  • Internal document management platforms
  • Public or aggregated identity databases

3. Watchlists & Adverse Data

Purpose:

Centralizes data sources that help screen for high-risk or prohibited parties, such as sanctions lists, politically exposed person (PEP) lists, industry fraud alerts, and adverse media. These sources are crucial for preventing transactions with sanctioned or otherwise high-risk individuals and entities.

Examples of Included Sources:

  • Sanctions lists
  • PEP databases
  • Fraud data repositories
  • Adverse media and court filings

4. OSINT & Communication Data

Purpose:

Draws on open-source intelligence (OSINT) and communication records, such as phone logs, emails, messaging apps, and social media. Investigators use these to corroborate customer information or detect suspicious narrative patterns.

Examples of Included Sources:

  • Publicly available websites and social media posts
  • Records of electronic communications (where permissible)

5. Corporate & Ownership Data

Purpose:

Provides records clarifying the legal structures of companies, trusts, and other organizational entities, often revealing beneficial ownership or hidden relationships. These records are vital for unmasking shell companies or layered corporate structures.

Examples of Included Sources:

  • Company and beneficial ownership registries
  • Trust information and accounts
  • Licensed money service business registries
  • Real estate and other high-value asset ownership data

6. Access & Security Data

Purpose:

Encompasses physical or digital access logs—ranging from safe deposit box records to network access logs. These help track and investigate unauthorized activity or anomalies in how systems and facilities are accessed.

Examples of Included Sources:

  • Safe deposit box access logs
  • System and network access logs
  • Cybersecurity event data (e.g., failed login attempts or reported account takeovers)

7. Loan & Credit Data

Purpose:

Includes all formal lending documentation, such as loan and mortgage agreements or credit card facilities. These records clarify liabilities, repayment behaviors, and potential irregular usage of credit lines.

Examples of Included Sources:

  • Loan agreements
  • Credit facilities and mortgage documents

8. Market & Instrument Data

Purpose:

Focuses on pricing, trading volume, and related metrics for various financial instruments—stocks, bonds, derivatives—as well as commodities. This data is crucial for detecting market manipulation or confirming legitimate trades.

Examples of Included Sources:

  • Financial instrument and securities market data
  • Commodity market data

9. Customs, Border & Trade Data

Purpose:

Covers all import/export records, border crossings, and official trade documents. It is essential in detecting trade-based money laundering (TBML), smuggling, and hidden international flows of goods and value.

Examples of Included Sources:

  • Customs and border records
  • Asset seizure data
  • Trade documentation

10. Legal, Regulatory & Licensing Data

Purpose:

Consists of official legal documentation—such as asset declarations, court filings, and professional licenses. Analysts use these to confirm individuals’ or entities’ legal standing, declared assets, and adherence to regulatory obligations.

Examples of Included Sources:

  • Asset declarations
  • Professional licensing and affiliation databases
  • Legal documentation
  • Country/jurisdictional risk references

11. Business & Financial Records

Purpose:

Broad category for internal or external records that detail a business’s operations and finances. These might include tax filings, audit reports, and routine performance metrics, often analyzed to spot inconsistencies or anomalies.

Examples of Included Sources:

  • Contracts and invoices
  • Financial, business, and tax records
  • External or internal audit reports
  • Data on business activities or operations

12. Product & Service Usage Data

Purpose:

Collects information on how clients interact with specific financial products or services, separate from individual transactions. Analysts can use these patterns to detect suspicious spikes in usage or product misapplication.

Examples of Included Sources:

  • Product and service usage dashboards
  • Aggregated metrics on customer engagement

13. Employment & HR Data

Purpose:

Covers both internal employee records and external recruitment data. This information can reveal internal conflicts of interest, insider threats, or money-mule recruitment schemes.

Examples of Included Sources:

  • Job recruitment data
  • Employee records

Principles of Category Assignment

  • No Overlap: Each data source belongs to one primary category, preventing confusion about where it fits.
  • Utility Focus: Categories reflect how data is most commonly used in AML/CFT/CPF processes.
  • Bottom-Up Approach: This taxonomy emerged from documenting varied money laundering techniques and associated behaviors. As these techniques were analyzed, clear data source patterns were identified, which then shaped the groupings.
  • Adjustable Granularity: If an organization requires fewer or more detailed categories, these can be merged or subdivided.
  • Practical Usage: Categories such as “Transaction & Payment Data” are deliberately broad to consolidate all flows of money, while specialized domains (e.g., OSINT, watchlists, corporate ownership) are grouped separately for clarity.

Conclusion

This taxonomy provides a structured way to classify the many data sources used in AML/CFT/CPF. By organizing each source into a single logical category, investigators, analysts, and system architects can more easily navigate and apply the information. The scheme draws on a bottom-up perspective—derived from real-world investigations of financial crime techniques—and seeks to be both comprehensive and flexible. Adjustments can be made as new threats evolve or new sources become available, while still preserving the principle that every data source belongs exactly once in the taxonomy.