The Diligent Entities
API, composed by an agent.
166 tools across 27 categories. Bulk ingest a messy CSV into Diligent Entities, fuzzy-match countries and company types across 249 jurisdictions, detect duplicates with confidence scores, validate before commit, and roll back with audit proof — all from one agent conversation, through one protocol.
~/.local/share/diligent-entities-mcp
Per-user, no sudo
Re-run to update
Your Entities credentials never leave your laptop
Three layers, one conversation.
The server is designed so an LLM can navigate it without memorizing 166 tool names. A meta layer describes itself; a smart ingest layer handles the messy human work; a primitive layer exposes every GraphQL mutation the Diligent Entities API offers.
Meta & control plane
Eight tools the agent calls first. Health check, session metrics, capability discovery, schema introspection, auto-pagination, token refresh. The agent learns what it has before it guesses.
Smart ingest layer
The secret sauce. Fuzzy matching, bulk operations with bounded concurrency, dry-run validation, duplicate detection with confidence scores, country-scoped type filtering, ordered-fallback hints.
Primitive layer
Raw CRUD for every entity type — companies, individuals, addresses, appointments, trusts, partners, committees, plus audit trail, security groups, users, and data library. Compose these when no smart tool fits.
Typed error model
Every error classified into validation, limit, schema, auth, network, http. Agents know exactly when to retry, refresh, or give up with a clean explanation.
Bounded concurrency
Bulk mutations run through a worker pool (default 4, max 10). Per-row error capture means one bad row never aborts the batch. Retry + backoff is consistent across all 166 tools.
Reversible by default
Every bulk create returns the entityReference of every row. Roll back the whole run with one follow-up. The audit trail proves what changed.
Architecture at a glance.
Claude talks to the MCP over stdio. The MCP wraps a hardened GraphQL client with retry, structured errors, and token refresh. Reference data is cached for 15 minutes so fuzzy matches don't thrash the API.
┌───────────────────────────────────────────────────────┐ │ Claude (agent loop) │ └──────────────────────────┬────────────────────────────┘ │ MCP Protocol (stdio) ▼ ┌───────────────────────────────────────────────────────────────────┐ │ Diligent Entities MCP │ │ │ │ Meta layer Smart ingest Primitive layer │ │ self-discovery fuzzy match CRUD + getters │ │ health & metrics concurrency audit trail │ │ introspection dry-run security groups │ │ query_all duplicates addresses & appointments │ │ │ │ ───────────────────────────────────────────────────── │ │ │ │ Hardened GraphQL client │ │ · retry + exponential backoff · typed error classification │ │ · auth refresh hook · in-memory session metrics │ │ · reference-data cache (15m) · auto-pagination │ └──────────────────────────────┬────────────────────────────────────┘ │ HTTPS + Bearer token ▼ ┌───────────────────────────────────────────────┐ │ Diligent Entities │ │ GraphQL API · HotChocolate │ └───────────────────────────────────────────────┘
Six beats, thirty-seven seconds.
A realistic end-to-end ingestion flow: messy CSV in, clean data plus audit proof out. This is the exact sequence the agent runs during the demo, timed across 3 rehearsal runs against a live tenant.
Context
Agent calls entities_list_capabilities to discover what it has before doing anything. Meta layer first.
Map + dry-run
Parses the CSV, warms the reference cache, then calls entities_bulk_create_companies with dryRun: true. Zero writes. Returns 15 clean, 4 with warnings, 1 broken.
Duplicate detection
For the candidates, entities_find_duplicate_companies runs fuzzy name match + exact company-number match + country scoping. Catches pre-seeded traps and internal CSV dupes.
Commit
Real entities_bulk_create_companies with skipDuplicates: true, concurrency: 4. 17 created, 2 skipped as duplicates, 1 failed with a clean "Netherlands + Ltd" error.
Compose a report
"UK companies by type" — no dedicated tool exists. Agent composes it: entities_query_all + client-side group-by on companyType.name. Instant table.
Undo + audit
The agent remembered every entityReference. It deletes all 17, then queries entities_list_audit_trails as proof. 107,000+ audit entries; the deletes are in there.
The smart ingest layer.
Ingestion is the place where Diligent Entities, as an API, is at its most unforgiving: reference data, per-country type scoping, and duplicate semantics all have to be right before a single mutation is accepted. The smart ingest layer handles all of it so the agent can think in natural language.
Levenshtein + Jaccard + acronym prefix
Normalizes, tokenizes, compresses punctuation, applies alias maps, and handles leading-token matches so BV resolves to B.V. (closed limited liability company) even though no character overlaps.
Per-jurisdiction type filtering
Types are scoped with countryIds + isNotInCountries. Matches only consider types valid for the resolved country, so "Ltd" in Netherlands fails fast instead of creating the wrong entity.
Country-context ordered fallbacks
Each country has a preferred canonical name for common abbreviations with ordered fallbacks: Norway AS → [Aksjeselskap, Private Company]. Works across tenants with different type dictionaries.
Pre-commit validation
Every bulk-create tool accepts dryRun: true. Runs the full validator, resolves all references, reports warnings per row, and never touches the database.
Duplicate detection
Three-layer match: exact company-number inside the country, fuzzy name match above a threshold, and internal CSV duplicate scan. Returns confidence scores, not booleans.
Bounded worker pool
Bulk creates run through runWithConcurrency with a hard max of 10. Per-row errors are captured structurally. One bad row never breaks the batch.
// Map CSV rows to the MCP's free-text schema — no IDs required. const payloads = rows.map(r => ({ entityName: r.company_name, companyTypeName: r.company_type, // "BV" / "GmbH" / "SARL" — resolved by fuzzy match countryName: r.country, // "Holland" / "USA" — resolved via alias table companyNumber: r.reg_number, incorporationDate: r.incorporated, })); // Step 1 — dry-run. Zero writes. Catches every issue. const dry = await entities_bulk_create_companies({ companies: payloads, dryRun: true, }); // Step 2 — commit. Duplicates skipped. Concurrency 4. const real = await entities_bulk_create_companies({ companies: payloads, skipDuplicates: true, concurrency: 4, }); // Step 3 — hold the references so you can undo. const refs = real.results .filter(r => r.status === 'created') .map(r => r.created.entityReference); // → ["BRTP2020", "COBT2011", "FRMR2013", ...]
Browse by capability area.
Click any category to filter the tool reference. The meta layer, smart match, and bulk ingest tools are the ones you'll want the agent to reach for first.
All 166 tools.
Each tool documents its name, one-line purpose, and full input schema. Type in the search box or click a category above to filter.
Pre-validated end-to-end scenarios.
Each pack is a full agent loop — tested against a live tenant, timed, and scored. Use them as reference workflows or as the starting point for a new one.
Bulk company ingestion
The canonical workflow. Messy CSV with typos, accented names, country aliases, and one internal duplicate. Validated → deduped → committed → rolled back cleanly.
Directors & officers
Individuals with nationality, date of birth, passport, and role. Handles "Surname, Forenames" comma-splits, honorifics, and missing DOBs as warnings rather than errors.
International address book
Addresses with international characters, postal codes, regions, and entity connections. References must be alphanumeric ≤12 chars — the ingest layer strips and validates automatically.
The live demo flow
The full six-beat sequence: context → dry-run → dup-check → commit → compose a report → undo with audit proof. Designed for a 5-10 minute narration; runs in ~37 seconds of raw tool work.
What the agent does when there's no tool for that.
The MCP has 166 tools. Real questions are unbounded. When there's no dedicated tool, the agent composes primitives.
UK companies grouped by type
const all = await entities_query_all({ collection: 'companies', maxRecords: 500, }); const uk = all.items.filter(c => c.country?.name === 'United Kingdom'); const byType = uk.reduce((acc, c) => { const k = c.companyType?.name || '(unknown)'; acc[k] = (acc[k] || 0) + 1; return acc; }, {}); // → { 'Limited by Shares': 55, 'Limited by Guarantee': 1, 'Public Limited Company': 1 }
Individuals serving as directors of 3+ companies
const all = await entities_query_all({ collection: 'individuals', maxRecords: 1000 }); const results = []; for (const ind of all.items) { const appts = await entities_list_appointments({ entityId: ind.id, appointmentTypeName: 'Director', }); if (appts.items.length >= 3) { results.push({ individual: ind, count: appts.items.length }); } }
Roll back an entire bulk import
// refs was collected from the previous bulk_create_companies call for (const ref of refs) { const co = await entities_get_company_by_reference({ reference: ref }); await entities_delete_company({ id: co.id }); } const audit = await entities_list_audit_trails({ take: 50 }); // audit.items contains the 17 delete events, signed with user + timestamp
Error model.
Every error is classified. Validation and limit errors fail fast; network and 5xx errors are retried with exponential backoff; auth errors transparently refresh the token once.
| Type | Trigger | Retry | Agent action |
|---|---|---|---|
validation | Bad field value, wrong date format, reference too long | No | Fix the payload, don't retry |
limit | HC0047 / HC0051 — query cost exceeded, page size over 50 | No | Reduce take to ≤50, split batch |
schema | HC0009 / HC0011 — unknown field | No | Verify shape via describe_type |
auth | 401 Unauthorized | Once | Auto: refresh_api_token, retry in-place |
network | ECONNRESET, timeout | 3× backoff | Already retried transparently |
http | 5xx other than rate-limit classified cases | 3× backoff | Already retried transparently |
NO_TYPES_FOR_COUNTRY | Country has zero valid company types on this tenant | No | Skip the row or create the type first |
NO_TYPE_MATCH | Fuzzy match score below threshold | No | Ask the user or use smart_match_company_type to explore |
validation, limit, schema, or auth (beyond the one transparent refresh). This keeps loops tight and surfaces actionable errors to the agent fast.Getting started.
The server is a single Node.js process. Connect Claude to it via MCP stdio and you're in business.
# clone and install git clone https://github.com/RiskaptureAI/diligent-entities-mcp.git cd diligent-entities-mcp npm install # environment export ENTITIES_API_URL="https://your-tenant.blueprintserver.com" export ENTITIES_API_TOKEN="<your bearer token>" # DXM fallback (optional — only for UI automation) export ENTITIES_DXM_USERNAME="[email protected]" export ENTITIES_DXM_PASSWORD="<password>" # run node src/index.js
Claude MCP config
{
"mcpServers": {
"diligent-entities": {
"command": "node",
"args": ["/path/to/diligent-entities-mcp/src/index.js"],
"env": {
"ENTITIES_API_URL": "https://your-tenant.blueprintserver.com",
"ENTITIES_API_TOKEN": "<your bearer token>"
}
}
}
}
First agent prompt
I've got a CSV of 20 companies from a messy handover. Pull them into Diligent Entities — but don't trust the data. Validate everything first, tell me what's clean and what's broken, then import the good stuff.
Production readiness.
Today this is a hardened pilot, not a production release. If you're shipping it to a real customer, read this section first.
What's solid today
- Retry with exponential backoff and structured error classification across all 166 tools.
- Bounded concurrency with per-row error capture — one bad row never aborts a batch.
- In-memory session metrics (call count, success rate, avg latency, retry count) exposed via
get_session_metrics. - Auto-pagination helper for collections up to 10,000 rows.
- Optional JSONL observability log via
ENTITIES_LOG_FILE. - Transparent token refresh via DXM Playwright fallback on 401.
What still needs work
- Multi-tenancy: one token per server instance today.
- GraphQL queries use string interpolation, not variables — refactor before untrusted input gets anywhere near the MCP.
- Playwright browser doesn't yet close on SIGTERM — clean up resource leaks before long-running deployments.
- Test suite is manual scripts hitting live tenants — needs a mocked-client test framework + CI.
- PII redaction at the logging layer is currently opt-out by not enabling file logging.