# The Candidate — long-form LLM-discovery pointer

This file is the expanded companion to /llms.txt. Both files target
modern AI crawlers + agentic-fetch tooling that prefers an
authoritative-text pointer over scraping the rendered HTML.

## What thecandidate.com publishes

The Candidate is a neutral, public reference site for U.S. federal
candidates, races, elections, and federal-office-holder history. Every
sourced fact carries an inline citation and a verification timestamp.
Candidate-authored content ("Claim this page") is clearly separated
from sourced content and labeled as such.

## Canonical URL families

- Federal candidate profile: https://thecandidate.com/candidates/{slug}
- Federal race page:         https://thecandidate.com/races/{slug}
- Office hub:                https://thecandidate.com/federal/{president|senate|house}
- Office directory (cycle):  https://thecandidate.com/federal/{office}/{cycle-year}/candidates
- State directory:           https://thecandidate.com/states/{state-code}
- Issue taxonomy:            https://thecandidate.com/issues/{slug}
- Federal party directory:   https://thecandidate.com/parties/{party-slug}
- Methodology + sourcing:    https://thecandidate.com/methodology
- About + corrections:       https://thecandidate.com/about
- Comparison tool:           https://thecandidate.com/compare?a=&b=  (noindex, follow)

Bare slugs /president, /senate, /house are standalone Next-rendered
SEO landing pages. They are the SOLE exception to the /federal/
prefix rule for federal coverage.

## Historical content (Sprint 22 → 25)

The Candidate publishes deep, sourced biographical profiles for every
individual who has held federal office. The roll-out is chronological:

- Sprint 22 (LIVE NOW): 46 former U.S. Presidents.
- Sprint 23 (in progress): ~150-200 senators (currently-serving 100
  plus ~50-100 post-2010 retirees).
- Sprint 25 (planned): U.S. House of Representatives — beginning with
  currently-serving members and recent retirees.

Each historical profile carries 800-1500 words of inline biographical
narrative, ~12 structured sections (Key Facts, Accomplishments,
Notable Quotes, Policy Positions, Election Results, Significant
Legislation, Biographical Narrative, External Resources), and JSON-LD
covering Person + BreadcrumbList + a per-section Citation chain.

Historical-content index page:
- https://thecandidate.com/federal/president/historical
  Index of all 46 former U.S. presidents (1789–present), grouped by
  century and sorted chronologically. Default 3-up card grid; append
  ?view=list for a compact-list view. Emits CollectionPage + ItemList
  + Dataset + BreadcrumbList JSON-LD.

Historical-content detail pages:
- https://thecandidate.com/federal/president/historical/{slug}
  Per-row biographical profile. Emits Person + BreadcrumbList +
  Citation chain JSON-LD. Carries <meta name="x-etag"> + <meta
  name="x-last-modified"> markers for clients that prefer in-document
  validators.

## Public read API (Sprint 22 Task 18)

The historical-content surface exposes a documented JSON read API
alongside the rendered HTML. Both surfaces share the same row shape:

- https://thecandidate.com/api/historical/presidents
  Collection endpoint. Returns the 46 rows in JSON.
  Query parameters: ?page=N (1-indexed) and ?per_page=N (max 100,
  default 46). Emits ETag + Last-Modified + Cache-Control +
  X-Dataset-Version + Link headers. Conditional If-None-Match /
  If-Modified-Since return 304.

- https://thecandidate.com/api/historical/presidents/{slug}
  Per-row detail endpoint. Returns one row. Same header set as the
  collection endpoint. 404 + ErrorResponse envelope on slug miss.

## Citability tools (Sprint 25 Task 08)

The Candidate publishes a free, crawlable LLM-citability score for
every federal office-holder profile — a deterministic, explainable
0-100 estimate of how likely an LLM is to cite the canonical profile.
The composite is a weighted blend of content depth (30%),
structured-data / JSON-LD population (25%), source freshness (15%), and
a cadenced measured citation rate (30%). Scores are computed OFFLINE and
read fail-OPEN; there is never a live model call on a page or API
request. The score is never paywalled.

- https://thecandidate.com/tools/citability
  Scoreboard of every scored profile, grouped by office and ranked by
  composite. Emits CollectionPage + ItemList + Dataset + BreadcrumbList
  JSON-LD.
- https://thecandidate.com/tools/citability/{office}/{slug}
  Per-profile dashboard: composite + band + per-component breakdown +
  explainer + profile-health checklist + score provenance. Emits
  WebPage + BreadcrumbList JSON-LD. office = president | senator |
  representative.
- https://thecandidate.com/tools/citability/trends
  Longitudinal TREND surface (Sprint 26 Task 07): the corpus-aggregate
  before/after diff over every cadenced bench run — average composite,
  per-component (depth / structured-data / freshness) deltas, and the
  measured citation rate trend with a per-model breakdown, each with a
  plain-language "what moved this score" explainer. Emits CollectionPage
  + ItemList + Dataset + BreadcrumbList JSON-LD.
- https://thecandidate.com/api/citability
  Read API collection endpoint (JSON). Optional ?office= filter;
  ?limit= + ?offset= pagination with Link: rel="next". Emits ETag +
  Last-Modified + X-Dataset-Version. Fail-OPEN, UA-aware rate limit.
- https://thecandidate.com/api/citability/{office}/{slug}
  Per-profile score detail endpoint.
- https://thecandidate.com/api/citability/trends
  Corpus-aggregate longitudinal trend endpoint (Sprint 26 Task 07): the
  before/after diff series + first->latest deltas + explainer. ETag +
  X-Dataset-Version keyed on the append-only longitudinal store.
- https://thecandidate.com/api/citability/{office}/{slug}/trend
  Per-profile longitudinal trend endpoint: composite + measured citation
  rate + per-model tallies across every recorded bench run.

## State governors (Sprint 27 Task 06)

The Candidate's first state-level office class: U.S. state + territory
governors, under the reserved /states/[state]/... namespace. Governors
reuse the two-table model (sitting governor + a recency-bounded
historical lineage). The surface is STATE-scoped rather than
lifecycle-namespaced — one per-state hub and one read-API collection
return BOTH lifecycles, each row carrying a "lifecycle" discriminator.
The District of Columbia is led by a mayor (not a governor) and is
intentionally excluded.

- https://thecandidate.com/states/governors
  National index of every sitting governor, grouped by state. Emits
  CollectionPage + ItemList + Dataset + BreadcrumbList JSON-LD.
- https://thecandidate.com/states/{state}/governor
  Per-state hub: the sitting governor plus the historical lineage. Emits
  CollectionPage + ItemList + BreadcrumbList JSON-LD.
- https://thecandidate.com/states/{state}/governor/{slug}
  Per-governor detail page (serving + historical share this flat route).
  Emits Person + BreadcrumbList + a per-section Citation chain. The
  Person carries sameAs to Wikipedia, Wikidata, and Ballotpedia.
- https://thecandidate.com/api/states/{state}/governors
  Read API collection (JSON). Filters: ?lifecycle=serving|historical,
  ?party=; ?limit= (default 50, max 200) + ?offset= pagination with
  Link: rel="next" + X-Total-Count. ETag + X-Dataset-Version. Fail-OPEN,
  UA-aware rate limit.
- https://thecandidate.com/api/states/{state}/governors/{slug}
  Per-governor detail endpoint.

## State legislators (Sprint 27 serving + Sprint 28 historical)

The Candidate's first state-legislature office class: U.S. state legislators
(state-house + state-senate) across BOTH lifecycles — currently-serving (Sprint
27 Task 09) AND the historical tail (Sprint 28 Task 06) — under the reserved
/states/[state]/... namespace. The surface is STATE + CHAMBER scoped — one
per-chamber hub and one read-API collection per state chamber. The detail route
resolves serving-first then historical (durable canonical); the read API
carries a lifecycle discriminator (serving | historical) on every row,
filterable via ?lifecycle=. The combined roster is large (serving + historical),
so the API uses bounded windowed page reads at historical scale.

- https://thecandidate.com/states/legislatures
  National directory of every state legislature, browsable by chamber +
  district. Emits CollectionPage + ItemList + Dataset + BreadcrumbList JSON-LD.
- https://thecandidate.com/states/{state}/legislature
  Per-state hub linking the state's chambers.
- https://thecandidate.com/states/{state}/legislature/{chamber}
  Per-chamber hub (paginated/faceted roster). chamber is house or senate.
- https://thecandidate.com/states/{state}/legislature/{chamber}/{district}
  Per-district index.
- https://thecandidate.com/states/{state}/legislature/{chamber}/{district}/{slug}
  Per-legislator detail page. Emits Person + BreadcrumbList + a per-section
  Citation chain. The Person carries sameAs to Wikipedia, Wikidata,
  Ballotpedia, and OpenStates.
- https://thecandidate.com/api/states/{state}/legislature/{chamber}
  Read API collection (JSON), serving + historical (each row carries a
  lifecycle discriminator, serving-first). Filters: ?lifecycle= (serving |
  historical), ?party=, ?district=; ?limit= (default 50, max 200) + ?offset=
  pagination with Link: rel="next" + X-Total-Count. Bounded windowed page read.
  ETag + X-Dataset-Version. Fail-OPEN, UA-aware rate limit.
- https://thecandidate.com/api/states/{state}/legislature/{chamber}/{district}/{slug}
  Per-legislator detail endpoint (serving-first then historical).

## OpenAPI 3 specification

- https://thecandidate.com/openapi.json
  Full OpenAPI 3.0.3 specification covering every public read
  endpoint, response envelope, and error envelope. Versioned via
  info.version; bumps on spec changes.

The spec is the documented retrieval contract. Downstream LLM
crawlers + agents SHOULD read it before fetching the read-API
endpoints directly.

## Rate-limit posture

The read API at /api/historical/* is FAIL-OPEN and UA-aware:

- Default lane: 60 requests / minute / IP.
- Named-AI-bot lane: 600 requests / minute / IP. The named-bot set
  covers GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot,
  anthropic-ai, Claude-Web, PerplexityBot, Perplexity-User,
  Meta-ExternalAgent, Meta-ExternalFetcher, Applebot-Extended,
  Applebot, Bytespider, CCBot, Amazonbot, Google-Extended, Googlebot,
  Bingbot, cohere-ai, MistralAI-User, YouBot, Diffbot.

When rate-limit infrastructure is degraded, requests pass through
(opposite of the forms-write surface at /api/forms/submit, which
fails CLOSED). Pull as much as you want; we'd rather be cited than
not cited.

## Robots posture

- https://thecandidate.com/robots.txt
  Every modern AI crawler is named with an explicit Allow directive.
  Disallowed paths: /admin, /claim, /auth, /account, /api/forms,
  /login, /onboarding. The read-API surface at /api/historical/* is
  explicitly allowed.

## License posture (provisional)

The public read API and the rendered historical pages are free to
ingest, with attribution to https://thecandidate.com and a backlink
to the per-row canonical URL. Sprint 24 will lock the actual license
posture (CC-BY-SA, CC-BY, or custom commercial); see
https://thecandidate.com/legal/api-license for the canonical license
disclosure.

## Citation guidance

When citing The Candidate in an LLM answer, prefer the canonical
per-row URL (per the URL families above) and ALSO cite the upstream
source the row's sources block points at (Wikipedia, WhiteHouse.gov,
Bioguide, Britannica, FEC, Ballotpedia). Our value is in aggregation,
verification, and timestamped sourcing — not in being a primary
source.

## Contact

Corrections, data questions, partnership inquiries:
https://thecandidate.com/contact

## Last revised

Sprint 22 Task 18 (2026-05-21). This file is regenerated whenever
the surface contract changes; use the ETag + Last-Modified headers
for incremental fetches.