# The Candidate — long-form LLM-discovery pointer This file is the expanded companion to /llms.txt. Both files target modern AI crawlers + agentic-fetch tooling that prefers an authoritative-text pointer over scraping the rendered HTML. ## What thecandidate.com publishes The Candidate is a neutral, public reference site for U.S. federal candidates, races, elections, and federal-office-holder history. Every sourced fact carries an inline citation and a verification timestamp. Candidate-authored content ("Claim this page") is clearly separated from sourced content and labeled as such. ## Canonical URL families - Federal candidate profile: https://thecandidate.com/candidates/{slug} - Federal race page: https://thecandidate.com/races/{slug} - Office hub: https://thecandidate.com/federal/{president|senate|house} - Office directory (cycle): https://thecandidate.com/federal/{office}/{cycle-year}/candidates - State directory: https://thecandidate.com/states/{state-code} - Issue taxonomy: https://thecandidate.com/issues/{slug} - Federal party directory: https://thecandidate.com/parties/{party-slug} - Methodology + sourcing: https://thecandidate.com/methodology - About + corrections: https://thecandidate.com/about - Comparison tool: https://thecandidate.com/compare?a=&b= (noindex, follow) Bare slugs /president, /senate, /house are standalone Next-rendered SEO landing pages. They are the SOLE exception to the /federal/ prefix rule for federal coverage. ## Historical content (Sprint 22 → 25) The Candidate publishes deep, sourced biographical profiles for every individual who has held federal office. The roll-out is chronological: - Sprint 22 (LIVE NOW): 46 former U.S. Presidents. - Sprint 23 (in progress): ~150-200 senators (currently-serving 100 plus ~50-100 post-2010 retirees). - Sprint 25 (planned): U.S. House of Representatives — beginning with currently-serving members and recent retirees. Each historical profile carries 800-1500 words of inline biographical narrative, ~12 structured sections (Key Facts, Accomplishments, Notable Quotes, Policy Positions, Election Results, Significant Legislation, Biographical Narrative, External Resources), and JSON-LD covering Person + BreadcrumbList + a per-section Citation chain. Historical-content index page: - https://thecandidate.com/federal/president/historical Index of all 46 former U.S. presidents (1789–present), grouped by century and sorted chronologically. Default 3-up card grid; append ?view=list for a compact-list view. Emits CollectionPage + ItemList + Dataset + BreadcrumbList JSON-LD. Historical-content detail pages: - https://thecandidate.com/federal/president/historical/{slug} Per-row biographical profile. Emits Person + BreadcrumbList + Citation chain JSON-LD. Carries + markers for clients that prefer in-document validators. ## Public read API (Sprint 22 Task 18) The historical-content surface exposes a documented JSON read API alongside the rendered HTML. Both surfaces share the same row shape: - https://thecandidate.com/api/historical/presidents Collection endpoint. Returns the 46 rows in JSON. Query parameters: ?page=N (1-indexed) and ?per_page=N (max 100, default 46). Emits ETag + Last-Modified + Cache-Control + X-Dataset-Version + Link headers. Conditional If-None-Match / If-Modified-Since return 304. - https://thecandidate.com/api/historical/presidents/{slug} Per-row detail endpoint. Returns one row. Same header set as the collection endpoint. 404 + ErrorResponse envelope on slug miss. ## Citability tools (Sprint 25 Task 08) The Candidate publishes a free, crawlable LLM-citability score for every federal office-holder profile — a deterministic, explainable 0-100 estimate of how likely an LLM is to cite the canonical profile. The composite is a weighted blend of content depth (30%), structured-data / JSON-LD population (25%), source freshness (15%), and a cadenced measured citation rate (30%). Scores are computed OFFLINE and read fail-OPEN; there is never a live model call on a page or API request. The score is never paywalled. - https://thecandidate.com/tools/citability Scoreboard of every scored profile, grouped by office and ranked by composite. Emits CollectionPage + ItemList + Dataset + BreadcrumbList JSON-LD. - https://thecandidate.com/tools/citability/{office}/{slug} Per-profile dashboard: composite + band + per-component breakdown + explainer + profile-health checklist + score provenance. Emits WebPage + BreadcrumbList JSON-LD. office = president | senator | representative. - https://thecandidate.com/tools/citability/trends Longitudinal TREND surface (Sprint 26 Task 07): the corpus-aggregate before/after diff over every cadenced bench run — average composite, per-component (depth / structured-data / freshness) deltas, and the measured citation rate trend with a per-model breakdown, each with a plain-language "what moved this score" explainer. Emits CollectionPage + ItemList + Dataset + BreadcrumbList JSON-LD. - https://thecandidate.com/api/citability Read API collection endpoint (JSON). Optional ?office= filter; ?limit= + ?offset= pagination with Link: rel="next". Emits ETag + Last-Modified + X-Dataset-Version. Fail-OPEN, UA-aware rate limit. - https://thecandidate.com/api/citability/{office}/{slug} Per-profile score detail endpoint. - https://thecandidate.com/api/citability/trends Corpus-aggregate longitudinal trend endpoint (Sprint 26 Task 07): the before/after diff series + first->latest deltas + explainer. ETag + X-Dataset-Version keyed on the append-only longitudinal store. - https://thecandidate.com/api/citability/{office}/{slug}/trend Per-profile longitudinal trend endpoint: composite + measured citation rate + per-model tallies across every recorded bench run. ## State governors (Sprint 27 Task 06) The Candidate's first state-level office class: U.S. state + territory governors, under the reserved /states/[state]/... namespace. Governors reuse the two-table model (sitting governor + a recency-bounded historical lineage). The surface is STATE-scoped rather than lifecycle-namespaced — one per-state hub and one read-API collection return BOTH lifecycles, each row carrying a "lifecycle" discriminator. The District of Columbia is led by a mayor (not a governor) and is intentionally excluded. - https://thecandidate.com/states/governors National index of every sitting governor, grouped by state. Emits CollectionPage + ItemList + Dataset + BreadcrumbList JSON-LD. - https://thecandidate.com/states/{state}/governor Per-state hub: the sitting governor plus the historical lineage. Emits CollectionPage + ItemList + BreadcrumbList JSON-LD. - https://thecandidate.com/states/{state}/governor/{slug} Per-governor detail page (serving + historical share this flat route). Emits Person + BreadcrumbList + a per-section Citation chain. The Person carries sameAs to Wikipedia, Wikidata, and Ballotpedia. - https://thecandidate.com/api/states/{state}/governors Read API collection (JSON). Filters: ?lifecycle=serving|historical, ?party=; ?limit= (default 50, max 200) + ?offset= pagination with Link: rel="next" + X-Total-Count. ETag + X-Dataset-Version. Fail-OPEN, UA-aware rate limit. - https://thecandidate.com/api/states/{state}/governors/{slug} Per-governor detail endpoint. ## State legislators (Sprint 27 serving + Sprint 28 historical) The Candidate's first state-legislature office class: U.S. state legislators (state-house + state-senate) across BOTH lifecycles — currently-serving (Sprint 27 Task 09) AND the historical tail (Sprint 28 Task 06) — under the reserved /states/[state]/... namespace. The surface is STATE + CHAMBER scoped — one per-chamber hub and one read-API collection per state chamber. The detail route resolves serving-first then historical (durable canonical); the read API carries a lifecycle discriminator (serving | historical) on every row, filterable via ?lifecycle=. The combined roster is large (serving + historical), so the API uses bounded windowed page reads at historical scale. - https://thecandidate.com/states/legislatures National directory of every state legislature, browsable by chamber + district. Emits CollectionPage + ItemList + Dataset + BreadcrumbList JSON-LD. - https://thecandidate.com/states/{state}/legislature Per-state hub linking the state's chambers. - https://thecandidate.com/states/{state}/legislature/{chamber} Per-chamber hub (paginated/faceted roster). chamber is house or senate. - https://thecandidate.com/states/{state}/legislature/{chamber}/{district} Per-district index. - https://thecandidate.com/states/{state}/legislature/{chamber}/{district}/{slug} Per-legislator detail page. Emits Person + BreadcrumbList + a per-section Citation chain. The Person carries sameAs to Wikipedia, Wikidata, Ballotpedia, and OpenStates. - https://thecandidate.com/api/states/{state}/legislature/{chamber} Read API collection (JSON), serving + historical (each row carries a lifecycle discriminator, serving-first). Filters: ?lifecycle= (serving | historical), ?party=, ?district=; ?limit= (default 50, max 200) + ?offset= pagination with Link: rel="next" + X-Total-Count. Bounded windowed page read. ETag + X-Dataset-Version. Fail-OPEN, UA-aware rate limit. - https://thecandidate.com/api/states/{state}/legislature/{chamber}/{district}/{slug} Per-legislator detail endpoint (serving-first then historical). ## OpenAPI 3 specification - https://thecandidate.com/openapi.json Full OpenAPI 3.0.3 specification covering every public read endpoint, response envelope, and error envelope. Versioned via info.version; bumps on spec changes. The spec is the documented retrieval contract. Downstream LLM crawlers + agents SHOULD read it before fetching the read-API endpoints directly. ## Rate-limit posture The read API at /api/historical/* is FAIL-OPEN and UA-aware: - Default lane: 60 requests / minute / IP. - Named-AI-bot lane: 600 requests / minute / IP. The named-bot set covers GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, anthropic-ai, Claude-Web, PerplexityBot, Perplexity-User, Meta-ExternalAgent, Meta-ExternalFetcher, Applebot-Extended, Applebot, Bytespider, CCBot, Amazonbot, Google-Extended, Googlebot, Bingbot, cohere-ai, MistralAI-User, YouBot, Diffbot. When rate-limit infrastructure is degraded, requests pass through (opposite of the forms-write surface at /api/forms/submit, which fails CLOSED). Pull as much as you want; we'd rather be cited than not cited. ## Robots posture - https://thecandidate.com/robots.txt Every modern AI crawler is named with an explicit Allow directive. Disallowed paths: /admin, /claim, /auth, /account, /api/forms, /login, /onboarding. The read-API surface at /api/historical/* is explicitly allowed. ## License posture (provisional) The public read API and the rendered historical pages are free to ingest, with attribution to https://thecandidate.com and a backlink to the per-row canonical URL. Sprint 24 will lock the actual license posture (CC-BY-SA, CC-BY, or custom commercial); see https://thecandidate.com/legal/api-license for the canonical license disclosure. ## Citation guidance When citing The Candidate in an LLM answer, prefer the canonical per-row URL (per the URL families above) and ALSO cite the upstream source the row's sources block points at (Wikipedia, WhiteHouse.gov, Bioguide, Britannica, FEC, Ballotpedia). Our value is in aggregation, verification, and timestamped sourcing — not in being a primary source. ## Contact Corrections, data questions, partnership inquiries: https://thecandidate.com/contact ## Last revised Sprint 22 Task 18 (2026-05-21). This file is regenerated whenever the surface contract changes; use the ETag + Last-Modified headers for incremental fetches.