blog/11 min read

Build an MCP Server for SEC EDGAR (Claude + LangChain Ready)

Build an MCP server for SEC EDGAR with pre-classified 8-K triggers, Form 4 cluster buys, 13F deltas. Claude + LangChain + LlamaIndex code included.

Build an MCP Server for SEC EDGAR (Claude + LangChain Ready)

If you're wiring an LLM agent to financial data, the SEC EDGAR system is the highest-signal, lowest-friction public source in the US - 12M+ filings, mandated-public by federal securities law, no API key, no per-call cost. The catch is the same one humans have always had with EDGAR: raw filings are XBRL soup and 8-K item codes, and your agent will burn 50,000 tokens reading a 10-K to answer "did anyone resign?" That's why an MCP server for SEC EDGAR with the analysis layer pre-baked is the only practical pattern for production agents.

I've built EDGAR plumbing three times now - once for a hedge-fund research bot, once for a DD-memo automation, and once as a public Apify actor at seibs.co/mcp-sec-edgar-intel. This post walks through what MCP is for non-Anthropic users, why EDGAR specifically wants an MCP wrapper, how to build one from scratch, and the shortcut if you'd rather not.

What MCP is (the 60-second version)

Model Context Protocol (MCP) is an open standard from Anthropic for connecting LLMs to external tools and data sources. Think of it as USB-C for AI agents: the agent runtime (Claude Desktop, a LangChain agent, a custom Python agent) speaks MCP, and any MCP server exposes tools the agent can discover and call. The protocol handles tool discovery, schema validation, and the request/response envelope.

The pitch: instead of writing a custom Anthropic tool-use schema for each integration, you stand up an MCP server once and any MCP-aware runtime (Claude Desktop, Continue.dev, Cursor, the official MCP SDK, Apify's MCP support) calls it. LangChain and LlamaIndex both have MCP adapter packages.

Spec at modelcontextprotocol.io. Reference Anthropic blog at anthropic.com/news/model-context-protocol.

Why EDGAR specifically wants an MCP wrapper

+--------+    "did any company in our    +--------+
| AGENT  | -- watchlist file an 8-K  --> | MCP    |
| (LLM)  |    about exec changes         | SERVER |
|        |    in the last 30 days?"      |        |
+--------+                                +---+----+
     ^                                        |
     |                                        v
     |       +----------+    +-----------+   +-------------+
     +-------|categorized| <- |classifier | <-| EDGAR API   |
       JSON  | 8-K event |    | (8-K item |   | data.sec    |
             | items     |    |  + NLP)   |   | .gov        |
             +----------+    +-----------+   +-------------+

Three problems an agent can't solve well by itself:

1. EDGAR fair-access enforcement. SEC requires a real User-Agent + 10 req/sec cap. An MCP server handles this once; otherwise every agent invocation has to re-enforce the policy.

2. Filing classification. An agent that reads a raw 8-K to determine "is this about M&A or about a dividend declaration?" burns ~10-50K tokens. A classifier in the MCP server does it deterministically and returns the category as a structured field. The agent gets a 200-byte JSON and a 99% cost cut.

3. Cross-filing analytics. "Did insiders cluster-buy this week?" requires joining N Form 4 filings and computing a rolling-window cluster. "What changed in this 13F vs last quarter?" requires QoQ position deltas. These are pre-computable; doing them per-prompt is wasteful.

The right division of labor: the MCP server does the IO, fair-access compliance, classification, and analytics. The agent does the natural-language framing, the multi-tool orchestration, and the human-facing output.

The walkthrough: build it yourself

If you want to build the MCP server from scratch, the structure is:

# minimal MCP server skeleton (Python)
from mcp.server import Server
from mcp.server.stdio import stdio_server
from mcp.types import Tool, TextContent
import httpx

server = Server("sec-edgar-intel")

SEC_BASE = "https://data.sec.gov"
USER_AGENT = "Your Company contact@yourdomain.com"  # REQUIRED

@server.list_tools()
async def list_tools() -> list[Tool]:
    return [
        Tool(
            name="get_8k_triggers",
            description=(
                "Return 8-K filings for a ticker classified by trigger type "
                "(m_and_a, executive_change, going_concern, restatement, "
                "impairment, debt_issuance, customer_loss, bankruptcy, "
                "auditor_change, restructuring)."
            ),
            inputSchema={
                "type": "object",
                "properties": {
                    "ticker": {"type": "string"},
                    "days_back": {"type": "integer", "default": 30},
                },
                "required": ["ticker"],
            },
        ),
        # ... get_form4_insider_activity, get_13f_positions_change,
        #     get_recent_form_d, get_earnings_transcript, get_company_filings
    ]

@server.call_tool()
async def call_tool(name: str, arguments: dict) -> list[TextContent]:
    if name == "get_8k_triggers":
        return await _get_8k_triggers(**arguments)
    raise ValueError(f"unknown tool {name}")

async def _get_8k_triggers(ticker: str, days_back: int = 30):
    # 1. resolve ticker -> CIK via SEC company_tickers.json
    # 2. fetch submissions JSON for CIK with proper User-Agent
    # 3. filter to form 8-K within days_back window
    # 4. fetch each 8-K body, classify by Item code + phrase matching
    # 5. return compact JSON envelope
    ...

if __name__ == "__main__":
    import asyncio
    asyncio.run(stdio_server(server))

You'll fill in: the ticker -> CIK resolver, the 8 req/sec semaphore, the Item-code -> category map (Item 1.01 = material agreement, Item 2.01 = acquisition, Item 5.02 = exec change, Item 4.02 = restatement, etc), and the phrase-level classifier for confirmation.

Realistic scope:

8-K classifier alone: 80-150 lines + a phrase library, ~16 hours
Form 4 cluster detection: rolling-window join, ~8 hours
13F QoQ deltas: persistent state, ~24 hours
Form D parser, earnings transcript extraction: ~16 hours each
Total: 60-100 engineering hours for a v1 you'd actually deploy

Code lives behind any MCP-compatible runtime - register it in Claude Desktop's config, or call it from LangChain via langchain-mcp-adapters, or from any custom Anthropic SDK agent via MCP client libraries.

The walkthrough: use a prebuilt MCP server

If you'd rather skip the 60-100 hours, I shipped one as part of my Apify portfolio: seibs.co/mcp-sec-edgar-intel. It exposes six tools (get_company_filings, get_8k_triggers, get_form4_insider_activity, get_13f_positions_change, get_recent_form_d, get_earnings_transcript) wrapping the upstream classifier actor. Apify supports MCP natively - point any MCP client at the actor and the tool catalog auto-discovers.

Pricing: $0.005 per tool call (the wrapper) plus upstream PPE events ($0.005 per filing, $0.010 per classified trigger). A typical get_8k_triggers call against one ticker over 90 days costs around $0.075 total.

Calling it from Claude (Anthropic SDK + Apify MCP)

import os
import anthropic
from apify_client import ApifyClient

apify = ApifyClient(os.environ["APIFY_TOKEN"])
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

def call_edgar_tool(tool_name: str, args: dict):
    run = apify.actor("seibs.co/mcp-sec-edgar-intel").call(run_input={
        "mode": "call_tool",
        "tool": tool_name,
        "args": args,
        "user_agent": "Acme Research contact@acme.com",
    })
    return list(apify.dataset(run["defaultDatasetId"]).iterate_items())

# Define the tools for Claude to use
tools = [
    {
        "name": "get_8k_triggers",
        "description": "Return 8-K filings classified by trigger type",
        "input_schema": {
            "type": "object",
            "properties": {
                "ticker": {"type": "string"},
                "days_back": {"type": "integer", "default": 30},
            },
            "required": ["ticker"],
        },
    },
    {
        "name": "get_form4_insider_activity",
        "description": "Form 4 insider trades with cluster_buy + unusual_size flags",
        "input_schema": {
            "type": "object",
            "properties": {
                "ticker": {"type": "string"},
                "days_back": {"type": "integer", "default": 60},
            },
            "required": ["ticker"],
        },
    },
]

messages = [
    {"role": "user", "content": (
        "Check if NVDA or AMD filed any 8-Ks about exec changes or M&A in "
        "the last 60 days, and whether any insiders cluster-bought."
    )},
]

while True:
    resp = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=2000,
        tools=tools,
        messages=messages,
    )
    if resp.stop_reason != "tool_use":
        print(resp.content[0].text)
        break
    messages.append({"role": "assistant", "content": resp.content})
    tool_results = []
    for block in resp.content:
        if block.type == "tool_use":
            result = call_edgar_tool(block.name, block.input)
            tool_results.append({
                "type": "tool_result",
                "tool_use_id": block.id,
                "content": str(result),
            })
    messages.append({"role": "user", "content": tool_results})

Calling it from LangChain

from langchain_anthropic import ChatAnthropic
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.tools import tool
from langchain_core.prompts import ChatPromptTemplate
from apify_client import ApifyClient
import os

apify = ApifyClient(os.environ["APIFY_TOKEN"])

@tool
def get_8k_triggers(ticker: str, days_back: int = 30) -> str:
    """Return 8-K filings for a ticker classified by trigger type."""
    run = apify.actor("seibs.co/mcp-sec-edgar-intel").call(run_input={
        "mode": "call_tool",
        "tool": "get_8k_triggers",
        "args": {"ticker": ticker, "days_back": days_back},
        "user_agent": "Acme Research contact@acme.com",
    })
    items = list(apify.dataset(run["defaultDatasetId"]).iterate_items())
    return str(items)

llm = ChatAnthropic(model="claude-opus-4-7")
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an equity research analyst."),
    ("human", "{input}"),
    ("placeholder", "{agent_scratchpad}"),
])
agent = create_tool_calling_agent(llm, [get_8k_triggers], prompt)
executor = AgentExecutor(agent=agent, tools=[get_8k_triggers], verbose=True)

result = executor.invoke({
    "input": "Did MSFT have any going-concern or restatement 8-Ks this quarter?"
})
print(result["output"])

What you can do with this MCP server

Agent use case	Tools called
Daily catalyst monitor for a watchlist	`get_8k_triggers` x N tickers, filter by event_type
Insider conviction screen	`get_form4_insider_activity` weekly, flag `cluster_buy + c_suite_only`
Smart-money 13F tracker	`get_13f_positions_change` x N manager CIKs, aggregate new + increased
Pre-IPO venture intel	`get_recent_form_d` filter by industry + min_amount
Earnings season analysis	`get_earnings_transcript` per portfolio name, pipe to summarization
DD memo automation	All six tools batched per target, fed into a structured-output prompt
Compliance / restatement alert	`get_8k_triggers` daily, alert on `restatement` or `auditor_change`
Securities litigation prep	`get_company_filings` for involved entities, full-text excerpt extraction

The combination most teams converge on: a nightly cron that calls get_8k_triggers across the watchlist, classifies, and posts high-priority events to Slack. The agent only spins up when an analyst asks a follow-up question.

Honest limitations

The non-glamorous truth about MCP servers wrapping public data.

MCP is young (released Nov 2024). Spec is stable; tooling around it is still maturing. Claude Desktop has good MCP support; ChatGPT does not (as of mid-2026); LangChain and LlamaIndex MCP adapters exist but are less battle-tested than their native tool-calling paths.

Apify-hosted MCP has cold-start latency. First tool call to an Apify-hosted MCP wrapper takes 3-8 seconds because the actor container has to spin up. Subsequent calls in the same run are sub-second. For latency-sensitive agents, host MCP locally.

Fair-access policy is your problem if you build it. The actor enforces 8 req/sec and requires a real User-Agent input. If you build your own MCP server, you have to enforce the SEC's 10 req/sec cap and pass a real contact UA on every request. Ignoring this gets your IP blocked.

8-K classifier is heuristic, not perfect. Confidence >= 0.8 is reliable; 0.5-0.8 needs human review; below 0.5 is informational only. Expect ~92% precision on high-confidence and ~75% recall across all true events.

13F is 45-day-stale by regulation. Not a tool limitation - 13F-HR is filed within 45 days of quarter-end. Smart-money tracking is by definition lagged.

No XBRL fundamentals parsing. This tool catalog extracts events and classifications, not balance-sheet line items. For XBRL-derived revenue / EPS / segments, use the SEC's Financial Statement Data Sets or a fundamentals API.

Token cost shifts. MCP wrappers cut per-prompt token cost but each MCP tool call adds latency (1-8s) and an Apify PPE charge ($0.005 + upstream events). For agents that would otherwise make 50 EDGAR calls per prompt, MCP is a net win. For one-off questions, it's marginal.

No real-time tick data. EDGAR polling latency is ~30-60 seconds from filing acceptance to actor output. Bloomberg-grade real-time is not the use case.

FAQ

Q: What is an MCP server? A: A server that speaks Anthropic's open Model Context Protocol to expose tools an LLM agent can discover and call. Think of it as a standard way to plug any data source or API into Claude, LangChain, LlamaIndex, Cursor, Continue.dev, or any other MCP-aware agent runtime.

Q: Do I need Claude to use an MCP server? A: No. MCP is an open standard. Claude Desktop has first-class MCP support, but LangChain (via langchain-mcp-adapters), LlamaIndex, and any custom agent built on the OpenAI SDK can call MCP servers with adapter libraries.

Q: Why is an MCP server for SEC EDGAR better than calling EDGAR directly from my agent? A: Three reasons: (1) fair-access compliance is centralized, (2) 8-K classification and Form 4 cluster detection happen deterministically in the server instead of burning 10-50K tokens per prompt asking the LLM to do it, (3) tools are pre-defined so the agent doesn't have to figure out the SEC's URL scheme on every call.

Q: How do I run an MCP server with Claude Desktop? A: Add the server to Claude Desktop's claude_desktop_config.json under mcpServers, restart Claude Desktop, and the tools appear in the chat. Apify-hosted MCP servers can be added via the Apify MCP adapter; locally-hosted servers run as subprocess via stdio.

Q: Is SEC EDGAR data really free? A: Yes. EDGAR is the SEC's public filing system, mandated by federal securities law. No login, no API key, no per-call fee. The only requirement is the SEC's fair-access policy (real User-Agent, 10 requests/second).

Q: Can my agent classify 8-K filings without an MCP server? A: Yes, but it's expensive. A raw 8-K is typically 5,000-50,000 tokens. Asking the LLM to read it and classify the event burns input tokens on every call. An MCP server with a deterministic classifier returns a 200-byte JSON for ~$0.01 per filing instead of $0.05-$0.50 in LLM tokens.

Q: What's the latency of calling an Apify-hosted MCP server? A: First call ~3-8s (container cold start), subsequent calls in the same run sub-second. For latency-sensitive workflows, self-host the MCP server locally.

Q: Can I add custom tools to an existing MCP server? A: If you have the source, yes - MCP servers are just Python (or any language) code. The Apify-hosted version is closed (you call the actor's MCP interface); to add custom tools you'd fork the upstream sec-edgar-intel actor or write your own MCP server using its output as input.

Q: How do I handle SEC fair-access rate limits in an MCP server? A: Implement an 8 req/sec semaphore (cap below 10 for safety margin), set a real User-Agent header on every request (e.g. "Acme Research contact@acme.com"), and back off with exponential delay on 403 responses.

Q: Does this MCP server work with LangGraph or LlamaIndex? A: Yes. Both have MCP adapter packages. LangGraph uses langchain-mcp-adapters; LlamaIndex has a similar adapter in llama-index-tools-mcp. The same tool catalog appears in both runtimes.

Q: What's the difference between this and the underlying sec-edgar-intel actor? A: sec-edgar-intel is the data pipeline (fetch + classify). mcp-sec-edgar-intel is a thin wrapper that advertises an MCP tool catalog, validates arguments with Pydantic, and reshapes responses for agent consumption (drops nulls, drops XML, compact JSON). Same backend, agent-friendly surface.

Try it free

Run mcp-sec-edgar-intel on Apify - free plan covers ~500 tool calls per month. The underlying sec-edgar-intel actor exposes the same pipeline as a traditional Apify run if you don't need the MCP surface.

Related MCP servers in the portfolio:

mcp-accounting-firm-leads - lead-finder MCP for fintech / accounting-SaaS sales agents.
mcp-youtube-intelligence - video metadata + transcript MCP for content-research agents.

About the author

I'm a solo MSP operator who builds B2B web-scraping actors at apify.com/seibs.co when I'm not running incident calls. The portfolio has 30+ live actors covering lead generation, intent data, SEC/USPTO/court records, and AI agent wrappers - all pay-per-event so you only pay for what's emitted. Find me at seibs.co.

actors mentioned

next step / 30 seconds

Not sure which actor matches your use case?

Answer 3 questions and we surface the 2-3 best matches in the portfolio. No email gate, no signup.

Find my actor Browse all 35 More posts