pdfmux is a PDF extraction tool and MCP server that routes each page through multiple backends, audits the output with confidence scoring, and re-extracts bad pages automatically. It can produce markdown, JSON, or chunked output for batch, CLI, Python, and agent integrations.
From the registry: PDF-to-Markdown router. Per-page backend selection + confidence scoring for RAG ingestion.
$ npx -y pdfmux-mcpPlease install the `pdfmux` MCP server into my current AI client (that's you).
Required prerequisites (do these first if not already done):
- **Install Python 3.11+** — pdfmux requires Python 3.11 or newer.
- **Install pdfmux MCP package via npx** — The README provides a one-line MCP setup using the published npm package `pdfmux-mcp`, which is the simplest portable MCP install path. Run: `npx -y pdfmux-mcp`
Optional prerequisites:
- Optional API key for BYOK LLM providers — Only needed if you configure pdfmux to use an external LLM fallback/provider such as Gemini, Claude, OpenAI, or Mistral.
Canonical MCP server config (stdio transport):
- command: `npx`
- args: ["-y","pdfmux-mcp"]
- optional environment variables:
- `GEMINI_API_KEY`: Google Gemini API key. Needed if pdfmux is configured to use Gemini or Gemma via Gemini. (example: `<your-gemini-api-key>`)
- `ANTHROPIC_API_KEY`: Anthropic API key. Needed if pdfmux is configured to use Claude models. (example: `<your-anthropic-api-key>`)
- `OPENAI_API_KEY`: OpenAI API key. Needed if pdfmux is configured to use GPT-4o or another OpenAI model. (example: `<your-openai-api-key>`)
Note: The README shows the MCP server can also be started from the Python CLI with `pdfmux serve` and `pdfmux serve --http 8080`, but it does not provide a canonical MCP client config shape for that path. For MCP client configuration, the README explicitly provides the stdio package invocation `npx -y pdfmux-mcp`, so that is the preferred extracted config. HTTP support is mentioned but no client URL is documented.
Add this MCP server to my current client's config in the correct format for you. If you need secrets or credentials I haven't provided, ASK me — do not invent values or leave raw placeholders. After adding it, tell me how to verify the server is connected.Real-time infrastructure monitoring with metrics, logs, alerts, and ML-based anomaly detection.
Trending hip-hop artist momentum scores across four cultural dimensions.
Web scraping with stealth HTTP, real browsers, and Cloudflare bypass. CSS selectors supported.