by @weights & biases
Official Weights & Biases Model Context Protocol (MCP) server enabling AI agents to query and analyze W&B data using natural language. Provides 6 powerful tools: query_wandb_tool (query runs, metrics, experiments with filters like "show me runs with loss < 0.1"), query_weave_traces_tool (analyze LLM traces and evaluations, get latency/performance metrics), count_weave_traces_tool (count traces, get storage metrics, check failure rates), create_wandb_report_tool (create W&B reports programmatically for visualizations), query_wandb_entity_projects (list projects for an entity), query_wandb_support_bot (get help from W&B documentation). Two deployment options: Hosted server at https://mcp.withwandb.com (zero installation, always up-to-date, automatic scaling, enterprise-grade reliability, maintained by W&B team) or Local STDIO (uvx installation for development, full control, air-gapped environments). Authentication via W&B API key from wandb.ai/authorize. Supported clients: Cursor (one-click installation from registry or manual .cursor/mcp.json), Claude Desktop (claude_desktop_config.json), OpenAI Response API (server-side MCP with Python client), Gemini CLI (gemini extensions install), Mistral LeChat (MCP server settings), VSCode (settings.json). Use cases: Analyze experiments ("Show me top 5 runs by eval/accuracy in project X"), debug traces ("How did latency evolve over last months?"), create reports ("Generate wandb report comparing decisions made last month"), get help ("How do I create a leaderboard in Weave?"). Best practices: Provide W&B project and entity name explicitly, avoid overly broad questions (refine to specific metrics like "highest f1 score"), ensure all data was retrieved when asking general questions. Written in Python (100%), MIT license, 18 stars, 4 forks. Installation: Hosted (use https://mcp.withwandb.com/mcp with API key header), Local (uvx pip install wandb-mcp-server, requires Python 3.10+ and uv/pip). For local HTTP testing with server-side clients (OpenAI, LeChat), use ngrok to expose local server. Configuration examples: Cursor registry install or add to mcp.json with uvx command, Claude add to config with url/apiKey, OpenAI tools array with server_url and authorization, Gemini extensions install from GitHub, VSCode settings.json with url and Authorization header. Example queries: "How many traces are in my project?", "What eval had highest f1 score?", "Show me failing traces from last run", "Create performance report for model comparison". Auto-clustering coming soon for analyzing trace quality. Repository: https://github.com/wandb/wandb-mcp-server. Support: GitHub issues, support@wandb.com. Resources: docs.wandb.ai, weave-docs.wandb.ai. Transport: STDIO for desktop clients, HTTP for web-based clients via FastMCP. Environment: WANDB_API_KEY required. Server startup: uvx wandb_mcp_server (STDIO) or uvx wandb_mcp_server --transport http --host 0.0.0.0 --port 8080 (HTTP). Requirements: wandb SDK, FastMCP, Python 3.10+. OpenAI example: client.responses.create with tools array containing mcp type, server_url, authorization. Production-grade hosted server maintained by W&B team (zero maintenance for users).
This server provides the following tools for AI assistants:
Query W&B runs, metrics, and experiments with filters. Search for specific runs based on metrics, hyperparameters, or tags. Example: "Show me runs with loss < 0.1" or "Find top 5 runs by eval/accuracy in project X"
Analyze LLM traces and evaluations from Weave. Get latency metrics, performance statistics, and trace details. Example: "What's the average latency?" or "How did latency of my agent traces evolve over the last months?"
Count traces and get storage metrics from Weave. Check failure rates, trace counts, and storage usage. Example: "How many traces failed?" or "What's my total trace count for this project?"
Create W&B reports programmatically for visualizations and analysis. Generate reports comparing runs, experiments, or model performance. Example: "Create a performance report" or "Generate wandb report comparing decisions made by the agent last month"
List projects for a W&B entity/organization. Discover available projects and their metadata. Example: "What projects exist?" or "List all projects in my organization"
Get help from W&B documentation through AI-powered support bot. Ask questions about W&B features, best practices, and usage. Example: "How do I use sweeps?" or "How do I create a leaderboard in Weave?"