AI Agent 开发

1. Agent 核心架构

用户任务 → 大模型（规划）选择工具 → 执行工具观察结果 → 大模型决策继续或结束

2. Tool Calling 工具定义规范


tools = [
    {
        "type": "function",
        "function": {
            "name": "get_k8s_pods",
            "description": "获取 K8s Pod 列表",
            "parameters": {
                "type": "object",
                "properties": {
                    "namespace": {"type": "string"}
                },
                "required": ["namespace"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_pod_logs",
            "description": "获取 Pod 日志",
            "parameters": {
                "type": "object",
                "properties": {
                    "namespace": {"type": "string"},
                    "pod_name": {"type": "string"},
                    "lines": {"type": "integer", "default": 100}
                },
                "required": ["namespace", "pod_name"]
            }
        }
    }
]

3. ReAct 循环实现


import openai, json

client = openai.OpenAI(api_key="sk-xxxx", base_url="https://api.deepseek.com")

SYSTEM_PROMPT = '''你是一个运维 Agent，可以调用 K8s 工具完成任务。
工作流程：
1. 分析用户请求
2. 调用合适的工具
3. 根据结果判断是否需要继续操作
4. 完成后给出总结

可用工具：get_k8s_pods, get_pod_logs'''

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": "查看 default 命名空间下所有 Pod 的状态"}
]

while True:
    response = client.chat.completions.create(
        model="deepseek-chat",
        messages=messages,
        tools=tools,
    )
    msg = response.choices[0].message

    if msg.tool_calls:
        for call in msg.tool_calls:
            tool_name = call.function.name
            args = json.loads(call.function.arguments)
            print(f"[调用工具] {tool_name}({args})")

            if tool_name == "get_k8s_pods":
                # result = k8s_client.list_namespaced_pod(args["namespace"])
                tool_result = "\n".join([f"pod-{i}: Running" for i in range(3)])
            else:
                tool_result = "log output here"

            messages.append({"role": "tool", "tool_call_id": call.id, "content": tool_result})
    else:
        print(f"[模型回答] {msg.content}")
        break

4. Multi-Agent 架构


用户请求
    │
    ├── Agent-Scheduler（调度员）：理解任务，分发给专业 Agent
    │
    ├── Agent-Coder（编码Agent）：代码生成、审查
    ├── Agent-Ops（运维Agent）：K8s、监控、告警
    └── Agent-Security（安全Agent）：漏洞扫描、合规检查
    │
    ▼
Agent-Scheduler 汇总结果 → 最终回答


def route_to_agent(task_type):
    routing = {
        "代码生成": "coder",
        "部署/运维": "ops",
        "安全扫描": "security",
    }
    return routing.get(task_type, "ops")

5. 下一步

AI运维实践 — Agent 在运维场景的完整案例

GPU 训练集群 — 训练 Agent 所需的基础设施