Skip to main content

Custom Middleware

This document introduces the Xpert Agent middleware capabilities and its plugin-based implementation approach, comparable to LangChain Middleware (refer to official documentation: overview, built-in). Through middleware, you can insert cross-cutting logic (logging, caching, rate limiting, security, prompt governance, etc.) at key nodes in the Agent lifecycle without modifying core orchestration logic.

Core Conceptsโ€‹

  • Strategy / Provider: Each middleware implements an IAgentMiddlewareStrategy, marked with @AgentMiddlewareStrategy('<provider>') and registered to AgentMiddlewareRegistry (plugin-sdk).
  • Middleware Instance: The AgentMiddleware object returned by createMiddleware, containing state Schema, context Schema, optional tools, and lifecycle hooks.
  • Node-based Integration: Connect WorkflowNodeTypeEnum.MIDDLEWARE nodes to Agents in the workflow graph. At runtime, they are loaded and executed in connection order via getAgentMiddlewares.
  • Configuration & Metadata: TAgentMiddlewareMeta describes name, i18n labels, icon, description, and configuration Schema. The UI renders configuration panels accordingly.

Lifecycle Hooks & Capabilitiesโ€‹

HookTrigger TimingTypical Use Cases
beforeAgentBefore Agent starts, triggered onceInitialize state, inject system prompts, fetch external context
beforeModelBefore each model callDynamically assemble messages/tools, truncate or compress context
afterModelAfter model returns, before tool executionAdjust tool call parameters, record logs/metrics
afterAgentAfter Agent completesPersist results, cleanup resources
wrapModelCallWraps model invocationCustom retry, caching, prompt protection, model switching
wrapToolCallWraps tool invocationAuthentication, rate limiting, result post-processing, return Command for flow control

Additionally, you can declare:

  • stateSchema: Persistable middleware state (Zod object/optional/with defaults).
  • contextSchema: Runtime-only readable context, not persisted.
  • tools: Dynamically injected DynamicStructuredTool list.
  • JumpToTarget: Return jumpTo in hooks to control jumps (e.g., model, tools, end).

Writing a Middlewareโ€‹

1) Define Strategy & Metadata

@Injectable()
@AgentMiddlewareStrategy('rateLimitMiddleware')
export class RateLimitMiddleware implements IAgentMiddlewareStrategy<RateLimitOptions> {
readonly meta: TAgentMiddlewareMeta = {
name: 'rateLimitMiddleware',
label: { en_US: 'Rate Limit Middleware', zh_Hans: '้™ๆตไธญ้—ดไปถ' },
description: { en_US: 'Protect LLM calls with quotas', zh_Hans: 'ไธบๆจกๅž‹่ฐƒ็”จๅขžๅŠ ้…้ขไฟๆŠค' },
configSchema: { /* JSON Schema for frontend form rendering */ }
};

2) Implement createMiddleware

  createMiddleware(options: RateLimitOptions, ctx: IAgentMiddlewareContext): AgentMiddleware {
const quota = options.quota ?? 100;
return {
name: 'rateLimitMiddleware',
stateSchema: z.object({ used: z.number().default(0) }),
beforeModel: async (state, runtime) => {
if ((state.used ?? 0) >= quota) {
return { jumpTo: 'end', messages: state.messages };
}
return { used: (state.used ?? 0) + 1 };
},
wrapToolCall: async (request, handler) => handler(request),
};
}
}

3) Register as Plugin Module

@XpertServerPlugin({
imports: [CqrsModule],
providers: [RateLimitMiddleware],
})
export class MyAgentMiddlewareModule {}

4) Runtime Integration

  • During development/deployment, add the plugin to plugin environment variables.
  • In the workflow editor, add middleware nodes to the Agent and configure order (execution order can be adjusted by arrangement).

Built-in Examplesโ€‹

  • SummarizationMiddleware
    Detects conversation length in beforeModel, triggers compression and replaces historical messages; supports triggering by token/message count or context ratio, and records compression to execution trace via WrapWorkflowNodeExecutionCommand.
  • todoListMiddleware Injects write_todos tool, and appends system prompts in wrapModelCall to guide LLM in planning complex tasks. Tool returns Command to update Agent state.

Best Practicesโ€‹

  • Implement only necessary hooks, keep them idempotent, avoid heavy blocking operations within hooks.
  • Use stateSchema to strictly declare persistent data, preventing state interference between different middleware/Agents.
  • In wrapModelCall/wrapToolCall, prioritize calling the passed handler to ensure default chain works, then add custom logic.
  • For ratio-triggered logic, rely on model profile.maxInputTokens; fallback to absolute token limits when unavailable (see Summarization example).
  • Following LangChain Middleware approach: decompose cross-cutting concerns like logging, auditing, caching, rate limiting, and gradual model switching into independent middleware, composing them by connection order.

Through this approach, you can seamlessly migrate LangChain's Middleware model into Xpert's plugin system, reusing existing orchestration, workflow, and UI configuration capabilities.