Custom Middleware

This document introduces the Xpert Agent middleware capabilities and its plugin-based implementation approach, comparable to LangChain Middleware (refer to official documentation: overview, built-in). Through middleware, you can insert cross-cutting logic (logging, caching, rate limiting, security, prompt governance, etc.) at key nodes in the Agent lifecycle without modifying core orchestration logic.

Core Concepts

Strategy / Provider: Each middleware implements an IAgentMiddlewareStrategy, marked with @AgentMiddlewareStrategy('<provider>') and registered to AgentMiddlewareRegistry (plugin-sdk).
Middleware Instance: The AgentMiddleware object returned by createMiddleware, containing state Schema, context Schema, optional tools, and lifecycle hooks.
Node-based Integration: Connect WorkflowNodeTypeEnum.MIDDLEWARE nodes to Agents in the workflow graph. At runtime, they are loaded and executed in connection order via getAgentMiddlewares.
Configuration & Metadata: TAgentMiddlewareMeta describes name, i18n labels, icon, description, and configuration Schema. The UI renders configuration panels accordingly.

Lifecycle Hooks & Capabilities

Hook	Trigger Timing	Typical Use Cases
`beforeAgent`	Before Agent starts, triggered once	Initialize state, inject system prompts, fetch external context
`beforeModel`	Before each model call	Dynamically assemble messages/tools, truncate or compress context
`afterModel`	After model returns, before tool execution	Adjust tool call parameters, record logs/metrics
`afterAgent`	After Agent completes	Persist results, cleanup resources
`wrapModelCall`	Wraps model invocation	Custom retry, caching, prompt protection, model switching
`wrapToolCall`	Wraps tool invocation	Authentication, rate limiting, result post-processing, return `Command` for flow control

Additionally, you can declare:

stateSchema: Persistable middleware state (Zod object/optional/with defaults).
contextSchema: Runtime-only readable context, not persisted.
tools: Dynamically injected DynamicStructuredTool list.
JumpToTarget: Return jumpTo in hooks to control jumps (e.g., model, tools, end).

Writing a Middleware

1) Define Strategy & Metadata

@Injectable()
@AgentMiddlewareStrategy('rateLimitMiddleware')
export class RateLimitMiddleware implements IAgentMiddlewareStrategy<RateLimitOptions> {
  readonly meta: TAgentMiddlewareMeta = {
    name: 'rateLimitMiddleware',
    label: { en_US: 'Rate Limit Middleware', zh_Hans: '限流中间件' },
    description: { en_US: 'Protect LLM calls with quotas', zh_Hans: '为模型调用增加配额保护' },
    configSchema: { /* JSON Schema for frontend form rendering */ }
  };

2) Implement createMiddleware

  createMiddleware(options: RateLimitOptions, ctx: IAgentMiddlewareContext): AgentMiddleware {
    const quota = options.quota ?? 100;
    return {
      name: 'rateLimitMiddleware',
      stateSchema: z.object({ used: z.number().default(0) }),
      beforeModel: async (state, runtime) => {
        if ((state.used ?? 0) >= quota) {
          return { jumpTo: 'end', messages: state.messages };
        }
        return { used: (state.used ?? 0) + 1 };
      },
      wrapToolCall: async (request, handler) => handler(request),
    };
  }
}

3) Register as Plugin Module

@XpertServerPlugin({
  imports: [CqrsModule],
  providers: [RateLimitMiddleware],
})
export class MyAgentMiddlewareModule {}

4) Runtime Integration

During development/deployment, add the plugin to plugin environment variables.
In the workflow editor, add middleware nodes to the Agent and configure order (execution order can be adjusted by arrangement).

Built-in Examples

SummarizationMiddleware
Detects conversation length in beforeModel, triggers compression and replaces historical messages; supports triggering by token/message count or context ratio, and records compression to execution trace via WrapWorkflowNodeExecutionCommand.
todoListMiddleware Injects write_todos tool, and appends system prompts in wrapModelCall to guide LLM in planning complex tasks. Tool returns Command to update Agent state.

Best Practices

Implement only necessary hooks, keep them idempotent, avoid heavy blocking operations within hooks.
Use stateSchema to strictly declare persistent data, preventing state interference between different middleware/Agents.
In wrapModelCall/wrapToolCall, prioritize calling the passed handler to ensure default chain works, then add custom logic.
For ratio-triggered logic, rely on model profile.maxInputTokens; fallback to absolute token limits when unavailable (see Summarization example).
Following LangChain Middleware approach: decompose cross-cutting concerns like logging, auditing, caching, rate limiting, and gradual model switching into independent middleware, composing them by connection order.

Through this approach, you can seamlessly migrate LangChain's Middleware model into Xpert's plugin system, reusing existing orchestration, workflow, and UI configuration capabilities.

Core Concepts​

Lifecycle Hooks & Capabilities​

Writing a Middleware​

Built-in Examples​

Best Practices​

Core Concepts

Lifecycle Hooks & Capabilities

Writing a Middleware

Built-in Examples

Best Practices