Reflexion 自反思 Agent 详解 概述 Reflexion 是一种让 Agent 具备”从错误中学习”能力的架构模式。核心思想是:执行任务后进行自我评估和反思,将经验教训存入记忆,在后续尝试中避免重复错误。
💡 提示: 核心价值 传统 Agent 每次都是”从零开始”,而 Reflexion Agent 能够积累经验、持续改进 。
与传统 Agent 的对比
维度
传统 Agent
Reflexion Agent
失败处理
简单重试
反思后改进再试
经验积累
无
有(记忆模块)
学习能力
无
从错误中学习
成功率
随机
逐次提升
一、核心架构 1.1 三大组件 graph TB
subgraph Reflexion Agent
A[Actor<br/>执行者] --> B[Evaluator<br/>评估者]
B --> C{成功?}
C -->|是| D[返回结果]
C -->|否| E[Self-Reflection<br/>自反思]
E --> F[Memory<br/>经验记忆]
F --> A
end
组件
职责
输入
输出
Actor
执行任务
任务 + 历史经验
执行结果
Evaluator
评估结果质量
任务 + 结果
成功/失败 + 反馈
Self-Reflection
分析失败原因
执行历史 + 评估反馈
经验教训
Memory
存储经验
反思结果
历史经验列表
1.2 执行流程 sequenceDiagram
participant U as 用户
participant A as Actor
participant E as Evaluator
participant R as Self-Reflection
participant M as Memory
U->>A: 任务
M->>A: 历史经验
A->>E: 执行结果
E->>E: 评估
alt 成功
E->>U: 返回结果
else 失败
E->>R: 评估反馈
R->>R: 反思分析
R->>M: 存储经验
M->>A: 更新经验
Note over A: 重新尝试
end
二、各组件详解 2.1 Actor(执行者) Actor 负责实际执行任务,但会参考历史经验避免重复错误。
Prompt 模板 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 你是一个智能助手,负责完成用户任务。 ## 任务 {task} ## 历史经验教训(请务必参考,避免重复错误) {experiences} ## 要求1. 仔细阅读历史经验,避免犯同样的错误2. 如果历史经验提到某种方法不可行,请尝试其他方法3. 输出你的思考过程和最终结果 ## 输出格式 ### 思考 [分析任务,参考经验,规划方案] ### 执行 [具体执行步骤和结果]
Java 实现 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 @Component public class Actor { private final ChatClient chatClient; public ActorResult execute (String task, List<String> experiences) { String experienceText = experiences.isEmpty() ? "暂无历史经验" : experiences.stream() .map(e -> "- " + e) .collect(Collectors.joining("\n" )); String response = chatClient.prompt() .system(ACTOR_PROMPT) .user(""" ## 任务 %s ## 历史经验教训 %s """ .formatted(task, experienceText)) .call() .content(); return parseActorResult(response); } }@Data public class ActorResult { private String thinking; private String execution; private List<String> actions; }
2.2 Evaluator(评估者) Evaluator 负责判断执行结果是否达标,可以用 LLM 评估、规则评估或测试用例评估。
评估方式对比
评估方式
适用场景
优点
缺点
LLM 评估
开放性任务
灵活
可能不准确
规则评估
有明确标准的任务
准确
不够灵活
测试用例
代码生成任务
客观准确
需要预先准备
人工评估
高风险决策
最准确
成本高
LLM 评估 Prompt 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 你是一个严格的任务评估专家。 ## 原始任务 {task} ## 执行结果 {result} ## 评估维度1. **正确性**:结果是否正确完成了任务要求2. **完整性**:是否覆盖了所有要求3. **质量**:结果的质量如何 ## 输出格式(JSON) { "success" : true/false, "score" : 0 -100 , "dimensions" : { "correctness" : {"score" : 0 -100 , "feedback" : "..." }, "completeness" : {"score" : 0 -100 , "feedback" : "..." }, "quality" : {"score" : 0 -100 , "feedback" : "..." } }, "overall_feedback" : "总体评价" , "specific_issues" : ["问题1" , "问题2" ] }
Java 实现 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 @Component public class Evaluator { private final ChatClient chatClient; public EvaluationResult evaluateWithLLM (String task, String result) { String response = chatClient.prompt() .system(EVALUATOR_PROMPT) .user("任务:%s\n\n结果:%s" .formatted(task, result)) .call() .content(); return parseJson(response, EvaluationResult.class); } public EvaluationResult evaluateCode (String code, List<TestCase> testCases) { int passed = 0 ; List<String> failures = new ArrayList <>(); for (TestCase tc : testCases) { try { Object result = executeCode(code, tc.getInput()); if (tc.getExpected().equals(result)) { passed++; } else { failures.add("输入 %s 期望 %s 实际 %s" .formatted(tc.getInput(), tc.getExpected(), result)); } } catch (Exception e) { failures.add("输入 %s 执行异常:%s" .formatted(tc.getInput(), e.getMessage())); } } int score = passed * 100 / testCases.size(); return EvaluationResult.builder() .success(score >= 100 ) .score(score) .specificIssues(failures) .build(); } }@Data @Builder public class EvaluationResult { private boolean success; private int score; private Map<String, DimensionScore> dimensions; private String overallFeedback; private List<String> specificIssues; }
2.3 Self-Reflection(自反思) Self-Reflection 是 Reflexion 的核心,负责深度分析失败原因并总结经验。
反思 Prompt 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 你是一个善于反思和总结的分析专家。 ## 任务目标 {task} ## 执行过程 {execution_trace} ## 执行结果 {result} ## 评估反馈 {evaluation_feedback} ## 请进行深度反思 ### 1. 问题诊断 - 这次尝试具体哪里出了问题? - 是理解错误、方法错误还是执行错误? ### 2. 根因分析 - 问题的根本原因是什么? - 是否有错误的假设? ### 3. 经验总结 - 从这次失败中学到了什么? - 下次遇到类似问题应该如何处理? ### 4. 改进方案 - 具体的改进措施是什么? - 下一次尝试应该怎么做? ## 输出格式(JSON) { "problem_diagnosis" : "问题诊断" , "root_cause" : "根本原因" , "lessons_learned" : [ "经验1:..." , "经验2:..." ], "improvement_plan" : "改进方案" , "confidence" : 0 -100 }
Java 实现 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 @Component public class SelfReflection { private final ChatClient chatClient; public ReflectionResult reflect (ReflectionInput input) { String response = chatClient.prompt() .system(REFLECTION_PROMPT) .user(""" ## 任务目标 %s ## 执行过程 %s ## 执行结果 %s ## 评估反馈 %s """ .formatted( input.getTask(), input.getExecutionTrace(), input.getResult(), input.getEvaluationFeedback() )) .call() .content(); return parseJson(response, ReflectionResult.class); } }@Data public class ReflectionInput { private String task; private String executionTrace; private String result; private String evaluationFeedback; }@Data public class ReflectionResult { private String problemDiagnosis; private String rootCause; private List<String> lessonsLearned; private String improvementPlan; private int confidence; public String toExperienceText () { return """ 问题:%s 原因:%s 教训:%s 改进:%s """ .formatted( problemDiagnosis, rootCause, String.join(";" , lessonsLearned), improvementPlan ); } }
2.4 Memory(记忆模块) Memory 负责存储和检索经验,支持短期记忆(当前任务)和长期记忆(跨任务)。
记忆架构 graph TB
subgraph Memory
A[短期记忆] --> C[经验检索]
B[长期记忆] --> C
end
D[当前任务经验] --> A
E[历史任务经验] --> B
C --> F[相关经验]
Java 实现 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 @Service public class ReflexionMemory { private final List<String> shortTermMemory = new ArrayList <>(); private final VectorStore vectorStore; public void addShortTerm (String experience) { shortTermMemory.add(experience); } public void clearShortTerm () { shortTermMemory.clear(); } public void addLongTerm (String experience, String taskType) { Document doc = new Document ( experience, Map.of( "task_type" , taskType, "timestamp" , System.currentTimeMillis(), "source" , "reflexion" ) ); vectorStore.add(List.of(doc)); } public List<String> recall (String task, int topK) { List<String> experiences = new ArrayList <>(); experiences.addAll(shortTermMemory); List<Document> docs = vectorStore.similaritySearch( SearchRequest.query(task).withTopK(topK) ); for (Document doc : docs) { experiences.add(doc.getContent()); } return experiences; } public List<String> getShortTermMemory () { return new ArrayList <>(shortTermMemory); } }
三、完整实现 3.1 Reflexion Agent 主类 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 @Service @Slf4j public class ReflexionAgent { private final Actor actor; private final Evaluator evaluator; private final SelfReflection selfReflection; private final ReflexionMemory memory; private static final int MAX_RETRIES = 3 ; private static final int SUCCESS_THRESHOLD = 80 ; public ReflexionResponse execute (String task) { log.info("开始执行任务: {}" , task); List<AttemptRecord> attempts = new ArrayList <>(); for (int attempt = 1 ; attempt <= MAX_RETRIES; attempt++) { log.info("=== 第 {} 次尝试 ===" , attempt); List<String> experiences = memory.recall(task, 5 ); log.info("检索到 {} 条相关经验" , experiences.size()); ActorResult actorResult = actor.execute(task, experiences); log.info("执行完成" ); EvaluationResult evalResult = evaluator.evaluateWithLLM( task, actorResult.getExecution() ); log.info("评估得分: {}" , evalResult.getScore()); attempts.add(new AttemptRecord (attempt, actorResult, evalResult)); if (evalResult.isSuccess() || evalResult.getScore() >= SUCCESS_THRESHOLD) { log.info("✅ 任务成功完成" ); if (evalResult.getScore() == 100 ) { saveSuccessExperience(task, actorResult); } memory.clearShortTerm(); return ReflexionResponse.success( actorResult.getExecution(), attempts ); } log.info("开始反思..." ); ReflectionResult reflection = selfReflection.reflect( ReflectionInput.builder() .task(task) .executionTrace(actorResult.getThinking()) .result(actorResult.getExecution()) .evaluationFeedback(evalResult.getOverallFeedback()) .build() ); log.info("反思结果: {}" , reflection.getLessonsLearned()); String experience = reflection.toExperienceText(); memory.addShortTerm(experience); if (reflection.getConfidence() > 70 ) { memory.addLongTerm(experience, classifyTask(task)); } } log.warn("❌ 达到最大重试次数,任务未能成功完成" ); memory.clearShortTerm(); return ReflexionResponse.failure( attempts.get(attempts.size() - 1 ).getActorResult().getExecution(), attempts ); } private String classifyTask (String task) { if (task.contains("代码" ) || task.contains("实现" ) || task.contains("函数" )) { return "coding" ; } else if (task.contains("分析" ) || task.contains("解释" )) { return "analysis" ; } else { return "general" ; } } private void saveSuccessExperience (String task, ActorResult result) { String experience = "成功经验:任务「%s」的有效解决方案是:%s" .formatted(task, result.getThinking()); memory.addLongTerm(experience, classifyTask(task)); } }@Data @AllArgsConstructor public class AttemptRecord { private int attemptNumber; private ActorResult actorResult; private EvaluationResult evaluationResult; }@Data public class ReflexionResponse { private boolean success; private String result; private List<AttemptRecord> attempts; private int totalAttempts; public static ReflexionResponse success (String result, List<AttemptRecord> attempts) { ReflexionResponse response = new ReflexionResponse (); response.success = true ; response.result = result; response.attempts = attempts; response.totalAttempts = attempts.size(); return response; } public static ReflexionResponse failure (String result, List<AttemptRecord> attempts) { ReflexionResponse response = new ReflexionResponse (); response.success = false ; response.result = result; response.attempts = attempts; response.totalAttempts = attempts.size(); return response; } }
3.2 Controller 层 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 @RestController @RequestMapping("/api/reflexion") public class ReflexionController { private final ReflexionAgent reflexionAgent; @PostMapping("/execute") public ReflexionResponse execute (@RequestBody TaskRequest request) { return reflexionAgent.execute(request.getTask()); } @GetMapping("/memory/stats") public Map<String, Object> getMemoryStats () { return Map.of( "shortTermCount" , memory.getShortTermMemory().size(), "longTermCount" , memory.getLongTermCount() ); } }@Data public class TaskRequest { private String task; }
四、实战案例 4.1 代码生成任务 任务 :”用 Java 实现一个 LRU 缓存,支持 get 和 put 操作,时间复杂度 O(1)”
第 1 次尝试 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 Actor 输出:public class LRUCache { private Map<Integer, Integer> cache; private int capacity; public LRUCache (int capacity) { this .cache = new HashMap <>(); this .capacity = capacity; } public int get (int key ) { return cache.getOrDefault (key , -1 ); } public void put (int key , int value) { if (cache.size () >= capacity) { cache.remove (cache.keySet ().iterator ().next ()); } cache.put (key , value); } } Evaluator 反馈: { "success" : false , "score" : 40 , "specific_issues" : [ "HashMap 无法保证 LRU 顺序" , "删除的不一定是最久未使用的元素" , "get 操作没有更新访问顺序" ] } Self-Reflection: { "problem_diagnosis" : "使用了普通 HashMap,无法维护访问顺序" , "root_cause" : "没有考虑 LRU 需要记录访问顺序的特性" , "lessons_learned" : [ "LRU 缓存需要能记录访问顺序的数据结构" , "可以使用 LinkedHashMap 或自己实现双向链表 + HashMap" ], "improvement_plan" : "使用 LinkedHashMap 并重写 removeEldestEntry 方法" }
第 2 次尝试 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Actor 输出(带经验):public class LRUCache extends LinkedHashMap<Integer, Integer> { private int capacity; public LRUCache (int capacity) { super (capacity, 0.75 f, true ); this .capacity = capacity; } public int get (int key) { return super .getOrDefault (key, -1 ) ; } public void put (int key, int value) { super .put(key, value); } @Override protected boolean removeEldestEntry (Map.Entry eldest) { return size() > capacity; } } Evaluator 反馈: { "success" : true , "score" : 95, "overall_feedback" : "正确实现了 LRU 缓存,使用 LinkedHashMap 简洁高效" } ✅ 成功完成
4.2 数据分析任务 任务 :”分析这段 SQL 的性能问题并给出优化建议”
1 2 3 4 5 SELECT * FROM orders oLEFT JOIN users u ON o.user_id = u.idLEFT JOIN products p ON o.product_id = p.idWHERE o.created_at > '2024-01-01' ORDER BY o.created_at DESC ;
反思改进过程 1 2 3 4 5 6 7 8 9 10 11 12 13 第 1 次:只关注了索引问题 经验:SQL 优化要全面考虑索引、查询字段、连接方式等 第 2 次:增加了字段选择建议 经验:SELECT * 是常见性能问题,应明确指出 第 3 次:完整分析- 索引建议:created_at, user_id, product_id- 查询优化:避免 SELECT * - 连接优化:考虑是否需要 LEFT JOIN - 分页建议:大数据量需要分页 ✅ 成功
五、进阶优化 5.1 分层反思 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 public class HierarchicalReflection { public QuickFix shallowReflect (EvaluationResult eval) { } public DeepAnalysis deepReflect (List<AttemptRecord> attempts) { } public MetaReflection metaReflect (List<ReflectionResult> reflections) { } }
5.2 经验蒸馏 定期将碎片化经验整合为结构化知识:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 @Scheduled(cron = "0 0 2 * * ?") public void distillExperiences () { List<Document> recentExperiences = memory.getRecentExperiences(100 ); Map<String, List<Document>> clusters = clusterExperiences(recentExperiences); for (Map.Entry<String, List<Document>> entry : clusters.entrySet()) { String summary = llm.summarize(entry.getValue()); memory.addDistilledKnowledge(entry.getKey(), summary); } }
5.3 经验迁移 跨任务类型迁移通用经验:
1 2 3 4 5 6 7 8 9 10 11 12 public List<String> transferExperiences (String newTask) { String taskType = classifyTask(newTask); List<String> sameTypeExp = memory.getByTaskType(taskType); List<String> universalExp = memory.getUniversalExperiences(); return filterTransferableExperiences(newTask, sameTypeExp, universalExp); }
六、最佳实践 6.1 评估器设计
任务类型
推荐评估方式
代码生成
测试用例 + 静态分析
文本写作
LLM 评估 + 规则检查
数学计算
结果验证
信息检索
人工抽检 + LLM 评估
6.2 反思深度控制 1 2 3 4 5 6 7 8 9 10 public ReflectionResult adaptiveReflect (int attemptCount, ...) { if (attemptCount == 1 ) { return quickReflect(...); } else if (attemptCount == 2 ) { return normalReflect(...); } else { return deepReflect(...); } }
6.3 避免过度反思 1 2 3 4 5 6 7 8 9 10 public boolean isStuck (List<ReflectionResult> reflections) { if (reflections.size() < 2 ) return false ; String last = reflections.get(reflections.size() - 1 ).toExperienceText(); String prev = reflections.get(reflections.size() - 2 ).toExperienceText(); return calculateSimilarity(last, prev) > 0.8 ; }
七、迭代控制与循环避免 7.1 迭代次数控制机制 基础控制:最大重试次数 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 @Service @Slf4j public class ReflexionAgent { private static final int MAX_RETRIES = 3 ; private static final int SUCCESS_THRESHOLD = 80 ; public ReflexionResponse execute (String task) { List<AttemptRecord> attempts = new ArrayList <>(); for (int attempt = 1 ; attempt <= MAX_RETRIES; attempt++) { log.info("=== 第 {} 次尝试 ===" , attempt); if (evalResult.isSuccess() || evalResult.getScore() >= SUCCESS_THRESHOLD) { log.info("✅ 任务成功完成" ); return ReflexionResponse.success(result, attempts); } } log.warn("❌ 达到最大重试次数,任务未能成功完成" ); return ReflexionResponse.failure(lastResult, attempts); } }
思路分析 :
使用 for 循环而不是 while,明确限制迭代次数
成功时立即返回,不浪费资源
达到最大次数自动停止,不会无限循环
参数说明
参数
说明
建议值
影响
MAX_RETRIES
最大重试次数
3-5
次数越多,成功率越高但成本越高
SUCCESS_THRESHOLD
成功评分阈值
80-90
阈值越高,要求越严格
MAX_REFLECTION_DEPTH
最大反思深度
2-3
控制反思的详细程度
7.2 多层次的循环控制 方式1:基于成功率的动态控制 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 @Service @Slf4j public class AdaptiveReflexionAgent { private static final int MAX_RETRIES = 5 ; private static final int SUCCESS_THRESHOLD = 80 ; private static final double MIN_IMPROVEMENT_RATE = 0.05 ; public ReflexionResponse execute (String task) { List<AttemptRecord> attempts = new ArrayList <>(); int lastScore = 0 ; for (int attempt = 1 ; attempt <= MAX_RETRIES; attempt++) { int currentScore = evalResult.getScore(); if (currentScore >= SUCCESS_THRESHOLD) { log.info("✅ 任务成功" ); return ReflexionResponse.success(result, attempts); } double improvementRate = (double )(currentScore - lastScore) / lastScore; if (attempt > 2 && improvementRate < MIN_IMPROVEMENT_RATE) { log.warn("⚠️ 改进不足 {}%,停止迭代" , MIN_IMPROVEMENT_RATE * 100 ); return ReflexionResponse.failure(result, attempts); } lastScore = currentScore; } log.warn("❌ 达到最大重试次数" ); return ReflexionResponse.failure(lastResult, attempts); } }
思路分析 :
不仅限制次数,还检查改进幅度
如果连续改进不足,提前停止
避免在无效循环中浪费资源
方式2:基于反思相似度的控制 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 @Service @Slf4j public class SmartReflexionAgent { private static final int MAX_RETRIES = 5 ; private static final double SIMILARITY_THRESHOLD = 0.8 ; public ReflexionResponse execute (String task) { List<AttemptRecord> attempts = new ArrayList <>(); List<ReflectionResult> reflections = new ArrayList <>(); for (int attempt = 1 ; attempt <= MAX_RETRIES; attempt++) { if (evalResult.isSuccess()) { return ReflexionResponse.success(result, attempts); } ReflectionResult reflection = selfReflection.reflect(input); reflections.add(reflection); if (isStuckInLoop(reflections)) { log.warn("⚠️ 检测到无效循环,停止迭代" ); return ReflexionResponse.failure(result, attempts); } } return ReflexionResponse.failure(lastResult, attempts); } private boolean isStuckInLoop (List<ReflectionResult> reflections) { if (reflections.size() < 2 ) return false ; ReflectionResult last = reflections.get(reflections.size() - 1 ); ReflectionResult prev = reflections.get(reflections.size() - 2 ); double similarity = calculateSimilarity( last.toExperienceText(), prev.toExperienceText() ); log.info("反思相似度: {}" , similarity); return similarity > SIMILARITY_THRESHOLD; } private double calculateSimilarity (String text1, String text2) { int commonChars = 0 ; int maxLen = Math.max(text1.length(), text2.length()); for (char c : text1.toCharArray()) { if (text2.contains(String.valueOf(c))) { commonChars++; } } return (double ) commonChars / maxLen; } }
思路分析 :
监测反思内容的相似度
如果反思重复,说明陷入死循环
提前停止避免浪费资源
7.3 循环风险识别 常见的无限循环场景 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 private static final int SUCCESS_THRESHOLD = 100 ; private static final int SUCCESS_THRESHOLD = 80 ; void reflect () { }ReflectionResult reflection = selfReflection.reflect(input);if (reflection.getImprovementPlan() == null || reflection.getImprovementPlan().isEmpty()) { log.warn("反思未产生改进方案,停止迭代" ); return ReflexionResponse.failure(result, attempts); }void execute (String task, List<String> experiences) { }String prompt = """ ## 历史经验教训(必须参考!) %s ## 重要提示 上次失败的原因是:%s 请务必避免重复这个错误! """ .formatted(experiences, lastFailureReason);
思路分析 :
识别循环的根本原因
在设计阶段就预防循环
不是简单地增加重试次数
7.4 完整的循环控制实现 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 @Service @Slf4j public class RobustReflexionAgent { private static final int MAX_RETRIES = 5 ; private static final int SUCCESS_THRESHOLD = 80 ; private static final double MIN_IMPROVEMENT_RATE = 0.05 ; private static final double SIMILARITY_THRESHOLD = 0.8 ; private static final int MAX_CONSECUTIVE_FAILURES = 2 ; public ReflexionResponse execute (String task) { List<AttemptRecord> attempts = new ArrayList <>(); List<ReflectionResult> reflections = new ArrayList <>(); int lastScore = 0 ; int consecutiveFailures = 0 ; for (int attempt = 1 ; attempt <= MAX_RETRIES; attempt++) { log.info("=== 第 {} 次尝试 ===" , attempt); ActorResult actorResult = actor.execute(task, memory.recall(task, 5 )); EvaluationResult evalResult = evaluator.evaluateWithLLM(task, actorResult.getExecution()); int currentScore = evalResult.getScore(); attempts.add(new AttemptRecord (attempt, actorResult, evalResult)); if (currentScore >= SUCCESS_THRESHOLD) { log.info("✅ 任务成功完成" ); return ReflexionResponse.success(actorResult.getExecution(), attempts); } if (attempt > 1 ) { double improvementRate = (double )(currentScore - lastScore) / Math.max(lastScore, 1 ); if (improvementRate < MIN_IMPROVEMENT_RATE) { consecutiveFailures++; log.warn("改进不足: {}%" , improvementRate * 100 ); if (consecutiveFailures >= MAX_CONSECUTIVE_FAILURES) { log.warn("⚠️ 连续 {} 次改进不足,停止迭代" , MAX_CONSECUTIVE_FAILURES); return ReflexionResponse.failure(actorResult.getExecution(), attempts); } } else { consecutiveFailures = 0 ; } } ReflectionResult reflection = selfReflection.reflect( ReflectionInput.builder() .task(task) .executionTrace(actorResult.getThinking()) .result(actorResult.getExecution()) .evaluationFeedback(evalResult.getOverallFeedback()) .build() ); reflections.add(reflection); if (isStuckInLoop(reflections)) { log.warn("⚠️ 检测到无效循环,停止迭代" ); return ReflexionResponse.failure(actorResult.getExecution(), attempts); } if (reflection.getImprovementPlan() == null || reflection.getImprovementPlan().isEmpty()) { log.warn("⚠️ 反思未产生改进方案,停止迭代" ); return ReflexionResponse.failure(actorResult.getExecution(), attempts); } memory.addShortTerm(reflection.toExperienceText()); if (reflection.getConfidence() > 70 ) { memory.addLongTerm(reflection.toExperienceText(), classifyTask(task)); } lastScore = currentScore; } log.warn("❌ 达到最大重试次数" ); return ReflexionResponse.failure(attempts.get(attempts.size() - 1 ).getActorResult().getExecution(), attempts); } private boolean isStuckInLoop (List<ReflectionResult> reflections) { if (reflections.size() < 2 ) return false ; String last = reflections.get(reflections.size() - 1 ).toExperienceText(); String prev = reflections.get(reflections.size() - 2 ).toExperienceText(); double similarity = calculateSimilarity(last, prev); return similarity > SIMILARITY_THRESHOLD; } private double calculateSimilarity (String text1, String text2) { return 0.0 ; } }
思路分析 :
多层次防护 :次数限制 + 改进率检查 + 循环检测 + 反思质量检查
连续失败计数 :允许偶尔的改进不足,但连续多次就停止
综合判断 :不依赖单一指标,多个条件共同决定是否继续
7.5 监控和日志 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 @Service @Slf4j public class MonitoredReflexionAgent { public ReflexionResponse execute (String task) { long startTime = System.currentTimeMillis(); List<AttemptRecord> attempts = new ArrayList <>(); try { for (int attempt = 1 ; attempt <= MAX_RETRIES; attempt++) { long attemptStart = System.currentTimeMillis(); long attemptDuration = System.currentTimeMillis() - attemptStart; log.info("第 {} 次尝试耗时: {}ms,得分: {}" , attempt, attemptDuration, evalResult.getScore()); } } finally { long totalDuration = System.currentTimeMillis() - startTime; log.info("任务总耗时: {}ms,总尝试次数: {}" , totalDuration, attempts.size()); logStatistics(attempts); } return result; } private void logStatistics (List<AttemptRecord> attempts) { if (attempts.isEmpty()) return ; int maxScore = attempts.stream() .mapToInt(a -> a.getEvaluationResult().getScore()) .max() .orElse(0 ); double avgScore = attempts.stream() .mapToInt(a -> a.getEvaluationResult().getScore()) .average() .orElse(0 ); log.info("统计信息 - 最高分: {},平均分: {:.2f},总次数: {}" , maxScore, avgScore, attempts.size()); } }
思路分析 :
记录每次尝试的耗时和得分
便于分析性能和改进效果
帮助调整控制参数
相关笔记
Agent架构模式详解
Prompt Engineering详解
大模型学习路线