Sony Internship-LLM/AI

ReAct Framework - Technologies and Components

1. RAG (Retrieval-Augmented Generation)

📌 Purpose:

Retrieve relevant skills from the LangChain Chatchat vector skill library.
Combine the retrieved skills with the LLM to generate the final task planning and execution strategies.
Improve the accuracy of information during robot interactions.

📌 Details:

The framework uses vector databases to store skills and assist the LLM in generating optimal strategies by querying the most relevant skills.
In complex task handling, the LLM optimizes itself by combining historical interaction data, aligning with RAG’s idea of “combining retrieval with generation”.

2. ReAct (Reasoning + Acting)

📌 Purpose:

Allow the LLM not only to answer questions but also to perform reasoning, planning, and execution.
The robot can think about its current state, query external information, and then act accordingly.

📌 Details:

“ReAct” refers both to the framework’s name and an interaction mode for the LLM.
The LLM first performs Reasoning, then Acts.
Example:
- The robot reasons about the target position (combining 3D vision data).
- Queries the LangChain vector skill library to find the appropriate grabbing method.
- Executes the grabbing task.

📌 ReAct Paper (Google DeepMind):

ReAct combines Chain-of-Thought (CoT) reasoning and Action-based execution, making it suitable for task planning and autonomous decision-making.

3. 3D Vision + Grounded Segmentation

📌 Purpose:

The robot uses a depth camera and 3D reconstruction to help the LLM understand environmental information.
It combines Grounding DINO / SAM (Segment Anything Model) for semantic segmentation.

📌 Details:

The robot retrieves information about the target object’s position and then lets the LLM plan the path.
Example:
- The robot may ask, “Where is the red object on the table?”
- The vision module segments the object and provides the data to the LLM.
- The LLM combines the 3D coordinates with task planning and provides operation instructions.

4. LangChain + Vector Database

📌 Purpose:

LangChain serves as the Prompt Management and Memory Storage component for the LLM.
A vector database (FAISS / ChromaDB / Milvus) stores robot skills and user interaction history.

📌 Details:

LangChain enables the LLM to remember tasks previously performed by the robot, avoiding redundant calculations.
The robot automatically manages the skill library, supporting:
- Task Screening
- Skill Summarization
- Self-verification

5. Task Decomposition

📌 Purpose:

The robot can break down complex tasks into smaller subtasks for gradual completion.
Example:
- 1. Retrieve object coordinates 🡪 2. Calculate grabbing angle 🡪 3. Perform grabbing 🡪 4. Place the object.

📌 Details:

The LLM, combined with LangChain, manages tasks to ensure logical execution.
When faced with new tasks, the robot can retrieve similar tasks from the skill library and generate an appropriate strategy.

6. Self-Learning & Feedback Loop

📌 Purpose:

After task execution, the robot automatically summarizes its experience to improve future decision-making.
With LangChain Memory, the robot can remember past mistakes and optimize strategies.

📌 Details:

Task feedback (success or failure) 🡪 Update skill library 🡪 Optimize LLM prompts.
Over time, the robot becomes smarter in the same scenario through long-term interaction memory.

Summary: Technologies in ReAct Framework

Technology	Role in ReAct Framework
RAG (Retrieval-Augmented Generation)	Retrieves relevant skills from the vector skill library to improve task execution accuracy.
ReAct (Reasoning + Acting)	Enables the robot to reason and execute tasks, facilitating autonomous decision-making.
3D Vision + Grounded Segmentation	Helps the LLM understand the robot’s environment and plan tasks accordingly.
LangChain + Vector Database	Stores task information, historical interactions, and manages skills.
Task Decomposition	Breaks complex tasks into smaller steps to improve execution efficiency.
Self-Learning (Feedback Loop)	Optimizes robot skills based on task performance and feedback, enhancing future task success.

Share on

Twitter Facebook LinkedIn