Sony Internship-LLM/AI
- I have participated in and been responsible for implementing the feeding task instructions for a robot based on the Large Language Models (LLMs).
- My responsibility included completing Prompt Engineering, which involved designing the ReAct framework. This framework enables LLMs to interact with external tools to obtain additional information and generate inference paths and task-specific operations in an interleaved manner. By decomposing language instructions, I aimed to provide more reliable and practical responses.
- To achieve this, I utilized Grounded-Segment-Anything to implement the localization of specific objects in two-dimensional images and performed three-dimensional reconstruction of the coordinates. I further encapsulated these functions into an skill library.
- Our designed system can learn new skills from LLM’s historical derivations and feedback, automatically adding these new skills to its skill vector library. During task completion, it will automatically select the required skills from the vector database based on relevance. It also incorporates a self-verification and self-correction module, which detects grammar and logic errors, providing feedback to LLM to revise solutions accordingly. Additionally, it utilizes environmental feedback to determine the successful implementation of tasks.