Projects

Sony Internship-LLM/AI

I have participated in and been responsible for implementing the feeding task instructions for a robot based on the Large Language Models (LLMs).
My responsibility included completing Prompt Engineering, which involved designing the ReAct framework. This framework enables LLMs to interact with external tools to obtain additional information and generate inference paths and task-specific operations in an interleaved manner. By decomposing language instructions, I aimed to provide more reliable and practical responses.
To achieve this, I utilized Grounded-Segment-Anything to implement the localization of specific objects in two-dimensional images and performed three-dimensional reconstruction of the coordinates. I further encapsulated these functions into an skill library.
Our designed system can learn new skills from LLM’s historical derivations and feedback, automatically adding these new skills to its skill vector library. During task completion, it will automatically select the required skills from the vector database based on relevance. It also incorporates a self-verification and self-correction module, which detects grammar and logic errors, providing feedback to LLM to revise solutions accordingly. Additionally, it utilizes environmental feedback to determine the successful implementation of tasks.

A cloud storage system with a client-server architecture using the Qt framework and C++.

We developed a cloud storage system using the Qt framework and C++. Our system employed a client-server architecture utilizing Sqlite3 for user data storage. The system successfully achieved basic functions common in cloud storage services. Users can register and log in securely, manage contacts, and engage in private and group chats. The system also allows users to upload, download, rename, delete, and share files. Users can also create, delete, rename, and navigate folders.
The files uploaded by users to the server are encrypted and cannot be directly accessed from the server. This implementation uses AES encryption and decryption functions provided by OpenSSL to ensure the security of the files.
For thread management, we chose multithreading instead of a thread pool because it offers greater flexibility in thread management. Each thread can be independently created and controlled, making it ideal for tasks with varying execution times or resource needs. For simple, occasional tasks, creating threads on demand avoids the complexity and resource overhead of managing a thread pool. This approach also allows for immediate thread destruction, preventing idle threads that may arise in a thread pool.
Additionally, to manage system resources, we will implement a mechanism to clean up inactive connections using a list instead of a max-heap, as mentioned in the proposal. On top of that, we use the MD5 algorithm for the integrity check instead of SHA-256 as mentioned in the proposal.

R&D Project: Led a data migration and analysis project, utilizing JavaScript to implement AI-driven sentiment analysis and risk warning systems. Spearheaded the completion of the quality inspection automation process, with key metrics regularly pushed to stakeholders, enhancing the stability and efficiency of business systems.

A project for extracting handwriting style from images

With our OCR-based pipeline, we successfully collected tens of thousands of word-level annotated images, enriching the existing handwriting dataset.
With the LSTM-based style encoder, the modified HiGAN+ model successfully generated realistic handwriting images with desired calligraphic styles.

I am responsible for the development of a web-based interactive and interpretable visual interface for AI audio generation, designed to provide a user-friendly interface for the review and inspection of speech waveforms and spectrograms, and to show the clip probability at each point of the audio.
In this project, I gained extensive experience in web interface development and became proficient in the Flask framework. By integrating front-end and back-end technologies, I successfully created a powerful and user-friendly interface.
By delving into the realm of audio processing, I successfully applied this knowledge to interface development, ensuring the accuracy of audio data and optimizing its visualization effects.
(Click the title to get more info.)

I participated in the GPT training project for the Shenzhen Talent Bureau, where my responsibilities included preprocessing and cleaning the training data to ensure its quality and reliability.
Using Python web scraping techniques, I collected relevant textual information and built a complete and accurate training dataset, laying a solid foundation for model training and evaluation.
I also verified and evaluated the test results generated by the Shenzhen Talent GPT. Through manual review and correction, I identified and annotated errors and inaccuracies in the generated text, providing improvement suggestions to optimize the model results.

Research Question:

Verify the impact of corporate ESG scores on stock return volatility in China’s A-share market.
Introduce ESG Scores to improve the Markowitz Mean Variance Model and study the impact of ESG on mean variance investors’ preference.

I actively participated in the algorithm development for the COSCO Shipping intelligent scheduling project and closely monitored the business requirements. Utilizing a path generation and optimization approach, I successfully designed an interactive algorithm that provided efficient and accurate scheduling results for the pulp ship transportation business. Additionally, I wrote a detailed documentation explaining the algorithm and its implementation (Click the title to get more info.).
This experience enhanced my ability to apply theoretical models to complex practical problems and improved my project communication and collaboration skills.