Covering approximately 71% of the Earth’s surface, the ocean plays a crucial role in global climate regulation, weather patterns, biodiversity, and human economic development. Ocean science research focuses on the natural characteristics of the ocean, its changing patterns, and the theories, methods, and applications related to the development and utilization of ocean resources. Therefore, we propose a large language model, OceanGPT, designed specifically for the ocean domain. It can handle various ocean science tasks, including Q&A and content generation. Additionally, we attempt to validate the potential of the large language model in simulating underwater robot operations, further exploring the realization of model-driven underwater embodied intelligence.
Evolutionary Data Synthesis Agent: Specifically, the agent employs two collaborative strategies: firstly, supplementing and expanding background knowledge of seed samples, and secondly, refining analysis to enhance and improve the knowledge contained within seed data.
Fine-tuned Literature Reading Agent: Initially fine-tuning a large language model to develop an intelligent model specialized for literature extraction, enabling the agent to extract high-quality sentences from vast ocean literature.
Quality Assurance Audit Agent: Predefining specific syntactic and semantic rules related to ocean science, constructing this agent through prompting to filter data and ensure the quality of generated data.
We trained OceanGPT based on open-source models (such as Qwen, LLaMA, MiniCPM, etc.) and instructions generated by the DoInstruct framework.