Sources
OceanGPT adheres to the principle of open source and openness, promoting research on ocean large-scale models through open instruction datasets and open source models.
Models
OceanGPT-o-7B
OceanGPT-o-7B-v0.1 was trained on bilingual corpora in the marine field based on Qwen2.5-VL-7B-Instruction.
OceanGPT-coder-7B
OceanGPT-coder-7B-v0.1 was trained on its own bilingual code corpus in the marine field based on Qwen2.5-Coder-7B Instruction.
Oceangpt-basic-14B-v0.1
Oceangpt-basic-14B-v0.1 was trained on marine domain corpora based on Qwen1.5-14B. Attention: This model is an early version and its performance is no longer as good as the latest model.
Oceangpt-basic-7B-v0.2
Oceangpt-basic-7B-v0.2 was trained on marine domain corpus based on Qwen2. Attention: This model is an early version and its performance is no longer as good as the latest model.
Oceangpt-basic-2B-v0.1
Oceangpt-basic-2B-v0.1 was trained on ocean domain corpus based on MiniCPM-2B. Attention: This model is an early version and its performance is no longer as good as the latest model.
Instruction Data
OceanInstruct-v0.2
Approximately 50K bilingual text instruction data in the field of marine science, constructed based on publicly available corpora.
OceanInstruct-v0.1
About 10K bilingual text instruction data in the field of ocean based on publicly available corpora. Note: This instruction data is only a partial data used by early models.
limitation
1. The model may have hallucination issues, please carefully identify them.
Due to limited computing resources, OceanGPT encoder currently only supports natural language interpretation and generation of certain types of sonar and ocean science images, while OceanGPT encoder currently only supports MOOS code generation.
3. We have not yet optimized the identity of the model, so the generated identity information may be similar to Qwen, MiniCPM, LLaMA, or GPT series models.
4. The output of the model is affected by prompt words, which may result in inconsistent results generated multiple times.
5. Some instruction data is synthesized data from a large model, which may contain errors.