Grammar and plants? New publication on plant communities patterns
A major advance in ecological modelling and conservation was published in Nature Plants, supported by OBSGESSION and two other Horizon Europe projects – MAMBO and GUARDEN.
This new study, “Learning the syntax of plant assemblages,” introduces an innovative way to study plant communities using methods borrowed from natural-language processing.
At its core, the research treats plant communities like sentences - where species are “words,” and their order (by abundance) and occurrence encode hidden relationships. The authors developed a deep-learning model called Pl@ntBERT, which learns the “syntax” of these plant-species sequences from millions of records across Europe and neighbouring regions.
Pl@ntBERT can predict species that likely belong to a community even if they were not recorded, outperforming standard co-occurrence methods by nearly 16.5%, and rivaling more conventional neural networks by relatively 6.5%.
The model assigns habitat types to plant assemblages more accurately than expert-based systems and even outperforms typical tabular deep-learning models. With a “vocabulary” of over 10,000 plant species and training on 1,4 million vegetation plots containing 29 million species occurrences, the model offers a comprehensive, data-driven foundation for large-scale biodiversity mapping.
This approach could reshape how ecologists and conservationists map biodiversity, track ecosystem changes, restore degraded habitats, or monitor invasive species.
The study is part of a growing trend of applying artificial intelligence - particularly large language models (LLMs) - beyond human language, into ecological data. By treating species as “tokens” in ecological “sentences,” researchers gain a more nuanced, relational understanding of ecosystems. The authors argue this could redefine how we model, monitor, and manage biodiversity in the face of rapid environmental change.
Read the full artcile here.