Random forest assessment of correlation between environmental factors and genetic differentiation of populations: Case of marine mussels Mytilus

New publication by Tomasz Kijewski, Malgorzata Zbawicka, Jakob Strand, Hans Kautsky, Jonne Kotta, Merli Rätsep, Roman Wenne.

2019.01.28 | Peter Schmidt Mikkelsen


The novel machine learning technique Random Forest (RF) was used to test if the genetic differentiation of populations of marine species may be related to any of the key environmental variables known to shape species distributions. The study was performed in North and Baltic Sea characterized by strong gradients of environmental factors and almost continuous distributions of Mytilus mussel populations. Assessment of the species identity was performed using four nuclear DNA markers, and previously published single nucleotide polymorphism (SNP) data. A general pattern of cline variation was observed with increasing Mytilus trossulus share towards the eastern Baltic Sea. Average allele share rose to 61% in Höga Kusten, Gulf of Bothnia. All Baltic Sea samples revealed a strong introgression of Mytilus edulis and a limited introgression of M. trossulus through the Danish Straits.

The studied environmental variables described 67 and 68% of the variability in the allele frequencies of M. edulis and M. trossulus. Salinity defined over 50% of the variability in the gene frequencies of the studied Mytilus spp. populations. Changes along this environmental gradient were not gradual but instead a significant shift from gene dominance was found at a salinity of 12 PSU. Water temperature and the trophic status of the sea area had only moderate association with the gene frequencies. The obtained results showed that the novel machine learning technique can be successfully used for finding correlations between genetic differentiation of populations and environmental variables and for defining the functional form of these linkages.


Arctic Research Centre