Welcome to the MBSCLoc home page.
Subcellular localization of mRNA is a common and crucial mechanism that provides precise and effective control over protein translation processes. Furthermore, this process plays a significant role in various cellular events. The subcellular localization of mRNA has greatly promoted the study of mRNA function. However, the current methods and techniques for subcellular localization are limited. There are problems with imbalanced data, low model performance, and poor generalization ability. Especially for the problem of multi-label subcellular localization, there are few solutions available.
In this study, the MBSCLoc model is proposed as an efficient computational tool for mRNA multi-label subcellular localization. MBSCLoc can simultaneously predict the location of mRNA across multiple cellular compartments, addressing the limitations of existing models in single-location prediction, incomplete feature extraction, and imbalanced data. MBSCLoc first utilizes the UTR-LM pre-trained model for comprehensive sequence information extraction. Then, by combining multi-class contrastive representation learning methods with Clustering Balanced Subspace Partitioning Algorithm (CBSP), a balanced subspace is constructed. Next, by optimizing the sample distribution, the issue of extremely imbalanced data can be addressed. Finally, multiple XGBoost classifiers are employed to train each subset and integrate voting to improve the model’s generalization ability and accuracy. The results of five-fold cross validation and independent testing experiments demonstrate that MBSCLoc is significantly superior to other methods under the synergistic effect of multiple modules. In addition, MBSCLoc has superior interpretability at the pixel level, providing strong support for mRNA multi-label subcellular localization research. More importantly, the significant role of the 5' UTR and 3' UTR regions has been preliminarily confirmed based on traditional biological analysis techniques and the Tree-SHAP algorithm, and all mRNA sequences show significant importance in the 5' UTR and 3' UTR regions, particularly in the 3' UTR, where about 80% of specific sites reach their peak.
Learn More