ORIGINAL
Introdução: A classificação de subtipos de AVC isquêmico é essencial para o prognóstico e tratamento, mas desafiadora na prática clínica. Objetivo: Desenvolver e avaliar modelos de aprendizado de máquina (ML) para classificação automatizada dos subtipos de AVC isquêmico (OCSP) usando dados clínicos. Métodos: Utilizando 13.056 casos do estudo IST, treinamos Random Forest, XGBoost, Regressão Logística, Support Vector Machine e k-Nearest Neighbors. Avaliamos acurácia, sensibilidade, especificidade, VPP, VPN e AUC-ROC com validação cruzada estratificada (10-fold). Resultados: Variáveis clínicas associaram-se fortemente aos subtipos (p < 0,001). Random Forest e XGBoost tiveram desempenho perfeito (todas as métricas = 1,000 ± 0,000). Regressão Logística e SVM tiveram desempenho quase perfeito (acurácia ≈ 0,998; AUC-ROC = 1,000). O KNN apresentou menor sensibilidade, especialmente para POCS (sensibilidade média = 0,898). Conclusão: Os modelos de ML, especialmente Random Forest e XGBoost, permitem classificar subtipos de AVC isquêmico com alta precisão usando dados clínicos rotineiros.
Introduction: Ischemic stroke subtype classification supports prognosis and treatment but can be challenging in acute care. Objective: To develop and evaluate Machine Learning models for automated OCSP-based ischemic stroke subtype classification using clinical data. Methods: Using 13,056 IST cases, we trained Random Forest, XGBoost, Logistic Regression, Support Vector Machine, and k-Nearest Neighbors models. Performance was assessed by accuracy, sensitivity, specificity, PPV, NPV, and AUC-ROC using 10-fold stratified cross-validation. Results: Clinical variables were strongly associated with stroke subtypes (p < 0.001). RF and XGBoost achieved perfect performance (all metrics = 1.000 ± 0.000). Logistic Regression and SVM also performed near-perfectly (accuracy ≈ 0.998, AUC-ROC = 1.000). KNN showed lower sensitivity, especially for POCS (macro average sensitivity = 0.898). Conclusion: ML models, particularly RF and XGBoost, enable highly accurate ischemic stroke subtype classification using routine clinical data.
5. Bamford J, Sandercock P, Dennis M, Warlow C, Burn J.
Classification and natural history of clinically identifiable subtypes of cerebral infarction. Lancet. 1991;337(8756):1521-6. http:// doi.org/10.1016/0140-6736(91)93206-O. PMid:1675378.
6. INTERNATIONAL STROKE TRIAL COLLABORATIVE GROUP.
The International Stroke Trial (IST): a randomised trial of aspirin, subcutaneous heparin, both, or neither among 19435 patients with acute ischaemic stroke. Lancet. 1997;349(9065):1569-81. http:// doi.org/10.1016/S0140-6736(97)04011-7. PMid:9174558.
7. Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12(85):2825-30.
8. Breiman L. Random forests. Mach Learn. 2001;45(1):5-32. http:// doi.org/10.1023/A:1010933404324.
9. Chen T, Guestrin C. XGBoost: a scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2016 Aug 13-17; San Francisco, CA. New York: Association for Computing Machinery; 2016. p. 785-94.
http://doi.org/10.1145/2939672.2939785.
10. Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20(3):273-97. http://doi.org/10.1007/BF00994018.
11. Cover T, Hart P. Nearest neighbor pattern classification. IEEE Trans Inf Theory. 1967;13(1):21-7. http://doi.org/10.1109/TIT.1967.1053964.
12. Wardlaw JM, Murray V, Berge E, del Zoppo GJ. Thrombolysis for acute ischaemic stroke. Cochrane Database Syst Rev. 2014;2014(7):CD000213. http://doi.org/10.1002/14651858.CD000213.pub3. PMid:25072528.
13. Andrade JBC, Mohr JP, Timbó FB, et al. Oxfordshire community stroke project classification: a proposed automated algorithm. Eur Stroke J. 2021;6(2):160-7. http://doi.org/10.1177/23969873211012136. PMid:34414291.
14. Fang G, Xu P, Liu W. Automated ischemic stroke subtyping based on machine learning approach. IEEE Access. 2020;8:118426-32. http:// doi.org/10.1109/ACCESS.2020.3004977.
15. Ryu W-S, Schellingerhout D, Lee H, et al. Deep learning-based automatic classification of ischemic stroke subtype using diffusionweighted images. J Stroke. 2024;26(2):300-11. http://doi.org/10.5853/ jos.2024.00535. PMid:38836277.
16. Garg R, Oh E, Naidech A, Kording KP, Prabhakaran S. Automating ischemic stroke subtype classification using machine learning and natural language processing. J Stroke Cerebrovasc Dis. 2019;28(7):2045-51. http:// doi.org/10.1016/j.jstrokecerebrovasdis.2019.02.004. PMid:31103549.
17. Lee HJ, Schwamm LH, Turner AC, et al. Abstract WMP49: a machine learning approach to classifying ischemic stroke etiology using variables available in the Get-with-the-Guidelines Stroke Registry. Stroke.
2025;56(Suppl 1). http://doi.org/10.1161/str.56.suppl_1.WMP49.
1Faculty of Medicine, Universidade Federal do Triângulo Mineiro, Uberaba, Minas Gerais, Brazil
2Center for Mathematics, Computing and Cognition – CMCC, Universidade Federal do ABC, Santo André, SP, Brazil.
3Discipline of Neurosurgery, Hospital das Clinicas, Universidade Federal do Triângulo Mineiro, Uberaba, MG, Brazil.
4Neurosurgery Division, Universidade Federal de Sergipe – UFS, Aracaju, SE, Brazil.
5Neurosurgery Division, Universidade Federal do Triângulo Mineiro, Uberaba, Minas Gerais, Brazil.
Received May 13, 2025
Accepted June 4, 2025