Email Attacks: An Ensemble Algorithm Utilizing Machine Learning for Phishing Detection Towards Potential Attack Prevention
Abstract
Purpose – This study is designed to validate the effectiveness of the ensembled algorithm of two machine learning algorithms in the detection and potential prevention of email intrusion in corporate firms, government institutions, and individuals as compared to other studies that use only a single selected best machine learning for email detection and filtering.
Method – The sampling method utilized the best algorithms for the ensemble which are Random Forest and Support Vector Machine (SVM) and were trained on the Kaggle dataset. SVM was embedded in the designed web page for email spam detection, while Random Forest was implemented in a browser extension for the detection and prediction of phishing links in emails.
Results – The test results showed that both algorithms achieved high accuracy rates, with SVM achieving an accuracy of 0.97% and Random Forest achieving an accuracy of 0.87%. As an ensemble approach, Random Forest and SVM advance if not outclass them in terms of accuracy, precision, recall, f1 score, true positive rate, and false positive rate.
Conclusion – From the findings, this study suggests that ensembled machine learning algorithms can be effective in detecting spam and malicious links in emails. The high accuracy rates achieved by both models indicate that they can be used as reliable ensembled tools for email threat detection and security.
Recommendations – It is highly recommended to embed the model system or the like into several email providers to automatically detect spam without having to copy and paste the email content into a webpage. Also, disabling malicious links and detecting malicious email attachments (payloads) should be included to further the capabilities of this study.
Theoretical Implications – The study on ensembled algorithms in machine learning if carefully selected will surely advance the accuracy detection of false positives or false negatives in email. This will lead to trust and worry-free email usage for everyone.
This work is licensed under a Creative Commons Attribution 4.0 International License.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.