Email Attacks: An Ensemble Algorithm Utilizing Machine Learning for Phishing Detection Towards Potential Attack Prevention

Erwin E. Guerra

Erwin E. Guerra College of Computer Science, University of Makati, Philippines http://orcid.org/0000-0003-3286-6661

Abstract

Purpose – This study is designed to validate the effectiveness of the ensembled algorithm of two machine learning algorithms in the detection and potential prevention of email intrusion in corporate firms, government institutions, and individuals as compared to other studies that use only a single selected best machine learning for email detection and filtering.

Method – The sampling method utilized the best algorithms for the ensemble which are Random Forest and Support Vector Machine (SVM) and were trained on the Kaggle dataset. SVM was embedded in the designed web page for email spam detection, while Random Forest was implemented in a browser extension for the detection and prediction of phishing links in emails.

Results – The test results showed that both algorithms achieved high accuracy rates, with SVM achieving an accuracy of 0.97% and Random Forest achieving an accuracy of 0.87%. As an ensemble approach, Random Forest and SVM advance if not outclass them in terms of accuracy, precision, recall, f1 score, true positive rate, and false positive rate.

Conclusion – From the findings, this study suggests that ensembled machine learning algorithms can be effective in detecting spam and malicious links in emails. The high accuracy rates achieved by both models indicate that they can be used as reliable ensembled tools for email threat detection and security.

Recommendations – It is highly recommended to embed the model system or the like into several email providers to automatically detect spam without having to copy and paste the email content into a webpage. Also, disabling malicious links and detecting malicious email attachments (payloads) should be included to further the capabilities of this study.

Theoretical Implications – The study on ensembled algorithms in machine learning if carefully selected will surely advance the accuracy detection of false positives or false negatives in email. This will lead to trust and worry-free email usage for everyone.

Author Biography

Erwin E. Guerra, College of Computer Science, University of Makati, Philippines

Erwin E. Guerra was a graduate of Bachelor of Science in Computer Engineering at Technological Institute of the Philippines (Manila Campus), also a graduate of Master in Information Technology at Polytechnic University of the Philippines (Sta. Mesa Campus), and currently finishing his Doctor of Information Technology degree (dissertation phase) in Technological Institute of the Philippines (Quezon City Campus). His field of interests include Robotics, Cloud Computing, Internet Security, Cybersecurity, Data Mining, Machine Learning, and Deep Learning.

Email Attacks: An Ensemble Algorithm Utilizing Machine Learning for Phishing Detection Towards Potential Attack Prevention

Abstract

Author Biography

Most read articles by the same author(s)

Other Journal by STEP Academic