Phishing Website Detection through Machine Learning Algorithms: A Comparative Analysis
Sr No:
Page No:
11-29
Language:
English
Authors:
Ochuko Piserchia*
Received:
2025-10-06
Accepted:
2025-11-29
Published Date:
2025-12-08
Abstract:
Phishing is the attempt to acquire sensitive information, often for malicious reasons,
by masking as a trustworthy entity in an electronic communication. Once victims access a
phishing website, the attacker attempts to convince them to send their private information such
as usernames, passwords and credit card resulting in information theft.
Despite the growing awareness of phishing and its prevention through traditional methods such
as DNS filtering, blacklisting, and user awareness trainings regarding the problem and its
associated risks, it remains as growing concern, costing millions of dollars each year. The only
effective defense against these threats is accurate detection of phishing attempts. However,
machine learning methods have shown reasonable performance rates. Machine learning
techniques which are a subset of artificial learning (AI) have shown significant success in
detecting phishing websites in comparison to traditional methods, although effectiveness can
vary depending on the approach deployed.
This research aimed to solve this problem by analyzing a phishing website dataset with six
supervised algorithms. This was achieved using a feature selection investigation on the most
promising of the 6 algorithms using primarily the filter method and compared with outcome of
wrapper method. In addition to Accuracy and ROC (Receiver Operating Characteristic) Curve
performance metrics, we also considered MCC (Matthews Correlation Coefficient). The
experiment showed that Random Forest is the best performing algorithm at 0.989 MCC score
(97% accuracy). We also realized 5 of the 30 features are enough for the classification with little
or no reduction in performance.
Keywords:
Phishing Detection, Machine Learning, Comparative Analysis, Random Forest, Feature Selection, Cybersecurity.