Phishing Website Detection Using Machine Learning: Model Development and Django Integration

Seun Mayowa Sunday

Abstract


The increasing number of phishing attacks is one of the major concerns of security researchers today. Traditional solutions for spotting phishing websites rely on signature-based methods, which cannot detect newly generated phishing websites. Thus, researchers are developing machine learning-based systems capable of detecting and classifying phishing websites with high accuracy, given a vast and diverse set of data.

After several steps which requires adequate preparation of the dataset for the model development, the prepared dataset is used to train the logistic regression (LR), k-nearest neighbor (KNN) and artificial neural network (ANN) model. This research is concluded by integrating the best performing model in terms of the documented measuring metrics into the Django application. Research has proved that the integration of machine-learning model into the web application is lacking. Researchers only stop at the model performance without proper integration into the end-user consumption. Apart from the comparison of the proposed model with previous researchers work, this research will also contribute by detailing the steps required to integrate the proposed model for end-user consumption.


Full Text:

PDF

References


Adebowale MA, Lwin KT, Hossain MA (2020) Intelligent phishing detection scheme using deep learning algorithms. J Enterp Inf Manag. https://doi.org/10.1108/JEIM-01-2020-0036

Ali NB, Petersen K (2014) Evaluating strategies for study selection in systematic literature studies. In: Proceedings of the 8th ACM/IEEE international symposium on empirical software engineering and measurement, pp 1–4

Aljofey A, Jiang Q, Qu Q, Huang M, and Niyigena JP, ‘‘An effective phishing detection model based on character level convolutional neural network from URL,’’ Electronics:, vol. 9, no. 9, p. 1514, Sep. 2020, doi: 10.3390/electronics9091514.

Almomani A (2018) Fast-flux hunter: a system for filtering online fast-flux botnet. Neural Comput Appl 29(7):483–493

Ahmad R and Alsmadi I, ‘‘Machine learning approaches to IoT security: A systematic literature review,’’ Internet Things, vol. 14, Jun. 2021, Art. no. 100365, doi: 10.1016/j.iot.2021.100365

Aljofey A, Jiang Q, Qu Q, Huang M, Niyigena JP (2020) An effective phishing detection model based on character level convolutional neural network from URL. Electronics 9(9):1514

APWG | Phishing Activity Trends Reports. Accessed: Apr. 8, 2021. https://apwg.org/trendsreports/

APWG GA, Manning R (2020) APWG Phishing Reports. https://docs.apwg.org/reports/apwg_trends_report_q4_2020.pdf

Barushka A and Hajek P, ‘‘Spam detection on social networks using cost-sensitive feature selection and ensemble-based regularized deep neural networks,’’ Neural Comput. Appl., vol. 32, no. 9, pp. 4239–4257, May 2020, doi: 10.1007/s00521-019-04331-5.

Basit A, Zafar M, Liu X, Javed AR, Jalil Z, Kifayat K (2020) A comprehensive survey of AI-enabled phishing attacks detection techniques. Telecommun Syst 76:139–154

Cheng M, Li Q, Wang J, Sun B (2020) LSTM based phishing detection for big email data. IEEE Trans Big Data 8(1):278–288

Chen Z, ‘‘Deep learning for cybersecurity: A review,’’ in Proc. Int. Conf. Comput. Data Sci. (CDS), Aug. 2020, pp. 7–18, doi: 10.1109/CDS49703.2020.00009.

Dou Z, Khalil I, Khreishah A, Al-Fuqaha A, and Guizani M, ‘‘Systematization of knowledge (SoK): A systematic review of software-based web phishing detection,’’ IEEE Commun. Surveys Tuts., vol. 19, no. 4, pp. 2797–2819, 4th Quart., 2017, doi: 10.1109/COMST.2017.2752087.

Da Silva CMR, Feitosa EL, Garcia VC (2020) Heuristic-based strategy for Phishing prediction: a survey of URL-based approach. Comput Secur 88:101613

Dou Z, Khalil I, Khreishah A, Al-Fuqaha A, Guizani M (2017) Systematization of knowledge (SoK): a systematic review of software-based web phishing detection. IEEE Commun Surv Tutor 19(4):2797–2819

El Aassal A, Baki S, Das A, and Verma RM, ‘‘An in-depth benchmarking and evaluation of phishing detection research for security needs,’’ IEEE Access, vol. 8, pp. 22170–22192, 2020, doi: 10.1109/ACCESS.2020.2969780.

Gupta BB, N. A. G. Arachchilage, and K. E. Psannis, ‘‘Defending against phishing attacks: Taxonomy of methods, current issues and future directions,’’ Telecommun. Syst., vol. 67, no. 2, pp. 247–267, Feb. 2018, doi: 10.1007/s11235-017-0334-z.

Geetha R and T. Thilagam, ‘‘A review on the effectiveness of machine learning and deep learning algorithms for cyber security,’’ Arch. Com- put. Methods Eng., vol. 28, no. 4, pp. 2861–2879, Jun. 2021, doi: 10.1007/s11831-020-09478-2.

Gupta BB, Tewari A, Jain AK, and Agrawal DP, ‘‘Fighting against phishing attacks: State of the art and future challenges,’’ Neural Comput. Appl., vol. 28, no. 12, pp. 3629–3654, Dec. 2017, doi: 10.1007/s00521- 016-2275-y.

Huang Y, Q. Yang, J. Qin, and W. Wen, ‘‘Phishing URL detec- tion via CNN and attention-based hierarchical RNN,’’ in Proc. 18th IEEE Int. Conf. Trust, Secur. Privacy Comput. Communications/13th IEEE Int. Conf. Big Data Sci. Eng. (TrustCom/BigDataSE), Aug. 2019, pp. 112–119, doi: 10.1109/TrustCom/BigDataSE.2019.00024.

Hossam H. Sultan Nancy M Salem Walid Al-Atabany (2019). Multi-Classification of Brain Tumor Images Using Deep Neural Network. Available at: https://www.researchgate.net/figure/ReLU-activation-function_fig7_333411007

Li Q, Cheng M, Wang J, Sun B (2020) LSTM based phishing detection for big email data. IEEE Trans Big Data 8(1):278–288

Liu H and B. Lang, ‘‘Machine learning and deep learning methods for intrusion detection systems: A survey,’’ Appl. Sci., vol. 9, no. 20, p. 4396, Oct. 2019, doi: 10.3390/app9204396.

Mahdavifar S, Ghorbani AA (2019) Application of deep learning to cybersecurity: a survey. Neurocom- puting 347:149–176

Kurtis Pykes (2020). Oversampling and Undersampling A technique for Imbalanced Classification. https://towardsdatascience.com/oversampling-and-undersampling-5e2bbaf56dcf

Hossam H. Sultan Nancy M Salem Walid Al-Atabany (2019). Multi-Classification of Brain Tumor Images Using Deep Neural Network. https://www.researchgate.net/figure/ReLU-activation-function_fig7_333411007

Kirill Eremenko (2017), Deep Learning A-Z™: Hands-On Artificial Neural Networks, https://www.udemy.com/course/deeplearning/

Neda Abdelhamid, Fadi Thabtah, Hussein Abdel-jaber (2017). Phishing detection: A recent intelligent machine learning comparison based on models content and features. https://ieeexplore.ieee.org/abstract/document/8004877

Patil S and S. Dhage, ‘‘A methodical overview on phishing detection along with an organized way to construct an anti-phishing framework,’’ in Proc. 5th Int. Conf. Adv. Comput. Commun. Syst. (ICACCS), Mar. 2019, pp. 588–593, doi: 10.1109/ICACCS.2019.8728356.

Ramzan Z, Wüest C (2007) Phishing attacks: analyzing trends in 2006. In: CEAS

Rao RS and Pais AR (2019). ‘‘Detection of phishing websites using an efficient feature-based machine learning framework,’’ Neural Comput. Appl., vol. 31, no. 8, pp. 3851–3873. doi: 10.1007/s00521- 017-3305-0.

Sullins LL (2006) Phishing for a solution: Domestic and international approaches to decreasing online identity theft. Emory Int’l L Rev 20:397

Siddharth M (2021). Build your first artificial neural networks using Pytorch. Available at: https://www.analyticsvidhya.com/blog/2021/08/build-your-first-artificial-neural-networks-using-pytorch/

Sahingoz OK, S. I. Baykal, and D. Bulut, ‘‘Phishing detection from urls by using neural networks,’’ in Computer Science & Information Technology (CS&IT). India: AIRCC Publishing Corporation, Dec. 2018, pp. 41–54, doi: 10.5121/csit.2018.81705.

Statista. Share of consumers shopping more online since the beginning of the coronavirus (COVID-19) pandemic in selected African countries in 2021. Available at: https://www.statista.com/statistics/1233745/share-of-consumers-shopping-more-online-due-to-covid-19-in-selected-african-countries/

Siddharth M (2021). Build your first artificial neural networks using Pytorch. https://www.analyticsvidhya.com/blog/2021/08/build-your-first-artificial-neural-networks-using-pytorch/

UNB, 2016. https://www.unb.ca/cic/datasets/url-2016.html

Vayansky I, Kumar S (2018) Phishing–challenges and solutions. Comput Fraud Secur 2018(1):15–20

Wei X, Wei X, Kong X, Lu S, Xing W, Lu W (2020) FMixCutMatch for semi-supervised deep learning. Neural Netw 133:166–176

Wei W, Q. Ke, J. Nowak, M. Korytkowski, R. Scherer, and M. Wo?niak, ‘‘Accurate and fast URL phishing detector: A convolutional neural net- work approach,’’ Comput. Netw., vol. 178, Sep. 2020, Art. no. 107275, doi: 10.1016/j.comnet.2020.107275.

Yang P, G. Zhao, and P. Zeng, ‘‘Phishing website detection based on multidimensional features driven by deep learning,’’ IEEE Access, vol. 7, pp. 15196–15209, 2019, doi: 10.1109/ACCESS.2019.2892066.

Yang W, W. Zuo, and B. Cui, ‘‘Detecting malicious URLs via a keyword- based convolutional gated-recurrent-unit neural network,’’ IEEE Access, vol. 7, pp. 29891–29900, 2019, doi: 10.1109/ACCESS.2019.2895751.

Zamir A, Khan HU, Iqbal T, Yousaf N, Aslam F, Anjum A, Hamdani M (2020) Phishing web site detection using diverse machine learning algorithms. Electron Libr 38(1):65–80


Refbacks

  • There are currently no refbacks.


Copyright (c) 2023 Journal of Electrical Engineering, Electronics, Control and Computer Science

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.