This is an open access article distributed under the CC BY 4.0
Volume 19 article 895 pages: 1126-1142
This research discussed our experience in implementing machine learning algorithms on the human aspect of information security awareness. The implementation of the classification and clustering approach have been conducted by creating a questionnaire, creating dataset, importing data, handling incompleted and imbalanced data, compiling datasets, feature scaling, building models, and subsequently evaluating machine learning models. Datasets are generated based on the collection of questionnaire result of the distributed questionnaire related to the Human Aspects of Information Security Questionnaire (HAIS-Q) to the stakeholder of an Indonesian institution. Models as results of algorithms implementation through the classification approach has been evaluated by several methods, such as: k-fold Cross Validation analysis, Confusion Matrix, Receiver Operating Characteristics, and score calculation for each model. A model of the Support Vector implementation in the classification has an accuracy of 99.7% and an error rate of 0.3%. Models of clustering implementation are used to determine the number of clusters that can optimally divide the dataset. The model of the DBSCAN algorithm on the clustering approach has an adjusted rand index value of always close to 0.
We thank the Ministry of Education and Culture of Republic of Indonesia for financial support for this research under the PTUPT Research Grant number NKB-356/UN2.RST/HKP.05.00/2020.
1. B. P. Statistik (2018). Statistik Telekomunikasi Indonesia 2017. Badan Pusat Statistik. Jakarta.
2. Easttom, C., Butler, W., (2019). A Modified McCumber Cube as a Basis for a Taxonomy of Cyber Attacks. IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC), p. 0943-0949.
3. Solms, R. v., Niekerk, J. v. (2013). From information security to cyber security. Computers & Security, vol. 38, 97-102, DOI: 10.1016/j.cose.2013.04.004.
4. McCumber, J. (2004). Assessing and Managing Security Risk in IT Systems: A Structured Methodology. Auerbach Publications. Massachusetts.
5. Kraemer, S., Carayon, P., Clem, J. (2009). Human and organizational factors in computer and information security: Pathways to vulnerabilities. Computers & Security, vol. 28, 509-520, DOI: 10.1016/j.cose.2009.04.006.
6. Edgar, T. W., Manz, D. O.,(2017). Machine Learning. Romer, B., Research Methods for Cyber Security. Syngress, Massachusetts, p. 153-173.
7. Apruzzese. G., Colajanni, M., Ferretti, L., Guido, A., Marchetti, M.,(2018). On the effectiveness of machine and deep learning for cyber security. 10th International Conference on Cyber Conflict (CyCon), p. 371-390.
8. Alohali, M., Clarke, N., Furnell, S., Albakri,S.,(2017). Information security behavior: Recognizing the influencers. Science and Information Conference (SAI), p. 844-853.
9. Bauer, S., Bernroider, E. W. (2017). From Information Security Awareness to Reasoned Compliant Action: Analyzing Information Security Policy Compliance in a Large Banking Organization. ACM SIGMIS Database: the DATABASE for Advances in Information Systems, vol. 48, p. 44–68, DOI: 10.1145/3130515.3130519.
10. Normandia, Y., Kumaralalita, L., Hidayanto, A. N., Nugroho, W. S., Shihab, M. R., (2018). Measurement of Employee Information Security Awareness Using Analytic Hierarchy Process (AHP): A Case Study of Foreign Affairs Ministry. International Conference on Computing, Engineering, and Design (ICCED), p. 52-56.
11. Farooq, A., Alifov, S., Virtanen, S., Isoaho, and J., (2018). Towards comprehensive information security awareness: a systematic classification of concerns among university students. Proceedings of the 32nd International BCS Human Computer Interaction Conference (HCI ’18), p. 1–6.
12. Carella, A., Kotsoev, M., Truta,T. M., (2017). Impact of security awareness training on phishing click-through rates. IEEE International Conference on Big Data (Big Data), p. 4458-4466.
13. Cindana, A., Ruldeviyani, Y., (2018). Measuring Information Security Awareness on Employee Using HAIS-Q: Case Study at XYZ Firm. International Conference on Advanced Computer Science and Information Systems (ICACSIS), p. 289-294.
14. Ikhsan, M. G., Ramli, K., (2019). Measuring the Information Security Awareness Level of Government Employees. 34th International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC), p.1-4.
15. Mustafa, M. S. b. O., Kabir, M. N. Erna, F., (2019). An Enhanced Model for Increasing Awareness of Vocational Students Against Phishing Attacks. IEEE International Conference on Automatic Control and Intelligent Systems (I2CACIS), p. 10-14.
16. Wahyudiwan, D. D. H., Sucahyo, Y. G., Gandhi, A., (2017). Information security awareness level measurement for employee: Case study at ministry of research, technology, and higher education. 3rd International Conference on Science in Information Technology (ICSITech), p. 654-658.
17. Alelyani, S., Tang, J., Liu, H. (2013). Feature Selection for Clustering: A Review. Aggarwal, C. C., Reddy, C. K., Data Clustering Algorithms and Applications. CRC Press, Boca Raton, p.29-60.
18. L’Heureux, A., Grolinger, K., Elyamany, H. F., Capretz, M. A. M. (2017). Machine Learning With Big Data: Challenges and Approaches. IEEE Access, vol. 5, 7776-7797, DOI: 10.1109/ACCESS.2017.2696365.
19. Saridewi, V. S., Sari, R. F. (2020). Feature Selection In The Human Aspect of Information Security Questionnaires Using Multicluster Feature Selection. International Journal of Advanced Science and Technology, vol. 29, no. 7, 3484-3493.
20. Nieles, M., Dempsey, K. L., Pillitteri, V. Y. (2017). An Introduction to Information Security. NIST Pubs, DOI: 10.6028/NIST.SP.800-12r1.
21. Parsons, K., Calic, D., Pattinson, M., Butavicius, M., McCormac, A., Zwaans, T. (2017). The Human Aspects of Information Security Questionnaire (HAIS-Q): Two further validation studies. Computers & Security, vol. 66, 40-51, DOI: 10.1016/j.cose.2017.01.004.
22. Kruger, H.A., Kearney, W.D. (2006). A prototype for assessing information security awareness. Computers & Security, vol. 25, no. 4, 289-296, DOI: 10.1016/j.cose.2006.02.008.
23. Alpaydin, E. (2009). Introduction to Machine Learning, Second Edition. The MIT Press, Cambridge.
24. Google Developers. Machine Learning Crash Course, from https://developers.google.com/ machine-learning/crash-course, accessed on 2020-4-11.
25. Goodfellow, I., Bengio, Y., Courville, A. (2016). Deep Learning, The MIT Press, Cambridge.
26. Swamynathan, M. (2017). Mastering Machine Learning with Python in Six Steps. Apress, Berkeley.
27. Lee, W. (2019). Python® Machine Learning. John Wiley & Sons, Inc., Indianapolis.
28. Ahmed, H., Nandi, A. K. (2019). Classification Algorithm Validation. Condition Monitoring with Vibration Signals: Compressive Sampling and Learning Algorithms for Rotating Machines. Wiley-IEEE Press, Hoboken, p. 307-319.
29. Xu, D., Tian, Y. (2015). A Comprehensive Survey of Clustering Algorithms. Annals of Data Science, vol. 2, 165–193, DOI: 10.1007/s40745-015-0040-1.
30. Wang, J., Wu, Y., Hsu, H.H., Cheng, Z. (2017). Spatial Big Data Analytics for Cellular Communication Systems. Hsu, H. H., Chang, C.Y., Hsu, C.H., Big Data Analytics for Sensor-Network Collected Intelligence. Academic Press, London, p. 153-166.
31. Davies, D. L., Bouldin, D. W. (1979). A Cluster Separation Measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-1, no. 2, 224-227, DOI: 10.1109/TPAMI.1979.4766909.
32. scikit-learn developers (BSD License). scikit-learn Machine Learning in Python, from https://scikit-learn.org/stable/index.html., accessed on 2020-04-27.
33. Kotu, V., Deshpande,B. (2018). Clustering. Data Science (Second Edition). Morgan Kaufmann, Cambridge, p. 221 - 261.
34. Thomas, M. C., Romagnoli, J. (2016). Extracting knowledge from historical databases for process monitoring using feature extraction and data clustering. Computer Aided Chemical Engineering, vol. 38, 859-864, DOI: 10.1016/B978-0-444-63428-3.50148-X.
35. Satopaa, V., Albrecht, J., Irwin, D., Raghavan, B., (2011). Finding a "Kneedle" in a Haystack: Detecting Knee Points in System Behavior. 31st International Conference on Distributed Computing Systems Workshops, p. 166-171.
36. Mejias, R. J., (2012). An Integrative Model of Information Security Awareness for Assessing Information Systems Security Risk. 45th Hawaii International Conference on System Sciences, p. 3258-3267, doi: 10.1109/HICSS.2012.104.
37. Brownlee, J. Machine Learning Mastery, from https://machinelearningmastery.com/, accessed on 2020-05-23.
38. Raschka, S. (2018). Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning. Computer Science, Mathematics, arXiv:1811.12808.