Analysis of Significant Influence towards Students’ Depression using Neural Network and Classification Tree Techniques. Revista Publicando, 6 (19). 2019, 62-78. ISSN 1390-9304.
Analysis of Significant Influence towards Students’ Depression using Neural Network and Classification Tree Techniques
Análisis de la influencia significativa de la depresión de los estudiantes utilizando las redes neuronales y las técnicas del árbol de clasificación
Recibido: 27/ 10 /2018
Aceptado: 27 /12 /2018
Norhatta Mohd1*, Yasmin Yahya2
1 Faculty of MIIT, Technical Foundation Section,
University Kuala Lumpur (UniKL), Malaysia, [email protected]
2 Faculty of MIIT, Technical Foundation Section,
University Kuala Lumpur (UniKL), Malaysia.
Resumen: La depresión de los estudiantes es un tema importante para la mayoría de las instituciones de educación superior. Aunque este problema ha sido investigado por muchos investigadores que utilizan técnicas de análisis estadístico y de minería de datos, este documento se centró en el rendimiento de las técnicas de depresión de las redes de neuronas artificiales y de árboles de clasificación entre estudiantes de Tecnología de la Ingeniería en la Universidad de Kuala Lumpur (UniKL) Instituto Malasio de Tecnología de la Información ( MIIT). Se identificaron varios factores que pueden influir en la depresión de los estudiantes. Factores de estrés, factores sociales (interpersonales e intrapersonales), factores ambientales y factores demográficos atribuidos para predecir la depresión de los estudiantes. Se comparan los rendimientos de estas técnicas, en función de la precisión. A partir de los resultados del análisis, se encontró que el estrés intrapersonal social contribuyó significativamente a la depresión de los estudiantes. Los rendimientos de ambos métodos se compararon mediante análisis de validación cruzada. La red neuronal artificial tiene la menor tasa de error y la más alta precisión; por lo tanto, la red neuronal artificial es la mejor técnica para clasificar en este conjunto de datos.
Palabras clave: Depresión, factor de estrés, red neuronal, árbol de clasificación, rendimiento del modelo.
Abstract: Students’ depression is an important issue for most of the higher learning institutions. Although this issue has been investigated by many researchers using statistical analysis and data mining techniques, this paper focused on the performance of Classification Tree and Artificial Neural Network techniques of depression among Engineering Technology students at Universiti Kuala Lumpur (UniKL) Malaysian Institute of Information Technology (MIIT). Various factors that may likely influence the students’ depression were identified. Stress factors, social factors (interpersonal and intrapersonal), environment factor as well as demographic factors attribute to predict the students’ depression. The performances of these techniques are compared, based on accuracy. From the findings of the analysis, social intra-personal stress was found significantly contribute to students’ depression. Performances of both methods were compared using cross-validation analysis. Artificial Neural Network has the least error rate and has the highest accuracy; therefore, Artificial Neural Network is the best technique to classify in this data set.
Keywords: Depression, Stress Factor, Neural Network, Classification Tree, Model Performance
INTRODUCTION
Feeling stress is common among teenagers. It is the body's response to physical, mental, or emotional changes, situations, and forces while entering a new phase of life. College life has become a lot more competitive, which could lead to students’ depression. Stress can result from external factors such as environment, social academic, and financial or from internal factors such as expectations, attitudes, feelings, and anxiety. Many studies have revealed that students’ performance in school, college and university are affected by symptoms of depression (Stark, K.D., Brookman, C.S. 1994), anxiety (Anson, O., Bernstein, J., Hobfoll, S.E., 1984.) and stress (Dusselier, L., Dunn, B., Wang, Y., Shelley II, M.C., Whalen, D.F., 2005) which may impair their academic achievement (Brown, S., Evans, J., Patterson, J., Petersen, S., Doll, H., Balding, J., Regis, D., 2000), lead to deterioration in relationships ( Ali, B.S., Rahbar, M.H., Naeem, S., Tareen, A.L., Gui, A., Samad, L. 2002), marital problems and affect future employment ( Eisenberg, D., Golberstein, E., Gollust, S., Hefner, J. 2007). Having stress in our life is unavoidable, but there are steps students can take to lessen its effects on their lives and health. Universities must take clear steps to learn and face serious psychological stress for students during the period they spend of their studies (Bataineh, 2013) which can also impact relationships in their life, whether it’s with friends, family, classmates or teachers.
Due to fast physical changes and mental development, students may sometimes experience incompatibility of their mental development with their physical changes or with the social environment and thus suffer from problems arising from inadequate adaptations. These problems may further cause psychological troubles and even induce deviant behaviors. Therefore, this study aims to identify some significant factors which constitute sources of stress to students’ depression and to compare the performance of different data mining techniques for prediction of depression among students. This paper presents a relationship study between depression, stress and their correlates such as social interpersonal, social intrapersonal, gender, academic status, environment, financial support and friends in University Kuala Lumpur Malaysian Institute of Information Technology (UniKL MIIT) which indicates that environment factor and social factors (interpersonal and intrapersonal) are the significant factors that influence students’ depression (Norhatta, M., Yasmin,Y., Naziren, N., Siti Nabilah, 2016).
Data mining has emerged as one of the key features of many applications of information systems. It has also been used as a means for predicting the future direction, extracting the hidden limitations, as well as revealing the specifications of a process (Yonghee L., Sangmun S., 2010). It is also has become the area of growing significance because it helps in analyzing data from different perspectives and summarizing it into useful information (Varun K., Anupama C., 2011). Each data mining technique serves a different purpose depending on the modeling objective (Sellappan P., Rafiah A., 2008). The two most common modeling objectives are classification and prediction. Classification models predict categorical labels (discrete, unordered) while prediction models predict continuous-valued functions (Han, J., Kamber, M., 2006). Classification Trees and Neural Networks use classification algorithms while Regression, Association Rules, and Clustering use prediction algorithms (Charly, K., 1998).
Decision Tree is one of a popular method for prediction. Most of the researchers have used this technique because of its simplicity and comprehensibility to uncover small or large data structure and predict the value. On the other hand, Neural network could also do a complete detection without having any doubt even in complex nonlinear relationship between the dependent and independent variables. The advantage of a neural network is that it can detect all possible interactions between predictors variables. Therefore, a neural network technique is selected as one of the best prediction methods. It also demonstrates that the proposed data mining can effectively find significant stress, social (interpersonal and intrapersonal) and environmental factors attribute to predict the students’ depression. Also, the process provides detailed statistical inferences.
Material and Method
This study implemented a cross-sectional design among University Kuala Lumpur MIIT students. A total of 216 questionnaires were distributed through random sampling for both diploma and degree students during September – December 2015 semester. This process of collecting data is suitable as it is a quick and secure method (Badriyah T, Briggs J S and Prytherch D R., 2012). The programs that participated in this study are Diploma in Computer and Networking (DCNET), Diploma in Information Technology (DIT), Diploma in Multimedia (DIM), Bachelor in Computer and Entrepreneur Management (BCEM), Bachelor in Networking System (BNS) and Bachelor in Software Engineering (BSE). Respondents were given a set of questionnaires to answer in 30 minutes during class, and the response rate was a hundred percent.
Data analysis was carried out to analyze the descriptive and inductive statistics to achieve the objectives in this study. Statistical software of Statistical Package for Social Science (SPSS) was employed to do all the statistics analysis. Missing values or any illogical data values had been catered during the process of data cleansing. The qualitative variables, such as student’s program, gender, ethnic group, religion, living status, parent status, financial support, and financial status, were presented in number and percentage. Continuous variables were expressed as mean and standard deviation. In this study, the dependent variable which is depression is categorized into two levels of 0 and 1 (binary) based on the mean of students’ depression; not depressed (1-2) and depressed (greater than 2).
The independent variables are demographic variables including gender, living status, financial support, financial status, relationship with friends and students’ stress factor which consists of interpersonal factor, intrapersonal factor, environment factor, and academic factor. Before building models, the data set were randomly split into two subsets, 60% of the data is for the training set (n = 130), and 40% of the data is for validation set (n = 86). Artificial Neural Network (ANN) was constructed using Multilayer Perceptron Neural Network (MLPNN) with Backpropagation algorithm (BP) where at least three layers of neurons: one input layer, one output layer and one hidden layer as shown in diagram 1. The output value is the outcome of students’ depression. The number of hidden layers applied is determined by estimating the generalization error of each network. The receiver-operating characteristics (ROC) curves were used to measure the discriminant ability of the models. The area under the ROC curve, summarized by the c-index can range from 0.5 (no predictive ability) to 1 (perfect discrimination). Reasonable discrimination is indicated by c-index values of 0.7 – 0.8 (Badriyah T, Briggs J S and Prytherch D R., 2012).
For Classification Trees (CT) model, Chi-squared Automatic Interaction Detector (CHAID) method was chosen as at each step, CHAID chooses the independent (predictor) variable that has the strongest interaction with the dependent variable. Categories of each predictor are merged if they are not significantly different concerning the dependent variable. To avoid the tree from growing extremely large and complex tree structures, the number of observations required for a split search is set at 50. Besides, missing values are not counted as an acceptable value in our decision tree models.
Result and Discussion
Table 1 displays the demographic characteristics of the students as well as the significant factors determined by univariate regression analysis. Generally, male and female students are approximately 40% and 60% respectively. About 80% of the students in this study living in a city area. Only two predictive factors were found to include in the model, social, intrapersonal stress (SI) with a p-value of 0.015 and environment stress (E) with a p-value of 0.017. These two significant predictors were included in the classification tree and artificial neural network analysis. Diagram 2 illustrates the classification tree model of training data for students’ depression in which the most suitable tree level is divided into two nodes. Social, intrapersonal stress was found to be the only significant factor. The terminal node 2 indicates that when the social, intrapersonal stress scores less than 0.29, the predicted probability for students’ depression is 85.1%.
Diagram 1: Multilayer Perceptron Neural Network (MLPNN)
Diagram 2: Classification tree for students’ depression (training data)
Table 1: Characteristic of students for the training dataset
Many undergraduate students feeling stress due to the changes for example, leaving home, becoming independent decision makers, and competing against new standards (Altmaier, E. M., 1983). Some students can see these transitions as a positive experience for them which can be exciting, but some students seem to be threatened by this change (Nelson, N. G., Dell’Oliver, C., Koch, C., & Buckler, R., 2001). Stress can affect some factors to students’ performance such as student’s grades, health, and personal adjustment. How students perceive the immediate environment, their own lives, and tasks confronting them serves to define, in a unique manner, people, and events as potentially dangerous or relatively innocuous (Roberts, G. H., & White, W. G. 1989). The transition of moving to college and leaving home can be an added stressor for an undergraduate student. In UniKL MIIT, students come from different states and cities where they have different upbringing and attitudes towards certain matters. They feel homesick when apart from their families, which leads to stress.
Experiencing homesickness need to have a support system so they can let out their feelings. There is a loss of control for students who are attending school and should adapt to a different climate, new language, behavior, and social customs. The changes and transition are stressful for students because of this new environment (Denise P., 2001). Undergraduate students who are passive and mildly depressed before leaving home are those most likely to show raised levels of homesickness following the move to university (Fisher, S., 1994). Many people recognize that stress comes in the form of negative tension that is caused by someone or something. Those who recognize stress as negative tension fail to realize that stress can generate a positive reaction to a stimulus. The positive response of stress can drive individuals to achieve and to test their potential to its fullest. Stress can be a positive aspect of learning if students experience stress as a challenge can exhibit an increased capacity to learn (N. Kumarswamy and P.O. Ebigbo, 1989). Many experiences distress rather than challenge, which can lead students to feel threatened and helpless. Our findings were consistent with the study of psychological problems of college students of 100 medical students, which were found that 31% having anxiety and depression and 26% having psychological distress (Cohen, S., Janicki-Deverts, D., Miller, G. E., 2007). Psychological stress has been found to contribute to poorer health practices, increased disease risk, accelerated disease progression, greater symptom reporting, more frequent health service utilization, and increased mortality (V.O. Oladokun, A.T. Adebanjo, O.E. Charles-Owaba., 2008).
Diagram 3. ROC curve for the ANN model
Table 2: Results of cross-validation in classification tree and neural network (%) |
|||||
CT |
75.8 |
93.0 |
34.5 |
65.5 |
7.0 |
ANN |
79.5 |
94.7 |
48.3 |
51.7 |
5.3 |
Model |
ACC |
SEN |
SPE |
PPV |
NPV |
*CT = Classification Trees; ANN = Artificial Neural Network
The model for Artificial Neural Network was tested to the validation data set of 86 students using the ROC curve, and the area under the curve was 0.718, as shown in Diagram 3. Cross-validation analysis was carried out to compare the results of the classification tree and the artificial neural network. Five indicators were used to assess the prediction accuracy. First, the accuracy (ACC) is the ratio of number correctly classified samples to the total number of samples. Second, the sensitivity (SEN) is the ratio of positive results from predicted values to positive actual values. Third, specificity (SPE) is the ratio of negative results from the predicted values to actual negative values. Fourth, positive predict value (PPV), the ratio of positively predicted to the negatively actual result. Fifth, negative predict value (NPV) is the ratio of negatively predicted to positively actual result. Table 2 shows that the total accuracy for the neural network is slightly higher compared to the classification tree. The Artificial Neural Network exhibited the accuracy of 79.5% and sensitivity of 94.7%, while Classification Trees provided the prediction performance with an accuracy of 75.8% and sensitivity of 93.0%. However, positive predictive value for Classification Tree is 65.5%, which is slightly higher than that obtained from the neural network 51.7%. Predicting students’ performance by other researchers using ANN and Decision tress has shown the potential of the artificial neural network for enhancing the effectiveness of a university admission system (Amirah M. S, Wahidah H and Nur’aini A.R., 2015), help the educational system to monitor the students’ performance in a systematic way (Sellappan P. and Rafiah A, 2008). Also, the experimental results also show that using neural networks, the system predicts heart disease with nearly 100% accuracy (M. Mayilvaganan, D. Kalpanadevi, 2014).
Conclusion
Predicting students’ depression is useful to help the lecturers and students improving their learning and teaching process. The stress experienced by students in UniKL are derived from different sources but mostly due to the environment and social factors. It is important to investigate the stress that students encounter. University life can be quite stressful for anyone. Therefore, it is important to look at the different factors of stress to help them cope effectively. This paper has reviewed previous studies on predicting students’ stress with various analytical methods. Results showed that environment and social factors are significantly related to students' depression.
Prediction model has been developed using Classification Tree and Artificial Neural Network analysis. Both techniques are the two methods highly used by the researchers for predicting students’ depression. The finding indicates that the artificial neural network model predicts more accurately with 79.5% accuracy compared to Classification Tree with 75.8% accuracy. Overall, Artificial Neural Network has the least error rate and has the highest accuracy; therefore, the artificial neural network is the best technique to classify in this data set. It can be concluded that Artificial Neural Network can handle both numerical and categorical data (K. Bunkar, U. K. Singh, B. Pandya, R. Bunkar, 2012), perform well in the large dataset ( P. M. Arsad, N. Buniyamin, J.-l. A. Manan, 2013) and easy to be understood and interpreted the relationship between variables ( T. Mishra, D. Kumar, S. Gupta, 2014), as a reliable prediction tool. It is also suggested that for further study, the qualitative study should be conducted in combination with a quantitative study to explore the perception of responding with stress, causes and level of stress in association with depression. A larger sample size also should be taken from the institution and it should be enhanced and extended to all UniKL campuses. The parameters in both Artificial Neural Network and Classification Trees can be optimized as well as to try other methods in data mining such as Naive Bayes and K-Nearest Neighbours (KNN) to obtain more accurate results. It is suggested that for further study, the qualitative study should be conducted in combination with a quantitative study to explore the perception of respondents with the causes and the level of stress in association with depression. Psychological or clinical experts could also be incorporated that might be linked to the prevention and aid in stress that could lead to depression which are better managed with a combination of methods including medication.
Acknowledgments: The authors would like to thank the Universiti Kuala Lumpur, the lecturers, and respondents involved in this study for the support of this research.
References
Ali, B.S., Rahbar, M.H., Naeem, S., Tareen, A.L., Gui, A., Samad, L. (2002). Prevalence of and factors associated with anxiety and depression among women in a lower middle class semi-urban community of Karachi, Pakistan. Journal of the Pakistan Medical Association 52, 513–517.
Altmaier, E. M. (1983). Helping students manage stress. San Francisco: Jossey-Boss Inc.
Amirah M. S, Wahidah H and Nur’aini A.R. (2015). A Review on Predicting Student’s Performance using Data Mining Techniques. The Third Information Systems International Conference. Procedia Computer Science 72, 414 – 422.
Amirah M.S., Wahidah H., Nur’aini A.R. (2015). A Review on Predicting Student’s Performance using Data Mining Techniques. Procedia Computer Science, 72, 414 – 422.
Anson, O., Bernstein, J., Hobfoll, S.E. (1984). Anxiety and performance in two ego threatening situation. Journal of Personality Assessment, 48, 168–172.
Badriyah T, Briggs J S and Prytherch D R. (2012). Decision Trees for Predicting Risk of Mortality using Routinely Collected Data. World Academy of Science, Engineering and Technology, 62 2012.
Badriyah T, Briggs J S and Prytherch D R. (2012). Decision Trees for Predicting Risk of Mortality using Routinely Collected Data. World Academy of Science, Engineering and Technology 62 2012.
Charly, K. (1998). Data Mining for the Enterprise, 31st Annual Hawaii Int. Conf. on System Sciences, IEEE Computer, 7, 295-304.
Cohen, S., Janicki-Deverts, D., & Miller, G. E. (2007). Psychological stress and disease. Journal of the American Medical Association, 298, 1685–1687.
Denise Pfeiffer. (2001). Academic and environmental stress among undergraduate and graduate college students: a literature review. The Graduate School University of Wisconsin-StoutMenomonie, WI 54751.
Dusselier, L., Dunn, B., Wang, Y., Shelley II, M.C., Whalen, D.F. (2005). Personal health, academic, and environmental predictors of stress for residence hall students. Journal of American College Health 54, 15–24.
E. Osmanbegovi´c, M. Sulji´c, (N-F). Data mining approach for predicting student performance, Economic Review 10 (1).
Eisenberg, D., Golberstein, E., Gollust, S., Hefner, J. (2007). Prevalence and correlates of depression, anxiety and suicidality among university students. American Journal of Orthopsychiatry 77, 534–542.
Fisher, S. (1994). Stress in academic life. New York: Open University Press.
G. Gray, C. McGuinness, P. Owende. (2014). An application of classification models to predict learner progression in tertiary education, in: Advance Computing Conference (IACC), 2014 IEEE International, IEEE, pp. 549–554.
Han, J., Kamber, M. (2006). Data Mining Concepts and Techniques,.Morgan Kaufmann Publishers.
K. Bunkar, U. K. Singh, B. Pandya, R. Bunkar (2012). Data mining: Prediction for performance improvement of graduate students using classification,in: Wireless and Optical Communications Networks (WOCN), 2012 Ninth International Conference on, IEEE, pp. 1–5.
M. M. Quadri, N. Kalyankar, (N-F). Drop out feature of student data for academic performance using decision tree techniques, Global Journal of Computer Science and Technology 10 (2).
M. Mayilvaganan, D. Kalpanadevi. (2014). Comparison of classification techniques for predicting the performance of student,s academic environment,in: Communication and Network Technologies (ICCNT), 2014 International Conference on, IEEE, pp. 113–118.
Marwan Zaid Bataineh. (2013). Academic Stress Among Undergraduate Students: The Case Of Education Faculty At King Saud University. International Interdisciplinary Journal , Vol 2, Issue1, Jan 2013.
N. Kumarswamy and P.O. Ebigbo. (1989). Stress among second year medical students – A comparative study, Indian J Clin Psychol., 16, 21-23.
Nelson, N. G., Dell’Oliver, C., Koch, C., & Buckler, R. (2001). Stress, coping, and success among graduate students in clinical psychology. Psychological Reports, 88, 759-767.
Norhatta, M., Yasmin,Y., Naziren, N., Siti Nabilah, A.S. (2016). Assessing Stress towards Depression among Universiti Kuala Lumpur Malaysian Institute of Information Technology (UniKL MIIT) Students. Advanced Science Letters. August 2016. Vol 22, No 8.
P. M. Arsad, N. Buniyamin, J.-l. A. Manan (2013). A neural network students’ performance prediction model (nnsppm), in: Smart Instrumentation, Measurement and Applications (ICSIMA), 2013 IEEE International Conference on, IEEE, pp. 1–5.
P. M. Arsad, N. Buniyamin, J.-l. A. Manan. (2013). A neural network students’ performance prediction model (nnsppm), in: Smart Instrumentation, Measurement and Applications (ICSIMA), 2013 IEEE International Conference on, IEEE, pp. 1–5.
Roberts, G. H., & White, W. G. (1989). Health and stress in developmental college students. Journal of College Student Development, 30, 515-521.
S. Natek, M. (2014). Zwilling, Student data mining solution– knowledge management system related to higher education institutions, Expert systemswith applications 41 (14), 6400–6407.
Sellappan P. and Rafiah A. (2008). Intelligent Heart Disease Prediction System Using Data Mining Techniques. IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.8.
Sellappan P., Rafiah A. (August 2008). Intelligent Heart Disease Prediction System Using Data Mining Techniques. International Journal of Computer Science and Network Security, VOL.8 No.8.
Stark, K.D., Brookman, C.S. (1994). Theory and family-school intervention. In: Fine, J.M., Carlson, C. (Eds.), The Handbook of Family-school Intervention: A System Perspective. Massachusetts, Allyn and Bacon.
Stewart-Brown, S., Evans, J., Patterson, J., Petersen, S., Doll, H., Balding, J., Regis, D. (2000). The health of students in institutes of higher education: an important public health problem? Journal of Public Health Medicine 22, 492–499.
T. Mishra, D. Kumar, S. Gupta (2014). Mining students’ data for prediction performance, in: Proceedings of the 2014 Fourth International Conference on Advanced Computing & Communication Technologies, ACCT ’14, IEEE Computer Society, Washington, DC, USA, pp. 255–262.doi:10.1109/ACCT.2014.105.URL http://dx.doi.org/10.1109/ACCT.2014.105
Tuckman. (1978). B.W. Conducting educational research. New York: Harcont Brace Jovanovich Inc.
V.O. Oladokun, A.T. Adebanjo and O.E. Charles-Owaba. (2008). Predicting Students’ Academic Performance using Artificial Neural Network: A Case Study of an Engineering Course. The Pacific Journal of Science and Technology, Volume 9. Number 1. May-June.
Varun K., Anupama C. (March 2011). An Empirical Study of the Applications of Data Mining Techniques in Higher Education. International Journal of Advanced Computer Science and Applications, vol. 2, No.3.
Yonghee L. (2010). Sangmun S. Job stress evaluation using response surface data mining. International Journal of Industrial Ergonomics. April 2014. 40, 379-385.