J Korean Acad Psychiatr Ment Health Nurs Search

CLOSE


J Korean Acad Psychiatr Ment Health Nurs > Volume 34(Special Issue); 2025 > Article
Jang and Heo: Comparison of Machine Learning Models for Predicting Suicide Attempts among Korean Adolescents with Suicidal Ideation: Secondary Data Analysis Based on 20th Korea Youth Risk Behavior Survey

Abstract

Purpose

This study aimed to identify the factors associated with the progression from suicidal ideation to suicide attempts and to compare the predictive performance of various machine learning models.

Methods

We conducted a secondary analysis using original data from the 20th Korea Youth Risk Behavior Survey (KYRBS), focusing on 6,316 adolescents who reported suicidal ideation. We evaluated predictive performance using logistic regression, random forest, and k-nearest neighbors (KNN) models.

Results

Suicide attempts were significantly associated with sociodemographic factors, such as academic achievement, economic status, type of residence, and perceived health status, as well as psychological and behavioral factors, including suicidal planning, feelings of sadness and despair, anxiety, alcohol use, smoking, drug use, and exposure to violence. Logistic regression exhibited the highest predictive performance (AUC=0.77, accuracy=0.84, F1=0.80). The random forest model identified suicidal planning, loneliness, generalized anxiety, drug use, and exposure to violence as key predictors based on the Gini index, while KNN demonstrated the lowest predictive stability.

Conclusion

Logistic regression is effective for predicting suicide attempts among adolescents, and machine learning approaches should be considered for early risk screening in community mental health nursing.

INTRODUCTION

Adolescent suicide represents a serious global public-health concern. According to the World Health Organization [1], approximately 727,000 individuals die by suicide each year, and suicide ranks as the third leading cause of death among those aged 15 to 29. In Korea, a 2024 survey by the Korea Disease Control and Prevention Agency (KDCA) reported high prevalence of depressive symptoms and suicidal ideation among adolescents [2]. Mitchell et al. [3] reported high susceptibility of adolescents to peer influence, and emotional and behavioral phenomena —including depression, anxiety, self-harm, and suicide— spread through peer networks. This factor underscores the need for early intervention [3].
Previous studies that examined factors associated with adolescent suicide have reported that various psychosocial variables—depression, stress, sleep, academic burden, family environment, and peer relationships—are associated with adolescent suicide [4-6]. However, because most studies have relied on cross-sectional designs in general adolescent samples or have examined only single factors, they could not precisely identify the high-risk individuals who attempt suicide. Specifically, variablecentric approaches do not capture complex interactions among factors and thus are too limited to predict suicide attempts.
Though suicidal ideation is the strongest predictor of suicide attempts, studies scarcely explore the 'Ideation-toattempt transition factors' among adolescents who report such ideation. Klonsky and May [7] emphasized that suicidal ideation and suicidal action are distinct processes and proposed the Ideation-to-Action Framework. This model posits that although many adolescents contemplate suicide, only a limited proportion of them proceed to actual attempts, and that additional factors—such as suicidal planning, impulsivity, distress tolerance, and environmental-risk factors—intervene in the transition from ideation to action [8]. Therefore, studies focusing on adolescents with suicidal ideation should identify factors that facilitate the transition from ideation to attempts and provide evidence to prevent this progression. Adolescents with suicidal ideation represent a distinct high-risk group, often exhibiting greater psychological distress, interpersonal difficulties, and risk behaviors than the general adolescent population, including heightened susceptibility to peer influence and emotional contagion within social networks [3]. Furthermore, emerging suicide theories emphasize that suicidal ideation and suicide attempts are not linear outcomes of the same process, but rather distinct stages in which additional factors—such as suicidal planning, impulsivity, and reduced distress tolerance—facilitate the transition from ideation to action [8]. Therefore, examining sociodemographic, psychological, and behavioral determinants within this subgroup is essential for developing targeted suicide-prevention strategies, particularly in mental-health and school-based nursing practice.
Accordingly, this study seeks to employ machine-learning methods to comprehensively capture interactions among variables rather than relying solely on traditional statistical approaches. Machine learning encompasses various algorithms, and each method exhibits different strengths and limitations depending on data characteristics. Consequently, rather than limiting analysis to a single method, diverse models should be applied and their performance should be compared [9]. Machine learning can be used for the precise identification of key factors within complex data structures and to offer practical utility through the early prediction of the risk of suicide attempts using simple survey responses alone [10].
This study extends previous research by applying a machine-learning approach within a nursing science perspective, clearly distinguishing itself from prior studies that primarily relied on traditional statistical analyses. In addition, it incorporates an integrated framework reflecting sociodemographic, psychological, and behavioral determinants relevant to adolescent mental-health nursing, moving beyond variable-centric analyses.We applied multiple machine-learning models and compared their predictive performance to identify the optimal model for predicting adolescent suicide attempts. Accordingly, this study utilized the nationally representative, regularly administered, large-scale dataset from the 20th Korea Youth Risk Behavior Survey (KYRBS) 2024 [2] to compare and analyze machine-learning-based predictive models among adolescents who reported suicidal ideation; by identifying key predictors and model performance, our findings may support early clinical screening and risk stratification in adolescent mental-health nursing practice and inform the development of tailored intervention strategies for suicide prevention.

METHODS

1. Study Design

We performed a secondary-data analysis to predict suicide attempts among adolescents by identifying associated factors and comparing machine-learning models, using raw data from the 20th KYRBS, which is representative of adolescents.

2. Participants and Data Collection

In this study, an initial sample of 6,951 adolescents who reported suicidal ideation was selected from the raw data of the 20th KYRBS, which included a total of 54,653 participants. All variables used in the analysis, including suicidal ideation, planning, and attempts, were measured through self-reported responses to standardized survey items in the KYRBS. Because complete input data are required for machine-learning analysis, participants with missing values in any of the selected variables were excluded, resulting in a final analytic sample of 6,316 individuals.
The KYRBS is an anonymous, web-based, self-administered survey conducted among students from middle school—1st grade to high school 3rd grade (grades 7 to 12) —to assess health-related behaviors such as smoking, drinking, sleep habits, and physical activity among Korean adolescents. The target population of KYRBS consisted of students enrolled in middle and high schools nationwide. A stratified cluster sampling method was employed, with schools as the primary sampling units and classes as the secondary units. All students in the selected classes were surveyed, and long-term absentees, students with disabilities who could not participate, and those with reading difficulties were excluded. The researcher requested access to the raw data from KDCA and received de-identified data, which were used for analysis. Because the data were collected using a stratified cluster sampling method, complex sample analysis is generally recommended for studies aiming to estimate population parameters. However, since this study aimed not to estimate population-level statistics but to develop a machine learning-based model for predicting suicide attempts, it did not apply complex sample weights. Accordingly, this study conducted a pooled analysis within the same survey year using data from KYRBS [2].

3. Dataset

1) Dependent variable

The dependent variable in this study was the presence or absence of suicide attempts, defined in KBYRBS as a response indicating a suicide attempt within the past 12 months.

2) Independent variables

The selection of independent variables was informed by well-established psychosocial and behavioral determinants of adolescent suicide, as documented in prior literature [4-6]. In addition, variables reflecting Ideation-to-Action processes, including suicidal planning, emotional distress, impulsivity, and interpersonal stressors, were included to identify factors that may facilitate the transition from ideation to attempts [7,8]. Therefore, all independent variables were drawn from KYRBS items that align with these domains.

(1) Sociodemographic Variables

The sociodemographic variables included sex, the types of school and residence, academic achievement, economic status, perceived health status and body image, and age. The type of school was categorized as middle or high school. Academic achievement and economic status were categorized into high, medium, and low levels. The type of residence was categorized into three: living with family; other (e.g., living with relatives, boarding, living alone, or in a dormitory); and living in a childcare facility. Perceived health status was categorized into healthy, average, and poor; and perceived body image was categorized as thin, average, or overweight. Age was treated as a continuous variable.

(2) Psychological and Behavioral Risk Factor Variables

Psychological and behavioral risk-factor variables were categorized into two subdomains: ① psychological factors and ② behavioral factors, based on previous classifications of adolescent mental-health risk determinants.

① Psychological risk factors

Perceived stress, experiences of sadness and despair, generalized anxiety (GAD-7 score), suicidal planning, experiences of loneliness, and exposure to violence. These variables, except generalized anxiety, were treated as binary variables (yes/no). Generalized anxiety was measured using the GAD-7 and treated as a continuous variable [2,11]. In this study, generalized anxiety, measured by the GAD-7, was classified as a psychological factor. The Generalized Anxiety Disorder-7 (GAD-7) scale, developed by Spitzer et al. [11] is used to assess the frequency of individuals being bothered by anxiety-related problems over the past two weeks. It comprises seven items rated on a 4-point Likert scale. These items are rated on a scale from 0 ("not at all") to 3 ("nearly every day"), yielding a total score ranging from 0 to 21. The total score is categorized using cut-off points of 5, 10, and 15, with scores of 5 or higher indicating mild anxiety; 10 or higher indicating moderate anxiety; 15 or higher indicating severe anxiety [11]. A Korean-translated version is available free of charge on the Patient Health Questionnaire website (www. phqscreeners.com). The internal consistency of the instrument during its development was high, with a Cronbach's ⍺ of .92 [11]. After validating the Korean version, a study conducted among patients with epilepsy reported a Cronbach's ⍺ of .92 [12]. In the present study, the value was .90. and the GAD-7 scores were treated as a continuous variable.

② Behavioral risk factors

Breakfast consumption, high-caffeine beverages consumption, sugar-sweetened beverages consumption, and fast food consumption; physical activity; sedentary time; smartphone-use time; alcohol use, smoking experience; peer smoking status; experiences of sexual intercourse; and habitual-drug use (therapeutic use excluded). Most behavioral variables were coded as binary variables (yes/no), except sedentary time and smartphone-use time, which were calculated separately for weekdays and weekends and treated as continuous variables. Breakfast consumption, high-caffeine beverages consumption, sugarsweetened beverages consumption, fast food consumption; physical activity, sedentary time, and smartphoneuse time were assessed based on experiences during the past 7 days.

4. Data Analysis

1) Data-analysis methods

Data were analyzed using IBM SPSS version 29.0 and Orange version 3.35. Participant characteristics were summarized using frequencies, percentages, means, and standard deviations. x2 tests and t-tests were used to examine suicide attempts across sociodemographic, psychological, and behavioral risk factors.
The development and validation of machine-learningbased predictive models were performed using Orange version 3.35. All available data were used in the analysis, and to prevent overfitting and enhance generalizability, 5-fold cross-validation (k-fold cross-validation, k=5) was employed.
Predictive performance was evaluated using accuracy, precision, recall, the F1 score, the area under the receiver operating characteristic curve (AUC), classification accuracy (CA), the Matthews correlation coefficient (MCC), and the Gini index for variable importance. Accuracy refers to the proportion of correctly classified samples, with higher values indicating better overall performance. Precision represents the proportion of predicted positives that are actual positives, with higher precision indicating fewer false positives. Recall represents the proportion of actual positives correctly identified as positive; higher recall indicates fewer false negatives. Precision and recall may exhibit a trade-off depending on the decision threshold, particularly in imbalanced datasets [13,14].
The F1 score, the harmonic mean of precision and recall, increases when both indicators are high and reflects balanced predictive performance. AUC measures the probability that the model ranks a randomly selected positive case above a negative case; values closer to 1.0 indicate excellent discrimination, whereas values near 0.5 reflect chance-level performance. MCC ranges from -1 to 1, with values near 1 indicating perfect prediction, values near 0 indicating chance-level classification, and negative values indicating inverse prediction accuracy. Variable importance was assessed using the Gini index, which represents each variable's contribution to reducing node impurity in the random forest algorithm; higher Gini values indicate greater contribution to the model [13,14].

2) Machine-learning model

(1) Logistic regression

Logistic regression is a fundamental statistical method for predictive modeling that transforms a linear combination of input variables to predict the probability of the occurrence of an event. It is used to predict situations with two possible outcomes (binary outcomes)—such as "yes" or "no," "true" or "false," or "1" or "0"—and is employed for diagnostics and risk-assessment purposes [15]. Logistic regression is a computationally efficient simple model and is used to produce easy-to-interpret classification decisions; however, outliers can affect its predictions [13].

(2) Random forest

Random forest is a representative ensemble learning model highly effective for handling structured data. It is used to construct a forest of decision trees through random generation of individual trees and produce the final prediction by aggregating their predictions. Random forest is used because it tends to achieve strong generalization performance and can reduce the likelihood of overfitting in predictive results. However, a drawback is that the prediction time increases as the number of decision trees grows [14].

(3) K-nearest neighbors (KNN)

KNN is a supervised learning method in machine learning; given new data, it selects k samples closest to the target sample and makes a prediction using the mean of those neighbors [14].

5. Ethical Considerations

The 2024 KYRBS is a nationally approved statistical dataset (Approval No. 117058) conducted by KDCA and the Ministry of Education before the survey implementation. The researcher obtained approval from KDCA to use the dataset and access the raw data. The ethical review of the Institutional Review Board of Cheongju University was requested, and the study was granted an exemption from review (1041107-202502-HR-053-01).

RESULTS

1. Suicide Attempts by Sociodemographic Variables

Table 1 shows results of the analysis of Sociodemographic variables. The sample comprised 2,313 males (36.6 %) and 4,003 females (63.4%). The type of school was middle school for 3,680 participants (58.3%) and high school for 2,636 (41.7%). Academic achievement was high in 734 (11.6%), medium in 4,706 (74.5%), and low in 876 (13.9%). Economic status was high in 613 (9.7%), medium in 5,445 (86.2%), and low in 258 (4.1%). Regarding the type of residence, living with family was most common in 6,003 (95.0%), followed by other arrangements (e.g., living with relatives, boarding, living alone, or in a dormitory) in 283 (4.5%) and living in a childcare facility in 30 (0.5%). Regarding the perceived health status, 2,895 (45.8%) participants were "healthy"; 2,045 (32.4%) were average; and 1,376 (21.8%) were poor. The mean age was 14.77±1.70 years.
When suicide attempts by sociodemographic variables were analyzed, statistically significant differences were observed for academic achievement (x2=58.09, p<.001), economic status (x2=39.03, p<.001), type of residence (x2=13.60, p=.001), and perceived health status (x2=44.11, p< .001), whereas no significant differences were found for sex (x2=0.14, p=.076), type of school (x2=1.10, p=.295), perceived body image (x2=1.24, p=.539), or age (t=1.67, p=.950).

2. Suicide Attempts by Psychological and Behavioral Risk Factor

Findings of the analysis of variables related to psychological and behavioral risk factors are presented in Table 2.
Suicide attempts by psychological risk factor variables, were analyzed, statistically significant differences were observed for perceived stress (x2=12.73, p<.001), experiences of sadness and despair (x2=64.50, p<.001), suicidal planning (x2=810.53, p<.001), experiences of loneliness (x2=19.65, p<.001), exposure to violence (x2=96.37, p< .001), anxiety (t=-11.18, p<.001).
Suicide attempts by behavioral risk factor variables, were analyzed, statistically significant differences were observed for high-caffeine beverage consumption (x2=13.69, p<.001), physical activity (x2=5.68, p=.017), sedentary time (weekday, academic) (t=7.12, p<.001), sedentary time (weekday, non-academic) (t=-3.94, p<.001), sedentary time (weekend, academic) (t=6.10, p<.001), sedentary time (weekend, non-academic) (t=-4.10, p<.001), smartphone usage time (weekday) (t=-6.70, p<.001), and smartphone usage time (weekend) (t=-5.66, p<.001), alcohol use (x2=67.87, p<.001), smoking experience (x2=71.20, p< .001), peer smoking status (x2=38.56, p<.001), sexual intercourse experience (x2=60.62, p<.001), habitual drug use (x2=105.92, p<.001).
By contrast, no statistically significant differences were found for consumptions of breakfast (x2=0.86, p=.335), sugar-sweetened beverages (x2=1.00, p=.316), or fast food (x2=3.02, p=.082).

3. Comparison of Machine Learning Models

To compare the performance of models, we evaluated them using AUC, CA, the F1 score, precision, recall, and MCC. Results of the performance comparison are presented in Table 3.
The logistic regression model achieved an AUC of 0.77, a CA of 0.84, an F1 of 0.80, a precision of 0.80, a recall of 0.84, and an MCC of 0.24, representing the highest values across all metrics. The random forest model yielded an AUC of 0.71, a CA of 0.83, an F1 of 0.80, a precision of 0.79, a recall of 0.83, and an MCC of 0.22. The k-nearest neighbors (KNN) model showed an AUC of 0.66, a CA of 0.82, an F1 of 0.79, a precision of 0.78, a recall of 0.82, and an MCC of 0.19. Across all evaluation metrics, logistic regression demonstrated the strongest predictive performance. It showed the highest AUC, classification accuracy, recall, and precision among the three models. Although its F1 score was equal to that of the random forest model, logistic regression achieved a superior Matthews correlation coefficient, reflecting more balanced and reliable performance.
Additionally, Figure 1 shows the results of the variable importance analysis based on the gini index. The top variables (gini index), in descending order, were suicidal planning (0.036), experiences of loneliness (0.006), anxiety (0.006), habitual-drug use experience (0.005), exposure to violence (0.005), perceived stress (0.004), sexual intercourse experience (0.003), smoking experience (0.003), experiences of sadness and despair (0.003), and alcohol use (0.003).

4. Logistic Regression Analysis

Table 4 shows the findings of the analysis of variable importance using the Gini index and a binary logistic re-gression model with the top nine variables, which were used to identify factors associated with suicide attempts among adolescents.
In the logistic regression analysis, suicidal planning (Odds Ratio [OR]=5.74, 95% Confidence Intervals [CI]=4.95~6.66, p<.001) was associated with approximately 5.7 -fold higher odds of a suicide attempt, and habitual-drug use experience with approximately 1.8-fold higher odds (OR=1.82, 95% CI=1.35~2.44, p<.001). Further, exposure to violence (OR=1.84, 95% CI=1.43~2.38, p<.001), peer smoking status (OR=1.29, 95% CI=1.04~1.60, p=.020), experiences of sadness and despair (OR=1.33, 95% CI=1.11~1.60, p=.002), perceived stress (OR=1.42, 95% CI=1.21~1.65, p<.001), and the anxiety (OR=1.03, 95% CI=1.01~1.04, p<.001) were also statistically significant predictors of suicide attempts.
By contrast, sexual intercourse experience (OR=1.17, 95% CI=0.93~1.45, p=.188) and experiences of loneliness (OR=1.23, 95% CI=0.94-1.62, p=.130) were not statistically significant because 95% CI for these variables included 1.

DISCUSSION

This study aimed to identify factors that contribute to suicide attempts among adolescents with suicidal ideation and explore the optimal predictive approach by comparing the performance of various machine-learning models. The analysis revealed significant differences by suicideattempt status in academic achievement, economic status, the type of residence, and the perceived health status. Unlike prior research, which has typically examined general adolescent populations and identified depression, anxiety, stress, and the lack of social support as primary correlates of suicidal ideation [16,17], the present study restricted the sample to adolescents who reported suicidal ideation and identified distinct factors associated with its transition to actual suicide attempts.
To predict suicide attempts among adolescents with suicidal ideation, we compared logistic regression, random forest, and k-nearest neighbors (KNN), and evaluated performance using AUC, CA), F1, precision, recall, and MCC. The logistic regression model was used to correctly classify approximately 83.7% cases, demonstrating the highest predictive performance compared with the other models. The AUC and F1 scores —0.77 and 0.80, respectively— demonstrated robust discriminative performance and a favorable balance between precision and recall. Notably, high scores for precision (80.3%) and recall (83.7%) enabled the effective capture of the at-risk group while minimizing false positives; this aligns with prior findings that higher precision and recall can reduce false positives [18]. Additionally, the highest MCC value of 0.24 suggests the relatively stable predictive performance for the logistic regression model even in the presence of class imbalance [19].
These results indicate that logistic regression, a traditional statistical method, is useful because of its straightforward interpretability and predictive performance. The accuracy of approximately 82.5% for random forest model indicates comparatively strong predictive performance; however, with an AUC of 0.71, its discriminative performance was somewhat lower than that of logistic regression. Nevertheless, the high score of 82.5% for recall indicates the strength of minimizing false negatives and avoiding missed at-risk cases [20]. Notably, variable-importance analysis using the Gini index identified suicidal planning, loneliness, generalized anxiety, drug use, and exposure to violence as key risk factors. Accordingly, random forest may serve as a useful analytical tool for exploring risk factors and identifying key variables [21].
Although the KNN model achieved an accuracy of approximately 82.3% and is apparently comparable to other models, its AUC and MCC of 0.66 and 0.19, respectively, indicates the weakest performance in terms of predictive stability and class balance. These results likely stem from algorithmic characteristics of KNN—namely, its sensitivity to sample size and class imbalance [22]. In fact, this study used survey-based data and the survey was restricted to adolescents with suicidal ideation; the suicide-attempt group was relatively small, and most variables were categorical or binary. This data structure likely disadvantaged the KNN algorithm—which computes similarity based on distance—and led to reduced performance [22]. Therefore, although KNN showed limited performance in the context of this study, it may still serve as a useful auxiliary classifier when sample sizes are adequate and class distributions are balanced, or when the goal is to detect local patterns.
Taken together, all three models were found to achieve accuracies of above 80%; however, logistic regression was found to be the best-performing model in terms of both accuracy and predictive stability. By contrast, random forest showed the strength of exploring key risk factors for suicide attempts through variable importance analysis, and despite its limited overall performance, KNN was informative for identifying local patterns within groups. These findings suggest that, in predicting adolescent suicide attempts, model choice may vary depending on the objective —optimizing predictive performance versus exploring risk factors. It is also important to note that the comparison of machine-learning models in this study was conducted for descriptive and exploratory purposes within a predictive analytics framework, rather than for formal statistical significance testing of performance differences. Therefore, differences in AUC, F1, and other performance metrics should not be interpreted as statistically significant, and the results should be understood as comparative indicators to support model selection.
By integrating the analysis of the importance of machine-learning into the logistic regression results, this study identified suicidal planning as the strongest predictor of suicide attempts. This finding is consistent with those of Klonsky and May [7] and reaffirms that suicidal planning is key to separating mere ideation from suicidal action. A suicidal plan implies that concrete feasibility has already been considered; at this point, suicide risk escalates sharply, indicating that suicidal planning should be prioritized as a target for assessment in prevention efforts.
Further, drug use and exposure to violence emerged as important predictors. This aligns with the finding of Ballabrera et al. [23] who contends that substance use and exposure to violence during adolescence exacerbate impulsivity, diminish self-control, and heighten emotional instability, thereby increasing suicidal-behavior risk. Alcohol and smoking were also identified as risk factors; alcohol interacts with impulsivity to directly precipitate suicidal urges and behaviors [24], whereas smoking—when coupled with risk behaviors within peer groups—has been reported to indirectly increase suicide risk [25].
Among emotional factors, experiences of sadness and despair and generalized anxiety were significantly associated with suicide attempts. This aligns with prior findings that depression and anxiety constitute core psychopathological underpinnings of adolescent suicide attempts [26]. Notably, hopelessness—when combined with negative expectations about the future—substantially increases the risk of suicidal ideation or attempts [27]. In contrast, generalized anxiety, although its odds ratio is relatively modest, may heighten suicide risk through a cascade of chronic anxiety leading to sleep deprivation and heightened emotional reactivity, ultimately precipitating interpersonal difficulties [28].
Meanwhile, some variables exhibited high importance in the machine-learning analyses but were not statistically significant in the logistic regression analysis. Sexual-intercourse experience was not significant in the logistic regression analysis but showed greater importance according to the machine-learning of gini index-based variable importance. This may be because early sexual debut often co-occurs with other risk behaviors—such as alcohol use, smoking, and drug use [29]. Therefore, in machine learning, in which complex interactions are captured, the influence of sexual debut tends to be weighted more heavily.
Loneliness was likewise not significant in the regression analysis but emerged as a major factor in the random forest analysis. Because adolescence is a period of heightened sensitivity to peer relationships, social isolation and loss of belonging can be directly linked to suicidal urges [30]. Moreover, correlations with hopelessness and anxiety may have attenuated its independent effect in the regression analysis. By contrast, machine learning is less sensitive to inter-variable correlations, so the extent to which loneliness contributed to classification accuracy appears to have been directly captured. Therefore, it would be identified as an important variable.
These findings exemplify distinctions between traditional regression models and machine learning approaches, suggesting that simultaneous, rather than individual, consideration of multiple complex variables lead to clearer identification of risk factors. Besides conforming substance use, violence, emotional distress, and anxiety as core factors related to suicide planning, this study also identified sexual experience and loneliness—factors that were less evident in traditional statistical analyses but emerged as significant through machine learning. This holds academic significance in elucidating the multidimensional and interactive nature of adolescent suicide attempts.
Furthermore, this study demonstrated that only basic survey data can be used to predict suicide-attempt risk at an early stage, highlighting the potential applicability of screening tools in school and counseling contexts. Specifically, an early-detection system in which suicide planning is employed as a core indicator should be developed. Further, preventive education should address risky behaviors such as substance use, alcohol consumption, smoking, and exposure to violence. Accessibility to counseling and treatment for mental health problems including depression and anxiety should be enhanced. Socioeconomic factors such as academic performance and financial status should be incorporated into multidimensional support.
Beyond the comparative evaluation of model performance, the findings of this study provide important implications for psychiatric nursing practice and community-based mental health services. In particular, school settings, community mental-health centers, and public health offices could utilize machine-learning-based screening approaches to identify adolescents at high risk for progressing from suicidal ideation to suicide attempts. The model developed in this study relies solely on brief self-report survey items and therefore may serve as an accessible decision-support tool for early detection and triage in educational and counseling contexts. Moreover, the results suggest a need to strengthen suicide-prevention education in psychiatric and community mental-health nursing curricula by incorporating training on risk assessment, digital mental-health screening, and the interpretation of machine-learning output. Future research should expand the development and validation of nursing interventions tailored to key determinants identified in this study—such as suicidal planning, substance use, exposure to violence, and emotional distress—and evaluate the effectiveness of integrating predictive analytics into adolescent suicideprevention programs. Based on the key predictors identified in this study—including suicidal planning, substance use, exposure to violence, and emotional distress—psychiatric and community mental-health nursing interventions may incorporate structured suicide safety planning, school -based substance-use prevention and peer-risk education, trauma-informed care approaches, and cognitive-behavioral strategies to reduce anxiety and hopelessness. Integrating these interventions into school and community nursing protocols may strengthen early detection and targeted suicide-prevention efforts for adolescents with suicidal ideation.
Despite its strengths, this study has several limitations. First, the cross-sectional design makes it difficult to establish causal relationships, and the findings cannot be generalized to all adolescents because the analysis was restricted to those who reported suicidal ideation. In addition, because participants with missing values were excluded using a complete-case approach, the final analytic sample may differ systematically from the original population, which could introduce selection bias and potentially lead to underestimation of suicide-attempt risk. Furthermore, potential bias arising from the self-reported nature of the data and the absence of complex sampling weights, which may limit population-level generalizability and introduce sampling error, should be considered when interpreting the results. Moreover, some key variables were measured using single-item self-report questions, which may limit measurement precision. Therefore, future research should employ longitudinal designs to elucidate the transition from suicidal ideation to planning and attempts and develop and validate intervention programs based on key variables. Further, future studies should examine the mediating and moderating effects between suicide planning and other risk factors, while applying advanced machinelearning techniques to identify major determinants, enhance predictive performance, and ultimately utilize such models for adolescent mental-health promotion.

CONCLUSION

Using data from the 20th KYRBS, this study identified factors associated with the transition from suicidal ideation to suicide attempts among adolescents and evaluated machine-learning models to determine the best predictive approach. Suicide planning was identified as the strongest risk factor, followed by substance use, exposure to violence, smoking, feelings of sadness and hopelessness, alcohol use, and generalized anxiety. Additionally, random forest analysis revealed loneliness and sexual experience as meaningful predictors that may not be fully captured through traditional statistical methods.
Based on these findings, the results highlight the potential applicability of school- and community-based early screening systems that utilize brief self-report items to identify adolescents at elevated risk of attempting suicide. Suicide planning, as the most salient factor, should be prioritized as a core indicator during counseling and crisis-intervention assessments. Moreover, prevention programs should include targeted education addressing substance use, smoking, and violence exposure; emotional regulation strategies for sadness, hopelessness, and anxiety; and social-connectedness interventions to reduce loneliness. School nurses, mental-health practitioners, and community agencies may incorporate machine-learning-assisted decision-support tools to improve risk detection and triage.
Future research should continue to refine predictive models and develop tailored interventions aligned with identified risk factors, including programs that combine psychoeducation, peer-support systems, digital mentalhealth services, and protective-factor enhancement to strengthen resilience among adolescents with suicidal ideation.

CONFLICTS OF INTEREST

The authors declared no conflicts of interest.

Notes

AUTHOR CONTRIBUTIONS
Conceptualization or/and Methodology: Jang,Y-M & Heo, M-L
Data curation or/and Analysis: Jang,Y-M & Heo, M-L
Funding acquisition: None
Investigation: Jang,Y-M & Heo, M-L
Project administration or/and Supervision: Jang,Y-M & Heo, M-L
Resources or/and Software: Jang,Y-M & Heo, M-L
Validation: Jang,Y-M & Heo, M-L
Visualization: Jang,Y-M
Writing: original draft or/and review & editing: Jang,Y-M & Heo, M-L

Fig. 1.
Variable importance based on gini index.
jkpmhn-2025-34-S1-57f1.jpg
Table 1.
Differences in Suicide Attempt According to Participants' Sociodemographic Characteristics (N=6,316)
Variables Categories n (%) or M±SD Suicide attempt experience
Suicide attempt rate (%) x2 or t p
No
Yes
n (%) n (%)
Total 5,244 (83.0) 1,072 (17.0)
Sex Male 2,313 (36.6) 1,915 (30.3) 398 (6.3) 17.2 0.14 .076
Female 4,003 (63.4) 3,329 (52.7) 674 (10.7) 16.8
Type of school Middle school 3,680 (58.3) 3,040 (48.1) 640 (10.2) 17.4 1.10 .295
High school 2,636 (41.7) 2,204 (34.9) 432 (6.8) 16.4
Academic achievement High 734 (11.6) 596 (9.4) 138 (2.2) 18.8 58.09 <.001
Medium 4,706 (74.5) 3,995 (63.3) 711 (11.3) 15.1
Low 876 (13.9) 653 (10.3) 223 (3.5) 25.5
Economic status High 613 (9.7) 496 (7.9) 117 (1.8) 19.1 39.03 <.001
Medium 5,445 (86.2) 4,569 (72.3) 876 (13.9) 16.1
Low 258 (4.1) 179 (2.8) 79 (1.3) 30.6
Type of residence With family 6,003 (95.0) 5,000 (79.1) 1,003 (15.9) 16.7 13.60 .001
Other 283 (4.5) 226 (3.6) 57 (0.9) 20.1
Childcare facility 30 (0.5) 18 (0.3) 12 (0.2) 40.0
Perceived health status Healthy 2,895 (45.8) 2,483 (39.3) 412 (6.5) 14.2 44.11 <.001
Average 2,045 (32.4) 1,693 (26.8) 352 (5.6) 17.2
Poor 1,376 (21.8) 1,068 (16.9) 308 (4.9) 22.4
Perceived body Image Thin 1,575 (25.0) 1,306 (20.7) 269 (4.3) 17.1 1.24 .539
Average 1,776 (28.1) 1,489 (23.6) 287 (4.5) 16.2
Overweight 2,965 (46.9) 2,449 (38.7) 516 (8.2) 17.4
Age 14.77±1.70 14.78±1.70 14.69±1.68 1.67 .950

M=mean; SD=standard deviation;

Living with relatives, boarding, alone, or in a dormitory.

Table 2.
Differences in Suicide Attempt According to Psychological and Behavioral Risk Factors (N=6,316)
Variables Categories n (%) or M±SD Suicide attempt experience
Suicide attempt rate (%) x2 or t p
Yes
No
n (%) or M±SD n (%) or M±SD
Total 1,072 (17.0) 5,244 (83.0)
Psychological risk factors Perceived stress Yes 6,160 (97.5) 1,029 (16.3) 5,131 (81.2) 27.6 12.73 <.001
No 156 (2.5) 43 (0.7) 113 (1.8) 16.7
Experiences of sadness and despair Yes 4,588 (72.2) 881 (13.9) 3,677 (58.2) 19.3 64.50 <.001
No 1,758 (27.8) 191 (3.1) 1,567 (24.8) 10.9
Suicidal planning Yes 2,002 (31.7) 735 (11.6) 1,267 (20.1) 36.7 810.53 <.001
No 4,314 (68.3) 337 (5.3) 3,977 (63.0) 7.8
Experiences of loneliness Yes 5,625 (89.1) 996 (15.8) 4,629 (73.3) 17.7 19.65 <.001
No 691 (10.9) 76 (1.2) 615 (9.7) 11.0
Exposure to violence Yes 360 (5.7) 129 (2.0) 231 (3.7) 35.8 96.37 <.001
No 5,956 (94.3) 943 (14.9) 5,013 (79.4) 15.8
Anxiety 9.55±5.67 11.38±5.97 9.17±5.53 -11.18 <.001
Behavioral risk factors Breakfast consumption Yes 4,638 (73.4) 775 (12.3) 3,863 (61.1) 16.7 0.86 .335
No 1,678 (26.6) 297 (4.7) 1,381 (21.9) 17.7
High-caffeine beverage consumption Yes 3,608 (57.1) 667 (10.6) 2,941 (46.5) 18.5 13.69 <.001
No 2,708 (42.9) 405 (6.4) 2,303 (36.5) 15.0
Sugar-sweetened beverage consumption Yes 5,982 (94.7) 1,022 (16.2) 4,960 (78.5) 17.1 1.00 .316
No 334 (5.3) 50 (0.8) 284 (4.5) 15.0
Fast food consumption Yes 5,331 (84.4) 886 (14.0) 4,445 (70.4) 16.6 3.02 .082
No 985 (15.6) 186 (2.9) 799 (12.7) 18.9
Physical activity Yes 4,282 (67.8) 760 (12.0) 3,522 (55.8) 17.7 5.68 .017
No 2,034 (32.2) 312 (4.9) 1,722 (27.3) 15.3
Sedentary time (weekday, academic) 466.89±244.55 418.65±246.54 476.75±242.99 7.12 <.001
Sedentary time (weekday, non-academic) 210.52±161.35 230.05±182.17 206.53±156.48 -3.94 <.001
Sedentary time (weekend, academic) 249.83±223.85 213.45±211.97 257.27±225.50 6.10 <.001
Sedentary time (weekend, non-academic) 329.06±232.81 357.09±255.61 323.33±227.46 -4.10 <.001
Smartphone usage time (weekday) 303.72±192.01 343.65±219.77 295.56±184.78 -6.70 <.001
Smartphone usage time (weekend) 443.31±250.34 486.21±278.22 434.54±243.35 -5.66 <.001
Alcohol use Yes 2,576 (40.8) 558 (8.8) 2,018 (32.0) 21.7 67.87 <.001
No 3,740 (59.2) 514 (8.1) 3,226 (51.1) 13.7
Smoking experience Yes 821 (13.0) 224 (3.5) 597 (9.5) 27.3 71.20 <.001
No 5,495 (87.0) 848 (13.4) 4,647 (73.6) 15.4
Peer smoking status Yes 2,689 (42.6) 548 (8.7) 2,141 (33.9) 20.4 38.56 <.001
No 3,627 (57.4) 524 (8.3) 3,103 (49.1) 14.4
Sexual intercourse experience Yes 623 (9.9) 175 (2.8) 448 (7.1) 28.1 60.62 <.001
No 5,693 (90.1) 897 (14.2) 4,796 (75.9) 15.8
Habitual drug use experience Yes 256 (4.1) 104 (1.7) 152 (2.4) 40.6 105.92 <.001
No 6,060 (95.9) 968 (15.3) 5,092 (80.6) 16.0

M=mean; SD=standard deviation.

Table 3.
Comparison of Machine Learning Models (N=6,316)
Model AUC CA F1 Precision Recall MCC
Logistic regression 0.77 0.84 0.80 0.80 0.84 0.24
Random forest 0.71 0.83 0.80 0.79 0.83 0.22
KNN 0.66 0.82 0.79 0.78 0.82 0.19

AUC=area under the curve; CA=classification accuracy; F1=F1 score (harmonic mean of precision and recall); MCC=matthews correlation coefficient; KNN=K-nearest neighbors.

Table 4.
Logistic Regression Analysis of Psychological and Behavioral Risk Factors Associated With Suicide Attempt (N=6,316)
Variables B S.E. Wald df p OR 95% CI
Lower Upper
Suicidal planning 1.75 0.08 533.35 1 <.001 5.74 4.95 6.66
Habitual drug use 0.60 0.15 15.76 1 <.001 1.82 1.35 2.44
Sexual intercourse experience 0.15 0.12 1.73 1 .188 1.17 0.93 1.45
Exposure to violence 0.61 0.13 22.04 1 <.001 1.84 1.43 2.38
Peer smoking status 0.25 0.11 5.40 1 .020 1.29 1.04 1.59
Experiences of sadness and despair 0.29 0.09 9.19 1 .002 1.33 1.11 1.60
Perceived stress 0.35 0.08 19.61 1 <.001 1.42 1.21 1.65
Experiences of loneliness 0.21 0.14 2.30 1 .130 1.23 0.94 1.62
Anxiety 0.03 0.01 14.64 1 <.001 1.03 1.01 1.04
Constant -7.56 0.38 394.42 1 <.001 0.00

OR=odds ratio; CI=confidence intervals.

REFERENCES

1. World Health Organization. Suicide worldwide in 2021: global health estimates. World Health Organization [Internet]. 2021 [cited 2025 Sep 10]. Available from: https://www.who.int/publications/i/item/9789240110069

2. Korea Disease Control and Prevention Agency (KDCA). The 20th Korea youth risk behavior survey. Korea Disease Control and Prevention Agency [Internet]. 2024 [cited 2025 Sep 10]. Available from: https://www.kdca.go.kr/yhs/

3. Mitchell RHB, Kozloff N, Sanches M, Goldstein BI, Amini J, Bridge JA, et al. Sex differences in suicide trends among adolescents aged 10 to 14 years in Canada. The Canadian Journal of Psychiatry. 2023;68(7):547-549. https://doi.org/10.1177/07067437231173370
crossref pmid
4. Cantor N, Kingsbury M, Warner E, Landry H, Clayborne Z, Islam R, et al. Young adult outcomes associated with adolescent suicidality: a meta-analysis. Pediatrics. 2023;151(3):e2022058113 https://doi.org/10.1542/peds.2022-058113
crossref pmid
5. De Filippi M, Rignanese M, Salmè E, Madeddu F, Calati R. The relationship between physical pain and suicidal thoughts and behaviors in adolescents: a meta-analysis. European Psychiatry. 2021;64(S1):S580 https://doi.org/10.1192/j.eurpsy.2021.1548
crossref
6. Peprah P, Asare BYA, Okwei R, Agyemang-Duah W, Osafo J, Kretchy IA, et al. A moderated mediation analysis of the association between smoking and suicide attempts among adolescents in 28 countries. Scientific Reports. 2023;13(5755):13 https://doi.org/10.1038/s41598-023-32610-8
crossref pmid pmc
7. Klonsky ED, May AM. The three-step theory(3ST): a new theory of suicide rooted in the "ideation-to-action" framework. International Journal of Cognitive Therapy. 2015;8(2):114-129. https://doi.org/10.1521/ijct.2015.8.2.114
crossref
8. Kirshenbaum JS, Pagliaccio D, Bitran A, Xu E, Auerbach RP. Why do adolescents attempt suicide? insights from leading ideation-to-action suicide theories: a systematic review. Translational Psychiatry. 2024;14: 266 https://doi.org/10.1038/s41398-024-02914-y
crossref pmid pmc
9. Raju GSB, Manasa C, Bhavani ND, Amulya J, Shirisha D. Comparative analysis of different machine learning algorithms on different datasets. 2023 7th International Conference on Intelligent Computing and Control Systems; 2023 May 17-19; Karpagam College of Engineering, Coimbatore, India. Coimbatore: IEEE; 2023. p. 104-109. https://doi.org/10.1109/ICICCS56967.2023.10142906

10. Somé NH, Noormohammadpour P, Lange S. The use of machine learning on administrative and survey data to predict suicidal thoughts and behaviors: a systematic review. Frontiers in Psychiatry. 2024;15: 1291362 https://doi.org/10.3389/fpsyt.2024.1291362
crossref pmid pmc
11. Spitzer RL, Kroenke K, Williams JBW, Lowe B. A brief measure for assessing generalized anxiety disorder: the GAD-7. Archives of Internal Medicine. 2006;166(10):1092-1097. https://doi.org/10.1001/archinte.166.10.1092
crossref pmid
12. Seo JG, Cho YW, Lee SJ, Lee JJ, Kim JE, Moon HJ, et al. Validation of the generalized anxiety disorder-7 in people with epilepsy: a MEPSY study. Epilepsy & Behavior. 2014;35: 59-63. https://doi.org/10.1016/j.yebeh.2014.04.005
crossref
13. Kim YR. Predictive data analysis of teacher's satisfaction with teaching using machine learning model [master's thesis]. [Incheon]: Gyeongin National University of Education; 2024. 83 p

14. Park HS. Machine learning and deep learning self-study. Seoul: Hanbit Media; 2025. p. 120-282

15. Kumar S, Gota V. Logistic regression in cancer research: a narrative review of the concept, analysis, and interpretation. Cancer Research, Statistics, and Treatment. 2023;6(4):573-578. https://doi.org/10.4103/crst.crst_293_23
crossref
16. Dwi Windarwati H, Lestari R, Agung Wicaksono S, Wahyu Kusumawati M, Asih Laras Ati N, Khaqul Ilmy S. Relationship between stress, anxiety, and depression with suicidal ideation in adolescents. 2022;17(1):36-41. https://doi.org/10.20473/jn.v17i1.31216
crossref
17. Wu Y, Guo Z, Zhang D, Wang Y, Wang S. Sleep quality and suicidal ideation in adolescent depression: a chain mediation effect of perceived social support and resilience. Clinical Psychology & Psychotherapy. 2024;31(2):e2990 https://doi.org/10.1002/cpp.2990
crossref
18. Allam H, Davison C, Kalota F, Lazaros E, Hua D. AI-driven mental health surveillance: identifying suicidal ideation through machine learning techniques. Big Data and Cognitive Computing. 2025;9(1):16 https://doi.org/10.3390/bdcc9010016
crossref
19. Chicco D, Warrens MJ, Jurman G. The Matthews correlation coefficient (MCC) is more informative than Cohen's kappa and Brier score in binary classification assessment. IEEE Access. 2021;9: 78368-78381. https://doi.org/10.1109/ACCESS.2021.3084050
crossref
20. Singh S. Emphasis on the minimization of false negatives or false positives in binary classification. arXiv. 2022;Forthcoming. https://doi.org/10.48550/arXiv.2204.02526
crossref
21. Gwetu MV, Tapamo JR, Viriri S. Exploring the impact of purity gap gain on the efficiency and effectiveness of random forest feature selection. International Conference on Computational Collective Intelligence (ICCCI 2019); 2019 September 4-6; Hendaye, France. Cham: Springer; 2019. p. 340-352. https://doi.org/10.1007/978-3-030-28377-3_28

22. Appana DK, Islam MR, Kim J. Reliable fault diagnosis of bearings using distance and density similarity on an enhanced k-NN. In: Australasian conference on artificial life and computational intelligence; 2016 February 2-5; Canberra, Australia. Cham: Springer; 2016. p. 193-203.https://doi.org/10.1007/978-3-319-51691-2_17

23. Ballabrera Q, Gómez-Romero MJ, Chamarro A, Limonero JT. The relationship between suicidal behavior and perceived stress: the role of cognitive emotional regulation and problematic alcohol use in Spanish adolescents. Journal of Health Psychology. 2023;29(9):950-962. https://doi.org/10.1177/13591053231207295
crossref pmid pmc
24. Stephenson M, Lannoy S, Edwards AC. Shared genetic liability for alcohol consumption, alcohol problems, and suicide attempt: evaluating the role of impulsivity. Translational Psychiatry. 2023;13: 87 https://doi.org/10.1038/s41398-023-02389-3
crossref pmid pmc
25. Chen M, Wang X, Tan DS, Wang H, Guo J, Li J, et al. Tobacco and alcohol use; suicide ideation, plan, and attempt among adolescents; and the role of legal purchase age restrictions: a pooled population-based analysis from 58 countries. BMC Medicine. 2025;23: 163 https://doi.org/10.1186/s12916-025-03983-6
crossref pmid pmc
26. Demirdöğen EY, Akıncı MA, Bozkurt A, Dağcı H. Suicidal attempt in adolescents with major depressive disorder. Namık Kemal Medical Journal. 2023;11(3):294-300. https://doi.org/10.4274/nkmj.galenos.2023.42204
crossref
27. Li Y, Kwok SY. A longitudinal network analysis of the interactions of risk and protective factors for suicidal potential in early adolescents. Journal of Youth and Adolescence. 2023;52(2):306-318. https://doi.org/10.1007/s10964-022-01698-y
crossref pmid
28. Hamilton JL, Tsypes A, Zelazny J, Sewall CJR, Rode N, Merranko J, et al. Sleep influences daily suicidal ideation through affective reactivity to interpersonal events among high-risk adolescents and young adults. Journal of Child Psychology and Psychiatry. 2023;64(1):27-38. https://doi.org/10.1111/jcpp.13651
crossref pmid
29. Bana BD, Kim JJ, Tamanal JM, Kim SH. Sexual experience, suicidal behaviors and depression association, and its tendency to lead to smoking and alcohol consumption among Korean adolescents. Asian Journal of Humanities and Social Studies. 2021;9(4):160-169. https://doi.org/10.24203/ajhss.v9i4.6744
crossref
30. Hutchinson EA, Sequeira SL, Silk JS, Jones NP, Oppenheimer C, Scott L, et al. Peer connectedness and pre-existing social reward processing predicts US adolescent girls' suicidal ideation during COVID-19. Journal of Research on Adolescence. 2021;31(3):703-716. https://doi.org/10.1111/jora.12652
crossref pmid pmc


ABOUT
ARTICLE CATEGORY

Browse all articles >

BROWSE ARTICLES
FOR CONTRIBUTORS
KPMHN
Editorial Office
20 Gunji-ro, Deokjin-gu, Jeonju-si, Jeollabuk-do, 54896 College of Nursing, Jeonbuk National University, Republic of Korea
E-mail: kpmhn0@gmail.com (Editorial office), daek1009@jbnu.ac.kr (Managing Editor)                

Copyright © 2026 by The Korean Academy of Psychiatric and Mental Health Nursing.

Developed in M2PI

Close layer
prev next