INTRODUCTION
Adolescent suicide represents a serious global public-health concern. According to the World Health Organization [
1], approximately 727,000 individuals die by suicide each year, and suicide ranks as the third leading cause of death among those aged 15 to 29. In Korea, a 2024 survey by the Korea Disease Control and Prevention Agency (KDCA) reported high prevalence of depressive symptoms and suicidal ideation among adolescents [
2]. Mitchell et al. [
3] reported high susceptibility of adolescents to peer influence, and emotional and behavioral phenomena —including depression, anxiety, self-harm, and suicide— spread through peer networks. This factor underscores the need for early intervention [
3].
Previous studies that examined factors associated with adolescent suicide have reported that various psychosocial variables—depression, stress, sleep, academic burden, family environment, and peer relationships—are associated with adolescent suicide [
4-
6]. However, because most studies have relied on cross-sectional designs in general adolescent samples or have examined only single factors, they could not precisely identify the high-risk individuals who attempt suicide. Specifically, variablecentric approaches do not capture complex interactions among factors and thus are too limited to predict suicide attempts.
Though suicidal ideation is the strongest predictor of suicide attempts, studies scarcely explore the 'Ideation-toattempt transition factors' among adolescents who report such ideation. Klonsky and May [
7] emphasized that suicidal ideation and suicidal action are distinct processes and proposed the Ideation-to-Action Framework. This model posits that although many adolescents contemplate suicide, only a limited proportion of them proceed to actual attempts, and that additional factors—such as suicidal planning, impulsivity, distress tolerance, and environmental-risk factors—intervene in the transition from ideation to action [
8]. Therefore, studies focusing on adolescents with suicidal ideation should identify factors that facilitate the transition from ideation to attempts and provide evidence to prevent this progression. Adolescents with suicidal ideation represent a distinct high-risk group, often exhibiting greater psychological distress, interpersonal difficulties, and risk behaviors than the general adolescent population, including heightened susceptibility to peer influence and emotional contagion within social networks [
3]. Furthermore, emerging suicide theories emphasize that suicidal ideation and suicide attempts are not linear outcomes of the same process, but rather distinct stages in which additional factors—such as suicidal planning, impulsivity, and reduced distress tolerance—facilitate the transition from ideation to action [
8]. Therefore, examining sociodemographic, psychological, and behavioral determinants within this subgroup is essential for developing targeted suicide-prevention strategies, particularly in mental-health and school-based nursing practice.
Accordingly, this study seeks to employ machine-learning methods to comprehensively capture interactions among variables rather than relying solely on traditional statistical approaches. Machine learning encompasses various algorithms, and each method exhibits different strengths and limitations depending on data characteristics. Consequently, rather than limiting analysis to a single method, diverse models should be applied and their performance should be compared [
9]. Machine learning can be used for the precise identification of key factors within complex data structures and to offer practical utility through the early prediction of the risk of suicide attempts using simple survey responses alone [
10].
This study extends previous research by applying a machine-learning approach within a nursing science perspective, clearly distinguishing itself from prior studies that primarily relied on traditional statistical analyses. In addition, it incorporates an integrated framework reflecting sociodemographic, psychological, and behavioral determinants relevant to adolescent mental-health nursing, moving beyond variable-centric analyses.We applied multiple machine-learning models and compared their predictive performance to identify the optimal model for predicting adolescent suicide attempts. Accordingly, this study utilized the nationally representative, regularly administered, large-scale dataset from the 20th Korea Youth Risk Behavior Survey (KYRBS) 2024 [
2] to compare and analyze machine-learning-based predictive models among adolescents who reported suicidal ideation; by identifying key predictors and model performance, our findings may support early clinical screening and risk stratification in adolescent mental-health nursing practice and inform the development of tailored intervention strategies for suicide prevention.
RESULTS
1. Suicide Attempts by Sociodemographic Variables
Table 1 shows results of the analysis of Sociodemographic variables. The sample comprised 2,313 males (36.6 %) and 4,003 females (63.4%). The type of school was middle school for 3,680 participants (58.3%) and high school for 2,636 (41.7%). Academic achievement was high in 734 (11.6%), medium in 4,706 (74.5%), and low in 876 (13.9%). Economic status was high in 613 (9.7%), medium in 5,445 (86.2%), and low in 258 (4.1%). Regarding the type of residence, living with family was most common in 6,003 (95.0%), followed by other arrangements (e.g., living with relatives, boarding, living alone, or in a dormitory) in 283 (4.5%) and living in a childcare facility in 30 (0.5%). Regarding the perceived health status, 2,895 (45.8%) participants were "healthy"; 2,045 (32.4%) were average; and 1,376 (21.8%) were poor. The mean age was 14.77±1.70 years.
When suicide attempts by sociodemographic variables were analyzed, statistically significant differences were observed for academic achievement (x2=58.09, p<.001), economic status (x2=39.03, p<.001), type of residence (x2=13.60, p=.001), and perceived health status (x2=44.11, p< .001), whereas no significant differences were found for sex (x2=0.14, p=.076), type of school (x2=1.10, p=.295), perceived body image (x2=1.24, p=.539), or age (t=1.67, p=.950).
2. Suicide Attempts by Psychological and Behavioral Risk Factor
Findings of the analysis of variables related to psychological and behavioral risk factors are presented in
Table 2.
Suicide attempts by psychological risk factor variables, were analyzed, statistically significant differences were observed for perceived stress (x2=12.73, p<.001), experiences of sadness and despair (x2=64.50, p<.001), suicidal planning (x2=810.53, p<.001), experiences of loneliness (x2=19.65, p<.001), exposure to violence (x2=96.37, p< .001), anxiety (t=-11.18, p<.001).
Suicide attempts by behavioral risk factor variables, were analyzed, statistically significant differences were observed for high-caffeine beverage consumption (x2=13.69, p<.001), physical activity (x2=5.68, p=.017), sedentary time (weekday, academic) (t=7.12, p<.001), sedentary time (weekday, non-academic) (t=-3.94, p<.001), sedentary time (weekend, academic) (t=6.10, p<.001), sedentary time (weekend, non-academic) (t=-4.10, p<.001), smartphone usage time (weekday) (t=-6.70, p<.001), and smartphone usage time (weekend) (t=-5.66, p<.001), alcohol use (x2=67.87, p<.001), smoking experience (x2=71.20, p< .001), peer smoking status (x2=38.56, p<.001), sexual intercourse experience (x2=60.62, p<.001), habitual drug use (x2=105.92, p<.001).
By contrast, no statistically significant differences were found for consumptions of breakfast (x2=0.86, p=.335), sugar-sweetened beverages (x2=1.00, p=.316), or fast food (x2=3.02, p=.082).
3. Comparison of Machine Learning Models
To compare the performance of models, we evaluated them using AUC, CA, the F1 score, precision, recall, and MCC. Results of the performance comparison are presented in
Table 3.
The logistic regression model achieved an AUC of 0.77, a CA of 0.84, an F1 of 0.80, a precision of 0.80, a recall of 0.84, and an MCC of 0.24, representing the highest values across all metrics. The random forest model yielded an AUC of 0.71, a CA of 0.83, an F1 of 0.80, a precision of 0.79, a recall of 0.83, and an MCC of 0.22. The k-nearest neighbors (KNN) model showed an AUC of 0.66, a CA of 0.82, an F1 of 0.79, a precision of 0.78, a recall of 0.82, and an MCC of 0.19. Across all evaluation metrics, logistic regression demonstrated the strongest predictive performance. It showed the highest AUC, classification accuracy, recall, and precision among the three models. Although its F1 score was equal to that of the random forest model, logistic regression achieved a superior Matthews correlation coefficient, reflecting more balanced and reliable performance.
Additionally,
Figure 1 shows the results of the variable importance analysis based on the gini index. The top variables (gini index), in descending order, were suicidal planning (0.036), experiences of loneliness (0.006), anxiety (0.006), habitual-drug use experience (0.005), exposure to violence (0.005), perceived stress (0.004), sexual intercourse experience (0.003), smoking experience (0.003), experiences of sadness and despair (0.003), and alcohol use (0.003).
4. Logistic Regression Analysis
Table 4 shows the findings of the analysis of variable importance using the Gini index and a binary logistic re-gression model with the top nine variables, which were used to identify factors associated with suicide attempts among adolescents.
In the logistic regression analysis, suicidal planning (Odds Ratio [OR]=5.74, 95% Confidence Intervals [CI]=4.95~6.66, p<.001) was associated with approximately 5.7 -fold higher odds of a suicide attempt, and habitual-drug use experience with approximately 1.8-fold higher odds (OR=1.82, 95% CI=1.35~2.44, p<.001). Further, exposure to violence (OR=1.84, 95% CI=1.43~2.38, p<.001), peer smoking status (OR=1.29, 95% CI=1.04~1.60, p=.020), experiences of sadness and despair (OR=1.33, 95% CI=1.11~1.60, p=.002), perceived stress (OR=1.42, 95% CI=1.21~1.65, p<.001), and the anxiety (OR=1.03, 95% CI=1.01~1.04, p<.001) were also statistically significant predictors of suicide attempts.
By contrast, sexual intercourse experience (OR=1.17, 95% CI=0.93~1.45, p=.188) and experiences of loneliness (OR=1.23, 95% CI=0.94-1.62, p=.130) were not statistically significant because 95% CI for these variables included 1.
DISCUSSION
This study aimed to identify factors that contribute to suicide attempts among adolescents with suicidal ideation and explore the optimal predictive approach by comparing the performance of various machine-learning models. The analysis revealed significant differences by suicideattempt status in academic achievement, economic status, the type of residence, and the perceived health status. Unlike prior research, which has typically examined general adolescent populations and identified depression, anxiety, stress, and the lack of social support as primary correlates of suicidal ideation [
16,
17], the present study restricted the sample to adolescents who reported suicidal ideation and identified distinct factors associated with its transition to actual suicide attempts.
To predict suicide attempts among adolescents with suicidal ideation, we compared logistic regression, random forest, and k-nearest neighbors (KNN), and evaluated performance using AUC, CA), F1, precision, recall, and MCC. The logistic regression model was used to correctly classify approximately 83.7% cases, demonstrating the highest predictive performance compared with the other models. The AUC and F1 scores —0.77 and 0.80, respectively— demonstrated robust discriminative performance and a favorable balance between precision and recall. Notably, high scores for precision (80.3%) and recall (83.7%) enabled the effective capture of the at-risk group while minimizing false positives; this aligns with prior findings that higher precision and recall can reduce false positives [
18]. Additionally, the highest MCC value of 0.24 suggests the relatively stable predictive performance for the logistic regression model even in the presence of class imbalance [
19].
These results indicate that logistic regression, a traditional statistical method, is useful because of its straightforward interpretability and predictive performance. The accuracy of approximately 82.5% for random forest model indicates comparatively strong predictive performance; however, with an AUC of 0.71, its discriminative performance was somewhat lower than that of logistic regression. Nevertheless, the high score of 82.5% for recall indicates the strength of minimizing false negatives and avoiding missed at-risk cases [
20]. Notably, variable-importance analysis using the Gini index identified suicidal planning, loneliness, generalized anxiety, drug use, and exposure to violence as key risk factors. Accordingly, random forest may serve as a useful analytical tool for exploring risk factors and identifying key variables [
21].
Although the KNN model achieved an accuracy of approximately 82.3% and is apparently comparable to other models, its AUC and MCC of 0.66 and 0.19, respectively, indicates the weakest performance in terms of predictive stability and class balance. These results likely stem from algorithmic characteristics of KNN—namely, its sensitivity to sample size and class imbalance [
22]. In fact, this study used survey-based data and the survey was restricted to adolescents with suicidal ideation; the suicide-attempt group was relatively small, and most variables were categorical or binary. This data structure likely disadvantaged the KNN algorithm—which computes similarity based on distance—and led to reduced performance [
22]. Therefore, although KNN showed limited performance in the context of this study, it may still serve as a useful auxiliary classifier when sample sizes are adequate and class distributions are balanced, or when the goal is to detect local patterns.
Taken together, all three models were found to achieve accuracies of above 80%; however, logistic regression was found to be the best-performing model in terms of both accuracy and predictive stability. By contrast, random forest showed the strength of exploring key risk factors for suicide attempts through variable importance analysis, and despite its limited overall performance, KNN was informative for identifying local patterns within groups. These findings suggest that, in predicting adolescent suicide attempts, model choice may vary depending on the objective —optimizing predictive performance versus exploring risk factors. It is also important to note that the comparison of machine-learning models in this study was conducted for descriptive and exploratory purposes within a predictive analytics framework, rather than for formal statistical significance testing of performance differences. Therefore, differences in AUC, F1, and other performance metrics should not be interpreted as statistically significant, and the results should be understood as comparative indicators to support model selection.
By integrating the analysis of the importance of machine-learning into the logistic regression results, this study identified suicidal planning as the strongest predictor of suicide attempts. This finding is consistent with those of Klonsky and May [
7] and reaffirms that suicidal planning is key to separating mere ideation from suicidal action. A suicidal plan implies that concrete feasibility has already been considered; at this point, suicide risk escalates sharply, indicating that suicidal planning should be prioritized as a target for assessment in prevention efforts.
Further, drug use and exposure to violence emerged as important predictors. This aligns with the finding of Ballabrera et al. [
23] who contends that substance use and exposure to violence during adolescence exacerbate impulsivity, diminish self-control, and heighten emotional instability, thereby increasing suicidal-behavior risk. Alcohol and smoking were also identified as risk factors; alcohol interacts with impulsivity to directly precipitate suicidal urges and behaviors [
24], whereas smoking—when coupled with risk behaviors within peer groups—has been reported to indirectly increase suicide risk [
25].
Among emotional factors, experiences of sadness and despair and generalized anxiety were significantly associated with suicide attempts. This aligns with prior findings that depression and anxiety constitute core psychopathological underpinnings of adolescent suicide attempts [
26]. Notably, hopelessness—when combined with negative expectations about the future—substantially increases the risk of suicidal ideation or attempts [
27]. In contrast, generalized anxiety, although its odds ratio is relatively modest, may heighten suicide risk through a cascade of chronic anxiety leading to sleep deprivation and heightened emotional reactivity, ultimately precipitating interpersonal difficulties [
28].
Meanwhile, some variables exhibited high importance in the machine-learning analyses but were not statistically significant in the logistic regression analysis. Sexual-intercourse experience was not significant in the logistic regression analysis but showed greater importance according to the machine-learning of gini index-based variable importance. This may be because early sexual debut often co-occurs with other risk behaviors—such as alcohol use, smoking, and drug use [
29]. Therefore, in machine learning, in which complex interactions are captured, the influence of sexual debut tends to be weighted more heavily.
Loneliness was likewise not significant in the regression analysis but emerged as a major factor in the random forest analysis. Because adolescence is a period of heightened sensitivity to peer relationships, social isolation and loss of belonging can be directly linked to suicidal urges [
30]. Moreover, correlations with hopelessness and anxiety may have attenuated its independent effect in the regression analysis. By contrast, machine learning is less sensitive to inter-variable correlations, so the extent to which loneliness contributed to classification accuracy appears to have been directly captured. Therefore, it would be identified as an important variable.
These findings exemplify distinctions between traditional regression models and machine learning approaches, suggesting that simultaneous, rather than individual, consideration of multiple complex variables lead to clearer identification of risk factors. Besides conforming substance use, violence, emotional distress, and anxiety as core factors related to suicide planning, this study also identified sexual experience and loneliness—factors that were less evident in traditional statistical analyses but emerged as significant through machine learning. This holds academic significance in elucidating the multidimensional and interactive nature of adolescent suicide attempts.
Furthermore, this study demonstrated that only basic survey data can be used to predict suicide-attempt risk at an early stage, highlighting the potential applicability of screening tools in school and counseling contexts. Specifically, an early-detection system in which suicide planning is employed as a core indicator should be developed. Further, preventive education should address risky behaviors such as substance use, alcohol consumption, smoking, and exposure to violence. Accessibility to counseling and treatment for mental health problems including depression and anxiety should be enhanced. Socioeconomic factors such as academic performance and financial status should be incorporated into multidimensional support.
Beyond the comparative evaluation of model performance, the findings of this study provide important implications for psychiatric nursing practice and community-based mental health services. In particular, school settings, community mental-health centers, and public health offices could utilize machine-learning-based screening approaches to identify adolescents at high risk for progressing from suicidal ideation to suicide attempts. The model developed in this study relies solely on brief self-report survey items and therefore may serve as an accessible decision-support tool for early detection and triage in educational and counseling contexts. Moreover, the results suggest a need to strengthen suicide-prevention education in psychiatric and community mental-health nursing curricula by incorporating training on risk assessment, digital mental-health screening, and the interpretation of machine-learning output. Future research should expand the development and validation of nursing interventions tailored to key determinants identified in this study—such as suicidal planning, substance use, exposure to violence, and emotional distress—and evaluate the effectiveness of integrating predictive analytics into adolescent suicideprevention programs. Based on the key predictors identified in this study—including suicidal planning, substance use, exposure to violence, and emotional distress—psychiatric and community mental-health nursing interventions may incorporate structured suicide safety planning, school -based substance-use prevention and peer-risk education, trauma-informed care approaches, and cognitive-behavioral strategies to reduce anxiety and hopelessness. Integrating these interventions into school and community nursing protocols may strengthen early detection and targeted suicide-prevention efforts for adolescents with suicidal ideation.
Despite its strengths, this study has several limitations. First, the cross-sectional design makes it difficult to establish causal relationships, and the findings cannot be generalized to all adolescents because the analysis was restricted to those who reported suicidal ideation. In addition, because participants with missing values were excluded using a complete-case approach, the final analytic sample may differ systematically from the original population, which could introduce selection bias and potentially lead to underestimation of suicide-attempt risk. Furthermore, potential bias arising from the self-reported nature of the data and the absence of complex sampling weights, which may limit population-level generalizability and introduce sampling error, should be considered when interpreting the results. Moreover, some key variables were measured using single-item self-report questions, which may limit measurement precision. Therefore, future research should employ longitudinal designs to elucidate the transition from suicidal ideation to planning and attempts and develop and validate intervention programs based on key variables. Further, future studies should examine the mediating and moderating effects between suicide planning and other risk factors, while applying advanced machinelearning techniques to identify major determinants, enhance predictive performance, and ultimately utilize such models for adolescent mental-health promotion.
CONCLUSION
Using data from the 20th KYRBS, this study identified factors associated with the transition from suicidal ideation to suicide attempts among adolescents and evaluated machine-learning models to determine the best predictive approach. Suicide planning was identified as the strongest risk factor, followed by substance use, exposure to violence, smoking, feelings of sadness and hopelessness, alcohol use, and generalized anxiety. Additionally, random forest analysis revealed loneliness and sexual experience as meaningful predictors that may not be fully captured through traditional statistical methods.
Based on these findings, the results highlight the potential applicability of school- and community-based early screening systems that utilize brief self-report items to identify adolescents at elevated risk of attempting suicide. Suicide planning, as the most salient factor, should be prioritized as a core indicator during counseling and crisis-intervention assessments. Moreover, prevention programs should include targeted education addressing substance use, smoking, and violence exposure; emotional regulation strategies for sadness, hopelessness, and anxiety; and social-connectedness interventions to reduce loneliness. School nurses, mental-health practitioners, and community agencies may incorporate machine-learning-assisted decision-support tools to improve risk detection and triage.
Future research should continue to refine predictive models and develop tailored interventions aligned with identified risk factors, including programs that combine psychoeducation, peer-support systems, digital mentalhealth services, and protective-factor enhancement to strengthen resilience among adolescents with suicidal ideation.