Abstract
This study examines the structural and content consistency of large language models (LLMs) in ethical decision making with a qualitative approach. Responses to basic ethical themes such as “justice”, “non-maleficence”, “autonomy”, “impartiality”, and “goodness” were evaluated using the thematic analysis method of Braun and Clarke (2006). The three-stage coding process analyzed empathy patterns, contextual transitions, and relationships between themes. The findings, supported by Python-supported frequency and variation analyses, revealed that the models exhibited high empathy and solution determination in the themes of “inclusiveness” and “communication” but low structural consistency in the themes of “religion” and “disability”. The responses to the same ethical theme in different contexts were determined to carry semantic shifts. This original study emphasizes that ethical sensitivity should be evaluated based on patterns.
Metrics
References
- Anderson, J., & Rainie, L. (2020). Concerns about AI and human agency. Pew Research Center. https://doi.org/10.1037/tps0000264 DOI: https://doi.org/10.1037/tps0000264
- Andrus, M., Spitzer, E., Brown, J., & Zick, Y. (2021). What we can't measure or understand: Challenges to operationalizing fairness in AI. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (pp. 249–260). https://doi.org/10.1145/3442188.3445939 DOI: https://doi.org/10.1145/3442188.3445888
- Angammana, J. S. K., & Jayawardena, M. (2022). Influence of artificial intelligence on warehouse performance: The case study of the Colombo area, Sri Lanka. Journal of Sustainable Development of Transport and Logistics, 7(2), 80–110. https://doi.org/10.14254/jsdtl.2022.7-2.6 DOI: https://doi.org/10.14254/jsdtl.2022.7-2.6
- Askell, A., Bai, Y., Chen, Y., et al. (2021). A general language assistant as a laboratory for alignment. arXiv preprint arXiv:2112.00861. https://doi.org/10.48550/arXiv.2112.00861
- Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Hume, T., ... & Olsson, C. (2022). Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862.
- Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (pp. 610–623). https://doi.org/10.1145/3442188.3445922 DOI: https://doi.org/10.1145/3442188.3445922
- Binns, R. (2018). Fairness in machine learning: Lessons from political philosophy. Proceedings of the 2018 Conference on Fairness, Accountability and Transparency, 149–159.
- Binns, R., Veale, M., Van Kleek, M., & Shadbolt, N. (2018). 'It's reducing a human being to a percentage': Perceptions of justice in algorithmic decisions. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (pp. 233–245). https://doi.org/10.1145/3173574.3173951 DOI: https://doi.org/10.1145/3173574.3173951
- Birhane, A., van Dijk, J., & Zliobaite, I. (2022). The forgotten margins: A framework for ethical evaluation of AI systems. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (pp. 188–201). https://doi.org/10.1145/3531146.3533166 DOI: https://doi.org/10.1145/3531146.3533157
- Bommasani, R., Hudson, D. A., Adeli, E., et al. (2021). On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258. https://doi.org/10.48550/arXiv.2108.07258
- Braun, V., & Clarke, V. (2019). Reflecting on reflexive thematic analysis. Qualitative Research in Sport, Exercise and Health, 11(4), 589–597. DOI: https://doi.org/10.1080/2159676X.2019.1628806
- Chiang, P. E., Chi, E. H., & Lin, Z. (2023). Assessing ethical robustness of language models across cultures. In Findings of the Association for Computational Linguistics: ACL 2023 (pp. 389–401). https://doi.org/10.18653/v1/2023.findings-acl.389 DOI: https://doi.org/10.18653/v1/2023.findings-acl.389
- Floridi, L., & Cowls, J. (2019). A unified framework of five principles for AI in society. Harvard Data Science Review, 1(1). https://doi.org/10.1162/99608f92.8cd550d1 DOI: https://doi.org/10.1162/99608f92.8cd550d1
- Floridi, L., Cowls, J., Beltrametti, M., et al. (2018). AI4People—An ethical framework for a good AI society: Opportunities, risks, principles, and recommendations. Minds and Machines, 28(4), 689–707. https://doi.org/10.1007/s11023-018-9482-5 DOI: https://doi.org/10.1007/s11023-018-9482-5
- Gabriel, I. (2020). Artificial intelligence, values, and alignment. Minds and Machines, 30(3), 411–437. https://doi.org/10.1007/s11023-020-09539-2 DOI: https://doi.org/10.1007/s11023-020-09539-2
- Ganguli, D., Askell, A., Bai, Y., et al. (2023). Predictability and surprise in large language models. In Proceedings of NeurIPS 2023. https://doi.org/10.48550/arXiv.2305.01640
- Gehrmann, S., Welleck, S., Braverman, J., et al. (2023). Repairing model outputs with structured training. In Proceedings of ACL 2023. https://doi.org/10.18653/v1/2023.acl-main.458
- Glaese, A., McAleese, N., Aslanides, J., et al. (2022). Improving alignment of dialogue agents via targeted human feedback. Advances in Neural Information Processing Systems, 35. https://doi.org/10.48550/arXiv.2204.05862
- Hooker, S., & Kim, B. (2019). What are fairness properties in machine learning? arXiv preprint arXiv:2012.15738. https://doi.org/10.48550/arXiv.2012.15738
- Ji, Z., Lu, Z., & Sun, Z. (2023). Language models as ethical reasoners? In Findings of the Association for Computational Linguistics: ACL 2023 (pp. 1225–1237). https://doi.org/10.18653/v1/2023.findings-acl.1225
- Jobin, A., Ienca, M., & Vayena, E. (2019). The global landscape of AI ethics guidelines. Nature Machine Intelligence, 1, 389–399. https://doi.org/10.1038/s42256-019-0088-2 DOI: https://doi.org/10.1038/s42256-019-0088-2
- Kahneman, D. (2011). Thinking, fast and slow. Farrar, Straus and Giroux.
- Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174. DOI: https://doi.org/10.2307/2529310
- Lyeonov, S., Draskovic, V., Kubaščikova, Z., & Fenyves, V. (2024). Artificial intelligence and machine learning in combating illegal financial operations: Bibliometric analysis. Human Technology, 20(2), 325–360. https://doi.org/10.14254/1795-6889.2024.20-2.5 DOI: https://doi.org/10.14254/1795-6889.2024.20-2.5
- Mokander, J., & Floridi, L. (2021). Ethics-based auditing to develop trustworthy AI. Minds and Machines, 31(4), 595–610. https://doi.org/10.1007/s11023-021-09547-3 DOI: https://doi.org/10.1007/s11023-021-09557-8
- Mubarak, E. M., Zuhair Ridha, A., & Abdullah, Z. T. (2024). Multi-criteria assessment methodology for remanufacturing conventional grinding machines into CNC machine tools. Economics, Management and Sustainability, 9(2), 29–43. https://doi.org/10.14254/jems.2024.9-2.3 DOI: https://doi.org/10.14254/jems.2024.9-2.3
- Mumcu, A. Y. (2024). Exploring the intersection of utilitarianism and sustainability in business: A conceptual analysis. Economics, Management and Sustainability, 9(1), 119-131. https://doi.org/10.14254/jems.2024.9-1.9 DOI: https://doi.org/10.14254/jems.2024.9-1.9
- Nowell, L. S., Norris, J. M., White, D. E., & Moules, N. J. (2017). Thematic analysis: Striving to meet the trustworthiness criteria. International Journal of Qualitative Methods, 16(1), 1–13. DOI: https://doi.org/10.1177/1609406917733847
- Raji, I. D., Binns, R., Veale, M., et al. (2020). Closing the AI accountability gap: Defining responsibility for harmful behavior. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (pp. 33–44). https://doi.org/10.1145/3351095.3372873 DOI: https://doi.org/10.1145/3351095.3372873
- Razavi, N., & Sierpinski, G. (2024). An attempt to determine the impact of the implementation of autonomous vehicles on a larger scale on the planning of city transport systems. Journal of Sustainable Development of Transport and Logistics, 9(1), 96-120. https://doi.org/10.14254/jsdtl.2024.9-1.8 DOI: https://doi.org/10.14254/jsdtl.2024.9-1.8
- Saldaña, J. (2021). The coding manual for qualitative researchers (4th ed.). Sage Publications.
- Santurkar, S., Tsipras, D., & Madry, A. (2023). Whose values are encoded? A cross-cultural comparison of ethical judgments in LMs. arXiv preprint arXiv:2303.13508. https://doi.org/10.48550/arXiv.2303.13508
- Schramowski, P., Dehghani, M., & Kersting, K. (2022). Large language models encode human-like moral norms. Nature Machine Intelligence, 4(9), 840–847. https://doi.org/10.1038/s42256-022-00554-7 DOI: https://doi.org/10.1038/s42256-022-00458-8
- Seniutis, M., Gružauskas, V., Lileikiene, A., & Navickas, V. (2024). Conceptual framework for ethical artificial intelligence development in social services sector. Human Technology, 20(1), 6–24. https://doi.org/10.14254/1795-6889.2024.20-1.1 DOI: https://doi.org/10.14254/1795-6889.2024.20-1.1
- Seniutis, M., Gružauskas, V., Sas, A., Navickas, V., & Švažas, M. (2025). Designing a framework for ethnography-driven prompt engineering in social work. Human Technology, 21(1), 106–127. https://doi.org/10.14254/1795-6889.2025.21-1.5 DOI: https://doi.org/10.14254/1795-6889.2025.21-1.5
- Srivastava, M., Pujara, J., & Getoor, L. (2022). Contextual fairness in machine learning. In Proceedings of the AAAI Conference on Artificial Intelligence, 36(7), 7578–7586. https://doi.org/10.1609/aaai.v36i7.20674 DOI: https://doi.org/10.1609/aaai.v36i7.20674
- Tamkin, A., Brundage, M., Ganguli, D., & Clark, J. (2021). Understanding the capabilities, limitations, and societal impact of large language models. arXiv preprint arXiv:2102.02503. https://doi.org/10.48550/arXiv.2102.02503
- Topol, E. (2019). Deep medicine: How artificial intelligence can make healthcare human again. Basic Books.
- Vaismoradi, M., Jones, J., Turunen, H., & Snelgrove, S. (2016). Theme development in qualitative content analysis and thematic analysis. Journal of Nursing Education and Practice, 6(5), 100–110. https://doi.org/10.5430/jnep.v6n5p100 DOI: https://doi.org/10.5430/jnep.v6n5p100
- Vovk, I., & Vovk, Y. (2024). Sustainable personnel management in the hospitality industry: Enhancing organizational performance through employee engagement and commitment. Economics, Management and Sustainability, 9(2), 44-58. https://doi.org/10.14254/jems.2024.9-2.4 DOI: https://doi.org/10.14254/jems.2024.9-2.4
- Vovk, Y., Vovk, I., Plekan, U., Tson, O., & Oleksyuk, V. (2025). Sustainable and smart logistics centers: Challenges and opportunities for Ukraine’s transport system. Journal of Sustainable Development of Transport and Logistics, 10(1), 116–124. https://doi.org/10.14254/jsdtl.2025.10-1.8 DOI: https://doi.org/10.14254/jsdtl.2025.10-1.8
- Weidinger, L., Mellor, J., & Gabriel, I. (2022). Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359. https://doi.org/10.48550/arXiv.2112.04359
- Zhang, H., Liu, H., & Wang, Y. (2023). Mapping moral variation in AI: A theme-based analysis. AI & Ethics, 4(1), 51–68. https://doi.org/10.1007/s43681-023-00256-z
- Zhou, H., Zhang, H., & Li, Y. (2023). Consistency and contradiction: Ethics in AI decision-making. AI & Ethics, 3(2), 79–94. https://doi.org/10.1007/s43681-023-00201-0