信息安全与通信保密

2026, 05, No.390 49-64

模型认知偏见的多维治理

余可欣苏宇

1.中国人民公安大学

基金项目(Foundation):

邮箱(Email):

DOI:

发布时间： 2026-05-20

出版时间： 2026-05-20

移动端阅读

8	0	47
下载次数	被引频次	阅读次数

引用本文下载本文

PDF

引用导出

GB/T 7714-2015 MLA APA Refworks EndNote NoteExpress NoteFirst

摘要全文参考文献出版信息相关文章

摘要：

模型认知偏见是生成式人工智能深度应用中的重大风险，其成因涵盖数据、模型、训练与交互等多维度。既有治理方法多聚焦于判别式模型的显性偏见，在生成式模型适配、隐性偏见检测及多主体协同方面存在不足。针对这些问题，系统剖析了模型认知偏见的生成机理与风险传导路径，从技术、法律、政策3个层面构建了多维协同治理框架，有效提升了治理效能，从而防范模型偏见对个体权益、社会公平与国家安全的深层威胁，推动人工智能向公平、安全、可持续的方向发展。

关键词： 模型认知偏见; 生成式人工智能; 隐性偏见; 多维治理;

Abstract：

Cognitive bias in large language models poses a major risk in the deployment of generative AI(Artificial Intelligence), arising from multiple dimensions including data, model architecture, training mechanisms, and human-machine interaction. Existing governance approaches mainly focus on explicit bias in discriminative models, leaving gaps in addressing the specificities of generative models, detecting implicit bias, and fostering multi-stakeholder coordination. To address these issues, this paper systematically analyzes the formation mechanisms and risk pathways of model cognitive bias, constructs a multidimensional collaborative governance framework from technical, legal, and policy perspectives, and thus effectively enhances governance efficiency and mitigates the threats to individual rights and interests, social equity, and national security, thereby steering AI toward fairness, safety, and sustainability.

KeyWords： model cognitive bias; generative artificial intelligence; implicit bias; multidimensional governance;

如需获取全文，请访问cnki.net

参考文献

[1]门洪华,徐博雅.美国认知域战略布局与大国博弈[J].现代国际关系, 2022(6):1-11.

[2]封帅.国家安全学视域下的人工智能安全研究:议题网络建构的初步尝试[J].国际安全研究, 2023,41(1):26-49.

[3]苏宇.大型语言模型生成虚假信息风险的法律治理[J].环球法律评论, 2025, 47(6):36-52.

[4]Bender E M, Gebru T, McMillan-Major A, et al. On the dangers of stochastic parrots:Can language models be too big?[C]//Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, ACM,2021:610-623.

[5]韩旭至.生成式人工智能治理的逻辑更新与路径优化——以人机关系为视角[J].行政法学研究,2023(6):30-42.

[6]Theodorou B, Danek B, Tummala V, et al. Improving medical machine learning models with generative balancing for equity and excellence[J]. npj Digital Medicine, 2025, 8:100.

[7]Xu Y N, Derricks V, Earl A, et al. Modeling annotator disagreement with demographic-aware experts and synthetic perspectives[PP/OL]. V3. arXiv(2025-11-04)[2026-02-09]. https://doi.org/10.48550/arXiv.2508.02853.

[8]徐月梅,叶宇齐,何雪怡.大语言模型的偏见挑战:识别、评估与去除[J].计算机应用, 2025, 45(3):697-708.

[9]Ravfogel S, Elazar Y, Gonen H, et al. Null it out:Guarding protected attributes by iterative nullspace projection[PP/OL]. V2. arXiv(2020-04-28)[2026-02-09]. https://doi.org/10.48550/arXiv.2004.07667.

[10]Bolukbasi T, Chang K W, Zou J, et al. Man is to computer programmer as woman is to homemaker?Debiasing word embeddings[PP/OL]. V1. arXiv(2016-07-21)[2026-02-09]. https://doi.org/10.48550/arXiv.1607.06520.

[11]Xie Z T, Zhao J B, Wang Y L, et al. MindScope:Exploring cognitive biases in large language models through Multi-Agent Systems[PP/OL]. V1. arXiv(2024-10-06)[2026-02-09]. https://doi.org/10.48550/arXiv.2410.04452.

[12]Bai X, Wang A, Sucholutsky I, et al. Explicitly unbiased large language models still form biased associations[J].Proceedings of the National Academy of Sciences of the United States of America, 2025, 122(8):e2416228122.

[13]李哲,王可,王彪,等.人机融合智能决策：概念、框架与应用[J].电子与信息学报, 2025, 47(10):3439-3464.

[14]European Parliament and Council. Regulation(EU)2024/1689 of the european parliament and of the council of 13 june 2024 laying down harmonised rules on artificial intelligence and amending regulations(ec)no 300/2008,(eu)no 167/2013,(eu)no 168/2013,(eu)2018/858,(eu)2018/1139 and(eu)2019/2144and directives 2014/90/eu,(eu)2016/797 and(eu)2020/1828(artificial intelligence act)[EB/OL].(2024-07-12)[2026-02-09]. https://eur-lex.europa.eu/legalcontent/EN/TXT/?uri=OJ:L_202401689.

[15]李学尧.大语言模型应用中的司法偏误与认知干预[J].政治与法律, 2025(5):65-76.

[16]Buckley T A, Conci R, Brodeur P G, et al. Advancing medical artificial intelligence using a century of cases[PP/OL]. V1. arXiv(2025-09-15)[2026-02-09].http://arxiv.org/abs/2509.12194.

[17]Itzhak I, Belinkov Y, Stanovsky G. Planted in pretraining, swayed by finetuning:A case study on the origins of cognitive biases in LLMs[PP/OL]. V2. arXiv(2025-07-12)[2026-02-09]. https://doi.org/10.48550/arXiv.2507.07186.

[18]清华大学人工智能国际治理研究院.我国算法治理政策研究报告[EB/OL].(2022-12)[2026-02-09].https://aiig.tsinghua.edu.cn/info/1025/1759.htm.

[19]Kornai A. Digital language death[J]. PLoS One, 2013,8(10):e77056.

[20]Joshi P, Santy S, Budhiraja A, et al. The state and fate of linguistic diversity and inclusion in the NLP world[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online:Association for Computational Linguistics, Association for Computational Linguistics, 2020:6282-6293.

[21]Longpre S, Singh N, Cherep M, et al. Bridging the data provenance gap across text, speech, and video[C]//International Conference on Learning Representations,2025:60592-60670.

[22]Santurkar S, Durmus E, Ladhak F, et al. Whose opinions do language models reflect?[PP/OL]. V1. arXiv(2023-03-30)[2026-02-09]. https://doi.org/10.48550/arXiv.2303.17548.

[23]罗茜,蔡文怡.生成式人工智能的偏见：主要表现、发生机制与治理路径[J].福建师范大学学报(哲学社会科学版), 2025(1):95-104.

[24]潘丽.生成式人工智能的伦理问题研究综述——基于CiteSpace的文献计量与可视化分析[J].昆明理工大学学报(社会科学版), 2025, 25(2):57-68.

[25]Shu M, Karell D, Okura K, et al. How latent and prompting biases in AI-generated historical narratives influence opinions[J]. PNAS Nexus, 2026, 5(3):22.

[26]邓蔚,邢钰晗,李逸凡,等.公平性机器学习研究综述[J].智能系统学报, 2020, 15(3):578-586.

[27]喻国明,苏健威,张恩雨.论生成式AI时代的用户需求与表达范式——从分众匹配到层级递进的要素融合网络[J].新闻与传播评论, 2024, 77(1):5-14.

[28]陈雅静.警惕科研中的AI“迎合倾向”-中国社会科学网[EB/OL].(2026-01-23)[2026-02-09]. https://www.cssn.cn/skgz/bwyc/202601/t20260123_5970459.shtml.

[29]Wilson K, Sim M, Gueorguieva A M, et al. No thoughts just AI:Biased LLM hiring recommendations alter human decision making and limit human autonomy[J].ACM Conference on AI, Ethics, and Society, 2025, 8(3):2692-2704.

[30]Wilson K, Caliskan A. Gender, race, and intersectional bias in resume screening via language model retrieval[PP/OL]. V2. arXiv(2024-08-20)[2026-02-09]. https://doi.org/10.48550/arXiv.2407.20371.

[31]Bertrand J, Weill L. Do algorithms discriminate against African Americans in lending?[J]. Economic Modelling,2021, 104:105619.

[32]Omar M, Agbareia R, Apakama D U, et al. New model,old risks:Sociodemographic bias and adversarial hallucinations vulnerability in GPT-5[J]. npj Digital Medicine, 2026, 9:282.

[33]马皑,宋业臻.人工智能犯罪风险评估“算法歧视”现象及其规制路径[J].江淮论坛, 2022(2):119-127.

[34]Li X, Chen Z, Zhang J M, et al. Dark-skin individuals are at more risk on the street:Unmasking fairness issues of autonomous driving systems[PP/OL]. V4. ar Xiv(2024-10-17)[2026-02-09]. http://arxiv.org/abs/2308.02935.

[35]Fihrer G, Rosenberg N, Lazarus S, et al. Closed loop authoritarianism:How AI and users radicalize each other[EB/OL].(2026-02-02)[2026-02-09].https://networkcontagion.us/reports/closed-loopauthoritarianism-how-ai-and-users-radicalize-eachother/.

[36]Nodedrift.OpenAI codex prompt reveals hidden rules[EB/OL].(2026-04-30)[2026-05-04]. https://nodedrift.com/blog/openai-codex-prompt-revealshidden-rules.

[37]新华网.算法推荐放大“共振”激化情绪，如何治理？[EB/OL].(2025-09-09)[2026-03-30]. https://www.news.cn/politics/20250909/74d99fb608a54882a394347f404d933f/c.html.

[38]潘道广,万婕.生成式人工智能的意识形态安全挑战与应对[EB/OL].[2026-01-05][2026-02-10]. https://paper.people.com.cn/rmzk/pc/content/202601/05/content_30134192.html.

[39]夏春晖.人工智能对国家文化安全的影响与对策-中国社会科学网[EB/OL].(2024-06-21)[2026-02-10]. https://www.cssn.cn/skgz/bwyc/202406/t20240621_5760331.shtml.

[40]刘璐.两会专访|AI生成内容版权归谁？自动驾驶出事谁赔？代表齐秀敏呼吁立法明确规则[EB/OL].(2026-03-10)[2026-03-16]. https://www.thepaper.cn/newsDetail_forward_32733888.

[41]苏宇.从算法解释到系统测评——人工智能法治的信息工具变革[J].探索与争鸣, 2025(3):107-116.

[42]刘玉栋,李晓玲.数字政府建设中人工智能应用的治理困境与路径拓展——以京津冀及雄安新区实践为视角[EB/OL].(2026-02-03)[2026-03-16]. https://www.zhangjiakou.jcy.gov.cn/jcdy/202602/t20260203_7565187.shtml.

[43]Scholten F, Rebholz T R, Hütter M. Metacognitive myopia in large language models[PP/OL]. V1. arXiv(2024-08-10)[2026-02-10]. https://doi.org/10.48550/arXiv.2408.05568.

[44]Wang T, Ma Y, Liao K, et al. Token-aware editing of internal activations for large language model alignment[C]//Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing.Suzhou, China:Association for Computational Linguistics, 2025:9473-9475.

[45]Bai Y, Kadavath S, Kundu S, et al. Constitutional AI:Harmlessness from AI feedback[PP/OL]. V1.arXiv(2022-12-15)[2026-02-10]. http://arxiv.org/abs/2212.08073.

[46]Bhattacharya A, Stumpf S, De Croon R, et al.Explanatory debiasing:Involving domain experts in the data generation process to mitigate representation bias in AI systems[PP/OL]. V2. arXiv(2025-02-27)[2026-02-10]. https://doi.org/10.48550/arXiv.2501.01441.

[47]国家网信办,国家发展改革委,教育部,等.生成式人工智能服务管理暂行办法:国家广播电视总局令第15号[EB/OL].(2023-07-10)[2026-02-10]. https://www.gov.cn/zhengce/zhengceku/202307/content_6891752.htm.

[48]Aksoy M, Weber E, Rutinowski J, et al. Evaluating biases in large language models over time:A framework with a GPT case study on political bias[J]. Applied Stochastic Models in Business and Industry, 2026, 42(2):e70078.

[49]Xiang A, Andrews J T A, Bourke R L, et al. Fair humancentric image dataset for ethical AI benchmarking[J].Nature, 2025, 648(8092):97-108.

[50]Kukreja S, Kumar T, Purohit A, et al. A literature survey on open source large language models[C]//Proceedings of the 2024 7th International Conference on Computers in Management and Business, ACM, 2024:133-143.

[51]State of Colorado. SB24-205 consumer protections for artificial intelligence[EB/OL].[2026-02-10]. https://leg.colorado.gov/bills/SB24-205.

基本信息:

中图分类号:D922.17;D923;TP18

引用信息:

[1]余可欣,苏宇.模型认知偏见的多维治理[J].信息安全与通信保密,2026,No.390(05):49-64.

发布时间：

2026-05-20

出版时间：

2026-05-20

请选择需要下载的pdf数据

信息安全与通信保密

使用微信“扫一扫”功能。
将此内容分享给您的微信好友或者朋友圈

引用

GB/T 7714-2015 格式引文

MLA格式引文

APA格式引文

请选择需要下载的pdf数据

信息安全与通信保密

使用微信“扫一扫”功能。将此内容分享给您的微信好友或者朋友圈

引用

使用微信“扫一扫”功能。
将此内容分享给您的微信好友或者朋友圈