Bilingual Corpus

定义（中文） / Definition (ZH)

“双语语料库”：由两种语言的文本构成、通常按句子或段落进行对齐（alignment）的语料集合，常用于翻译研究、机器翻译、词典编纂与跨语言信息检索等。（也常被称为“平行语料库”的一种典型形式。）

发音（IPA） / Pronunciation (IPA)

/baɪˈlɪŋɡwəl ˈkɔːrpəs/

例句 / Examples

A bilingual corpus helps translators find natural equivalents.
双语语料库能帮助译者找到更自然的对应表达。

By training on a large bilingual corpus, the model learns word alignments and produces more fluent translations in context.
通过在大型双语语料库上训练，模型能学习词语对齐关系，并在语境中生成更流畅的译文。

词源简述（中文） / Etymology (ZH)

bilingual 来自拉丁语前缀 bi-（“二、双”）与 lingua（“语言、舌头”）相关词根，整体意为“使用两种语言的”。corpus 源自拉丁语 corpus（“身体”），在语言学中引申为“文本集合/语料”。合起来即“两种语言的文本集合”。

文献与著作中的用例 / Literary & Notable Works

Statistical Machine Translation（Philipp Koehn）——讨论机器翻译训练数据时常涉及双语/平行语料库。
Foundations of Statistical Natural Language Processing（Christopher D. Manning & Hinrich Schütze）——在统计方法与语料资源部分常提及相关概念。
Corpus Linguistics: Investigating Language Structure and Use（Douglas Biber, Susan Conrad, Randi Reppen）——语料库类型与应用中常出现“双语语料库/平行语料库”的讨论。
The Oxford Handbook of Computational Linguistics（编：Ruslan Mitkov）——跨语言资源与NLP任务中常提到该术语。