Word segmentation is a key step in Chinese text processing.
分词是中文文本处理中关键的一步。
Accurate word segmentation can improve downstream tasks such as search, machine translation, and named entity recognition.
高质量的分词能提升检索、机器翻译和命名实体识别等下游任务的效果。
Speech and Language Processing(Dan Jurafsky & James H. Martin):在分词/切分(含中文等语言场景)的章节与相关讨论中常出现该概念。
Foundations of Statistical Natural Language Processing(Christopher D. Manning & Hinrich Schütze):涉及统计方法下的文本切分与相关建模思路(概念层面与分词密切相关)。
Neural Machine Translation by Jointly Learning to Align and Translate(Bahdanau et al., 2014):虽更聚焦翻译,但在实际NMT管线中常与分词/子词切分一并讨论,语境中常出现“segmentation/word segmentation”等相关表达。