1 KL散度
KL散度(Kullback–Leibler divergence) 定义如下: D K L ∑ i 1 n P ( x i ) log ( P ( x i ) Q ( x i ) ) D_{KL}\sum_{i1}^nP\left(x_i\right)\times\log\left(\frac{P(x_i)}{Q(x_i)}\right) DKLi1∑nP(xi)log(Q(xi)P(xi))…
前言
原论文:A Pretrainer’s Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity
摘要
预训练是开发高性能语言模型(LM)的初步和基本步骤。尽管如此,预训练数据的设计却严…