1 . Generate BPE model and dictionary
subword-nmt leanr-joint-bpe-and-vocab --input corpus.path -s 30000 --output en.bpe --write-vocabulary dict.en.txt
2. Segment the corpus according to the BPE model
subword-nmt apply-bpe -c en.bpe < corpus.path > corpus.bpe
3. Use Fairseq to train the model based on the dictionary and corpus
3.1Split corpus into training set, verification set, and test set :
sed -n 1,1000000p corpus.bpe > train.en
3.2 Execute the preprocess file
python $FILE/preprocess.py --source-lang en --target-lang zh --trainpref $DATA/train --validpref $DATA --destdir $DATA/preprocess --srcdict dict.en.txt --tgtdict dict.zh.txt