PTT Corpus Log in Register

Note: File size limit: 400 KB

What is Jseg?


  • An enhanced version of Jieba segmentator
  • Using directed acyclic graph (DAG) and Hidden Markov Model (HMM) to segmentate Chinese words
  • Data are trained with Sinica Corpora
  • Equipped with Emoticon detection
  • F1-score: 0.91