LOPEN project

CWN Sense Tagger

In this project, we aim to solve the Chinese word sense disambiguation problem by state-of-the-art Bert model. It gives us huge performance gains and can score roughly 82% accuracy.

[link] [demo]

Deep Lexicon (DeepLEX)

A large Chinese-centered open lexicon as an alternative resource to atomic lexicon theory.

[link]

Chinese Wordnet (CWN)

CWN aims at constructing a deep semantic and conceptual network. Fine-grained semantic analysis and open relational design are conducive to the structure of langanguage and mine.

[link] [CWN v1] [CWN v2]

HanziAnalysisKit

The Hanzi Glyph Corpus Toolkit (HGCT) and lexicoR facilitate the querying and analysis of Chinese character glyphs within corpora and provides access to various Chinese lexical resources.

[link]

Chinese Word Map (CWM)

CWM is a TSCL-based (Teching Chinese as a Second Language) word sketch engine of lexical knowledge.

[link]

Corpora Open and Search (COPENS)

An open corpus system and query tool. Automatically pre-processing and free annotating.

[link]

PTT Corpus

As a characteristic BBS system in Taiwan, PTT records interesting social and cultural language phenomena. It also provides important empirical information on language contact and evolution.

[link]

Chinese variation

A parellel corpus of Taiwan Mandarin and Mainland Mandarin.

[link]

Toxic Talk

A toxic talk generator trained with comments on Internet

[link]

Collabin

A blog of learning notes wriiten by the lab members.

[link]

Python for Humanities (2018)

[link] [GitHub]

Corpus Linguistics (2018)

[link]

Hands-on Corpus Linguistics Workshop (2018)

[link] [GitHub]

Resource and tools

CWN Sense Tagger

Deep Lexicon (DeepLEX)

Chinese Wordnet (CWN)

HanziAnalysisKit

Chinese Word Map (CWM)

Corpora Open and Search (COPENS)

PTT Corpus

Chinese variation

Toxic Talk

Collaborative learning

Collabin

Open courses

Python for Humanities (2018)

Corpus Linguistics (2018)

Hands-on Corpus Linguistics Workshop (2018)