NLP

Ancient poetry data set

It contains 232670 five character or seven character quatrains

Chinese dialogue dataset

Dialogue data of 24W deep cleaning

Chinese Internet product user review data set

It contains 586538 comment sentences of Internet products, covering negative emotion to positive emotion

Chinese sentence making corpus

It contains 2445164 sentences, which are made of common words, idioms and other words. Each word contains several sample sentences

Couplet data set

It contains 774491 Chinese couplets

Materials collection of Qi language of ancient and modern Chinese

It contains training set 984611 / verification set 48980 / test set 50000 aligned ancient modern clause corpus

Medical problem generation pre training data

106W deep cleaning problem generated data