Does state censorship affect the "judgment" of artificial intelligence? What we can see from the study of "Chinese case"
State censorship can affect artificial intelligence (AI) algorithms, which can change the outcome of AI's "judgment." The results of such a study have recently been revealed in a survey of online Chinese encyclopedias. From this result, various issues related to AI are highlighted.
With globalization of business, educational and government activities, artificial intelligence (AI) rarely hits the border wall. But even in today's modern-day AI programs, where the gold rush for new ideas, algorithms, and talent continues, cultural differences between countries can be noticeable.
Under these circumstances, a new study has revealed how government censorship affects AI algorithms and programs that use them.
Margaret Roberts, a professor of political science at the University of California, San Diego (USCD), and Eddie Jan, a PhD student at the university, chose two subjects. We prepared an AI language program trained using the Chinese version of "Wikipedia" and the online encyclopedia "Baidu Encyclopedia" operated by the country's search giant Baidu (Baidu), and compared the two.
The Chinese version of Wikipedia is not available from within China. On the other hand, Baidu Baike is censored by the Communist Party government. We asked Baidu for comment, but no response was obtained.
The purpose of this study was to find out if AI would learn if there was censorship. If so, the censorship will also be reflected in the language program, which may affect the output of chatbots, voice assistants, translation programs, autocomplete features, etc.
Differences created from the two original data
The linguistic programs studied learn how specific words in different texts are used. It regards different words as nodes connected in physical space, and if the words are close to each other, it is judged that the similarity is high. For example, in the case of a translation program, when an unknown word appears, the meaning can be inferred from the physical positional relationship of the word in both the source and destination languages.
USCD researchers have found that there are significant differences between the two algorithms trained on different datasets. He points out that this difference is believed to be the result of censorship.
For example, algorithms trained on the Chinese Wikipedia tended to associate "democracy" with positive words like "stability." On the other hand, in the algorithm trained in Baidu Baike, "democracy" was regarded as close to a word like "confused".
Roberts and Yang used these two AI algorithms to create a language program that infers the nuances of an article (bright news or dark news) from news headlines. Then, while the Chinese Wikipedia AI gave positive scores to headings containing words such as "election," "freedom," and "democracy," Baidu's AI gave "surveillance," "social control," and "Chinese Communist Party. The score tended to be higher when there was a word such as ".