Artikeldetails

Klaas Willems: »Google Books Ngram Viewer« und historische Computerlexikologie: Der Sprachgebrauch der NS-Zeit (MU)

Produkttyp: Beitrag (Zeitschrift)

Autor(in): Klaas Willems

Titel: »Google Books Ngram Viewer« und historische Computerlexikologie: Der Sprachgebrauch der NS-Zeit

Publikation in: Muttersprache, 122. Jahrgang, Heft 2

Seiten: 81–101 (21 Seiten)

Erschienen: 15.09.2012

Abstract: siehe unten

Preis: 4,90 € inkl. MwSt.
(Download)

Abstract

»Ngram Viewer« (http://books.google.com/ngrams) ist ein neues sprachtechnologisches Tool, das im Internet frei zur Verfügung steht und auf einem digitalen Korpus von mehr als 5 Millionen optisch eingescannten Büchern in verschiedenen Sprachen basiert. Das Tool verspricht, ein wichtiges Instrument eines neuen Forschungsfeldes, »culturomics«, zu werden, das einen Zweig der Computerlexikologie darstellt und aufgrund von gedruckten Quellen die Geschichte menschlicher Kultur im weitesten Sinne zum Gegenstand hat. Der vorliegende Beitrag untersucht die Möglichkeiten und den Nutzen von »Ngram Viewer« für die historische Wortschatzanalyse des Deutschen. Das Korpus, aufgrund dessen statistische Daten über die deutsche Sprache erhoben werden können, beträgt zurzeit 37 Milliarden Wörter (zum Vergleich: das englische Korpus enthält 361 Milliarden Wörter). Anhand von mehr als 100 Ausdrücken, die vornehmlich Victor Klemperers Buch LTI. Notizbuch eines Philologen (1947, ³1957), daneben aber auch anderen Publikationen über den Sprachgebrauch der NS-Zeit (u. a. Cornelia Schmitz-Berning, Vokabular des Nationalsozialismus 2000) entnommen sind, werden spezifische Thesen über die Gebrauchsgeschichte einzelner Ausdrücke vor, während und nach der NS-Zeit (1918–1945) überprüft, wobei auch die Zuverlässigkeit von »Ngram Viewer« ein besonderes Augenmerk des Beitrags bildet.

»Ngram Viewer« (http://books.google.com/ngrams) is a new search tool which is freely available through the internet and created to browse over 5 million books in different languages digitized by Google (this is roughly a third of all the books that have been digitized so far and approximately 4 % of all books ever published). »Ngram Viewer« promises to be a powerful tool in an emerging new discipline called »culturomics«. »Culturomics« is a form of computational lexicology focusing on the history of human culture as it manifests itself in published material. This article investigates the possibilities and problems of »Ngram Viewer« with regard to the analysis of the history of the German lexicon. The available German corpus currently contains 37 billion words (compared to 361 billion words in the English corpus). The study draws on a sample of more than 100 expressions (»n-grams«) which are commonly regarded as typical of the language of the Third Reich. The sample is taken from publications such as Victor Klemperer’s book LTI. Notizbuch eines Philologen (1947, ³1957) and Cornelia Schmitz-Berning’s Vokabular des Nationalsozialismus (2000). The usage frequency of the expressions is examined by means of »Ngram Viewer«, with a focus on the historical data before, during and after the Third Reich period (1918–1945), and with special attention to some of the scanning errors arising from difficulties when the optical character recognition program (OCR) is applied to older German texts.

Klaas Willems: »Google Books Ngram Viewer« und historische Computerlexikologie: Der Sprachgebrauch der NS-Zeit (MU)

Abstract

Netzwerke