Robert Jäschke,¹
Michel Schwab¹ &
Frank Fischer²
¹ Humboldt-Universität zu Berlin
² Higher School of Economics, Moskau
11. November 2020
Dieses Werk ist lizenziert unter einer Creative Commons Namensnennung 4.0 International Lizenz.
Bildquelle: Wikimedia Commons
= Vittorio Hösle (Quelle:, 2013)
= Alice Schwarzer (Quelle:, 2014)
= Markus Lanz (Quelle:, 2014)
= Jim Koch (Quelle:, 2014)
»Die ›Vossianische Antonomasie‹ ist […] die Setzung eines Eigennamens für ein Appellativ: der Träger des Eigennamens ist eine Person oder Sache, die in Geschichte oder Mythologie eine hervorragende Realisierung der mit dem Appellativ bezeichneten Eigenschaft war. Die hervorragende Person oder Sache ist der Typus […], der sich in der bezeichneten neuen Realisierung wiederfindet. Meist wird der Typus durch ein untypologisches, aktualisierendes Signal (Pronomen, Adjektiv, Genitiv), in jedem Falle durch den (sprachlichen oder situationsmäßigen) Kontext aus der typologischen Ferne in die zu bezeichnende Gegenwart geholt […].«
Heinrich Lausberg: Handbuch der literarischen Rhetorik. Eine Grundlegung der Literaturwissenschaft. Band 2. München: Hueber 1960. §581, S. 301. [diese Seite in der 3. Auflage 1990 bei Google Books; Hervorhebung von uns]
»If we refer to Leonard Cohen as the Lord Byron of rock music, we treat a popular singer as a famous romantic poet elevating him and popular songs to a higher level of culture.«
Ziel: automatische Extraktion aus großen Textkorpora
Korpus: »New York Times« 1987–2007 (Sandhaus 2008)
re.compile("(\\bthe\\s+([\\w.,'-]+\\s+){1,5}?of\\b)", re.UNICODE)
→ Anzahl der Kandidaten für das Jahr 1987 reduzierte sich von 641,432 über 5,236 zu 131.
Entität | Sitelinks |
House (Gebäude) | 184 |
Dr. House (TV Serie) | 79 |
Homer Doliver House (Botaniker) | 9 |
→ Wir behalten Kandidaten, deren Entität der Klasse "human" die bekannteste Entität ist.
Entität | Sitelinks |
Prince (Sänger) | 101 |
Prince (Sohn eines Königs/einer Königin) | 71 |
Beispiel: "the ("O") Marquis ("Person") de ("Person") Sade ("Person") of ("O")"
of | for | among | |
the | 2,779 | 24 | 3 |
a | 118 | 59 | 13 |
an | 14 | 13 | 0 |
Muster | Regex | Wikidata | Sperrliste | Vossantos |
the-of | 12,748,735 | 90,712 | 3,591 | 2,779 |
a-of | 5,900,839 | 11,860 | 705 | 118 |
an-of | 956,247 | 4,539 | 88 | 14 |
the-for | 2,960,459 | 8,070 | 817 | 24 |
a-for | 1,869,946 | 4,812 | 536 | 59 |
an-for | 304,529 | 1,424 | 296 | 13 |
the-among | 122,345 | 139 | 13 | 3 |
a-among | 67,019 | 82 | 25 | 13 |
an-among | 11,158 | 12 | 1 | 0 |
Summe | 24,941,277 | 121,650 | 6,072 | 3,023 |
Anzahl der Vossanto-Kandidaten nach jedem Schritt beim regelbasierten Verfahren
Ansatz | Precision | Recall | F1-Maß |
Regelbasiert | 49.8% | - | - |
Wikidata | 67.3% | 93.0% | 78.1% |
NER | 71.8% | 81.3% | 76.2% |
BLSTM | 86.9% | 85.3% | 86.1% |
Güte der automatisierten Ansätze basiert auf den Ergebnissen des regelbasierten Ansatzes!
Anzahl | Source |
72 | Michael Jordan |
62 | Rodney Dangerfield |
40 | Johnny Appleseed |
36 | Elvis Presley |
36 | Babe Ruth |
25 | Michelangelo |
25 | Donald Trump |
23 | Pablo Picasso |
23 | Bill Gates |
23 | Madonna |
21 | Jackie Robinson |
20 | P. T. Barnum |
20 | Tiger Woods |
19 | Martha Stewart |
17 | William Shakespeare |
17 | Wolfgang Amadeus Mozart |
17 | Cinderella |
16 | Henry Ford |
16 | John Wayne |
15 | Napoleon |
…, 12th men, actresses, Afghanistan, Australia, baseball, BMX racing, boxing, Brazilian basketball for the past 20 years, bull riding, college coaches, computer games, cricket, cyberspace, dance, diving, dressage horses, fast food, figure skating, foosball, football, game shows, geopolitics, golf, Harlem, her time, his day, his sport, his team, his time, hockey, horse racing, hunting and fishing, Indiana, integrating insurance and health care, julienne, jumpers, language, Laser sailing, late-night TV, management in Digital, Mexico, motocross racing in the 1980's, orange juice, real-life bulls, recording, Sauternes, snowboarding, soccer, television puppets, tennis, the Buffalo team, the dirt set, the Eagles, the game, the Hudson, the National Football League, the South Korean penal system, the sport, the White Sox, this sport, women's ball, women's basketball
Barack Obama: »There is a reason you call someone the Michael Jordan of [something]. They know what you’re talking about because Michael Jordan is the Michael Jordan of greatness. He is the definition of somebody so good at what they do that everybody recognizes it. That’s pretty rare.«
Anzahl | Modifier |
56 | his day |
34 | his time |
29 | Japan |
17 | China |
16 | tennis |
16 | his generation |
16 | baseball |
14 | her time |
13 | our time |
13 | her day |
12 | the Zulus |
11 | the 90's |
11 | the 1990's |
11 | politics |
11 | hockey |
10 | the art world |
10 | Brazil |
10 | basketball |
10 | ballet |
9 | jazz |
Vossantos | Vossantos | Rubrik | Artikel | Artikel |
381 | 12.6% | Sports Desk | 174,823 | 9.4% |
222 | 7.4% | Metropolitan Desk | 237,896 | 12.8% |
220 | 7.3% | Book Review Desk | 32,737 | 1.8% |
180 | 6.0% | National Desk | 143,489 | 7.7% |
171 | 5.7% | The Arts/Cultural Desk | 38,136 | 2.1% |
169 | 5.6% | Arts and Leisure Desk | 27,765 | 1.5% |
135 | 4.5% | Magazine Desk | 25,433 | 1.4% |
125 | 4.1% | Editorial Desk | 131,762 | 7.1% |
117 | 3.9% | Cultural Desk | 40,342 | 2.2% |
99 | 3.3% | Movies, Performing Arts/Weekend Desk | 13,929 | 0.8% |
96 | 3.2% | Business/Financial Desk | 112,951 | 6.1% |
90 | 3.0% | Foreign Desk | 129,732 | 7.0% |
78 | 2.6% | Weekend Desk | 18,814 | 1.0% |
74 | 2.5% | Leisure/Weekend Desk | 10,766 | 0.6% |
72 | 2.4% | Long Island Weekly Desk | 20,453 | 1.1% |
69 | 2.3% | Style Desk | 21,569 | 1.2% |
57 | 1.9% | Financial Desk | 206,958 | 11.2% |
44 | 1.5% | Arts & Leisure Desk | 6,742 | 0.4% |
42 | 1.4% | The City Weekly Desk | 22,863 | 1.2% |
41 | 1.4% | Connecticut Weekly Desk | 17,034 | 0.9% |
"When we introduced Word in October 1983, in its first incarnation it was dubbed the Marquis de Sade of word processors, which was not altogether unfair." (1993)
Bildquelle: Wikimedia Commons, Nathan Toasty
Vossanto oder nicht?
"Orange-tanned and silver-haired, Nicholas resembled a backstreet Blake Carrington from Dynasty."
"You know, the Ubers and AirBnBs and Facebooks of the world."
"Sensing that she’s a bit adrift, Nick becomes the Captain America of chivalry, and upon learning her purse was stolen, which included her wallet and credit cards, he’s determined to help her get home."
"Jamie is a Lidl version of Olly Murs, who is in-turn a knockoff of Robbie Williams."
Quelle: XKCD, Randall Munroe / CC BY-NC 2.5
\((PERSON|ORGANIZATION|GPE) *\) (is|has) (often|sometimes)? (been)? (called)?
vossanto_re_str = """
( # target
\((PERSON|ORGANIZATION|GPE)\ (?P<x10>[^)]*?)\)
(\ \((PERSON|ORGANIZATION)\ (?P<x11>[^)]*?)\))?
(\ (?P<x12>[^/()]*?)/NNP?)?
( # is
(is|has|are)/VBZ (\ (often|sometimes)/RB)? (\ been/VBN)? (\ called/VBN)?|
\ the/DT # the
( # source
\((PERSON|ORGANIZATION|GPE)\ (?P<y11>[^)]*?)\)
(\ (?P<y12>[^/()]*?)/NNP?)?
\ (of|among|from)/IN # of
\ ( # modifier
(\ (?P<z12>[^/()]*?)/(IN|JJ))?
(\ (?P<z13>[^/()]*?)/NNS?)?
((?P<z40>[^/()]*?)/(CD|DT|JJ)\ )?
\((ORGANIZATION|PERSON|GPE)\ (?P<z41>[^)]*?)\)
(\ (?P<z43>[^/()]*?)/NN[SP]?)?
\ [\.,-]/[\.,:]
n | Target, | the | Source | of | Modifier | Beispiel |
13 | PER, | DT | ORG | IN | NNS | (PER Thomas Eisner) ,/, the/DT (ORG St. Francis) of/IN bugs/NNS |
13 | PER, | DT | ORG | IN | GPE | (PER Ethel Merman) ,/, the/DT (ORG Birgit Nilsson) of/IN (GPE Broadway) |
10 | PER, | DT | ORG | IN | NN | (PER Eddie Shore) ,/, the/DT (ORG Babe Ruth) of/IN hockey/NN |
9 | PER, | DT | ORG | IN | DT NN | (PER Hal Schell) ,/, the/DT (ORG Boswell) of/IN the/DT delta/NN |
7 | PER | VBZ DT | ORG | IN | NN | (PER Mr. Solerwitz) is/VBZ the/DT (ORG Babe Ruth) of/IN ripoffs/NNS |
7 | PER | DT | PER | IN | GPE | (PER Oscar) ,/, the/DT (PER Larry Bird) of/IN (GPE Brazil) ,/, |
7 | PER, | DT | ORG | IN | NN NNS | (PER Edward Rogoff) ,/, the/DT (ORG Don Quixote) of/IN cab/NN riders/NNS |
6 | PER, | DT | ORG | IN | JJ NN | (PER Carlos Gardel) ,/, the/DT (ORG Elvis) of/IN tango/JJ culture/NN |
5 | PER, | DT | ORG | IN | NN NN | (PER Johnny Miller) ,/, the/DT (ORG Simon Cowell) of/IN golf/NN criticism/NN |
5 | GPE, | DT | ORG | IN | NNS | (GPE McEnroe) ,/, the/DT (ORG Picasso) of/IN players/NNS |
Länge | Häufigkeit | Muster | Beispiel |
1 | 183 | ORG | the/DT (ORG Cook Island Michael Jordan) |
1 | 3 | FACILITY | the/DT (FACILITY Faroese Michael Jordan) |
1 | 5 | PER | the/DT (PER Chadian Michael Jordan) |
2 | 1 | GPE ORG | the/DT (GPE French) (ORG Guianese Michael Jordan) |
2 | 1 | ORG PER | the/DT (ORG Americans) (PER Michael Jordan) |
2 | 2 | NNP PER | the/DT I-Kiribati/NNP (PER Michael Jordan) |
2 | 3 | JJ PER | the/DT Saint-Martinoise/JJ (PER Michael Jordan) |
2 | 89 | GPE PER | the/DT (GPE Grenadian) (PER Michael Jordan) |
2 | 9 | LOC PER | the/DT (LOC South Sudanese) (PER Michael Jordan) |
3 | 1 | GPE NNP PER | the/DT (GPE U.S.) Virgin/NNP (PER Island Michael Jordan) |
3 | 1 | ORG CC PER | the/DT (ORG Wallis) and/CC (PER Futuna Michael Jordan) |
3 | 2 | GPE CC PER | the/DT (GPE Turks) and/CC (PER Caicos Island Michael Jordan) |
3 | 2 | GPE LOC PER | the/DT (GPE French) (LOC Polynesian) (PER Michael Jordan) |
Länge | Häufigkeit | Muster | Beispiel |
1 | 1 | PERSON | the/DT (PERSON Albanian John Doe) |
2 | 1 | NNPS PERSON | the/DT Americans/NNPS (PERSON John Doe) |
2 | 3 | JJ PERSON | the/DT Saint-Martinoise/JJ (PERSON John Doe) |
2 | 4 | FACILITY PERSON | the/DT (FACILITY Faroese) (PERSON John Doe) |
2 | 4 | NNP PERSON | the/DT Somalilander/NNP (PERSON John Doe) |
2 | 11 | LOCATION PERSON | the/DT (LOCATION South Sudanese) (PERSON John Doe) |
2 | 90 | GPE PERSON | the/DT (GPE Grenadian) (PERSON John Doe) |
2 | 178 | ORGANIZATION PERSON | the/DT (ORGANIZATION Cook Island) (PERSON John Doe) |
3 | 1 | GPE CC PERSON | the/DT (GPE Wallisian) or/CC (PERSON Futunan John Doe) |
3 | 1 | GPE PERSON PERSON | the/DT (GPE British) (PERSON Virgin Island) (PERSON John Doe) |
3 | 1 | ORGANIZATION CC PERSON | the/DT (ORGANIZATION Wallis) and/CC (PERSON Futuna John Doe) |
3 | 2 | GPE LOCATION PERSON | the/DT (GPE French) (LOCATION Polynesian) (PERSON John Doe) |
3 | 2 | GPE NNP PERSON | the/DT (GPE French) Guianese/NNP (PERSON John Doe) |
4 | 1 | GPE CC PERSON PERSON | the/DT (GPE Turks) and/CC (PERSON Caicos Island) (PERSON John Doe) |
4 | 1 | GPE NNP NNP PERSON | the/DT (GPE U.S.) Virgin/NNP Island/NNP (PERSON John Doe) |