Order from Amazon:The
Unicode Standard 5.0, Addison-Wesley (2007), hardcover,
5.4 pounds, 1417 pages: an indispensible reference and the last
printed version of the standard. Soon it will impossible to find. "Hard
copy versions of the Unicode Standard have been among the most crucial and
most heavily used reference books in my personal library for years."
—Don Knuth
UTF-8 is an ASCII-preserving encoding method for
Unicode (ISO 10646), the
Universal Character Set (UCS). The UCS encodes most of the world's writing
systems in a single character set, allowing you to mix languages and scripts
within a document without needing any tricks for switching character sets.
This web page is encoded directly in UTF-8.
As shown HERE,
Columbia University's Kermit 95 terminal emulation
software can display UTF-8 plain text in Windows 95, 98, ME, NT, XP, Vista,
or Windows 7/8/10 when using a monospace Unicode font like Andale Mono WT J or Everson Mono Terminal, or the lesser
populated Courier New, Lucida Console, or Andale Mono. C-Kermit can handle it too,
if you have a Unicode
display. As many languages as are representable in your font can be seen
on the screen at the same time.
It started out as a kind of stress test for UTF-8 support in Web
browsers, which was spotty when this page was first created in 1996 but
which has become standard in all modern browsers.
In fact only now (2021, a quarter century later) every single
"I can eat glass" entry is displayed correctly in
my browser (Firefox on Windows 10), including the Braille and the Gothic.
Contrast with this screen shot from 2002, testing
the UTF-8 support of our Kermit 95 terminal
emulator Microsoft's 2002 version of its Courier New font.
CLICK HERE for an
up-to-date survey of Unicode fonts.
From Laȝamon'sBrut
(The Chronicles of England, Middle English, West Midlands, ca.1190):
An preost wes on leoden, Laȝamon was ihoten
He wes Leovenaðes sone -- liðe him be Drihten.
He wonede at Ernleȝe at æðelen are chirechen,
Uppen Sevarne staþe, sel þar him þuhte,
Onfest Radestone, þer he bock radde.
(The third letter in the author's name is Yogh, missing from many fonts;
CLICK HERE for another Middle English sample
with some explanation of letters and encoding).
Sîne klâwen durh die wolken sint geslagen,
er stîget ûf mit grôzer kraft,
ich sih in grâwen tägelîch als er wil tagen,
den tac, der im geselleschaft
erwenden wil, dem werden man,
den ich mit sorgen în verliez.
ich bringe in hinnen, ob ich kan.
sîn vil manegiu tugent michz leisten hiez.
The first stanza of
Pushkin's Bronze Horseman (Russian):
На берегу пустынных волн
Стоял он, дум великих полн,
И вдаль глядел. Пред ним широко
Река неслася; бедный чёлн
По ней стремился одиноко.
По мшистым, топким берегам
Чернели избы здесь и там,
Приют убогого чухонца;
И лес, неведомый лучам
В тумане спрятанного солнца,
Кругом шумел.
Šota Rustaveli's Veṗxis Ṭq̇aosani,
̣︡Th, The Knight in the Tiger's Skin (Georgian):
ვეპხის ტყაოსანი
შოთა რუსთაველი
ღმერთსი შემვედრე, ნუთუ კვლა დამხსნას სოფლისა შრომასა,
ცეცხლს, წყალსა და მიწასა, ჰაერთა თანა მრომასა;
მომცნეს ფრთენი და აღვფრინდე, მივჰხვდე მას ჩემსა ნდომასა,
დღისით და ღამით ვჰხედვიდე მზისა ელვათა კრთომაასა.
Tamil poetry of Subramaniya Bharathiyar:
சுப்ரமணிய பாரதியார் (1882-1921):
யாமறிந்த மொழிகளிலே தமிழ்மொழி போல் இனிதாவது எங்கும் காணோம்,
பாமரராய் விலங்குகளாய், உலகனைத்தும் இகழ்ச்சிசொலப் பான்மை கெட்டு,
நாமமது தமிழரெனக் கொண்டு இங்கு வாழ்ந்திடுதல் நன்றோ? சொல்லீர்!
தேமதுரத் தமிழோசை உலகமெலாம் பரவும்வகை செய்தல் வேண்டும்.
Kannada poetry by Kuvempu — ಬಾ ಇಲ್ಲಿ ಸಂಭವಿಸು
ಬಾ ಇಲ್ಲಿ ಸಂಭವಿಸು ಇಂದೆನ್ನ ಹೃದಯದಲಿ
ನಿತ್ಯವೂ ಅವತರಿಪ ಸತ್ಯಾವತಾರ
ಮಣ್ಣಾಗಿ ಮರವಾಗಿ ಮಿಗವಾಗಿ ಕಗವಾಗೀ...
ಮಣ್ಣಾಗಿ ಮರವಾಗಿ ಮಿಗವಾಗಿ ಕಗವಾಗಿ
ಭವ ಭವದಿ ಭತಿಸಿಹೇ ಭವತಿ ದೂರ
ನಿತ್ಯವೂ ಅವತರಿಪ ಸತ್ಯಾವತಾರ || ಬಾ ಇಲ್ಲಿ ||
I Can Eat Glass
And from the sublime to the ridiculous, here is a
certain phrase¹ in an assortment of languages:
Arabic(2): أنا قادر على أكل الزجاج و هذا لا يؤلمني.
Japanese: 私はガラスを食べられます。それは私を傷つけません。
Thai: ฉันกินกระจกได้ แต่มันไม่ทำให้ฉันเจ็บ
Notes:
The "I can eat glass" phrase and initial translations (about 30 of them)
were borrowed from Ethan Mollick's I Can Eat Glass page
(which disappeared on or about June 2004) and converted to UTF-8. Since
Ethan's original page is gone, I should mention that his purpose was to offer
travelers a phrase they could use in any country that would command a
certain kind of respect, or at least get attention. See Credits for the many additional contributions since
then. When submitting new entries, the word "hurt" (if you have a choice)
is used in the sense of "cause harm", "do damage", or "bother", rather than
"inflict pain" or "make sad". In this vein Otto Stolz comments (as do
others further down; personally I think it's better for the purpose of this
page to have extra entries and/or to show a greater repertoire of characters
than it is to enforce a strict interpretation of the word "hurt"!):
This is the meaning I have translated to the Swabian dialect.
However, I just have noticed that most of the German variants
translate the "inflict pain" meaning. The German example should
read:
I guess, also these examples translate the wrong sense of "hurt",
though I do not know these languages well enough to assert them
definitely:
Nederlands / Dutch: Ik kan glas eten; het doet mij geen
pijn. (This one has been changed)
Kirchröadsj/Bôchesserplat: Iech ken glaas èèse, mer 't deet miech jing pieng.
In the Romanic languages, the variations on "fa male" (it) are probably
wrong, whilst the variations on "hace daño" (es) and "damaĝas" (Esperanto) are probably correct; "nocet" (la) is definitely right.
The northern Germanic variants of "skada" are probably right, as are
the Slavic variants of "škodi/шкоди" (se); however the Slavic variants
of " boli" (hv) are probably wrong, as "bolena" means "pain/ache", IIRC.
That was from July 2004. In December 2007, Otto writes again:
Hello Frank,
in days of yore, I had written:
> "Ich kann Glas essen ohne mir zu schaden."
> The comma fell victim to the 1996 orthographic reform,
The latest revision (2006) of the official German orthography
has revived the comma around infinitive clauses commencing with
ohne, or 5 other conjunctions, or depending from a noun or
from an announcing demonstrative
(http://www.ids-mannheim.de/reform/regeln2006.pdf, §75).
So, it's again: Ich kann Glas essen, ohne mir zu schaden.
Best wishes,
Otto Stolz
The numbering of the samples is arbitrary, done only to keep track of how
many there are, and can change any time a new entry is added. The
arrangement is also arbitrary but with some attempt to group related
examples together. Note: All languages not listed are wanted, not just the
ones that say (NEEDED).
Correct right-to-left display of these languages
depends on the capabilities of your browser. The period should
appear on the left. In the monospace Yiddish example, the Yiddish digraphs
should occupy one character cell.
Yoruba: The third word is Latin letter small 'j' followed by
small 'e' with U+0329, Combining Vertical Line Below. This displays
correctly only if your Unicode font includes the U+0329 glyph and your
browser supports combining diacritical marks. The Lingala and Indic examples
also include combining sequences.
Includes Unicode 3.1 (or later) characters beyond Plane 0.
The Classic Mongolian example should be vertical, top-to-bottom and
left-to-right. But such display is almost impossible. Also no font yet
exists which provides the proper ligatures and positional variants for the
characters of this script, which works somewhat like Arabic.
Taiwanese is also known as Holo or Hoklo, and is related to Southern
Min dialects such as Amoy.
Contributed by Henry H. Tan-Tenn, who comments, "The above is
the romanized version, in a script current among Taiwanese Christians since
the mid-19th century. It was invented by British missionaries and saw use in
hundreds of published works, mostly of a religious nature. Most Taiwanese did
not know Chinese characters then, or at least not well enough to read. More
to the point, though, a written standard using Chinese characters has never
developed, so a significant minority of words are represented with different
candidate characters, depending on one's personal preference or etymological
theory. In this sentence, for example, "-tàng", "chia̍h",
"mā" and "bē" are problematic using Chinese characters.
"Góa" (I/me) and "po-lê" (glass) are as written in other Sinitic
languages (e.g. Mandarin, Hakka)."
Wagner Amaral of Pinese & Amaral Associados notes that
the Brazilian Portuguese sentence for
"I can eat glass" should be identical to the Portuguese one, as the word
"machuca" means "inflict pain", or rather "injuries". The words "faz
mal" would more correctly translate as "cause harm".
Burmese: In English the first person pronoun "I" stands for both
genders, male and female. In Burmese (except in the central part of Burma)
kyundaw for
male and kyanma for female.
Using here a fully-compliant Unicode Burmese font -- sadly one and only one
Padauk Graphite font exists -- rendering using graphite engine.
Unicode 4.0 or older standard did not have some medial and vowel character;
the second example has them.
From Louise Hope, 22 November 2010:
I decided to have a go at an Inuktitut rendering, mainly in hopes of shaming someone who actually knows the language into coming up with something better.
Meanwhile, try this:
Loosely: I am able not to hurt myself whenever I eat glass.
aliguq >> glass (uninflected because it is the patient of a transitive verb in an ergative language)
nirijaraangakku >> "I eat him/her/it" in Frequentative mood (all one verb with inflectional ending, no affixes whatsoever)
suranngittunnaqtunga >> suraq (do permanent harm) + nngit (verb-negator) + tunnaq (ability) + tunga (intransitive ending, making the verb passive or reflexive)
See above about someone who knows the language, et cetera.
Script trivia: the syllable ᙱ is a single unicode character
representing the two elements ᓐ (syllable-final n) and ᖏ
(syllable ngi). I think they just did it that way because it looks tidier
than the expected ᓐᖏ. If your operating system didn't come
with Euphemia (all-purpose UCAS font), you can download Pigiarniq. It comes with a jolly little inuksuk ᐀ that the Unicode Consortium is trying to make into a squatter.
The Quick Brown Fox... Pangrams
The "I can eat glass" sentences do not necessarily show off the orthography of
each language to best advantage. In many alphabetic written languages it is
possible to include all (or most) letters (or "special" characters) in
a single (often nonsense) pangram. These were traditionally used in
typewriter instruction; now they are useful for stress-testing computer fonts
and keyboard input methods. Here are a few examples (SEND MORE):
English: The quick brown fox jumps over the lazy dog.
Jamaican: Chruu, a kwik di kwik brong fox a jomp huova di liezi daag de, yu no siit?
Irish: "An ḃfuil do ċroí ag bualaḋ ó ḟaitíos an ġrá a ṁeall lena ṗóg éada ó
ṡlí do leasa ṫú?"
"D'ḟuascail Íosa Úrṁac na hÓiġe Beannaiṫe pór Éava agus Áḋaiṁ."
Dutch: Pa's wijze lynx bezag vroom het fikse aquaduct.
German: Falsches Üben von Xylophonmusik quält jeden
größeren Zwerg. (1)
German: Im finſteren Jagdſchloß am offenen Felsquellwaſſer patzte der affig-flatterhafte kauzig-höfliche Bäcker über ſeinem verſifften kniffligen C-Xylophon. (2)
Norwegian: Blåbærsyltetøy ("blueberry jam", includes every
extra letter used in Norwegian).
Swedish: Flygande bäckasiner söka strax hwila på mjuka tuvor.
Icelandic: Sævör grét áðan því úlpan var ónýt.
Finnish: (5) Törkylempijävongahdus (This is a perfect pangram, every letter appears only once. Translating it is an art on its own, but I'll say "rude lover's yelp". :-D)
Finnish: (5) Albert osti fagotin ja töräytti puhkuvan melodian. (Albert bought a bassoon and hooted an impressive melody.)
Finnish: (5) On sangen hauskaa, että polkupyörä on maanteiden jokapäiväinen ilmiö. (It's pleasantly amusing, that the bicycle is an everyday sight on the roads.)
Polish: Pchnąć w tę łódź jeża lub osiem skrzyń fig.
Czech: Příliš
žluťoučký kůň úpěl
ďábelské ódy.
Slovak: Starý kôň na hŕbe
kníh žuje tíško povädnuté
ruže, na stĺpe sa ďateľ
učí kvákať novú ódu o
živote.
Slovenian:
Šerif bo za domačo vajo spet kuhal žgance.
Greek (monotonic):
ξεσκεπάζω την ψυχοφθόρα βδελυγμία
Other phrases commonly used in Germany include: "Ein wackerer Bayer
vertilgt ja bequem zwo Pfund Kalbshaxe" and, more recently, "Franz jagt im
komplett verwahrlosten Taxi quer durch Bayern", but both lack umlauts and
esszet. Previously, going for the shortest sentence that has all the
umlauts and special characters, I had
"Grüße aus Bärenhöfe
(und Óechtringen)!"
Acute accents are not used in native German words, so I was surprised to
discover "Óechtringen" in the Deutsche Bundespost
Postleitzahlenbuch:
It's a small village in eastern Lower Saxony.
The "oe" in this case
turns out to be the Lower Saxon "lengthening e" (Dehnungs-e), which makes the
previous vowel long (used in a number of Lower Saxon place names such as Soest
and Itzehoe), not the "e" that indicates umlaut of the preceding vowel.
Many thanks to the Óechtringen-Namenschreibungsuntersuchungskomitee
(Alex Bochannek, Manfred Erren, Asmus Freytag, Christoph Päper, plus
Werner Lemberg who serves as
Óechtringen-Namenschreibungsuntersuchungskomiteerechtschreibungsprüfer)
for their relentless pursuit of the facts in this case. Conclusion: the
accent almost certainly does not belong on this (or any other native German)
word, but neither can it be dismissed as dirt on the page. To add to the
mystery, it has been reported that other copies of the same edition of the
PLZB do not show the accent! UPDATE (March 2006): David Krings was
intrigued enough by this report to contact the mayor of Ebstorf, of which
Oechtringen is a borough, who responded:
Sehr geehrter Mr. Krings,
wenn Oechtringen irgendwo mit einem Akzent auf dem O geschrieben wurde,
dann kann das nur ein Fehldruck sein. Die offizielle Schreibweise lautet
jedenfalls „Oechtringen“.
Mit freundlichen Grüssen
Der Samtgemeindebürgermeister
i.A. Lothar Jessel
From Karl Pentzlin (Kochel am See, Bavaria, Germany):
"This German phrase is suited for display by a Fraktur (broken letter)
font. It contains: all common three-letter ligatures: ffi ffl fft and all
two-letter ligatures required by the Duden for Fraktur typesetting: ch ck ff
fi fl ft ll ſch ſi ſſ ſt tz (all in a
manner such they are not part of a three-letter ligature), one example of f-l
where German typesetting rules prohibit ligating (marked by a ZWNJ), and all
German letters a...z, ä,ö,ü,ß, ſ [long s]
(all in a manner such that they are not part of a two-letter Fraktur
ligature)."
Otto Stolz notes that "'Schloß' is now spelled 'Schloss', in
contrast to 'größer' (example 4) which has kept its
'ß'. Fraktur has been banned from general use, in 1942, and long-s
(ſ) has ceased to be used with Antiqua (Roman) even earlier (the
latest Antiqua-ſ I have seen is from 1913, but then
I am no expert, so there may well be a later instance." Later Otto confirms
the latter theory, "Now I've run across a book “Deutsche
Rechtschreibung” (edited by Lutz Mackensen) from 1954 (my reprint
is from 1956) that has kept the Antiqua-ſ in its dictionary part (but
neither in the preface nor in the appendix)."
Diaeresis is not used in Iberian Portuguese. Also this pangram
is missing a-tilde (ã) so it's a pænpangram.
From Yurio Miyazawa: "This poetry contains all the sounds in the
Japanese language and used to be the first thing for children to learn in
their Japanese class. The Hiragana version is particularly neat because it
covers every character in the phonetic Hiragana character set." Yurio also
sent the Kanji version:
色は匂へど 散りぬるを
我が世誰ぞ 常ならむ
有為の奥山 今日越えて
浅き夢見じ 酔ひもせず
Finnish pangrams from Mikko Ristilä.
Accented Cyrillic:
(This section contributed by Vladimir Marinov.)
In Bulgarian it is desirable, customary, or in some cases required to
write accents over vowels. Unfortunately, no computer character sets
contain the full repertoire of accented Cyrillic letters. With Unicode,
however, it is possible to combine any Cyrillic letter with any combining
accent. The appearance of the result depends on the font and the rendering
engine. Here are two examples.
Той видя бялата коса́ по главата и́ и ко́са на рамото и́, и ре́че да и́
рече́: "Пара́та по́ па́ри от па́рата, не ща пари́!", но си поми́сли: "Хей,
помисли́ си! А́ и́ река, а́ е скочила в тази река, която щеше да тече́,
а не те́че."
По пъ́тя пъту́ват кю́рди и югославя́ни.
HTML Features
Here is the Russian alphabet (uppercase only) coded in three
different ways, which should look identical:
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
(Literal UTF-8)
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
(Decimal numeric character reference)
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
(Hexadecimal numeric character reference)
In another test, we use HTML language tags to distinguish Bulgarian, Russian,
and Serbian,
which have different italic forms for lowercase
б, г, д, п, and/or т:
Bulgarian:
[ бгдпт ]
[ бгдпт ]
Мога да ям стъкло и не ме боли.
Russian:
[ бгдпт ]
[ бгдпт ]
Я могу есть стекло, это мне не вредит.
Serbian:
[ бгдпт ]
[ бгдпт ]
Могу јести стакло
а
да ми
не
шкоди.
Credits, Tools, and Commentary
Credits:
The "I can eat glass" phrase and the initial collection of translations:
Ethan Mollick.
Transcription / conversion to UTF-8: Frank da Cruz.
Albanian: Sindi Keesan.
Afrikaans: Johan Fourie, Kevin Poalses.
Anglo Saxon: Frank da Cruz.
Arabic: Najib Tounsi.
Armenian: Vaçe Kundakçı.
Belarusian: Alexey Chernyak, Patricia Clausnitzer.
Bengali: Somnath Purkayastha, Deepayan Sarkar.
Bislama: Dan McGarry.
Bosnian: Dmitrij D. Czarkoff.
Braille: Frank da Cruz.
Bulgarian: Sindi Keesan, Guentcho Skordev, Vladimir Marinov.
Burmese: "cetanapa", Sithu Thwin.
Cabo Verde Creole: Cláudio Alexandre Duarte.
Catalán: Jordi Bancells.
Chinese: Jack Soo, Wong Pui Lam.
Chinook Jargon: David Robertson.
Cornish: Chris Stephens.
Croatian: Dmitrij D. Czarkoff, Marjan Baće.
Czech: Stanislav Pecha, Radovan Garabík, Tomáš Malý.
Danish: Morten Due Jorgensen.
Dutch: Peter Gotink. Pim Blokland, Rob Daniel, Rob de Wit.
Erzian: Jack Rueter.
Esperanto: Franko Luin, Radovan Garabík, Andrew Lee.
Estonian: Meelis Roos.
Faroese: Jón Gaasedal.
Farsi/Persian: Payam Elahi.
Fijian: Paul Cannon.
Finnish: Sampsa Toivanen, Mikko Ristilä.
French: Luc Carissimo, Anne Colin du Terrail, Sean M. Burke, Theo Morelli.
Galician: Laura Probaos.
Georgian: Giorgi Lebanidze.
German: Christoph Päper, Otto Stolz, Karl Pentzlin, David Krings,
Frank da Cruz, Peter Keel (Seegras), Elias Glantschnig.
Gothic: Aurélien Coudurier.
Greek: Ariel Glenn, Constantine Stathopoulos, Siva Nataraja, Christos Georgiou.
Hebrew: Jonathan Rosenne, Tal Barnea.
Hausa: Malami Buba, Tom Gewecke.
Hawaiian: na Hauʻoli Motta, Anela de Rego, Kaliko Trapp.
Hindi: Shirish Kalele, Nitin Dahra, Shardul Chiplunkar.
Hungarian: András Rácz, Mark Holczhammer.
Icelandic: Andrés Magnússon, Sveinn Baldursson.
International Phonetic Alphabet (IPA): Siva Nataraja / Vincent Ramos.
Inuktitut: Louise Hope.
Irish: Michael Everson, Marion Gunn, James Kass, Curtis Clark.
Italian: Thomas De Bellis.
Jamaican: Stephen J. Cherin.
Japanese: Makoto Takahashi, Yurio Miyazawa.
Kannada: Sridhar R N, Alok G. Singh.
Karelian: Aleksandr Semakov.
Khmer: Tola Sann.
Kirchröadsj: Roger Stoffers.
Kreyòl: Sean M. Burke.
Korean: Jungshik Shin.
Langenfelder Platt: David Krings.
Lao: Tola Sann.
Lëtzebuergescht: Stefaan Eeckels.
Lingala:Denis Moyogo Jacquerye
(Nkóta ya Kɔ́ngɔ míbalé )
(Nkóta ya Kɔ́ngɔ míbal).
Lithuanian: Gediminas Grigas.
Lojban: Edward Cherlin.
Lusatian: Ronald Schaffhirt.
Macedonian: Sindi Keesan.
Malay: Zarina Mustapha.
Malayalam: Anil Matthews, Bobby Jacob.
Maltese: Kenneth Joseph Vella.
Manx: Éanna Ó Brádaigh.
Marathi: Shirish Kalele, Shardul Chiplunkar.
Marquesan: Kaliko Trapp.
Middle English: Frank da Cruz.
Milanese: Marco Cimarosti.
Mongolian: Tom Gewecke.
Montenegran: Dmitrij D. Czarkoff.
Napoletano: Diego Quintano.
Navajo: Tom Gewecke.
Nórdicg:
Yẃlyan Rott.
Nepali: Ujjwol Lamichhane, Rabi Tripathi.
Norwegian: Herman Ranes, Håvard Kvålen.
Odenwälderisch: Alexander Heß.
Old Irish: Michael Everson.
Old Norse: Andrés Magnússon.
Papiamentu: Bianca and Denise Zanardi.
Pashto: N.R. Liwal.
Pfälzisch: Dr. Johannes Sander.
Picard: Philippe Mennecier.
Polish: Juliusz Chroboczek, Paweł Przeradowski, Wlodzislaw Kostecki.
Portuguese: "Cláudio" Alexandre Duarte, Bianca and Denise
Zanardi, Pedro Palhoto Matos, Wagner Amaral.
Québécois: Laurent Detillieux.
Roman: Pierpaolo Bernardi.
Romanian: Juliusz Chroboczek, Ionel Mugurel.
Romansch: Alexandre Suter.
Ruhrdeutsch: "Timwi".
Russian: Alexey Chernyak, Serge Nesterovitch.
Sami: Anne Colin du Terrail, Luc Carissimo.
Sanskrit: Siva Nataraja / Vincent Ramos.
Sächsisch: André Müller.
Schwäbisch: Otto Stolz.
Scots: Jonathan Riddell.
Serbian: Dmitrij D. Czarkoff, Sindi Keesan, Ranko Narancic, Boris Daljevic, Szilvia Csorba,
O. Dag.
Sinhalese: Abdul-Ahad (ASM).
Slovak: G. Adam Stanislav, Radovan Garabík.
Slovenian: Albert Kolar, Primož Gabrijelčič.
Spanish: Aleida Morel,
Laura Probaos, Ricardo Cancho Niemietz.
Swahili: Ronald Schaffhirt.
Swedish: Christian Rose, Bengt Larsson.
Taiwanese: Henry H. Tan-Tenn.
Tagalog: Jim Soliven.
Tamil: Vasee Vaseeharan, Vetrivel P.
Tatar: Timur.
Telugu: Arjuna Rao Chavala.
Tibetan: D. Germano, Tom Gewecke.
Thai: Alan Wood's wife.
Turkish: Vaçe Kundakçı, Tom Gewecke, Merlign Olnon.
Ukrainian: Michael Zajac, Oleg Podsadny.
Ulster Gaelic: Ciarán Ó Duibhín.
Uzbek: Daniyar Nurgaliev.
Urdu: Mustafa Ali.
Vietnamese: Dixon Au,
[James] Đỗ Bá Phước
杜 伯 福.
Walloon: Pablo Saratxaga.
Welsh: Geiriadur Prifysgol Cymru (Andrew).
Yiddish: Mark David.
Zeneise: Angelo Pavese.
Tools Used to Create This Web Page:
The UTF8-aware Kermit 95 terminal emulator on
Windows, to a Unix host with the EMACS text editor. Kermit
95 displays UTF-8 and also allows keyboard entry of arbitrary Unicode BMP
characters as 4 hex digits, as shown HERE. Hex codes
for Unicode values can be found through the Unicode Consortium
Character Code Charts index.
When submissions arrive by email encoded in some other character set
(Latin-1, Latin-2, KOI, various PC code pages, JEUC, etc), I use the
TRANSLATE command of C-Kermit on a Unix host
(e.g. Linux or NetBSD) to convert the character set to UTF-8 (I could also
use Kermit 95 for this; it has the same TRANSLATE command). That's it -- no
"Web authoring" tools, no locales, no "smart" anything. It's just plain
text, nothing more. Editing in UTF-8 was rather tedious in times past but
as as of about version 23 or so, EMACS handles Unicode pretty well and works
well with Kermit 95 with its terminal character-set set to utf8.
Commentary:
Date: Wed, 27 Feb 2002 13:21:59 +0100
From: "Bruno DEDOMINICIS" <b.dedominicis@cite-sciences.fr>
Subject: Je peux manger du verre, cela ne me fait pas mal.
I just found out your website and it makes me feel like proposing an
interpretation of the choice of this peculiar phrase.
Glass is transparent and can hurt as everyone knows. The relation between
people and civilisations is sometimes effusional and more often rude. The
concept of breaking frontiers through globalization, in a way, is also an
attempt to deny any difference. Isn't "transparency" the flag of modernity?
Nothing should be hidden any more, authority is obsolete, and the new powers
are supposed to reign through loving and smiling and no more through
coercion...
Eating glass without pain sounds like a very nice metaphor of this attempt.
That is, frontiers should become glass transparent first, and be denied by
incorporating them. On the reverse, it shows that through globalization,
frontiers undergo a process of displacement, that is, when they are not any
more speakable, they become repressed from the speech and are therefore
incorporated and might become painful symptoms, as for example what happens
when one tries to eat glass.
The frontiers that used to separate bodies one from another tend to divide
bodies from within and make them suffer.... The chosen phrase then appears
as a denial of the symptom that might result from the destitution of
traditional frontiers.