Universal Language Dictionary
filename: introduc.txt
version: 1995.09.01
The current version of this subject has been found at:
http://www.invisiblelighthouse.com/uld/ - July'02
The following is from an earlier ULD site that closed with no
forwarding address. This earlier project contained a program
to prepare parallel word lists of any two languages that I don't
see on the new site and I find the new all-languages-in-one-listing
approach more difficult for those interested only in English. -- jlb 10Aug97
Copyright 1992-1995 by Richard K. Harrison. All rights reserved.
Permission is hereby granted for unrestricted use of these files by
any individual for his/her own pleasure, private research, personal
communication, etc. Use of these files by any government agency,
business entity, educational institution, or any other organization
requires permission.
acknowledgement
---------------
This project is entirely the result of volunteer effort. Special
thanks to those who took the time to type various vocabularies into
their computers and e-mail them to me. (Their names are mentioned at
the beginning of each *.DIC file)
introduction
------------
The Universal Language Dictionary is an attempt to create a list of
concepts, described in English, along with words to express those
concepts in several "natural" and "artificial" (constructed) languages.
This vocabulary of 1600 terms, along with some knowledge of a
language's grammar, enables one to engage in elementary conversation
and correspondence on a wide variety of topics. And, of course, it
is quite interesting to see the equivalent words listed side-by-side
for comparison. Furthermore, if you are creating an _a_posteriori_
planned language, these wordlists can be extremely helpful.
The list of terms can also be used as a general guide for those who
are in the process of designing artificial languages. A vocabulary
which cannot express most of these concepts is not ready to cope with
the communicative challenges of the modern world.
In their current condition, the wordlists are not really sufficient
for use in automated translation, because they do not contain much
information about irregular verbs, noun gender, inflected forms of
words and so forth.
The list does not include a complete array of function words, for
the following reason. Pronouns, structural particles, articles,
derivational affixes and similar morphemes are difficult to translate
from one language to another, and every language has a different array
of them, depending on its characteristics.
bibliography
------------
The choice of items was not made arbitrarily, but was based upon an
examination of the lists mentioned below. Of course, a few of the
items might seem idiosyncratic, and there are always difficulties
involved in trying to translate a list of terms from one language to
another; sometimes exact equivalents are not available. These flaws
are inevitable in this kind of project.
The selection of vocabulary items was influenced by:
Basic English wordlist; Esperanto baza radikaro; Loglan predicate list;
Lojban predicate list; VOA Special English wordlist; Concise Dictionary
of 26 Languages; Roget's Thesaurus; New Horizon Ladder Dictionary of the
English Language; Minimum Vocabularies of Written Chinese; Jo^yo^ kanji
list.
arrangement
-----------
The items are grouped into 38 topic-categories, which should make it
easier to find a desired item without an index. However, each item
has a three-character hexadecimal serial number, which makes possible
the automated generation of an index for each language included. The
categories are:
1. adpositions (001-020)
2. function words (021-03E)
3. people (03F-05C)
4. titles (05D-060)
5. groupings of people (061-069)
6. body parts and substances (06A-0BE)
7. body terms (0BF-0D1)
8. bodily actions (0D2-0F9)
9. animal species and types (0FA-125)
10. plant species and types (126-15F)
11. natural world (160-188)
12. tools and implements (189-219)
13. clothing (21A-22D)
14. buildings and institutions (22E-24D)
15. government and hierarchy (24E-270)
16. business and transactions (271-2A2)
17. religion and the supernatural (2A3-2B1)
18. mind and emotion (2B2-31F)
19. communication (320-36E)
20. games (36F-380)
21. identity (381-389)
22. numerals (38A-398)
23. quantity (399-3BD)
24. degree (3BE-3CA)
25. dimension, direction (3CB-40E)
26. motion (40F-439)
27. vehicles, etc. (43A-44A)
28. time and sequence (44B-4A0)
29. substances (4A1-4D9)
30. foodstuffs (4DA-4F0)
31. forms of matter (4F1-536)
32. qualities of matter (537-557)
33. matter-related actions (558-58B)
34. misc. matter/energy terms (58C-5A2)
35. light (5A3-5BC)
36. sound (5BD-5C4)
37. heat (5C5-5CC)
38. assorted abstract concepts (5CD-640)
how to assemble the dictionary and add more languages
-----------------------------------------------------
Each language's vocabulary is kept in a separate file. These files
are strictly formatted to facilitate automated processing. Although
I've hacked together some crude programs that assemble these files
into sequential dictionaries, I'm hoping that others will be inspired
to create applications to use these files in a more sophisticated
way.
The program entitled COLLATOR.BAS is written in a version of BASIC
called Microsoft QuickBasic version 4.5. This program will also run
under version 3.0 or later of Microsoft BASIC for the Macintosh, and
under QBasic which comes with version 6 of MS-DOS. The program
entitled COMBINER.C is written in "generic" C.
These programs enable you to assemble any selected group of properly
formatted vocabulary lists into a multi-lingual wordlist. If you
want a German-Novial-UNI dictionary, you can create one using these
programs. Simply boot the program, type in the file names of the
first two vocabulary lists you want to include (e.g. DEUTSCH.DIC and
NOVIAL.DIC), then wait for the first batch of interleaving to be
performed; then enter the name of the next vocabulary list to include
(e.g. UNI.DIC) and so on.
Presently we are limited to 7-bit ASCII text files; diacritical marks
and other non-English characters have to be represented by various
work-arounds. (These typographical work-arounds are described in the
typo_con comment lines at the beginning of each language's .DIC
file.) We are planning to use the ISO 8859-1 character set (when
appropriate) in the future.
Each language is assigned a 3-character "tag"; this should be the
first 3 letters of the language's actual name (e.g. "Deu" for
Deutsch/German); however, if the first 3 letters would not be
sufficiently distinct -- e.g. "Esp" might mean "Esperanto" or
"Espan~ol," "Int" might mean "Interling" or "Interlingua" or
"Interglossa" -- then something more distinctive must be invented.
Clarifications of definitions should be in (parentheses);
part-of-speech, gender of noun, other grammatical data in [brackets];
lexicographers' comments in {braces}.
abbreviations
-------------
aj adjective
aux auxiliary
av adverb
cj conjunction
ij interjection
n noun
num numeral
pfx prefix
pn pronoun
pr preposition
pres present tense
rel relative
sfx suffix
v verb
vi intransitive verb
vt transitive verb
``natural'' languages:
Deu Deutsch (German)
Eng English
Ned Nederlands (Dutch)
planned languages:
Basic English(Jeffrey Henning) Langmaker.
E-o Esperanto (L. L. Zamenhof)
Igl Interglossa (Lancelot Hogben)
Nov Novial (Otto Jespersen)
Tso Tsolyani (Muhammad Ab-dal-Rahman Barker)
UNI UNI (Elisabeth Wainscott)
site: ftp.gate.net
directory: pub/users/hrick/dictionary
* * * * welcome to the Universal Language Dictionary project * * * *
Introduction explains the project and describes the file formats
collator.bas combines *.DIC files into a multi-lingual dictionary
combiner.C combines *.DIC files into a multi-lingual dictionary
counter.bas counts the empty and non-empty entries in a *.DIC file
indexer.bas helps create an alphabetical index to any *.DIC file
basiceng.dic BASIC English
english.dic English
esperant.dic Esperanto
interglo.dic Interglossa
nederlan.dic Dutch
novial.dic Novial
tsolyani.dic Tsolyani
uni.dic UNI
Back to: Ogden's Basic English Homepage
or Word List
About this Page : DicIntro.html
Text file copied from the Universal Language Dictionary project
just before its web site was deleted.
Last updated December 19, 1996
URL: http://zbenglish.net/sites/basic/dicintro.html
Provided and Thanks to ZbEnglish.net