[Top] | [Contents] | [Index] | [ ? ] |
This file documents the purpose, usage, and technical details of preTeX, a package for preprocessing TeX documents to allow sophisticated typesetting based on natural-language rules (and particularly useful for typesetting Indian language documents written using an English transliteration).
This document applies to version 1.00.
1. Overview What is preTeX? 2. Indian languages Using preTeX to typeset Indian languages 3. More details on using preTeX in general 4. Defining map files How to control preTeX's behavior through its map files 5. How preTeX translates input to output
A. What is a context-free grammar? B. Other packages that typeset Indian languages Concept Index Index
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
PreTeX was originally designed to make it easier to typeset various Indian languages using a standard Roman alphabet, but it can have much broader application.
At its heart, preTeX is a simple preprocessor for TeX and LaTeX documents. It is not a typesetting program; it relies on TeX to actually lay words out on the page. preTeX is a document translator. It converts strings of letters and symbols into different strings of letters and symbols, according to a set of rules laid out in one or more map files. The map files define the conversion using what is called a "context-free grammar," (see section A. What is a context-free grammar?) which allows a lot of fairly smart translations.
This is particularly useful for typesetting Indian languages. All the written Indian languages are alphabetic, like English, with consonant and vowel symbols, but unlike English there are many sets of symbols that are to be written as a single symbol when they appear together. This happens to a lesser extent in typeset English, when for instance the letters `f' and `i' appear in sequence, they are often typeset as a single glyph where the letters run together: `fi'. This is called a ligature.
In most Indian languages, there are a vast number of ligatures and complicated rules for combining letters to form them. In general, in each syllable there are one or more consonants followed by a single vowel; all of these letters are typically written together as one or two symbols. We would like to be able to write Indian language text using English letters, so that we could write `kyaa', for instance, in Devanagari, and expect to see the glyph for the consonants `ky' followed by the glyph for the vowel `aa'. However, if the consonant `r' appears before the vowel, then an accent mark should be written beneath the symbol, unless the leading consonants were any of a number of special consonants that have their own glyph when they blend with an `r'. And so on.
It is possible to describe all of these complex rules using a context-free grammar, such that certain letters are converted to a TeX sequence to print a particular glyph when they appear alone, but to a different TeX sequence to print a different glyph when they appear in conjunction with certain other letters. In fact, that is exactly what the map files supplied with preTeX do.
Because each transliteration scheme is completely defined by a map file, it is possible--even easy--for the user to modify the transliteration behavior to suit his or her personal tastes. No programming skill is necessary; it requires only a bit of clearheaded intuition to understand exactly how the map file works.
Furthermore, it is easy to extend preTeX to handle additional languages, or even to adapt it to tasks which are not related to typesetting. For instance, the map file `dnmeter.map' supplied with preTeX scans a bit of Devanagari verse and identifies the short and long syllables--an important characteristic of the verse--according to rules based on the proximity of certain vowels to a certain number of consonants.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
PreTeX was designed to typeset Indian languages from English transliterated text, by preprocessing the original, transliterated document to produce a new document that can then be processed directly by TeX to produce the desired output.
It is not the only package that works in this way; there are several other Indian language typesetting programs (see section B. Other packages that typeset Indian languages) that take this approach, most of which are specific to one or two Indian languages (one notable exception, itrans, like preTeX, does aim to handle almost all Indian languages through a common interface--see B.1 Avinash Chopde's itrans). PreTeX owes a lot to some of these other packages; in addition to the basic philosophy of design, it depends on various fonts and TeX macros that were originally built for these other packages.
As currently shipped, preTeX can be used to typeset two Indian languages: Tamil and Devanagari.
2.1 General rules for typesetting Indian languages 2.2 Tamil Typesetting Tamil documents 2.3 Devanagari Typesetting Devanagari documents
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Each Indian language transliteration scheme is defined in a different
map file, supplied with preTeX. To use a particular scheme, you'll need
to include the appropriate \pretex
command at the head of your
document (see section 3.1 Referencing map files).
Subsequently, the text in your document that is Indian language text
should be preceded by a keyword to indicate its language. For instance,
if you include \pretex{tamil}
, then text following a
\tml
keyword will be typeset in Tamil; if you include
\pretex{devnag}
, then text following a \dn
keyword will
be typeset in Devanagari. (The scoping rules for the keywords are
actually a little more complicated than that. See section 3.2 Scoping rules for preTeX conversion.)
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Typesetting Tamil requires the wntamil font, a Metafont font designed at the University of Washington. See section B.2 University of Washington's wntamil.
To typeset in Tamil, you must include the sequence
\pretex{tamil}
at the beginning of your document
(see section 3.1 Referencing map files). Subsequently, the keyword \tml
may
be used to mark text that should be typeset in Tamil.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Typesetting Devanagari script requires the Devanagari font originally developed for Frans Velthuis' Devanagari package. See section B.3 Frans Velthuis' Devanagari package.
To typeset in Devanagari, you must include the sequence
\pretex{devnag}
at the beginning of your document
(see section 3.1 Referencing map files). Subsequently, the keyword \dn
may
be used to mark text that should be typeset in Devanagari.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Although it is highly extensible through its use of user-defined map files, there are some behaviors that are specifically built into preTeX and cannot be changed without modification of the source code.
3.1 Referencing map files Referencing map files in your TeX document 3.2 Scoping rules for preTeX conversion Which text will be converted and which won't 3.3 Precompiled map (`.mpc') files The implications of precompiled map files 3.4 Running preTeX Command-line options and environment variables
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A map file cannot be processed by preTeX unless it is explicitly
included in the document, by the placement of an appropriate
\pretex
command in the beginning of a TeX document (or in the
preamble of a LaTeX document). Although it resembles a TeX macro,
\pretex
is actually a command to preTeX to load in and process
the indicated map file. The argument to \pretex
, which must be
enclosed in braces, is the name of the map file to load, without the
`.map' extension. For instance, to load the map file
`tamil.map', which includes the definitions to typeset Tamil, you
would begin your document with \pretex{tamil}
.
You can include multiple different \pretex
commands in a single
document, if you need to reference multiple map files within that
document (for instance, if your document is written in multiple
different Indian languages). Including a map file not only enables the
given language for transliteration, but it also may implicitly include a
number of other TeX commands necessary for typesetting the language;
for instance, to load and define fonts. This is all defined within the
map file.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The appearance of a map keyword in the text generally indicates that all subsequent text until the next closing brace should be translated according to the rules given in the map file. However, there are a few exceptions.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Math mode, as marked by the appearance of a dollar sign (`$') or a double dollar sign (`$$'), temporarily ends text conversion. Text will not be converted until the following matching dollar sign or double dollar sign, ending math mode.
However, the LaTeX convention of `\(' and `\)' to delimit math mode is not respected.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Text following a percent sign (`%'), until the end of line, is generally treated as a comment by TeX and is thus not converted by preTeX. In fact, the default behavior of preTeX is to strip comments out from the file during the conversion process.
Note that there may be some subtle cases in TeX in which a dollar sign does not indicate math mode, and a percent sign does not begin a comment. PreTeX cannot know about these peculiar cases; it always treats these characters as special (except when they are escaped by `\').
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
TeX macros, as identified by a backslash (`\') followed by either a single non-alphabetic character or any number of alphabetic characters, are generally not converted (although the map file may explicitly indicate otherwise). Arguments to macros will also not be converted, as long as they appear within braces (this is actually due to the `Intervening braces' rule, below).
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
If an opening brace appears following a keyword, text following that
brace until its matching closing brace is not converted (unless
another keyword appears within the braces). This rule allows TeX
macros that take arguments within braces to be handled properly. (There
are some exceptions--it is possible for a particular map file to
specify that it should be `persistent' through opening braces.
See section 4.5 The persistent
map file command. It is also possible to specify this on the command
line; see 3.4 Running preTeX.)
The following example demonstrates this:
\pretex{tamil} This text will be typeset, unchanged, in English. {As will this text. \tml ka kaa ki kii ku kuu % This text is interpreted as Tamil. { This text is untranslated, but the font may be incorrect. } ke kee kai ko koo kau k % After the brace closes, we return to Tamil. } And here we return once more to English. |
It is instructive to compare this example with its actual output after processing by preTeX:
\font\tmlfnt=wntml10 \def\c#1c{\char"#1{}} \def\V#1{{\accent241 #1\discretionary{}{}{}}} \def\tml{\tmlfnt} This text will be typeset, unchanged, in English. {As will this text. \tml \c08c \c08ca \c0Ac \c0Bc \c0Cc \c0Dc { This text is untranslated, but the font may be incorrect. } \c16c\c08c \c17c\c08c \c11c\c08c \c16c\c08ca \c17c\c08ca \c16c\c08c\c80c \V{\c08c} } And here we return once more to English. |
Note that the \pretex{tamil}
command is replaced by a sequence
of commands that define the Tamil font, and declare a pair of macros
that are used to reference the Tamil glyphs. They also define the
\tml
keyword itself as a TeX macro that will activate the
Tamil font.
In the subsequent document, all letters following the appearance of the
\tml
keyword are replaced, as instructed by the map file, with
TeX commands to generate the appropriate glyphs in the Tamil font.
Since the \tml
keyword itself is not removed, TeX will switch
to the Tamil font to properly typeset the Tamil glyphs.
Note the scoping rule for the nested passage: text within the nested
braces after the \tml
keyword is not converted. However, TeX
itself does not automatically switch the font back to the Roman font for
text within the nested curly braces (remember, the \tml
keyword
switched to the Tamil font), so the text in this example will almost
certainly be typeset incorrectly.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Some map files are quite large and complex (particularly `devnag.map', for instance), and it may take preTeX several seconds to read the file and prepare it for processing. To avoid having to face this delay each time you convert a document, preTeX will save out a `precompiled' image that contains all of the same information in the map file, in a predigested form, which preTeX can read much more quickly for future sessions instead of the actual map file.
This file is by default given the same name as the map file with the
extension `.mpc', for instance, `devnag.mpc', and is typically
written to a local temporary directory like `/usr/tmp'. It is
possible to change this directory by setting the environment variable
PRETEX_MPC
to the directory you would prefer the `.mpc' files
to be written to.
When preTeX attempts to load a map file, it first looks for a previously stored `.mpc' file. If an `.mpc' file exists (and the corresponding `.map' file hasn't recently been changed), the `.mpc' file is loaded instead.
Occasionally an `.mpc' file may get corrupted or damaged somehow, and it may be necessary to remove it. In general, it is always safe to remove the `.mpc' files, since they can be regenerated at any time from the source `.map' files.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
In normal operation, you will create a document with the extension `.ptx' (don't use the extension `.tex', since that's the filename that preTeX will try to generate). Then you may simply run preTeX on the document to generate the associated TeX input file.
For instance, suppose you had created the document `mydoc.ptx'. You would print it using the command sequence:
There are also a number of options that you may use to control preTeX from the command line. If included, they should appear after the command `pretex' but before the input document filename. Many of them are primarily useful for designing map files or for debugging preTeX itself. They are:
Preload the given mapfile before loading the input document. This has the same effect as putting the command `\pretex{mapfile}' at the beginning of the document. It's useful if you want to use a particular map file to process a document, without having to edit the document to make it reference the map file.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A preTeX map file is capable of defining one or more `languages,' where a language is considered to be a single set of rules for converting text, associated with a particular keyword. By convention, only one language is defined in each map file, although this need not be the case.
The map file consists of a sequence of lines. Each line is either empty, a command, an argument for a previous command, or a command and argument pair in a single line. It is also valid to include comments throughout the map file; as in TeX, a comment is marked by a percent sign (`%'), and extends to the end of the line.
In general, commands always begin flush with the left margin in the file: there is no whitespace before a command. On the other hand, arguments are always indented; there must be at least one space before each argument to differentiate it from a command.
The following commands are supported:
4.1 The language
map file commandBegins a new language definition 4.2 The keyword
map file commandDefines one or more keywords that activate this language 4.3 The top
map file commandSpecify a sequence of TeX commands to insert 4.4 The font
map file commandDefine a TeX attribute to use for certain expansions 4.5 The persistent
map file commandSpecify that this language is applied through braces 4.6 The alphabet
map file commandDefine the lexical alphabet that will be used 4.7 The map
map file commandThe context-free grammer that defines the translation
In the following descriptions, many of the examples are taken from the `tamil.map', distributed with preTeX.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
language
map file command
This command begins a new language definition. It must be the first
command to appear in a map file. All commands following this point in
the map file, up until the next language
command, will relate
to this language.
The argument is the name of the language, and is usually specified on
the same line with the language
command. This name is
presently not used by preTeX; it is strictly for user information.
Example:
language Tamil |
This appears at the top of `tamil.map', to begin the definition for the Tamil language rules.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
keyword
map file command
This command identifies the keyword or keywords that will activate the
use of the conversion rules for this language. The argument, which is
usually specified on the same line with the keyword
command,
consists of one or more words, each of which will be treated as an
equivalent keyword. When the keyword appears in the document as a
TeX macro (i.e. followed by a backslash), then subsequent letters in
the document up until the next closing brace will be converted according
to the rules in this map file (see section 3.2 Scoping rules for preTeX conversion).
Note that the keyword itself is not normally removed from the document;
thus, the top
command (below) should generally also define the
keyword as a TeX macro to enable the appropriate font, or do whatever
other setup is necessary.
It is legal, but of questionable value, to omit the keyword
command, since a language without keywords cannot ever be used (except
via the `-f' command-line option; see 3.4 Running preTeX.)
Example:
keyword tml |
This sets up the keyword `\tml' to activate Tamil text. Thus, preTeX would convert the following text:
ka kaa \tml ki kii ku kuu |
To the following:
ka kaa \tml \c0Ac \c0Bc \c0Cc \c0Dc |
The `\tml' keyword marks the beginning of Tamil conversion, but the `\tml' keyword itself is not removed (or converted in any way).
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
top
map file command The arguments to this command (which are typically given in the following lines) are a number of TeX commands that will be written to the output document in place of the `\pretex' command that activated this language. These TeX commands are not interpreted; they are simply copied verbatim to the output file.
This is generally used to do any necessary TeX setup, and to define a TeX macro for each keyword defined above. This command is optional.
Example:
top \font\tmlfnt=wntml10 \def\c#1c{\char"#1{}} \def\V#1{{\accent241 #1\discretionary{}{}{}}} \def\tml{\tmlfnt} |
The first line, `\font\tmlfnt=wntml10', tells TeX to prepare the Tamil font, `wntml10', for use in this document. The next two lines define some TeX macros that will be used in the map expansion rules for Tamil. The last line, `\def\tml{\tmlfnt}', sets up the `\tml' keyword as a TeX macro that enables the Tamil font.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
font
map file command Sometimes it is necessary to switch TeX fonts in order to typeset certain symbols in a language (which may not be defined in the main font). This optional command is provided as convenient way to facilitate this.
The arguments to this command consistent of one or more font name / definition pairs, one per line. The font name may be any one-word name. The definition gives the TeX sequence to typeset a particular passage of text in the indicated font (or, for that matter, to apply any special formatting properties to the indicated passage of text); the hash mark (`#') symbol stands for the text passage.
The preTeX `font' defined by this command is not necessarily related to any TeX font, nor indeed is it necessarily a font at all; it just becomes a name to refer to some particular TeX sequence that changes the typesetting properties.
Example:
The wntamil font includes glyphs for all of the Tamil letters, but it does not include glyphs for the digits or punctuation marks. Thus, in order to typeset any of these, it is necessary to switch to the Roman font. We thus define a preTeX font, which we will call `roman':
font roman {\rm #} |
This declaration makes the keyword `roman' available when defining
conversions within the map
section (below). In
`tamil.map', all of the digits and punctuation symbols are declared
using this `roman' keyword, which tells preTeX to write punctuation
symbols and digits beteen `{\rm' and `}'.
For instance, the following text:
\tml ka 123 kaa |
Would be translated to:
\tml \c08c {\rm 123} \c08ca |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
persistent
map file command
Normally, preTeX will stop text translation when it encounters an
opening brace, and resume again when the matching closing brace is
encountered (see section 3.2 Scoping rules for preTeX conversion). When the persistent
command appears in a map file, however, that language is deemed to be
`persistent' past an opening brace, and all text following the keyword
(until the next closing brace) will be translated, even text within a
nested pair of braces.
This command is optional (in fact, unusual), and takes no arguments.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
alphabet
map file command This command defines the `letters' that are used in the language. Each individual letter may consist of any number of ASCII characters, but is typically only one or two.
The arguments to this command are the letters of the alphabet, generally one per line. If any two or more letters appear on the same line together, they are considered by preTeX to be alternate ways to write the same letter.
All of the individual ASCII characters are already defined as letters in their own right, so this command is never necessary unless you want to define multiple-character letters, or you need to define two or more letters to be equivalent.
Example:
alphabet k g kh gh "n c ch ~n .d .t .dh .th .n m d t dh th p b ph bh m y r l v zh .l =r 'n "s .s s j jh h k.s a aa i ii u uu e ee ai o oo au H .h M srii .r .a |
With this alphabet in effect, the Tamil word `ko.n.da' would be interpreted as a five-letter word: `k', `o', `.n', `.d', and `a'. Furthermore, it would still be the same five-letter word if it were written as `go.n.tha'.
Note that all of the one-character letters defined above on lines by themselves are unnecessary, because all of the one-character letters are already defined. They are included in `tamil.map' strictly for completeness.
Multiple-character letters are always interpreted as the longest
possible letter, even when alternate interpretations are possible, and
regardless of the relative priorities given in the map
section
(below). For example, the Tamil sequence `ai' is always
interpreted as the single vowel `ai', and never as the vowel
`a' followed by the vowel `i'. If you need to allow `a'
and `i' to be recognized individually, you will either need to
change the alphabet and remove `ai', or require the user to type
`a{}i', or provide a `do-nothing' letter, such as an underscore
(`_'), which does not produce any output, and will thus allow the
user to type `a_i' to mean `a' followed by `i'.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
map
map file command This command is the main body of the map file. This describes the context-free grammar (see section A. What is a context-free grammar?) that defines the translation performed for this language. The grammar consists of a number of nonterminal declarations, one of which must be named `<root>'. See section 5. How preTeX translates input to output.
4.7.1 Basic nonterminal declaration The basic structure of a nonterminal declaration 4.7.2 Modifying the nonterminal declaration 4.7.3 Specifying a priority 4.7.4 Special predefined nonterminals Some special predefined nonterminals 4.7.5 Limitations of the grammar Some things you can't represent 4.7.6 Simple examples of nonterminals Some simple examples
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A nonterminal declaration consists of the nonterminal name, indented at least one space (to differentiate it from the map command), followed by the string `::=', and optionally followed by one or more keywords and/or a relative priority that apply to the nonterminal definition, all on one line.
Each subsequent line defines a match/replacement string pair for the nonterminal. Each line must be indented at least one space, and contain a string that the nonterminal might match in the input document, followed by whitespace and the corresponding replacement string, all on one line.
There are two different kinds of match strings. The first kind consists entirely of literal text. This is called a terminal reference. In this case, when the matched text appears in the input document, it is simply replaced on the output document by the corresponding replacement text.
The second kind consists of some combination of literal text and one or more nonterminal names (possibly the name of the same nonterminal). This is called a nonterminal reference. In this case, a match is made when the text in the input document matches all of the referenced nonterminals, in order, including any literal text in the match string. When the input document text is so matched, it is again replaced by the corresponding replacement text. The replacement text may optionally also include any of the same nonterminal names that were referenced in the match text; in this case, the corresponding output string by the referenced nonterminal is output.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
There are a number of optional keywords that may be included on the line
that defines a nonterminal, following the `::=' operator. These
keywords are not to be confused with the keyword
command in
the map file (see section 4.2 The keyword
map file command)---these are special words which modify
the nonterminal definition.
The use of most of these keywords is actually discouraged. Generally, they are unnecessary, and only add complexity to a language definition. However, there do exist rare occasions when it is appropriate to use each of them.
Although the keywords are specified on the same line with the `::=' operator, in some cases it makes sense for a keyword to be applied to some but not all of a nonterminal's expansions. To do this, simply repeat the nonterminal definition line with the new keywords listed, e.g.:
<map> ::= abc def <map> ::= literal ghi |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The inline
keyword is a hint to preTeX that this nonterminal
should always be evaluated inline instead of in the normal way.
See section 5.5 Inline expansions. In general, you should not need to use this
keyword, because preTeX can do a pretty good job of figuring out by
itself which keywords should be made inline.
The inline
keyword always applies to the entire nonterminal, even
if it is only defined for some of the nonterminal's expansions.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A greedy nonterminal is handled in a special way: it always matches the longest possible string in the input document it can, regardless of its priority (see section 5.4 The effect of relative priority).
For example:
<number> ::= greedy <digit> <digit> <digit><number> <digit><number> |
In this example, the nonterminal `<number>', if it matches anything at all, will always match an entire string of consecutive digits--even if some other nonterminal definition with a higher priority might have matched one of the digits.
The use of the greedy
keyword is discouraged. It is usually
unnecessary, because preTeX generally prefers the longest possible match
anyway (see section 5.3 How preTeX chooses the best match of several possible choices), and its use interferes with the
proper execution of relative priorities.
The primary advantage to using greedy
is that of performance.
Since the rules for matching a greedy nonterminal are a little simpler,
it may be slightly faster for preTeX to evaluate a greedy nonterminal
than a normal one.
The greedy
keyword always applies to the entire nonterminal, even
if it is only defined for some of the nonterminal's expansions.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The literal
keyword is used to mark one or more expansions of a
nonterminal that are to be interpreted literally, e.g. they are not to
be scanned for other nonterminal references. This keyword is only
necessary if you need your nonterminal to match a string that happens to
contain the name of another nonterminal! The use of this keyword is
therefore somewhat limited. It's probably a better idea just to rename
the nonterminal in question so it doesn't look like any string that you
might expect to read from the input.
The literal
keyword only applies to those expansions of the
nonterminal for which it is explicitly specified.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The except
keyword is an extremely powerful keyword, and it is
subject to overuse by a beginning map file designer. It marks one or
more expansions of a nonterminal that are exceptions to the normal rule:
things which the nonterminal should not match.
For example:
<lucky-number> ::= <digit> <digit> <digit><digit> <digit><digit> <lucky-number> ::= except 13 |
In this example, the nonterminal `<lucky-number>' will match any one- or two-digit number except the number 13.
The use of the except
keyword is discouraged. Generally, if you
need a particular string not to be matched by a certain nonterminal, you
will be better off defining the correct match for that string using a
different nonterminal, and giving it a higher priority
(see section 4.7.3 Specifying a priority).
The except
keyword only applies to those expansions of the
nonterminal for which it is explicitly specified. In fact, it never
makes sense to apply this keyword to an entire nonterminal's definition,
because the nonterminal could then never match anything.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
At times it is necessary to explicitly specify that certain expansions should be preferred over others. This can be done by specifying a relative priority for the expansion(s) in question. (see section 5.4 The effect of relative priority)
In general, when a number appears following the `::=' symbol for a nonterminal definition, that number is added to the priority for any string that includes that match. If the priority number is positive, the match will be preferred over other matches; if it is negative, the other matches will be preferred instead.
For example:
<lucky-number> ::= <digit> <digit> <digit><digit> <digit><digit> <unlucky-number> ::= 10 13 (13) |
This is similar to the example for the except
keyword
(see section 4.7.2 Modifying the nonterminal declaration), demonstrating how priority can be used
to achieve a similar effect. In this example, the nonterminal
`<lucky-number>' will match any one- or two-digit number. However,
the nonterminal `<unlucky-number>' will specifically match the
number 13, which is a two-digit number. By assigning
`<unlucky-number>' the relative priority of 10, we guarantee that
preTeX will always use `<unlucky-number>' to match the number 13,
rather than `<lucky-number>'.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Certain nonterminals are implicitly already defined in every map file. These are provided for the convenience of the map file designer. Some of them provide the only way to reference a particular special character in the map file (like a space character, for instance); others are simply conveniences.
The predefined nonterminals are:
Note that the special nonterminals for matching whitespace and TeX macros are fairly special-purpose. You don't normally need to try to provide definitions for these explicitly in the map file; preTeX's normal behavior is to transmit them to the output document unchanged if you do not mention them. However, it is occasionally useful to build a grammar that can change the whitespace and/or intervening TeX macros as well as the normal text.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Nonterminal definitions must not be recursive without consuming characters. *** Explain this.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
<root> ::= <digit> <digit> a \c00c aa \c01c i \c02c ii \c03c <consonant> \V{<consonant>} <consonant>a <consonant> <consonant>e \c16c<consonant> <consonant>u <consonant>\c0F2c |
In this abbreviated example, the nonterminal `<root>' might match whatever `<digit>' matches, in which case the corresponding replacement is exactly whatever `<digit>' indicated it should be. Or it might match any of the vowels `a', `aa', `i', or `ii', in which case the replacement is one of four different glyphs, which presumably represent the corresponding vowels in Tamil. Finally, it might match anything that `<consonant>' matches, either alone or followed by one of the vowels `a', `e', or `u', in which case the replacement string is whatever `<consonant>' produces, along with some other glyph, either before, after, or around it, according to the vowel.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Once preTeX has started translating text in the input document (i.e. for
text in the document that appears after the map file keyword--see
3.2 Scoping rules for preTeX conversion), it follows some fairly complex, but predictable,
rules to convert the text, as defined by the alphabet
(see section 4.6 The alphabet
map file command) and map
(see section 4.7 The map
map file command) sections of the map
file.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A context-free grammar is a mathematical way to represent a language. In general, it starts with a symbol that may represent any one of a handful of letters, or strings of letters:
<N> ::= a b c |
We say that `<N>' is a nonterminal that may stand for any of `a', `b', or `c'. In this example, `a', `b', and `c' are called terminals, because they don't stand for anything else--just themselves.
The grammar becomes interesting when we let `<N>' stand for other nonterminals as well:
<N> ::= a b<E> <E><E> <E> ::= cd ef |
In this example, `<E>' could stand for either `cd' or `ef', and `<N>' could stand for either the letter `a', or the letter `b' followed by whatever `<E>' stands for, or two occurrences of whatever `<E>' stands for. To be explicit, then, `<N>' could be any of `a', `bcd', `bef', `cdcd', `cdef', `efcd', or `efef'.
Finally, a nonterminal may even stand for itself, which leads to quite a lot of power. Here is a nonterminal that stands for all of the odd palindromes of `a', `b', and `c':
<P> ::= a b c a<P>a b<P>b c<P>c |
If you trace this through, you should be able to see that `<P>' stands for `cac', `bab', `bccbabccb', and `abbccbccbba' (for instance), but not `abc' or `abbbc'.
PreTeX extends this concept of a context-free grammar by adding an arbitrary replacement string to correspond to each string a nonterminal stands for. In general, preTeX works by repeatedly matching a sequence of letters from the input document against something the root nonterminal stands for, and writing the corresponding replacement string to the output document. See section 5. How preTeX translates input to output.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
All of the following packages can be found on CTAN, the Comprehensive TeX Archive Network. You can browse this archive at http://tug2.cs.umb.edu/ctan/.
B.1 Avinash Chopde's itrans A general Indian language solution by Avinash Chopde B.2 University of Washington's wntamil A Tamil conversion program developed at UW B.3 Frans Velthuis' Devanagari package Frans Velthuis' program to typeset Devanagari
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This package has much in common with preTeX's goals. In particular, it seeks to provide a common interface to typeset as many different Indian languages as possible. Its transliteration scheme is also to a certain degree user-definable, although it is not quite as flexible as a context-free grammar, and it cannot easily be extended to other applications. But what it does, it does well.
As of this writing, it supports Devanagari, Tamil, Telugu, and Bengali. You can download itrans from the CTAN directory `language/indian/itrans'.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This package is strictly for typesetting Tamil documents. It is quite speedy although the transliteration rules are fixed. The Tamil font used by preTeX was originally developed for this package, and may be found here.
You can download wntamil from the CTAN directory `language/tamil/wntamil'.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This package is strictly for typesetting Devanagari (and Hindi) documents. Its transliteration rules are quite nice (in fact, preTeX's `devnag.map' file is designed to emulate Devanagari's transliteration rules), but it is not user-extensible. The Devanagari font used by preTeX was developed by Frans Velthuis for use with this package, and may be found here.
Devanagari may be found in the CTAN directory `language/devanagari/distrib'.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Jump to: | I O |
---|
Index Entry | Section | |
---|---|---|
| ||
I | ||
Indian languages | 2. Indian languages | |
| ||
O | ||
Overview | 1. Overview | |
|
Jump to: | I O |
---|
[Top] | [Contents] | [Index] | [ ? ] |
language
map file command
keyword
map file command
top
map file command
font
map file command
persistent
map file command
alphabet
map file command
map
map file command
[Top] | [Contents] | [Index] | [ ? ] |
1. Overview
2. Indian languages
3. More details on using preTeX in general
4. Defining map files
5. How preTeX translates input to output
A. What is a context-free grammar?
B. Other packages that typeset Indian languages
Concept Index
[Top] | [Contents] | [Index] | [ ? ] |
Button | Name | Go to | From 1.2.3 go to |
---|---|---|---|
[ < ] | Back | previous section in reading order | 1.2.2 |
[ > ] | Forward | next section in reading order | 1.2.4 |
[ << ] | FastBack | previous or up-and-previous section | 1.1 |
[ Up ] | Up | up section | 1.2 |
[ >> ] | FastForward | next or up-and-next section | 1.3 |
[Top] | Top | cover (top) of document | |
[Contents] | Contents | table of contents | |
[Index] | Index | concept index | |
[ ? ] | About | this page |