Note: I haven't yet written a real version of this document.
What follows is a cursory description for until I do.
Description
This library provides procedures for converting multifont text
created with the
jrichtext.tcl library (or with compatible tags) in a Tk text widget to a
variety of other formats, including a generic `Save As...' panel
you can use to prompt your users for a filename and file type.
The library contains a lot of procedures, currently only the most
important public procedures are documented. If all you want
to do is let your users save the contents of a richtext widget
in various formats, the only procedure you'll need is
j:tc:saveas.
Currently, the following output formats are supported:
- Tclformat richtext as supported by
jrichtext.tcl
- TeX
- HTML
- PostScript
Except when converting to HTML, only font information is converted;
underlining, colours, and other tags are not (yet :-) converted.
However, when converting to HTML,
jdoc or
jhtml-mode hypertext links
to other documents (i.e., not within the same document) are preserved after a
fashion. (These can be links to other
jdoc documents, which will be converted to references to HTML documents,
or standard Web URLs, which will be preserved without change.)
The HTML links generated for links to other
jdoc documents (as opposed to standard URLs) may need handediting,
however, since relative links in HTML documents and
jdoc don't follow the same rules.
Also, when converting to HTML, up to three levels of unordered
list are supported. (The lists generated may be very strange
if you haven't been careful when typing the list; all text within
the list must be tagged as an unordered list of the appropriate
level, and list item markers must appear within the list in appropriate
places.)
Thanks
Thanks to Miguel Santana
<santana@imag.fr> for permission to use the
/reencodeISO procedure from his
a2ps program when converting richtext to PostScript.
Tags
This library considers tags matching the pattern
richtext:font:font, where
font is one of
roman,
italic,
bold,
bolditalic,
typewriter,
heading0 through
heading5,
l_em,
l_cite,
l_var,
l_dfn,
l_strong,
l_kbd,
l_code, or
l_samp. Any completely untagged text is assumed to be fixedpitch
(i.e. typewriterstyle).
HTML conversion also considers tags matching the patterns
jdoc:link:link (where
link is a URL or the name of a
jdoc document),
special:anything (for specialpurpose things like horizontal rules, list items,
and literal html text), and
list:level:number, where
number is the depth of list nesting, for generating
<ul> and
</ul> tags.
See Also
jtexttags.tcl
jrichtext.tcl
Usage
j:tc:saveas
t
Argument
t is the text widget whose content is to be converted
Description
This procedure brings up a File Selection panel with an option
button that lets the user choose among the supported file formats.
When the user chooses a format and a name and clicks OK or
presses
Return, the text widget
t is saved in the chosen format in the specified file.
Warning
The File Selection panel seems to cause Tk scripts to crash under
at least some beta versions of Tk 4.0.
Conversion Procedures
All of the following take as their sole argument a text widget
whose contents are to be converted, and return the contents of
that text widget converted to the given format as their value.
(Note that this can mean that you're schlepping around some
fairly large strings.)
j:tc:tclrt:convert_text
t - convert to Tclformat richtext; see
jrichtext.tcl
j:tc:tex:convert_text
t - convert to TeX source
j:tc:html:convert_text
t - convert to HTML (without links, currently)
j:tc:ps:convert_text
t - convert to PostScript
TclFormat Richtext
Because it's designed to write into a text widget, this is the
most faithful format.
The distinction between
j:rt:par and two successive
j:rt:cr's is lost when converting text that was generated with the
jrichtext.tcl library, but it's not actually reflected in the text widget
in the first place.
TeX
The TeX generated by
j:tc:tex:convert_text works, but it's really weird and unnecessarily verbose.
It makes lots of characters active and changes some standard parameters,
so if you try to embed it in TeX documents of your own you should
enclose it in braces. If you don't use any nonASCII characters,
you can trim off most of the preamble, which provides support
for the ISO 8859-1 character set.
Tabs are converted to a fixed amount of whitespace, and spaces
at the beginning of a line are lost. Multiple blank lines are
also lost.
HTML
When converting to HTML, tabs are lost, as is any spacing at the
beginnings of lines.
Some vertical whitespace may be lost; any sequence of more than
one newline character will be translated to a
<p> tag.
PostScript
The linebreaking algorithm is hideous, and long words are likely
to be wrapped across lines.
Tabs are rendered as a fixed amount of space. Spaces occasionally
appear at the beginnings of lines when they shouldn't (similarly
to the way they do in the Tk text widget).
What is generated is actually a PostScript program that generates
the formatting, rather than a set of simple page descriptions,
so it makes a lot of demands on your PostScript interpreter, and
may print more slowly than you expect. Also, it doesn't conform
to the PostScript comment conventions (it can't), so tools that
need to work with PostScript files pagebypage will fail.
The ISO 8859-1 character set is supported only if you have a Level
2 PostScript interpreter (or at least an interpreter than knows
ISOLatin1Encoding).
Bugs and Misfeatures
- The code needs to be reorganised. Code is shared between
different formats that shouldn't be, and code isn't shared that
should be.
- Whitespace is often lost or garbled in many of the formats.
- Much of the code is pretty inefficient.
- Tags other than font tags should be handled, for instance,
colour and underlining should be supported (where possible).
Future Directions
- In addition to improving the existing conversions (and they
really need it!), I'd like to provide modes for plaintext (with
lines broken sensibly, and maybe capitalisation for headers) and
formatted text (like
nroff(1) output). LaTeX, RTF, and
troff are other possibilities.
- It would be nice to support WYSIWYG writing of manual pages,
or generation of them from
jdoc documents. This would probably require a little additional
information beyond what's in the text widget (e.g. name and description,
section of the manual, etc.)
- The exact fonts used when generating TeX should be user preferences.
(PostScript fonts already are.)
- The TeX conversion does a lot of work to support ISO 8859-1.
This should only be done if there are actually nonASCII characters
in the text (or perhaps it should be a user preference). The
PostScript conversion should support ISO 8859-1 even on Level
1 interpreters (it's easy enough).