Note: I haven't yet written a real version of this document. What follows is a cursory description for until I do.

Description

This library provides procedures for converting multi­font text created with the jrichtext.tcl library (or with compatible tags) in a Tk text widget to a variety of other formats, including a generic `Save As...' panel you can use to prompt your users for a filename and file type.

The library contains a lot of procedures, currently only the most important public procedures are documented. If all you want to do is let your users save the contents of a rich­text widget in various formats, the only procedure you'll need is j:tc:saveas.

Currently, the following output formats are supported:


Except when converting to HTML, only font information is converted; underlining, colours, and other tags are not (yet :-) converted.

However, when converting to HTML, jdoc or jhtml-mode hypertext links to other documents (i.e., not within the same document) are preserved after a fashion. (These can be links to other jdoc documents, which will be converted to references to HTML documents, or standard Web URLs, which will be preserved without change.) The HTML links generated for links to other jdoc documents (as opposed to standard URLs) may need hand­editing, however, since relative links in HTML documents and jdoc don't follow the same rules.

Also, when converting to HTML, up to three levels of unordered list are supported. (The lists generated may be very strange if you haven't been careful when typing the list; all text within the list must be tagged as an unordered list of the appropriate level, and list item markers must appear within the list in appropriate places.)

Thanks

Thanks to Miguel Santana <santana@imag.fr> for permission to use the /reencodeISO procedure from his a2ps program when converting rich­text to PostScript.

Tags

This library considers tags matching the pattern richtext:font:font, where font is one of roman, italic, bold, bolditalic, typewriter, heading0 through heading5, l_em, l_cite, l_var, l_dfn, l_strong, l_kbd, l_code, or l_samp. Any completely untagged text is assumed to be fixed­pitch (i.e. typewriter­style).

HTML conversion also considers tags matching the patterns jdoc:link:link (where link is a URL or the name of a jdoc document), special:anything (for special­purpose things like horizontal rules, list items, and literal html text), and list:level:number, where number is the depth of list nesting, for generating <ul> and </ul> tags.

See Also

jtexttags.tcl
jrichtext.tcl

j:tc:saveas

Usage

j:tc:saveas t

Argument

t is the text widget whose content is to be converted

Description

This procedure brings up a File Selection panel with an option button that lets the user choose among the supported file formats. When the user chooses a format and a name and clicks OK or presses Return, the text widget t is saved in the chosen format in the specified file.

Warning

The File Selection panel seems to cause Tk scripts to crash under at least some beta versions of Tk 4.0.

Conversion Procedures

All of the following take as their sole argument a text widget whose contents are to be converted, and return the contents of that text widget converted to the given format as their value. (Note that this can mean that you're schlepping around some fairly large strings.)

j:tc:tclrt:convert_text t - convert to Tcl­format richtext; see jrichtext.tcl
j:tc:tex:convert_text t - convert to TeX source
j:tc:html:convert_text t - convert to HTML (without links, currently)
j:tc:ps:convert_text t - convert to PostScript

Comments on Formats

Tcl­Format Richtext

Because it's designed to write into a text widget, this is the most faithful format.

The distinction between j:rt:par and two successive j:rt:cr's is lost when converting text that was generated with the jrichtext.tcl library, but it's not actually reflected in the text widget in the first place.

TeX

The TeX generated by j:tc:tex:convert_text works, but it's really weird and unnecessarily verbose. It makes lots of characters active and changes some standard parameters, so if you try to embed it in TeX documents of your own you should enclose it in braces. If you don't use any non­ASCII characters, you can trim off most of the preamble, which provides support for the ISO 8859-1 character set.

Tabs are converted to a fixed amount of whitespace, and spaces at the beginning of a line are lost. Multiple blank lines are also lost.

HTML

When converting to HTML, tabs are lost, as is any spacing at the beginnings of lines.

Some vertical whitespace may be lost; any sequence of more than one newline character will be translated to a <p> tag.

PostScript

The line­breaking algorithm is hideous, and long words are likely to be wrapped across lines.

Tabs are rendered as a fixed amount of space. Spaces occasionally appear at the beginnings of lines when they shouldn't (similarly to the way they do in the Tk text widget).

What is generated is actually a PostScript program that generates the formatting, rather than a set of simple page descriptions, so it makes a lot of demands on your PostScript interpreter, and may print more slowly than you expect. Also, it doesn't conform to the PostScript comment conventions (it can't), so tools that need to work with PostScript files page­by­page will fail.

The ISO 8859-1 character set is supported only if you have a Level 2 PostScript interpreter (or at least an interpreter than knows ISOLatin1Encoding).

Bugs and Misfeatures

Future Directions