Development: Alternative Documentation Formats
As the wxWidgets documentation system has been under constant scrutiny for the last few years now, the goal of this article is to briefly and accurately outline the existing problems and issues regarding the documentation as well as outline the proposed solutions to this dilemma.
The source of the docs is in SVN, in wxWidgets/docs/latex/wx. It is parsed using Tex2RTF, which is also in SVN, in /utils/tex2rtf. To generate the html documentation, for example, cd to the /wxWidgets/docs/latex/wx directory and run:
tex2rtf manual.tex test.htm -html
Problems with the Current Approach
The current LaTeX approach, while it works for creating HTML, PDF, and CHM documentation, has some shortcomings:
- The LaTeX source basically contains presentation markup only, not any semantic markup. This means the markup is only visual and doesn't tell us anything about the meaning of the information. It would be useful to have a more structured format, since that would allow us to automatically extract more information from the documentation sources.
- The LaTeX source is not real LaTeX, it's a specialized subset of it with some limitations. It can not be parsed by many standard LaTeX publication tools, and currently uses a custom in-house tool called TeX2RTF for publication, which means that for any new feature to be added to make documentation easier, it has to be added to TeX2RTF first.
- Nothing guarantees (or checks) that the documentation in the LaTeX files is consistent with the actual code.
- Everything in the LaTeX documentation has to be done by hand, while programs like Doxygen help out by doing the function headers etc. automatically for you. There is a utility in wxWidgets for doing this (HelpGen), but its far from ideal, and has to be maintained right alongside TeX2RTF.
- Tex2RTF is not actively maintained and it contains various bugs. wxWidgets developers cannot realistically upgrade and revise its code. Adding new features and even just keep it running requires too much time.
- Tex2RTF generates "old" HTML (similar to HTML4 but probably not compliant) and lots of other formats which were widely used 10-15 years ago, but not anymore (e.g. RTF, WinHelp).
- The generated HTML is quite "raw". The output of other tools (e.g. DocBook and Doxygen) is a lot cleaner and more customizable.
Requirements for a New System and Format
There has been much discussion about how to do this properly. Some requirements have been formulated:
- The new format should be as concise as possible. Our 6MB of LaTeX docs are already hard to handle; using a format which makes them even bigger does not help.
- It should be possible to edit the docs by hand in a simple text editor, but it would be nice to also be able to use GUI tools if desired.
- The new format should be easily parsable by scripts to allow integration with other systems (see section below).
- Installing the system for publishing the documentation shouldn't be difficult and should work for Unix, Windows and Mac.
- The tool chain should be as standard as possible with a low maintenance effort required from developers.
- It should produce valid output in: (X)HTML, PDF, CHM, RTF, and HTB.
- It should be possible to run an automatic conversion of all the existing docs to the new format.
- It should be able to produce customized documentation, i.e. wxWidgets, wxPython, wxPerl, etc. specific docs could be produced from one master
- It should be possible to check for consistency between the reference documentation and the actual source definitions (c.q. vice-versa :))
- Possible integration with the wxWiki. Bryan Petty (The Wiki Expert :)) says with an easy-to-parse format it will be possible to create a non-editable version of docs in the wiki. In the future it would be nice to have the possibility of "diffing" official docs against wiki docs and eventually integrate (without too much effort) wiki contents in the official docs.
- A listing of all event macros. (Arnout Engelen has done this from the first XML conversion attempts: )
- A good class hierarchy chart. (Also done by Arnout from XML: )
- Integration with IDEs (e.g. MSVS)
IMPORTANT POINT: there's a tradeoff between ease-of-parse and verbosity. A custom markup solution is a good tradeoff because it's still XML, thus easy to parse, but optimized for our needs and thus less verbose.
Benefits of a Concise and Easy-to-Parse Semantic Markup
A concise, semantic format (such as an XML-based one), would make it possible to create a number of auxiliary documentation pages which would improve doc usability.
This however can happen if and only if the format in which documentation is stored is easy to parse from custom scripts because there are no tools which automatically create the following things (no tools know about e.g. wxUSE_* defines of wxWidgets, or that we'd like to have a listing of all event macros, etc etc)!
A not-complete list of things which can be created is:
- Automatically create the "Classes by Category" page of the manual, extracting info stored in each class' doc file.
- Automatically create a customized listing of all events, all functions, and all other macros.
- Automatically create various statistics.
- Automatically check documentation consistency with wxWidgets headers.
- Automatically process documentation to do things like add new features to it such as the "Library" section recently added, screenshots for widgets, etc. Basically, it would be simple to make major documentation updates that affect numerous files.
- Automatically create a class hierarchy (with "fake" inheritance as is currently used with wxWidgets docs).
- Provide documentation in this wiki (at least in a read-only format).
- Provide a quick reference lookup bot in the IRC channel (this already exists as "wxOracle", but it is using Arnout's old XML reference, and is quickly becoming out of date).
- Integrate documentation in MSVS (see this page and this one) DevC++ (as devpack).
NOTE: such "extractions" are easy to write in XSLT, don't require additional tools in case of a DocBook-based solution.
Discussion Regarding a Custom Format
In a way, the existing LaTeX documentation uses custom macros and functions. This has been one of the main reasons attributed to the limitations and problems existing with the current documentation. Here, we discuss a "custom format" in regards to custom XML markup. This is significantly different in that XML was built for custom structures and markup, while still being compatible with with all standard XML parsers. Here we discuss custom tags which allow us to easily define recurrent sections of documentation, like "events", "window styles", and other sections usually more prevalent in wxWidgets than other libraries.
A customizable format allows:
- Keeping a high SNR (signal-noise-ratio) in docs; e.g. pure DocBook has a very low SNR for event sections because it does not have dedicated tags for how wxWidgets is built for event handling, which becomes verbose when writing documentation for events.
- Extending the format in future, e.g. adding tags for screenshots of widgets.
Proposals for possible solutions:
- Development: DocbookForWxDocs: AKA custom-markup solution; this allows to keep an high SNR (signal-noise-ratio) together with advantages of an XML-based format.
- BoostBook: see comparison below; too low SNR.
- Using Doxygen For The WxWidgets Documentation
- Natural Docs - An alternative to Doxygen with, as its name implies, a more natural syntax and very clean output, although it may be not as full-featured as Doxygen. Suffers however from the same problems as noted on the Doxygen documentation discussion page.
- QuickBook - yet to explore completely; not XML based nor inline.
- wxWiki Documentation - (see this page for more information and an example)
Comparison of BoostBook vs Custom Markup
Here you can find a "comparison chart" between the following formats:
- Francesco's proposed wxDoc format
- Arnout's proposed wxClassXML format
All stuff is at http://mathdev.sourceforge.net/comparison/
There you can find also the .html output obtained from the source files for the _same_ piece of test documentation. Note that the HTML output of all 3 formats is CSS-less and thus look somewhat worse than a "final" solution.
In particular you should look at the source files (which I repeat are for the _same_ piece of test docs):
http://mathdev.sourceforge.net/comparison/boostbook/wxstring.boostbook.html http://mathdev.sourceforge.net/comparison/wxclassxml/wxstring.wxclassxml.html http://mathdev.sourceforge.net/comparison/wxdoc/wxstring.wxdoc.html
and to the length of these files:
wxstring.boostbook => 383 wxstring.wxclassxml => 374 wxstring.wxdoc => 176
NOTE: it's not very important the wxdoc/wxclassxml comparison. It's important to note the "custom-markup" solution vs BoostBook solution aspects.
Some other important notes about this comparison:
a) BoostBook does NOT produce valid docbook markup. In fact, Francesco has lurked in #boost channel, asked to many peoples and noone was able to explain me this.
It's simply buggy. I've asked why at http://article.gmane.org/gmane.comp.lib.boost.documentation/3056 and I'll let you know eventual replies.
This also means that BoostBook, as is, is not 100% reliable: to obtain HTML output for that wxString doc piece I ignored validation errors and forced docbook to process the invalid markup. But noone grants how the output will look for an invalid piece of markup!
b) Backward manual compatibility: existing users won't like if we change completely the organization of the manual! E.g. look at http://mathdev.sourceforge.net/comparison/boostbook/wxString.html; it's quite different from usual wx doc pages, like http://mathdev.sourceforge.net/comparison/wxdoc/wxstring.html (which you can see styled here: http://mathdev.sourceforge.net/wxDocTests/html/wxstring.html)
Hacking boostbook XSL files to make generated docs more "conventional" to wxusers would require a lot of work.
c) There is a simple way to allow docbook tags inside wxDoc format without hassle thus making wxDoc easily a superset of docbook.
d) autolink feature of wxDoc. The comparison shown above is not fair: while wxDoc format provides automatic parsing of function prototypes, other formats don't. It's important to e.g. autolink the wxString class return type in the following prototype:
wxString GetSomeString(wxObject& anObject, const wxChar* test);
The wxDoc syntax illustrates how much more concise and readable it can be as compared with the equivalent BoostBook syntax shown below:
<function prototype="wxString GetSomeString(wxObject& anObject, const wxChar* test)">
<method name="GetSomeString"> <type><xref linkend="wxString"/></type> <parameter name="anObject"> <paramtype><xref linkend="wxObject">&</paramtype> </parameter> <parameter name="test"> <paramtype>const wxChar*</paramtype> </parameter> </method>
- NDoc Code Documentation Generator for .NET
- The DocBook Open Repository Project: http://docbook.sourceforge.net/
- A Mac OS X DocBook installation: http://www.projectomega.org/subcat.php?lg=en&php=products_docbook
- DocBook documentation: http://www.sagehill.net/xml/docbookxsl/index.html
- An example of DocBook output: http://abx.art.pl/pov/megapov/doc/
- The MegaPOV Tools For Using DocBook
- Synopsis C++ parser for python