Qpy provides a convenient mechanism for generating safely-quoted xml text from python code. It does this by implementing a quote-no-more string data type and a slight modification of the python compiler.
XML reserves 5 characters ("<", ">", "&", quote and apostrophe) so that they can be used as markup delimiters. When a document needs to use these characters for some other purpose, they must be escaped, that is, replaced by the an equivalent entity or character reference. This package defines a xml_quote() function that, for a string argument, returns a string with these 5 characters with equivalents: for example, "<" becomes "<".
When assembling an XML (or similar markup such as HTML) document, it is important to remember to quote everything that should be quoted, such as text that comes from a database or some (untrusted) outside source. In the case of web pages, underquoting this dangerous, as it leaves the door open for cross-site scripting and other attacks.
It would be nice if you could assemble your document as a string and
then call xml_quote()
on it at the end, just to make sure that everything
was quoted, but this generally results in over-quoting, where you lose
the intended markup structure. For web pages, over-quoting produces a
result that is ugly, but much safer than the underquoted alternative.
Programs that produce XML documents must keep track of just what has been quoted already and what has not been quoted already, and mistakes are common. Our objective is to make quoting errors rare, especially underquoting errors.
Our xml_quote()
function always returns an xml
instance. The class named
"xml" is a subclass of Python's unicode string class. An instance of
xml
is a string that is known to need no more XML quoting. When the
xml_quote()
function gets an xml
instance as an argument, it just returns
the instance immediately, without any changes. When the xml_quote()
function gets None
as an argument, it always returns an empty xml
instance. All other arguments to quote are converted to unicode
strings and then the reserved characters are escaped to produce the
resulting xml
instance.
The xml
class defines some functions that make it easy to build
quoted documents.
When an xml
instance is combined with another object using the +
operator, the result is the xml
instance formed by concatenating the
quoted operands. The value of the expression
xml('<x>') + '<'
is equal to the value of
xml('<x><')
When an xml
instance is used as a format string with the
%
operator,
the (non-number) arguments to the format string are quoted as they are
used.
The xml
class includes a join()
method that quotes the items
in the sequence before joining them. The common case of using an
empty xml
instance to join a sequence is implemented in the join_xml()
function. The join_str()
function acts the same way, except that
it does not escape any characters.
The Qpy compiler is Python compiler with an added preprocessor that
can best be understood understood as a source-code transformation. The
transformation is limited to the definitions of certain functions we
call "templates". An xml template is designated in qpy source code
by :xml
just after the function name in the function's definition.
For example, this is an xml template:
def f:xml(x):
"<div>"
x
"</div>"
The Qpy preprocessor essentially replaces this by:
from qpy import xml as _qpy_xml, join_xml as _qpy_join_xml
def f(x):
qpy_accumulation = []
qpy_append = qpy_accumulation.append
qpy_append(_qpy_xml("<div>"))
qpy_append(x)
qpy_append(_qpy_xml("</div>"))
return _qpy_join_xml(qpy_accumulation)
There are two main things going on here. One is that every
string-literal in the body of the function is wrapped by the xml
constructor. The assumption is that a literal string, provided by the
programmer, does not need any more quoting. The other part of the
conversion is that expression values are accumulated on a local list,
and the default return value is the xml
instance formed by
concatenating these values, after quoting them.
The values returned by f
are xml
instances, and here are some samples:
f(None) ⇒ "<div></div>" None becomes "".
f("<hr />") ⇒ "<div><hr /></div>" Quoting happens.
f(1) ⇒ "<div>1</div>" Converted.
f(xml("<hr />")) ⇒ "<div><hr /></div>" Already quoted.
The nice thing about this is that the expressions appearing in a
template, possibly including values provided from outside sources,
will always be quoted unless they are already instances of the xml
class. If the programmer makes a mistake with respect to quoting,
it will very likely appear as over-quoting instead of lurking as
a security problem.
Templates can't have normal python docstrings after the arguments: we just use comments.
A template may also be designated by :str
, instead of :xml
appearing before the function name. The difference is that a str
template will accumulate the values of expression statements and
return the join_str()
of the list, and there is no XML-quoting.
Templates can be nested arbitrarily along with other functions. A template's code transformation does not apply inside ordinary functions that are defined inside the template body.
Source code files that include templates should be named with a .qpy
suffix and placed in a python package directory. The package
__init__.py
should contain the following lines to make sure that the
compiled versions of the qpy modules are up-to-date:
from qpy.compile import compile_qpy_files
compile_qpy_files(__path__[0])
This package also includes qpcheck.py
, a script that looks for unknown
names and unused imports in directories containing python and qpy source code.
An example
package is included in the distribution.
To run it, just import the
qpy.example.example1
module. The purpose of the example is to
provide an example of a package, including the
required __init__.py
, and a .qpy
module.
Most template systems are designed to embed program-like value-substitution and control flow into what would otherwise be static content. Qpy (like Quixote's PTL templates) uses the opposite pattern, embedding static content in what would otherwise be an ordinary program. This program-centric pattern is especially attractive when content maintenance team is the same as the programming team.
DurusWorks Documentation