eulxml.xmlmap – Map XML to Python objects

eulxml.xmlmap makes it easier to map XML to Python objects. The Python DOM does some of this, of course, but sometimes it’s prettier to wrap an XML node in a typed Python object and assign attributes on that object to reference subnodes by XPath expressions. This module provides that functionality.

General Usage

Suppose we have an XML object that looks something like this:

<foo>
  <bar>
    <baz>42</baz>
  </bar>
  <bar>
    <baz>13</baz>
  </bar>
  <qux>A</qux>
  <qux>B</qux>
</foo>

For this example, we want to access the value of the first <baz> as a Python integer and the second <baz> as a string value. We also want to access all of them (there may be lots on another <foo>) as a big list of integers. We can create an object to map these fields like this:

from eulxml import xmlmap

class Foo(xmlmap.XmlObject):
    first_baz = xmlmap.IntegerField('bar[1]/baz')
    second_baz = xmlmap.StringField('bar[2]/baz')
    qux = xmlmap.StringListField('qux')

first_baz, second_baz, and all_baz here are attributes of the Foo object. We can access them in later code like this:

>>> foo = xmlmap.load_xmlobject_from_file(foo_path, xmlclass=Foo)
>>> foo.first_baz
42
>>> foo.second_baz
'13'
>>> foo.qux
['A', 'B']
>>> foo.first_baz=5
>>> foo.qux.append('C')
>>> foo.qux[0] = 'Q'
>>> print foo.serialize(pretty=True)
<foo>
  <bar>
    <baz>5</baz>
  </bar>
  <bar>
    <baz>13</baz>
  </bar>
  <qux>Q</qux>
  <qux>B</qux>
<qux>C</qux></foo>

Concepts

xmlmap simplifies access to XML data in Python. Programs can define new XmlObject subclasses representing a type of XML node with predictable structure. Members of these classes can be regular methods and values like in regular Python classes, but they can also be special field objects that associate XPath expressions with Python data elements. When code accesses these fields on the object, the code evaluates the associated XPath expression and converts the data to a Python value.

XmlObject

Most programs will use xmlmap by defining a subclass of XmlObject containing field members.

class eulxml.xmlmap.XmlObject([node[, context]])

A Python object wrapped around an XML node.

Typical programs will define subclasses of XmlObject with various field members. Some programs will use load_xmlobject_from_string() and load_xmlobject_from_file() to create instances of these subclasses. Other programs will create them directly, passing a node argument to the constructor. If the subclass defines a ROOT_NAME then this node argument is optional: Programs may then create instances directly with no constructor arguments.

Programs can also pass an optional dictionary to the constructor to specify namespaces for XPath evaluation.

If keyword arguments are passed in to the constructor, they will be used to set initial values for the corresponding fields on the XmlObject. (Only currently supported for non-list fields.)

Custom equality/non-equality tests: two instances of XmlObject are considered equal if they point to the same lxml element node.

_fields

A dictionary mapping field names to field members. This dictionary includes all of the fields defined on the class as well as those inherited from its parents.

ROOT_NAME = None

A default root element name (without namespace prefix) used when an object of this type is created from scratch.

ROOT_NAMESPACES = {}

A dictionary whose keys are namespace prefixes and whose values are namespace URIs. These namespaces are used to create the root element when an object of this type is created from scratch; should include the namespace and prefix for the root element, if it has one. Any additional namespaces will be added to the root element.

ROOT_NS = None

The default namespace used when an object of this type is created from scratch.

XSD_SCHEMA = None

URI or file path to the XSD schema associated with this XmlObject, if any. If configured, will be used for optional validation when calling load_xmlobject_from_string() and load_xmlobject_from_file(), and with is_valid().

is_empty()

Returns True if the root node contains no child elements, no attributes, and no text. Returns False if any are present.

is_valid()

Determine if the current document is valid as far as we can determine. If there is a schema associated, check for schema validity. Otherwise, return True.

Return type:boolean
node = None

The top-level xml node wrapped by the object

schema_valid()

Determine if the current document is schema-valid according to the configured XSD Schema associated with this instance of XmlObject.

Return type:boolean
Raises:Exception if no XSD schema is defined for this XmlObject instance
schema_validate = True

Override for schema validation; if a schema must be defined for the use of xmlmap.fields.SchemaField for a sub-xmlobject that should not be validated, set to False.

schema_validation_errors()

Retrieve any validation errors that occured during schema validation done via is_valid().

Returns:a list of lxml.etree._LogEntry instances
Raises:Exception if no XSD schema is defined for this XmlObject instance
serialize(stream=None, pretty=False)

Serialize the contents of the XmlObject to a stream. Serializes current node only; for the entire XML document, use serializeDocument().

If no stream is specified, returns a string. :param stream: stream or other file-like object to write content to (optional) :param pretty: pretty-print the XML output; boolean, defaults to False :rtype: stream passed in or an instance of cStringIO.StringIO

serializeDocument(stream=None, pretty=False)

Serialize the contents of the entire XML document (including Doctype declaration, if there is one), with an XML declaration, for the current XmlObject to a stream.

If no stream is specified, returns a string. :param stream: stream or other file-like object to write content to (optional) :param pretty: pretty-print the XML output; boolean, defaults to False :rtype: stream passed in or an instance of cStringIO.StringIO

validation_errors()

Return a list of validation errors. Returns an empty list if the xml is schema valid or no schema is defined. If a schema is defined but schema_validate is False, schema validation will be skipped.

Currently only supports schema validation.

Return type:list
xmlschema

A parsed XSD schema instance of lxml.etree.XMLSchema; will be loaded the first time it is requested on any instance of this class if XSD_SCHEMA is set and xmlchema is None. If you wish to load and parse the schema at class definition time, instead of at class instance initialization time, you may want to define your schema in your subclass like this:

XSD_SCHEMA = "http://www.openarchives.org/OAI/2.0/oai_dc.xsd"
xmlschema = xmlmap.loadSchema(XSD_SCHEMA)
xsl_transform(filename=None, xsl=None, return_type=None, **params)

Run an xslt transform on the contents of the XmlObject.

XSLT can be passed in as an XSLT object generated by load_xslt() or as filename or string. If a params dictionary is specified, its items will be passed as parameters to the XSL transformation, and any string values will automatically be encoded as XSL string parameters.

Note

If XSL is being used multiple times, it is recommended to use :meth`:load_xslt` to load and compile the XSLT once.

Parameters:
  • filename – xslt filename (optional, one of file and xsl is required)
  • xsl – xslt as string OR compiled XSLT object as returned by load_xslt() (optional)
  • return_type – type of object to return; optional, defaults to XmlObject; specify unicode or string for text output
Returns:

an instance of XmlObject or the return_type specified

XmlObjectType

class eulxml.xmlmap.core.XmlObjectType

A metaclass for XmlObject.

Analogous in principle to Django’s ModelBase, this metaclass functions rather differently. While it’ll likely get a lot closer over time, we just haven’t been growing ours long enough to demand all of the abstractions built into Django’s models. For now, we do three things:

  1. take any Field members and convert them to descriptors,
  2. store all of these fields and all of the base classes’ fields in a _fields dictionary on the class, and
  3. if any local (non-parent) fields look like self-referential eulxml.xmlmap.NodeField objects then patch them up to refer to the newly-created XmlObject.

Field types

There are several predefined field types. All of them evaluate XPath expressions and map the resultant XML nodes to Python types. They differ primarily in how they map those XML nodes to Python objects as well as in whether they expect their XPath expression to match a single XML node or a whole collection of them.

Field objects are typically created as part of an XmlObject definition and accessed with standard Python object attribute syntax. If a Foo class defines a bar attribute as an xmlmap field object, then an object will reference it simply as foo.bar.

class eulxml.xmlmap.fields.StringField(xpath, normalize=False, choices=None, *args, **kwargs)

Map an XPath expression to a single Python string. If the XPath expression evaluates to an empty NodeList, a StringField evaluates to None.

Takes an optional parameter to indicate that the string contents should have whitespace normalized. By default, does not normalize.

Takes an optional list of choices to restrict possible values.

Supports setting values for attributes, empty nodes, or text-only nodes.

class eulxml.xmlmap.fields.StringListField(xpath, normalize=False, choices=None, *args, **kwargs)

Map an XPath expression to a list of Python strings. If the XPath expression evaluates to an empty NodeList, a StringListField evaluates to an empty list.

Takes an optional parameter to indicate that the string contents should have whitespace normalized. By default, does not normalize.

Takes an optional list of choices to restrict possible values.

Actual return type is NodeList, which can be treated like a regular Python list, and includes set and delete functionality.

class eulxml.xmlmap.fields.IntegerField(xpath, *args, **kwargs)

Map an XPath expression to a single Python integer. If the XPath expression evaluates to an empty NodeList, an IntegerField evaluates to None.

Supports setting values for attributes, empty nodes, or text-only nodes.

class eulxml.xmlmap.fields.IntegerListField(xpath, *args, **kwargs)

Map an XPath expression to a list of Python integers. If the XPath expression evaluates to an empty NodeList, an IntegerListField evaluates to an empty list.

Actual return type is NodeList, which can be treated like a regular Python list, and includes set and delete functionality.

class eulxml.xmlmap.fields.NodeField(xpath, node_class, instantiate_on_get=False, *args, **kwargs)

Map an XPath expression to a single XmlObject subclass instance. If the XPath expression evaluates to an empty NodeList, a NodeField evaluates to None.

Normally a NodeField‘s node_class is a class. As a special exception, it may be the string "self", in which case it recursively refers to objects of its containing XmlObject class.

If an XmlObject contains a NodeField named foo, then the object will automatically have a create_foo() method in addition to its foo property. Code can call this create_foo() method to create the child element if it doesn’t exist; the method will have no effect if the element is already present.

Deprecated instantiate_on_get flag: set to True if you need a non-existent node to be created when the NodeField is accessed. This feature is deprecated: Instead, create your node explicitly with create_foo() as described above.

class eulxml.xmlmap.fields.NodeListField(xpath, node_class, *args, **kwargs)

Map an XPath expression to a list of XmlObject subclass instances. If the XPath expression evalues to an empty NodeList, a NodeListField evaluates to an empty list.

Normally a NodeListField‘s node_class is a class. As a special exception, it may be the string "self", in which case it recursively refers to objects of its containing XmlObject class.

Actual return type is NodeList, which can be treated like a regular Python list, and includes set and delete functionality.

class eulxml.xmlmap.fields.ItemField(xpath, *args, **kwargs)

Access the results of an XPath expression directly. An ItemField does no conversion on the result of evaluating the XPath expression.

class eulxml.xmlmap.fields.SimpleBooleanField(xpath, true, false, *args, **kwargs)

Map an XPath expression to a Python boolean. Constructor takes additional parameter of true, false values for comparison and setting in xml. This only handles simple boolean that can be read and set via string comparison.

Supports setting values for attributes, empty nodes, or text-only nodes.

class eulxml.xmlmap.fields.DateTimeField(xpath, format=None, normalize=False, *args, **kwargs)

Map an XPath expression to a single Python datetime.datetime. If the XPath expression evaluates to an empty NodeList, a DateTimeField evaluates to None.

Parameters:

For example, given the field definition:

last_update = DateTimeField('last_update', format="%d-%m-%Y %H:%M:%S",
    normalize=True)

and the XML:

<last_update>
    21-04-2012 00:00:00
</last_update>

accessing the field would return:

>>> myobj.last_update
datetime.datetime(2012, 4, 21, 0, 0)
class eulxml.xmlmap.fields.DateTimeListField(xpath, format=None, normalize=False, *args, **kwargs)

Map an XPath expression to a list of Python datetime.datetime objects. If the XPath expression evaluates to an empty NodeList, a DateTimeListField evaluates to an empty list. Date formatting is as described in DateTimeField.

Actual return type is NodeList, which can be treated like a regular Python list, and includes set and delete functionality.

Parameters:
  • format – optional date-time format. See DateTimeField for more details.
  • normalize – optional parameter to indicate string contents should have whitespace normalized before converting to datetime. By default, no normalization is done.
class eulxml.xmlmap.fields.DateField(xpath, format=None, normalize=False, *args, **kwargs)

Map an XPath expression to a single Python datetime.date, roughly comparable to DateTimeField.

Parameters:
  • format – optional date-time format. Used to convert between XML and Python datetime.date; if no format, then the ISO format YYYY-MM-DD (%Y-%m-%d) will be used.
  • normalize – optional parameter to indicate string contents should have whitespace normalized before converting to date. By default, no normalization is done.
class eulxml.xmlmap.fields.DateListField(xpath, format=None, normalize=False, *args, **kwargs)

Map an XPath expression to a list of Python datetime.date objects. See DateField and DateTimeListField for more details.

class eulxml.xmlmap.fields.SchemaField(xpath, schema_type, *args, **kwargs)

Schema-based field. At class definition time, a SchemaField will be replaced with the appropriate eulxml.xmlmap.fields.Field type based on the schema type definition.

Takes an xpath (which will be passed on to the real Field init) and a schema type definition name. If the schema type has enumerated restricted values, those will be passed as choices to the Field.

For example, to define a resource type based on the MODS schema, resourceTypeDefinition is a simple type with an enumeration of values, so you could add something like this:

resource_type  = xmlmap.SchemaField("mods:typeOfResource", "resourceTypeDefinition")

Currently only supports simple string-based schema types.

get_field(schema)

Get the requested type definition from the schema and return the appropriate Field.

Parameters:schema – instance of eulxml.xmlmap.core.XsdSchema
Return type:eulxml.xmlmap.fields.Field
class eulxml.xmlmap.fields.FloatField(xpath, *args, **kwargs)

Map an XPath expression to a single Python float. If the XPath expression evaluates to an empty NodeList, an FloatField evaluates to None.

Supports setting values for attributes, empty nodes, or text-only nodes.

class eulxml.xmlmap.fields.FloatListField(xpath, *args, **kwargs)

Map an XPath expression to a list of Python floats. If the XPath expression evaluates to an empty NodeList, an IntegerListField evaluates to an empty list.

Actual return type is NodeList, which can be treated like a regular Python list, and includes set and delete functionality.

Other facilities

eulxml.xmlmap.load_xmlobject_from_string(string, xmlclass=<class 'eulxml.xmlmap.core.XmlObject'>, validate=False, resolver=None)

Initialize an XmlObject from a string.

If an xmlclass is specified, construct an instance of that class instead of XmlObject. It should be a subclass of XmlObject. The constructor will be passed a single node.

If validation is requested and the specified subclass of XmlObject has an XSD_SCHEMA defined, the parser will be configured to validate against the specified schema. Otherwise, the parser will be configured to use DTD validation, and expect a Doctype declaration in the xml content.

Parameters:
  • string – xml content to be loaded, as a string
  • xmlclass – subclass of XmlObject to initialize
  • validate – boolean, enable validation; defaults to false
Return type:

instance of XmlObject requested

eulxml.xmlmap.load_xmlobject_from_file(filename, xmlclass=<class 'eulxml.xmlmap.core.XmlObject'>, validate=False, resolver=None)

Initialize an XmlObject from a file.

See load_xmlobject_from_string() for more details; behaves exactly the same, and accepts the same parameters, except that it takes a filename instead of a string.

Parameters:filename – name of the file that should be loaded as an xmlobject. etree.lxml.parse() will accept a file name/path, a file object, a file-like object, or an HTTP or FTP url, however file path and URL are recommended, as they are generally faster for lxml to handle.
eulxml.xmlmap.parseString(string, uri=None)

Read an XML document provided as a byte string, and return a lxml.etree document. String cannot be a Unicode string. Base_uri should be provided for the calculation of relative URIs.

eulxml.xmlmap.parseUri(stream, uri=None)

Read an XML document from a URI, and return a lxml.etree document.

eulxml.xmlmap.loadSchema(uri, base_uri=None)

Load an XSD XML document (specified by filename or URL), and return a lxml.etree.XMLSchema.

Note that frequently loading a schema without using a web proxy may introduce significant network resource usage as well as instability if the schema becomes unavailable. Thus this function will fail if the HTTP_PROXY environment variable is not set.