eulxml.xmlmap
– Map XML to Python objects¶
eulxml.xmlmap
makes it easier to map XML to Python objects. The
Python DOM does some of this, of course, but sometimes it’s prettier to wrap
an XML node in a typed Python object and assign attributes on that object to
reference subnodes by XPath expressions.
This module provides that functionality.
General Usage¶
Suppose we have an XML object that looks something like this:
<foo>
<bar>
<baz>42</baz>
</bar>
<bar>
<baz>13</baz>
</bar>
<qux>A</qux>
<qux>B</qux>
</foo>
For this example, we want to access the value of the first <baz>
as a
Python integer and the second <baz>
as a string value. We also want to
access all of them (there may be lots on another <foo>
) as a big list of
integers. We can create an object to map these fields like this:
from eulxml import xmlmap
class Foo(xmlmap.XmlObject):
first_baz = xmlmap.IntegerField('bar[1]/baz')
second_baz = xmlmap.StringField('bar[2]/baz')
qux = xmlmap.StringListField('qux')
first_baz
, second_baz
, and all_baz
here are
attributes of the Foo
object. We can access them in later code like
this:
>>> foo = xmlmap.load_xmlobject_from_file(foo_path, xmlclass=Foo)
>>> foo.first_baz
42
>>> foo.second_baz
'13'
>>> foo.qux
['A', 'B']
>>> foo.first_baz=5
>>> foo.qux.append('C')
>>> foo.qux[0] = 'Q'
>>> print foo.serialize(pretty=True)
<foo>
<bar>
<baz>5</baz>
</bar>
<bar>
<baz>13</baz>
</bar>
<qux>Q</qux>
<qux>B</qux>
<qux>C</qux></foo>
Concepts¶
xmlmap
simplifies access to XML data in Python. Programs
can define new XmlObject
subclasses representing a
type of XML node with predictable structure. Members of these classes can be
regular methods and values like in regular Python classes, but they can also be
special field objects that associate XPath expressions
with Python data elements. When code accesses these fields on the object, the
code evaluates the associated XPath expression and converts the data to a
Python value.
XmlObject
¶
Most programs will use xmlmap
by defining a subclass of
XmlObject
containing field members.
-
class
eulxml.xmlmap.
XmlObject
([node[, context]])¶ A Python object wrapped around an XML node.
Typical programs will define subclasses of
XmlObject
with various field members. Some programs will useload_xmlobject_from_string()
andload_xmlobject_from_file()
to create instances of these subclasses. Other programs will create them directly, passing a node argument to the constructor. If the subclass defines aROOT_NAME
then this node argument is optional: Programs may then create instances directly with no constructor arguments.Programs can also pass an optional dictionary to the constructor to specify namespaces for XPath evaluation.
If keyword arguments are passed in to the constructor, they will be used to set initial values for the corresponding fields on the
XmlObject
. (Only currently supported for non-list fields.)Custom equality/non-equality tests: two instances of
XmlObject
are considered equal if they point to the same lxml element node.-
_fields
¶ A dictionary mapping field names to field members. This dictionary includes all of the fields defined on the class as well as those inherited from its parents.
-
ROOT_NAME
= None¶ A default root element name (without namespace prefix) used when an object of this type is created from scratch.
-
ROOT_NAMESPACES
= {}¶ A dictionary whose keys are namespace prefixes and whose values are namespace URIs. These namespaces are used to create the root element when an object of this type is created from scratch; should include the namespace and prefix for the root element, if it has one. Any additional namespaces will be added to the root element.
-
ROOT_NS
= None¶ The default namespace used when an object of this type is created from scratch.
-
XSD_SCHEMA
= None¶ URI or file path to the XSD schema associated with this
XmlObject
, if any. If configured, will be used for optional validation when callingload_xmlobject_from_string()
andload_xmlobject_from_file()
, and withis_valid()
.
-
is_empty
()¶ Returns True if the root node contains no child elements, no attributes, and no text. Returns False if any are present.
-
is_valid
()¶ Determine if the current document is valid as far as we can determine. If there is a schema associated, check for schema validity. Otherwise, return True.
Return type: boolean
-
node
= None¶ The top-level xml node wrapped by the object
-
schema_valid
()¶ Determine if the current document is schema-valid according to the configured XSD Schema associated with this instance of
XmlObject
.Return type: boolean Raises: Exception if no XSD schema is defined for this XmlObject instance
-
schema_validate
= True¶ Override for schema validation; if a schema must be defined for the use of
xmlmap.fields.SchemaField
for a sub-xmlobject that should not be validated, set to False.
-
schema_validation_errors
()¶ Retrieve any validation errors that occured during schema validation done via
is_valid()
.Returns: a list of lxml.etree._LogEntry
instancesRaises: Exception if no XSD schema is defined for this XmlObject instance
-
serialize
(stream=None, pretty=False)¶ Serialize the contents of the XmlObject to a stream. Serializes current node only; for the entire XML document, use
serializeDocument()
.If no stream is specified, returns a string. :param stream: stream or other file-like object to write content to (optional) :param pretty: pretty-print the XML output; boolean, defaults to False :rtype: stream passed in or an instance of
cStringIO.StringIO
-
serializeDocument
(stream=None, pretty=False)¶ Serialize the contents of the entire XML document (including Doctype declaration, if there is one), with an XML declaration, for the current XmlObject to a stream.
If no stream is specified, returns a string. :param stream: stream or other file-like object to write content to (optional) :param pretty: pretty-print the XML output; boolean, defaults to False :rtype: stream passed in or an instance of
cStringIO.StringIO
-
validation_errors
()¶ Return a list of validation errors. Returns an empty list if the xml is schema valid or no schema is defined. If a schema is defined but
schema_validate
is False, schema validation will be skipped.Currently only supports schema validation.
Return type: list
-
xmlschema
¶ A parsed XSD schema instance of
lxml.etree.XMLSchema
; will be loaded the first time it is requested on any instance of this class if XSD_SCHEMA is set and xmlchema is None. If you wish to load and parse the schema at class definition time, instead of at class instance initialization time, you may want to define your schema in your subclass like this:XSD_SCHEMA = "http://www.openarchives.org/OAI/2.0/oai_dc.xsd" xmlschema = xmlmap.loadSchema(XSD_SCHEMA)
-
xsl_transform
(filename=None, xsl=None, return_type=None, **params)¶ Run an xslt transform on the contents of the XmlObject.
XSLT can be passed in as an XSLT object generated by
load_xslt()
or as filename or string. If a params dictionary is specified, its items will be passed as parameters to the XSL transformation, and any string values will automatically be encoded as XSL string parameters.Note
If XSL is being used multiple times, it is recommended to use :meth`:load_xslt` to load and compile the XSLT once.
Parameters: - filename – xslt filename (optional, one of file and xsl is required)
- xsl – xslt as string OR compiled XSLT object as returned by
load_xslt()
(optional) - return_type – type of object to return; optional, defaults to
XmlObject
; specify unicode or string for text output
Returns: an instance of
XmlObject
or the return_type specified
-
XmlObjectType
¶
-
class
eulxml.xmlmap.core.
XmlObjectType
¶ A metaclass for
XmlObject
.Analogous in principle to Django’s
ModelBase
, this metaclass functions rather differently. While it’ll likely get a lot closer over time, we just haven’t been growing ours long enough to demand all of the abstractions built into Django’s models. For now, we do three things:- take any
Field
members and convert them to descriptors, - store all of these fields and all of the base classes’ fields in a
_fields
dictionary on the class, and - if any local (non-parent) fields look like self-referential
eulxml.xmlmap.NodeField
objects then patch them up to refer to the newly-createdXmlObject
.
- take any
Field types¶
There are several predefined field types. All of them evaluate XPath expressions and map the resultant XML nodes to Python types. They differ primarily in how they map those XML nodes to Python objects as well as in whether they expect their XPath expression to match a single XML node or a whole collection of them.
Field objects are typically created as part of an XmlObject
definition and accessed with standard Python object attribute syntax. If a
Foo
class defines a bar
attribute as an
xmlmap
field object, then an object will reference it simply
as foo.bar
.
-
class
eulxml.xmlmap.fields.
StringField
(xpath, normalize=False, choices=None, *args, **kwargs)¶ Map an XPath expression to a single Python string. If the XPath expression evaluates to an empty NodeList, a StringField evaluates to None.
Takes an optional parameter to indicate that the string contents should have whitespace normalized. By default, does not normalize.
Takes an optional list of choices to restrict possible values.
Supports setting values for attributes, empty nodes, or text-only nodes.
-
class
eulxml.xmlmap.fields.
StringListField
(xpath, normalize=False, choices=None, *args, **kwargs)¶ Map an XPath expression to a list of Python strings. If the XPath expression evaluates to an empty NodeList, a StringListField evaluates to an empty list.
Takes an optional parameter to indicate that the string contents should have whitespace normalized. By default, does not normalize.
Takes an optional list of choices to restrict possible values.
Actual return type is
NodeList
, which can be treated like a regular Python list, and includes set and delete functionality.
-
class
eulxml.xmlmap.fields.
IntegerField
(xpath, *args, **kwargs)¶ Map an XPath expression to a single Python integer. If the XPath expression evaluates to an empty NodeList, an IntegerField evaluates to None.
Supports setting values for attributes, empty nodes, or text-only nodes.
-
class
eulxml.xmlmap.fields.
IntegerListField
(xpath, *args, **kwargs)¶ Map an XPath expression to a list of Python integers. If the XPath expression evaluates to an empty NodeList, an IntegerListField evaluates to an empty list.
Actual return type is
NodeList
, which can be treated like a regular Python list, and includes set and delete functionality.
-
class
eulxml.xmlmap.fields.
NodeField
(xpath, node_class, instantiate_on_get=False, *args, **kwargs)¶ Map an XPath expression to a single
XmlObject
subclass instance. If the XPath expression evaluates to an empty NodeList, a NodeField evaluates to None.Normally a
NodeField
‘snode_class
is a class. As a special exception, it may be the string"self"
, in which case it recursively refers to objects of its containingXmlObject
class.If an
XmlObject
contains a NodeField namedfoo
, then the object will automatically have acreate_foo()
method in addition to itsfoo
property. Code can call thiscreate_foo()
method to create the child element if it doesn’t exist; the method will have no effect if the element is already present.Deprecated
instantiate_on_get
flag: set to True if you need a non-existent node to be created when the NodeField is accessed. This feature is deprecated: Instead, create your node explicitly withcreate_foo()
as described above.
-
class
eulxml.xmlmap.fields.
NodeListField
(xpath, node_class, *args, **kwargs)¶ Map an XPath expression to a list of
XmlObject
subclass instances. If the XPath expression evalues to an empty NodeList, a NodeListField evaluates to an empty list.Normally a
NodeListField
‘snode_class
is a class. As a special exception, it may be the string"self"
, in which case it recursively refers to objects of its containingXmlObject
class.Actual return type is
NodeList
, which can be treated like a regular Python list, and includes set and delete functionality.
-
class
eulxml.xmlmap.fields.
ItemField
(xpath, *args, **kwargs)¶ Access the results of an XPath expression directly. An ItemField does no conversion on the result of evaluating the XPath expression.
-
class
eulxml.xmlmap.fields.
SimpleBooleanField
(xpath, true, false, *args, **kwargs)¶ Map an XPath expression to a Python boolean. Constructor takes additional parameter of true, false values for comparison and setting in xml. This only handles simple boolean that can be read and set via string comparison.
Supports setting values for attributes, empty nodes, or text-only nodes.
-
class
eulxml.xmlmap.fields.
DateTimeField
(xpath, format=None, normalize=False, *args, **kwargs)¶ Map an XPath expression to a single Python
datetime.datetime
. If the XPath expression evaluates to an emptyNodeList
, aDateTimeField
evaluates to None.Parameters: - format – optional date-time format. Used with
datetime.datetime.strptime()
anddatetime.datetime.strftime()
to convert between XML text and Pythondatetime.datetime
objects. If no format is specified, XML dates are converted from full ISO date time format, with or without microseconds, and dates are written out to XML in ISO format viadatetime.datetime.isoformat()
. - normalize – optional parameter to indicate string contents
should have whitespace normalized before converting to
datetime
. By default, no normalization is done.
For example, given the field definition:
last_update = DateTimeField('last_update', format="%d-%m-%Y %H:%M:%S", normalize=True)
and the XML:
<last_update> 21-04-2012 00:00:00 </last_update>
accessing the field would return:
>>> myobj.last_update datetime.datetime(2012, 4, 21, 0, 0)
- format – optional date-time format. Used with
-
class
eulxml.xmlmap.fields.
DateTimeListField
(xpath, format=None, normalize=False, *args, **kwargs)¶ Map an XPath expression to a list of Python
datetime.datetime
objects. If the XPath expression evaluates to an emptyNodeList
, aDateTimeListField
evaluates to an empty list. Date formatting is as described inDateTimeField
.Actual return type is
NodeList
, which can be treated like a regular Python list, and includes set and delete functionality.Parameters: - format – optional date-time format. See
DateTimeField
for more details. - normalize – optional parameter to indicate string contents
should have whitespace normalized before converting to
datetime
. By default, no normalization is done.
- format – optional date-time format. See
-
class
eulxml.xmlmap.fields.
DateField
(xpath, format=None, normalize=False, *args, **kwargs)¶ Map an XPath expression to a single Python
datetime.date
, roughly comparable toDateTimeField
.Parameters: - format – optional date-time format. Used to convert between
XML and Python
datetime.date
; if no format, then the ISO format YYYY-MM-DD (%Y-%m-%d) will be used. - normalize – optional parameter to indicate string contents
should have whitespace normalized before converting to
date
. By default, no normalization is done.
- format – optional date-time format. Used to convert between
XML and Python
-
class
eulxml.xmlmap.fields.
DateListField
(xpath, format=None, normalize=False, *args, **kwargs)¶ Map an XPath expression to a list of Python
datetime.date
objects. SeeDateField
andDateTimeListField
for more details.
-
class
eulxml.xmlmap.fields.
SchemaField
(xpath, schema_type, *args, **kwargs)¶ Schema-based field. At class definition time, a SchemaField will be replaced with the appropriate
eulxml.xmlmap.fields.Field
type based on the schema type definition.Takes an xpath (which will be passed on to the real Field init) and a schema type definition name. If the schema type has enumerated restricted values, those will be passed as choices to the Field.
For example, to define a resource type based on the MODS schema,
resourceTypeDefinition
is a simple type with an enumeration of values, so you could add something like this:resource_type = xmlmap.SchemaField("mods:typeOfResource", "resourceTypeDefinition")
Currently only supports simple string-based schema types.
-
get_field
(schema)¶ Get the requested type definition from the schema and return the appropriate
Field
.Parameters: schema – instance of eulxml.xmlmap.core.XsdSchema
Return type: eulxml.xmlmap.fields.Field
-
-
class
eulxml.xmlmap.fields.
FloatField
(xpath, *args, **kwargs)¶ Map an XPath expression to a single Python float. If the XPath expression evaluates to an empty NodeList, an FloatField evaluates to None.
Supports setting values for attributes, empty nodes, or text-only nodes.
-
class
eulxml.xmlmap.fields.
FloatListField
(xpath, *args, **kwargs)¶ Map an XPath expression to a list of Python floats. If the XPath expression evaluates to an empty NodeList, an IntegerListField evaluates to an empty list.
Actual return type is
NodeList
, which can be treated like a regular Python list, and includes set and delete functionality.
Other facilities¶
-
eulxml.xmlmap.
load_xmlobject_from_string
(string, xmlclass=<class 'eulxml.xmlmap.core.XmlObject'>, validate=False, resolver=None)¶ Initialize an XmlObject from a string.
If an xmlclass is specified, construct an instance of that class instead of
XmlObject
. It should be a subclass of XmlObject. The constructor will be passed a single node.If validation is requested and the specified subclass of
XmlObject
has an XSD_SCHEMA defined, the parser will be configured to validate against the specified schema. Otherwise, the parser will be configured to use DTD validation, and expect a Doctype declaration in the xml content.Parameters: - string – xml content to be loaded, as a string
- xmlclass – subclass of
XmlObject
to initialize - validate – boolean, enable validation; defaults to false
Return type: instance of
XmlObject
requested
-
eulxml.xmlmap.
load_xmlobject_from_file
(filename, xmlclass=<class 'eulxml.xmlmap.core.XmlObject'>, validate=False, resolver=None)¶ Initialize an XmlObject from a file.
See
load_xmlobject_from_string()
for more details; behaves exactly the same, and accepts the same parameters, except that it takes a filename instead of a string.Parameters: filename – name of the file that should be loaded as an xmlobject. etree.lxml.parse()
will accept a file name/path, a file object, a file-like object, or an HTTP or FTP url, however file path and URL are recommended, as they are generally faster for lxml to handle.
-
eulxml.xmlmap.
parseString
(string, uri=None)¶ Read an XML document provided as a byte string, and return a
lxml.etree
document. String cannot be a Unicode string. Base_uri should be provided for the calculation of relative URIs.
-
eulxml.xmlmap.
parseUri
(stream, uri=None)¶ Read an XML document from a URI, and return a
lxml.etree
document.
-
eulxml.xmlmap.
loadSchema
(uri, base_uri=None)¶ Load an XSD XML document (specified by filename or URL), and return a
lxml.etree.XMLSchema
.Note that frequently loading a schema without using a web proxy may introduce significant network resource usage as well as instability if the schema becomes unavailable. Thus this function will fail if the
HTTP_PROXY
environment variable is not set.