You might have noticed that DB2 10 for Linux, UNIX, and Windows was announced earlier this month, and it contains a bunch of great new features. One of the new features that I am particularly excited about is Time Travel, which enables bitemporal data managenment in DB2.

DB2 10 also includes a variety of useful XML enhancements. Let’s take a quick look at the most important ones here:

New XML Index Data Types: Integer, Decimal

As you know, when you create an index on an XML element or attribute you must specify the data type that DB2 should use to interpret and index the specifies nodes. The reason is that the usage of XML Schema, which might define data types, is optional. And even if an XML Schema was used, it might not define a data type for each and every element and attribute.

Before DB2 10, an XML index for numeric values is specified with the data type DOUBLE, so that any type of numeric values can be indexed. In DB2 10 you can also specify the data types INTEGER and DECIMAL to define a numeric XML index.

If you know that a certain element or attribute can only contain INTEGER (or DECIMAL) values, then you can improve query performance by creating the XML index with the most appropriate type. For example, assume that dept is a table that has an XML column deptdoc. Then you can define the following indexes in DB2 10:

create unique index idx1 on dept(deptdoc) generate key using
xmlpattern ‘/dept/employee/@id’ as sql integer;

create index idx2 on dept(deptdoc) generate key using
xmlpattern ‘/dept/employee/salary’ as sql decimal(6,2);

You don’t need to change existing queries to use these new indexes!

XML Indexes for Case-Insensitive Search

DB2 10 allows you to define case-insensitive indexes and search very efficiently for string values regardless of their upper case or lower case spelling. For example, to enable case-insensitive search of employee names, you could create the following index:

create unique index idx1 on dept(deptdoc) generate key using
xmlpattern ‘/dept/employee/name/fn:upper-case(.)‘ as sql varchar(40);

This index stores all key values in upper-case. To find employees with names such as Peter Pan, PETER PAN,  Peter PAN, pEtEr pAn, etc. you could write the following query that can use the index above:

SELECT *
FROM dept
WHERE XMLEXISTS(‘$DEPTDOC/dept/employee[name/fn:upper-case(.) = “PETER PAN”]’);

Index support for Prefix Search

Previously, the following query with the fn:starts-with() function could not use an XML index to find all employees whose name starts with “Jo”:

SELECT *
FROM dept
WHERE XMLEXISTS(‘$DEPTDOC/dept/employee[fn:starts-with(name, “Jo”)]’);

In DB2 10, this query can benefit from existing XML indexes of type VARCHAR(n) that may exist on the <name> element.

XML Indexes for Existential (Structural) Predicates

An existential predicate, sometimes called structural predicate, is a predicate that checks the existence of an XML element or attribute in an XML document, regardless of its value.

For example, imagine you want to find all documents that contain a <comment> element for an employee. One of the following queries comes to mind:

SELECT *
FROM dept
WHERE XMLEXISTS(‘$DEPTDOC/dept/employee[comment]‘);

SELECT *
FROM dept
WHERE XMLEXISTS(‘$DEPTDOC/dept/employee[fn:exists(comment)]‘);

To support such queries, DB2 10 allows you to create an XML index with the fn:exists function, like this:

create unique index idx1 on dept(deptdoc) generate key using
xmlpattern ‘/dept/employee/fn:exists(comment)‘ as sql varchar(1);

This index must be defined as VARCHAR(1) and each index key represents true or false to indicate the existence of the <comment> element in a given document.

Friendly XML query behavior

DB2 10 also includes several enhancements that make XML queries behave in a more friendly manner. Let’s look at just one of several cases. For example, consider the following query that has a numeric predicate on the @id attribute:

select *
from dept
where xmlexists(‘$DEPTDOC/dept[@id = 15]’);

What happens if this query encounters an XML document where the @id attribute has a non-numeric string value such as “WW Sales”?  In DB2 9.x this situation would cause a type error at run-time and the query would fail and abort with an error message. Although this behavior was compliant with the language standard, it is not very useful. If you look for the department with id = 15, any department with non-numeric id can never be a match. Therefore, DB2 10 simply evaluates the predicate to FALSE in this case, i.e. ignores the document with the non-numeric id, and continues processing the query.

Binary XML client/server communication

With DB2 10, Java applications can choose the format in which XML is transmitted between the application and the DB2 server. You can choose between traditional textual XML and a new binary XML format that is a compressed on-the-wire format for XML.

On the application side, the encoding and decoding of XML to and from the binary format is performed by the JDBC driver (JCC 4.9 or higher). The JDBC driver can convert the binary XML format to and from a text,  DOM, SAX, or StAX representation. The use of binary XML provides the most notable performance gain for Java applications that already consume or produce XML in SAX format.

Applications can exchange XML with the DB2 server (through inserts, queries, etc.) and completely avoid the serialization of XML to text, i.e. neither the client nor the server needs to deal with textual XML for the purpose of insert or retrieving XML. Eliminating the parsing and serialization of textual XML can enable significantly better end-to-end performance.

Other IBM products that support this binary XML communication format include DB2 10 for z/OS, the WebSphere Application Server XML Feature Pack, and the WebSphere Datapower appliance.

Query Performance Enhancements

DB2 10 also includes a number of performance improvements for common types of XML queries, including many types of queries that use the XMLTABLE function.