If you are familiar with SQL you have probably seen quantified SQL predicates that compare a single value with a set of values. For example, the following SQL query returns all rows from table1 where col1 is greater than all values in col2 of table2.

SELECT *
FROM table1
WHERE col1  > ALL (SELECT col2 FROM table2)

And if you replace the keyword ALL with the keyword SOME, then the query above returns all rows from table1 where col1 is greater than at least one of the values in col2 of table2.

These comparisons are also known as universal quantification (“ALL”) and existential quantification (“SOME”), respectively.

Similar quantified comparisons are possible in XQuery too, but the syntax is slightly different. In XQuery you have the keywords “every … satisfies ….” for  universal quantification and “some … satisfies …” for existential quantification. Let’s use the following table and two simple sample documents to explore how they work:

create table customer(doc XML);

insert into customer values(‘
<customer id=”1″>
    <phone type=”cell”>408-516-4963</phone>
    <phone type=”cell”>408-087-1234</phone>
    <phone type=”cell”>408-111-222</phone>
</customer>
‘);

insert into customer values(‘
<customer id=”2″>
    <phone type=”home”>408-516-4963</phone>
    <phone type=”cell”>408-087-1234</phone>
    <phone type=”office”>408-111-222</phone>
</customer>
‘);

Now assume that we want to find customers where all their phone numbers are of type “cell”. We can code a corresponding query in XQuery or in SQL/XML notation as follows:

xquery
for $c in db2-fn:xmlcolumn(“CUSTOMER.DOC”)/customer
where every $p in $c/phone satisfies $p/@type = “cell”
return $c;

select *
from customer
where xmlexists (‘$DOC/customer[ every $p in ./phone satisfies $p/@type = “cell”]’);

Both queries return just the first of the two customer documents. The second document is not selected because not all of its phone elements have a @type attribute with the value “cell”.

The quantified expression every $p in $c/phone satisfies $p/@type = “cell” means that $p iterates over all the items in the sequence $c/phone,  and for each such $p the predicates $p/@type = “cell” must be true.

If we want to return all documents where at least one of the phone number is of type “cell” (existential quantification!), then we can simply replace the keyword “every” with the keyword “some”, such as in the following query:

select *
from customer
where xmlexists (‘$DOC/customer[ some $p in ./phone satisfies $p/@type = “cell”]’);

This query returns both our sample documents, because both documents contain at least one phone element where the @type attribute has the value “cell”. Since XPath uses existential quantification by default, you can often omit the “some … satisfies …” and obtain the same result with an ordinary path expression, like this:

select *
from customer
where xmlexists (‘$DOC/customer[phone/@type = “cell”]’);

For more information, use the following resources.

DB2 Information Center – Quantified Expressions:
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.xml.doc/doc/xqrquanexp.html

XQuery specification of quantified expressions:
http://www.w3.org/TR/xquery/#id-quantified-expressions

Generating XML from relational data is not difficult. The SQL standard contains a set of XML construction functions that take column names or other SQL expressions as input and return XML. The most prominent of these XML construction functions are:

  • XMLELEMENT (constructs an XML element)
  • XMLATTRIBUTES (constructs one or more XML attributes)
  • XMLFOREST (constructs a sequence of XML elements)
  • XMLTEXT (constructs an XML text node)
  • XMLAGG (aggregates data from multiple rows into a single XML fragment)
  • XMLNAMESPACES (constructs a namespace declaration)
  • etc.

Here is a simple example:

create table mydata(c1 int, c2 varchar(25));
insert into mydata values (1, ‘This is a test’);

select xmlelement(name “myroot”,
               xmlelement(name “col1”, c1),
               xmlelement(name “col2”, c2) )
from mydata;

<myroot><col1>1</col1><col2>This is a test</col2></myroot>

 1 record(s) selected.

The XML is typically returned in textual (serialized) format, as a single line without any line breaks. That’s perfectly fine, except that line breaks can sometimes be useful to make the XML easier to read for the human eye.

There are several ways in which you add add newline characters to the XML to make it more readable.

If the XML that you construct is less then 32KB in size, then you can use the SQL function REPLACE to replace every occurrence of ‘><‘ with the hex string 3E0A3C. The hex character 3E is the same as ‘>’, the hex character 3C is the same as ‘<‘, and the hex character 0A is a line feed in between them. Before we can apply the REPLACE function to the constructed XML, we need to explicitly convert the  XML to data type VARCHAR, which can be done with the function XMLSERIALIZE.

The following query is extended from the previous example by adding the functions XMLSERIALIZE and REPLACE:

select replace(
       xmlserialize(
          xmlelement(name “myroot”,
               xmlelement(name “col1”, c1),
               xmlelement(name “col2”, c2) ) AS VARCHAR(32000)),
         ‘><‘,
         x’3E0A3C’)       
from mydata;

<myroot>
<col1>1</col1>
<col2>This is a test</col2>
</myroot>

 1 record(s) selected.

The replace function takes three parameters:

  • the constructed serialized XML
  • the string that should be searched and replaced, i.e. ‘><‘
  • the new string that replaces the search string, i.e. x’3E0A3c’

However, this approach works only for smallish XML, up to 32KB.

If you generate larger XML, you have two options. First, you can perform the replace operation at the client that receives the XML. Second, you can explicitly construct XML text nodes that contain newline characters where you want them. For example:

select xmlelement(name “myroot”,
              xmltext(x’0A’),         
              xmlelement(name “col1”, c1),
              xmlelement(name “col2”, c2),
              xmltext(x’0A’) )
from mydata;

<myroot>
<col1>1</col1><col2>This is a test</col2>
</myroot>

 1 record(s) selected.

 

select xmlelement(name “myroot”,
              xmltext(x’0A’),         
              xmlelement(name “col1”, c1),
              xmltext(x’0A’),
              xmlelement(name “col2”, c2),
              xmltext(x’0A’) )
from mydata;

<myroot>
<col1>1</col1>
<col2>This is a test</col2>
</myroot>

 1 record(s) selected.

If you want a line break after every XML element then this approach can be tedious and makes the SQL statement quite a bit longer. Then again, the line breaks are really only needed for human readability, not for any client application that consumes the generated XML.

 

In my previous post I discussed the native XML capabilities provided by the IBM WebSphere Application Server V7.0 Feature Pack for XML. This Feature Pack allows you to manipulate XML in your Java application directly with XQuery, XPath, or XSLT.

If you use the WebSphere XML Feature Pack together with DB2 pureXML, one of the obvious questions is: when should I use XPath or XQuery in the application server and when should I push these operations to DB2 and run XQuery or SQL/XML in the database layer?

The best split between XML processing in one tier vs. the other certainly depends on the exact nature of your application and may vary on a case by case basis. Still, here are some general considerations and examples.

Example 1:
Assume your application needs to retrieve an XML document from the database based on one or multiple XML search conditions, but the decision about which information to extract from the document can only be made at runtime after the document has been retrieved. In this situation it is typically best to issue an XQuery or SQL/XML query to DB2 to retrieve the full document, exploiting DB2’s XML indexes to evaluate the search conditions efficiently. The extraction of specific elements is then performed with XPath or XQuery in the application server as part of the application’s business logic.

Example 2:
As a variation of Example 1, consider a hybrid database schema where some data is in relational columns and some in XML columns. Assume you need to retrieve XML documents based on relational and XML predicates, possibly involving joins between the table that holds the XML and other fully relational tables. It is typically recommended to use a database query that contains both relational and XML predicates to minimize the amount of data that is retrieved from the database and sent to the application. The extraction of XML elements would still happen in the application server, especially if it depends on post-retrieval application logic.

Example 3:
The application needs to retrieve specific elements from one or multiple XML documents, and these elements are fixed and known before the database query is submitted. In this case it is typically best to code the search and the extraction in a single XQuery or SQL/XML statement for DB2. Again, the general idea is to reduce data early and not ship more data around than necessary.

Example 4:
The application receives an XML document from a web service and needs to examine it with an XPath query before it can decide whether the document should be inserted into the database or passed to another service. Typically, this operation should be implemented in the application server instead of calling out to the database when it might be not needed.

Example 5:
The application receives a transient XML document from a web service and based on specific element values in that document the application needs to retrieve related documents from the DB2 database . Although you could pass the full input document as a parameter to a SQL/XML database query and perform the entire processing in DB2, it can be easier and more efficient to first extract the relevant values from the input document in the application layer and then send a simpler and more targeted query to DB2, for potentially better end-to-end performance.

Example 6:
The application needs to retrieve XML documents from the database and apply logic that has already been coded in XQuery modules and XQuery user-defined functions. Such modules are an optional feature of the XQuery language standard. Modules are supported in the WebSphere XML Feature Pack but not in DB2. Obviously, the choice is to retrieve the full documents from DB2 (similar to Examples 1 and 2) and then use the additional XQuery capabilities of the WebSphere XML Feature Pack on these documents.

 

These examples are certainly not an exhaustive of list potential scenarios. Also, they are not meant to be hard rules, but just illustrations of the trade-offs and considerations that can be involved in the decision process when you split XML processing between the application server and the database.

 

For concrete coding examples with the WebSphere XML Feature Pack and DB2 pureXML, see the following articles:

“Programming XML across the multiple tiers: Use XML in the middle tier for performance, fidelity, and development ease” – http://www.ibm.com/developerworks/library/x-xmlfeat1/

“Programming XML across the multiple tiers, Part 2: Write efficient Java EE applications that exploit an XML database server” – http://www.ibm.com/developerworks/xml/library/x-xmlfeat2/