If you -or your colleagues- are new to managing and querying XML data in DB2, then this 5-part article series “Get off to a fast start with DB2 9 pureXML” might be a good starting point. These articles were originally written for DB2 9.1, but recently all five parts have been updated for DB2 9.5 and 9.7. Here they are:

Part 1: XML to the core (Introduction)

Part 2: Create and populate a DB2 XML database

Part 3: Query DB2 XML data with SQL

Part 4: Query DB2 XML data with XQuery

Part 5: Develop Java applications for DB2 XML data

If you want to get a little bit deeper from here, I can recommend a few additional resources:

For example, if your interest is in querying XML data, take a look at “pureXML in DB2 9: Which way to query your XML data?” or “XMLTABLE by Example“.

For a deep dive on XML updates and transformations, see “Updating XML in DB2 9.5“.

In case you’re a Java developer and interested in the new XML features in JBDC 4.0, read my previous blog post on “XML Support in JDBC 4.0: The SQLXML Interface“.

XML performance topics are covered in “15 Best Practices for pureXML Performance in DB2 9” and “Exploit XML Indexes for XML Query Performance in DB2 9“. These two articles provide tuning and configuration guidelines for XML performance in DB2. But, if you’re more interested in performance results and measurements, see “A Performance Comparison of DB2 9 pureXML with CLOB and Shredded XML Storage or the results of the TPoX benchmark.

If these articles don’t provide what you’re looking for, browse through the DB2 pureXML Wiki or the DB2 pureXML Cookbook. – or post a comment to this blog with any requests or questions you might have!

Happy Reading…

If you are using DB2 on the z/OS platform and haven’t had a chance yet to explore its pureXML features, consider attending the DB2 for z/OS pureXML Hands-on Lab at this year’s DB2 Users Group conference (IDUG).

In this 2.5-hour lab you will create tables with XML columns, examine the underlying storage objects, define XML indexes, load XML documents, and use SQL/XML functions -such as XMLTABLE– to query and manipulate XML.

The IDUG 2010 North America conference is on May 10-14 in Tampa/Florida and offers more than 100 technical sessions from DB2 users, experts from third party vendors, DB2 Gold Consultants, and DB2 experts from IBM.

I can highly recommend this IDUG event. I have attended the IDUG conference many times and found it very valuable. There is so much useful information in the sessions, and the networking within the DB2 community is simply priceless (and fun!). The main focus areas are DB2 for Linux, UNIX, Windows and DB2 for z/OS as well as application development and tools for these systems.

If you want to read up on DB2 for z/OS pureXML, here are some good resources:

If you are or will be using XML in a relational database, chances are that you will be using SQL/XML to query XML and relational data in an integrated manner. Most XML applications are not black or white, meaning they do not use XML exclusively without any relationship to any current or existing relational data. More often than not, XML documents somehow relate to some structured data for which the relational data model might be the better choice. This makes the combined management of XML documents and relational data in a single database -often even in a single table!- so appealing.

One of the most popular and most versatile functions in the SQL/XML language standard is the XMLTABLE function. Let’s look a simple example.

First, let’s a create a simple database table that contains a mix of relational data and XML:

create table products(id integer, name varchar(20), details XML);

insert into products
values(1, 'Snow Shovel',
'<product>
  <fullname>Snow Shovel, Deluxe Edition</fullname>
  <category>S5-D</category>
  <color>Silver</color>
</product>');

insert into products
values(2, 'Gloves',
'<product>
  <category>K16-F</category>
  <color>Blue</color>
  <size range="Medium">7</size>
</product>');

Now assume we want to retrieve the id, name, category, color, and size for some or all of the products. We can issue a SQL/XML query such as the following:

select id, name, T.*
from products,
  XMLTABLE('$DETAILS/product'
    COLUMNS
      cat   VARCHAR(10) PATH 'category',
      color VARCHAR(10) PATH 'color',
      size  VARCHAR(10) PATH 'size/@range') as T;

ID          NAME                 CAT        COLOR      SIZE
----------- -------------------- ---------- ---------- ----------
 1          Snow Shovel          S5-D       Silver     -
 2          Gloves               K16-F      Blue       Medium

 2 record(s) selected.

The XMLTABLE function in the “from” clause is a table function that produces zero, one, or multiple rows for each XML document. In this example it produces one row for each XML document in the “product” table. Implicitly there is an inner join between the XMLTABLE function and the “product” table.

Now what happens if there is a row whose XML column is NULL? Let’s insert such a row and repeat the previous query:

insert into products values(3, 'Ice Scraper', NULL);

select id, name, T.*
from products,
  XMLTABLE('$DETAILS/product'
    COLUMNS
      cat   VARCHAR(10) PATH 'category',
      color VARCHAR(10) PATH 'color',
      size  VARCHAR(10) PATH 'size/@range') as T;

ID          NAME                 CAT        COLOR      SIZE
----------- -------------------- ---------- ---------- ----------
 1          Snow Shovel          S5-D       Silver     -
 2          Gloves               K16-F      Blue       Medium

 2 record(s) selected.

You may be surprised that the ID and NAME of the newly inserted product do not show up in this query result. A corresponding row is missing from the result set, because for this row there is no XML document on the XML-side of the inner join.

What we need in this case is a left outer join, so that the relational columns from the base table are returned even if the XML column (the other side of the implicit join) is NULL:

select id, name, T.*
from products LEFT OUTER JOIN
 XMLTABLE('$DETAILS/product'
   COLUMNS
     cat   VARCHAR(10) PATH 'category',
     color VARCHAR(10) PATH 'color',
     size  VARCHAR(10) PATH 'size/@range') as T ON 1=1;

ID          NAME                 CAT        COLOR      SIZE
----------- -------------------- ---------- ---------- ----------
 1          Snow Shovel          S5-D       Silver     -
 2          Gloves               K16-F      Blue       Medium
 3          Ice Scraper          -          -          -

 3 record(s) selected.

The join condition of this left outer join is trivial: 1=1, which always evaluates to true. It means that we want this left outer join to be applied to every row in this table.

You can read more about the XMLTABLE function in the developerWorks article “XMLTABLE By Example“, or in Chapter 7 and Chapter 9 of the DB2 pureXML Cookbook.

The XMLTABLE function is part of the SQL standard. DB2 supports it, Oracle does too, but as far as I know the XMLTABLE function is not available in SQL Server 2008.

More on XML ETL

April 5, 2010

To follow up on my previous post about XML ETL, the second part of the paper on using the InfoSphere Design Studio for ETL with XML has been published:

Using IBM InfoSphere Warehouse Design Studio with pureXML data, Part 2: Create a control flow for multiple ETL jobs involving XML

The first part of this article explained how to define and run an individual ETL data flow that uses XML data as input to populate relational and/or XML columns in a data warehouse. This second part describes how to build a single control flow that calls multiple data flows. Each data flow can extract, transform, and load XML data. A control flow allows you to orchestrate multiple data flows that have to be executed in a specific order. You can develop and test the individual data flows independently and use them as building block to build larger ETL processes, by using a control flow.