December | 2012 | Native XML Database

nativexmldatabase.com – 2008 to 2012

December 28, 2012

This blog on XML database technology and DB2 pureXML has been active for almost 5 years now. My former colleage Conor O’Mahony ran this blog from early 2008 to December 2009, and I took over on Dec 15, 2009. We have received a lot of feedback and many of the blog posts were inspired by questions from XML database users.

I’m not sure yet how much time I will have for blogging in 2013 and beyond. In any case, I liked the idea of producing a summary page that provides a quick overview of all the topics and posts in this blog. But, how to produce such a summary most efficiently?

Well, true to the topic of this blog I decided to load this blog as a piece of XML into DB2 pureXML and run queries on it!

First I downloaded the complete blog as an HTML document. But, HTML is not necessarily wellformed XML because HTML allows elements with missing end tags and other goofy things. Luckily there are free tools available that convert HTML into XHTML. XHTML is a form of HTML that complies with the rules for well-formed XML documents.

After converting the entire blog into an XML (XHTML) document, it took only three steps to create a list of blog post titles, dates, and URLs:

CREATE TABLE nativexmldatabase(doc XML);


LOAD FROM blog.del OF DEL REPLACE INTO nativexmldatabase(doc);


SELECT date, title, url
FROM nativexmldatabase,
   XMLTABLE('$DOC//div[@class="entrytitle"]'
   COLUMNS
     seq     FOR ORDINALITY,
     date    VARCHAR(20)   PATH 'h3',
     title   VARCHAR(105)  PATH 'h2/a',
     url     VARCHAR(145)  PATH 'h2/a/@href') AS T;

Of course, a quick look into the XML document is necessary to discover the required tag names and XPath expressions, such as //div[@class=”entrytitle”] to iterate over all the individual posts, or h2/a and h2/a/@href to retrieve the title and URL of each post.

This query produces a nice list of all blog post titles, dates, and URLs. Instead of posting this listing as-is, I decided to roughly group the blog posts along the following topic areas:

* Design
* Performance
* Migration
* Relational to XML or XML to relational
* XML Queries
* XML Storage and Indexing
* XML Schemas
* Application Development
* Use Cases and Case Studies
* ETL and Warehousing
* DB2 pureXML Product News
* Other Products and Tools
* Publications and Resources
* XML Coverage at IDUG and IOD Conferences
* News, Announcements, Miscellaneous

And here is the result, a table of contents for nativexmldatabase.com from 2008 through 2012. I hope this can serve as a quick reference to topics of interest.

Date	Design
Dec 4, 2012	Good and bad XML design
Jan 21, 2012	Business Records in the 21st Century
Jan 8, 2012	Data Normalization Reconsidered
Jan 21, 2011	Name/Value Pairs – A pretty bad idea in XML as in Relational !
Jan 8, 2011	XML Manipulation in Application code vs. a Native XML Database
Sep 28, 2010	5 Reasons for Storing XML in a Database
Sep 26, 2008	Flexible Schemas: When to Persist Data in XML Instead of Relational
Aug 4, 2008	When to Store in XML Format
Mar 21, 2008	When to use a Native XML Database

Date	Performance
Oct 23, 2012	Overcoming Performance Obstacles in XML Encryption
Nov 15, 2011	TPoX 2.1 has been released!
May 9, 2011	News Flash: Intel publishes TPoX Benchmark results on new 10-core Westmere-EX CPUs
Mar 4, 2011	New TPoX Benchmark Results Available
Jan 8, 2011	XML Manipulation in Application code vs. a Native XML Database
Aug 22, 2010	XML versus Relational Database Performance
July 14, 2010	A 10TB XML Data Warehouse Benchmark
Nov 3, 2008	Benchmark for 1TB Transactional XML System
Mar 20, 2008	Native XML versus CLOB and Shredding

Date	Migration
June 29, 2012	How to migrate from the XML Extender to DB2 pureXML
Oct 3, 2010	How to migrate XML from LOB to XML columns

Date	Relational to XML or XML to relational
Aug 30, 2010	XML Construction in Views and User-Defined Functions
Aug 24, 2011	How to generate XML from relational data – with line breaks!
June 8, 2011	How to quickly produce XML from relational tables
June 5, 2011	How to insert relational data into existing XML documents (Part 2)
May 21, 2011	How to add relational data into existing XML documents (Part 1)

Date	XML Queries
Sep 13, 2012	How to Validate XML Documents in Database Queries
Aug 14, 2012	Result set cardinalities when querying repeating XML elements
Aug 10, 2012	Using the XMLTABLE function with MERGE statements in DB2 for z/OS
July 30, 2012	Using the XMLTABLE function in UPDATE and MERGE statements
May 24, 2012	How to query CDATA sections in XML
Feb 4, 2012	How to list the paths of all elements in an XML document?
Nov 30, 2011	XQuery support in DB2 10 for z/OS
Oct 9, 2011	Advanced SQL/XML: Joins and FLWOR Expressions in the XMLTABLE Function
Aug 29, 2011	Quantified expressions in XQuery: When ‘some’ and ‘every’ satisfy!
Mar 18, 2011	XQuery and SQL/XML: How to convert a date that is not a date into a date?
Aug 11, 2010	How to order query results based on XML values
May 26, 2010	XML Profiling – How to get a list of all elements and attributes
May 9, 2010	How to deal with dirty data
July 24, 2011	XQuery and Other XML Manipulation in SQL Stored Procedures
April 6, 2011	Returning data from one path based on predicates in another (part 2/2)
April 3, 2011	Intra-document references: Returning data from one path depending on predicates in another path
June 17, 2010	An XQuery Cheat Sheet
April 12, 2010	The XMLTABLE function and a case where a left outer join can help
Feb 21, 2010	Get Started with XML Full-Text Search in DB2
Jan 25, 2010	tokenize it !
Jan 11, 2010	Watch your step! Positional predicates in XPath.
Oct 14, 2008	XQuery: Introduction, Tips, and Future Directions
July 25, 2008	XQuery versus SQL/XML – Part Deux
June 26, 2008	XQuery versus SQL/XML

Date	XML Storage and Indexing
July 11, 2012	XML Storage in DB2: To inline or not to inline?
June 8, 2012	XML Indexing in DB2 9.x and DB2 10 for Linux, UNIX, and Windows
July 11, 2011	How to index XML documents in the presence of schema diversity?
July 19, 2010	Structure-agnostic XML indexes in DB2
Dec 26, 2010	How to insert XML documents from the file system?
Feb 21, 2010	Get Started with XML Full-Text Search in DB2
May 6, 2009	DB2 Compresses XML Data by 60% to 80%
May 3, 2009	Multi-Dimensional Clustering for Tables with Native XML Data
May 1, 2009	Use Hash Partitioning for Fast Analysis of XML Data
Oct 16, 2008	DBA Concerns about Native XML Storage
April 8, 2008	Native XML Reduces Storage Costs

Date	XML Schemas
Sep 13, 2012	How to Validate XML Documents in Database Queries
Feb 29, 2012	What is an XML Schema and why should I care?
Aug 22, 2008	Schema Evolution
Aug 21, 2008	Schema Flexibility

Date	Application Development
Sep 29, 2011	At your service: REST with DB2 pureXML!
Aug 8, 2011	Using XQuery across Application Server and Database Server
July 30, 2011	Beyond the Database: Native XML and XQuery in the Application Server
Feb 25, 2011	CICS meets XML: Using CICS Web Servives with DB2 pureXML
Jan 8, 2011	XML Manipulation in Application code vs. a Native XML Database
Oct 8, 2010	Buidling XML Solutions with Rational Software Architect (RSA) and Rational Application Dev
Mar 1, 2010	XML Support in JDBC 4.0: The SQLXML Interface
Nov 9, 2009	Building a JSON and DB2 pureXML Application
Aug 5, 2008	Native XML Storage Reduces Development Costs
Nov 18, 2008	Referential Integrity and XML Data

Date	XML Use Cases and Case Studies
Sep 12, 2011	DB2 pureXML – Rich in Proteins!
June 24, 2011	A true story about using XML forms and DB2 to process millions of tax returns
May 12, 2010	Webcast on XML Use Cases (May 18)
Feb 7, 2010	BJC HealthCare Improves Clinical Research with DB2 pureXML
Nov 2, 2009	Electronic Forms Using Adobe PDF and IBM DB2 pureXML
Sep 8, 2009	The Patient-Centered Medical Home
Aug 24, 2009	Electronic Health Records for Smarter Healthcare
July 17, 2009	Article about XML for Healthcare
June 16, 2009	Building an XML-Based Electronic Forms Solution
June 5, 2009	Short Video Highlighting Early Experiences with DB2 9.7
Mar 30, 2009	GUS Group uses DB2 pureXML for ERP System
Mar 13, 2009	UCLA Health System uses XML and SOA to Optimize Data Integration and Migration
Mar 2, 2009	Storing and Retrieving ACORD Data for Insurance
Feb 26, 2009	Speedy Communications and Collaboration in XML-based Environments
Dec 30, 2008	XForms and Native XML Storage
Mar 27, 2008	How to Choose Solution Areas
Mar 25, 2008	More information about Storebrand
Mar 23, 2008	Productivity Improvements at Storebrand

Date	ETL and Warehousing
April 13, 2011	XML ETL with DataStage 8.5
July 14, 2010	A 10TB XML Data Warehouse Benchmark
June 23, 2010	More on Data Warehousing and XML
April 5, 2010	More on XML ETL
Mar 26, 2010	XML ETL
Jan 31, 2010	New Article on XML in the Data Warehouse
May 28, 2009	XML in the Data Warehouse
April 22, 2009	Fast Performance for Native XML Data in the Warehouse
Aug 24, 2009	Article on Integrating XML Operational Data into a Data Warehouse

Date	DB2 pureXML product news
April 19, 2012	XML Enhancements in DB2 10 for Linux, UNIX, and Windows
Nov 30, 2010	Node-level XML Updates in DB2 10 for z/OS
Nov 18, 2010	New XML Features in DB2 10 for z/OS !
June 8, 2010	Updated version of a free XML database: DB2 Express-C 9.7.2
Aug 7, 2009	SOA Projects: IBM DB2 versus Oracle Database
Feb 11, 2009	XML Database in the Cloud
Feb 10, 2009	pureXML Moves into Core DB2 Product

Date	Other Products and Tools
Dec 5, 2010	XML Tools for DB2 pureXML
Mar 8, 2010	Integration of Altova XMLSpy with DB2 pureXML
Jan 18, 2010	XPath and XQuery Functions in SQL Server vs. DB2
Jan 4, 2010	Shredding and constructing XML in mySQL
Dec 23, 2009	XML Support in mySQL
Dec 17, 2009	Oracle vs. mySQL: The Full Monty
Aug 7, 2009	MyCareTeam and IBM Collaborate to Improve Continuing Care for Diabetes
July 8, 2009	Why Won’t Oracle Publish XML Benchmark Results for TPoX?
Sep 30, 2008	XML in Oracle 11g
Sep 24, 2008	XML in SQL Server 2008
Sep 19, 2008	XML in SQL Server 2005
July 11, 2008	Questions for XML Database Vendors
May 13, 2008	XML-only Databases
May 12, 2008	All Native XML Databases are not Equal

Date	Publications and Resources
Feb 13, 2011	Extremely pureXML in DB2 10 for z/OS
Sep 14, 2010	DB2 pureXML Cookbook – Errata
Aug 13, 2009	DB2 pureXML Cookbook – 45% Discount
June 3, 2009	DB2 pureXML for Dummies . Get Your Copy!
April 29, 2010	Get off to a fast start with DB2 pureXML
April 10, 2009	DB2 pureXML Cookbook
Feb 25, 2009	Intel Insights Magazine covers Native XML Storage
Feb 5, 2009	Henrik’s Blog
Feb 5, 2009	DB2 pureXML leads SOA World Magazine Reader’s Choice Award
Jan 21, 2009	Podcast Series about DB2 pureXML on the Mainframe
Jan 20, 2009	Reports for Information in Native XML Format
Dec 30, 2008	Anant Jhingran on Improving XML Applications
Sep 1, 2008	DB2 pureXML Online Communities
July 27, 2008	Getting Started with Native XML Databases

Date	XML Coverage at IDUG and IOD Conferences
Oct 21, 2012	XML from Vegas to Berlin!
Oct 18, 2011	XML in Las Vegas !
April 25, 2011	XML sessions at IDUG North America 2011
Oct 21, 2010	Next week in Vegas
April 18, 2010	DB2 for z/OS pureXML Hands-on Lab @ IDUG North America
Oct 23, 2009	DB2 pureXML at the IOD Conference
May 28, 2009	IBM IOD Conference Attendees get a Poken for Networking
May 13, 2009	Short Video from the IDUG North America Confernece
Nov 11, 2008	XML at IBM Information on Demand Conference
Sep 11, 2008	DB2 pureXML at the IOD Conference
Sep 4, 2008	Meet Native XML Databases Users

Date	News, Announcements, Miscellaneous
July 30, 2010	DB2 Express-C Light – A reduced footprint XML database
Mar 15, 2010	More DB2 pureXML Bootcamps
Feb 14, 2010	DB2 pureXML Webinars and Bootcamps
Dec 15, 2009	Hi there here is the new guy!
Dec 13, 2009	Welcome Matthias Nicola!
June 9, 2009	Flirting with Poken
April 22, 2009	The XML Rap Superstar
April 22, 2009	IDUG announces winners of XML Challenge
April 20, 2009	Teaser for Upcoming DB2 pureXML Features
Mar 12, 2009	DB2 and Cloud Computing – Chat with the Labs
Feb 5, 2009	Free Online Conference – Data in Action
Jan 20, 2009	DB2 Technology Sandbox
Dec 26, 2008	Native XML Storage for the Mac
Dec 3, 2008	Update on the XML Challenge
Nov 5, 2008	Why IBM DB2 is ideal for Transactional XML
Oct 9, 2008	10 Reasons why DBAs Should Understand Native XML
Oct 2, 2008	XML Challenge Web Site is Live!
Sep 28, 2008	1st Oct is Online Community Action Day
Aug 7, 2008	Webcast: A primer for storing and retrieving XML data
July 1, 2008	Viral Video – Wednesday
June 20, 2008	Viral Video – Tuesday
June 16, 2008	Viral Video – Monday
June 13, 2008	Learn to Use XML with Databases and win Prizes!
Mar 22, 2008	How to Market New Technology
Mar 19, 2008	Welcome

Happy browsing!

Posted by Matthias Nicola
Filed in Uncategorized

Good and bad XML design

December 4, 2012

Whenever people design things –such as houses, software systems, database schemas, or even XML structures– there are good and not so good design options. Roughly speaking, a design is usually considered “good” if it allows for easy and (cost-)efficient use and implementation of whatever is being built.

For XML documents we have seen various cases of “good” and “bad” designs over the years. Previously I wrote about name/value pair design for XML documents, which often is not a good choice.

Recently I received a question regarding another design choice, which I would like to share here in a simplified form.

The question was about an XML message format to transmit a list of order numbers and the corresponding order system identifiers. Two alternative XML structures were proposed:

Option A:

<OrderList>
  <Order>
    <OrderNumber>A0000001</OrderNumber>
    <OrderSystem>ABC</OrderSystem>
  </Order>
  <Order>
    <OrderNumber>B0000001</OrderNumber>
    <OrderSystem>XP1</OrderSystem>
  </Order>
  <Order>
    <OrderNumber>C0000001</OrderNumber>
    <OrderSystem>Q-9</OrderSystem>
  </Order>
</OrderList>

Option B:

<OrderList>
    <OrderNumber>A0000001</OrderNumber>
    <OrderSystem>ABC</OrderSystem>
    <OrderNumber>B0000001</OrderNumber>
    <OrderSystem>XP1</OrderSystem>
    <OrderNumber>C0000001</OrderNumber>
    <OrderSystem>Q-9</OrderSystem>
</OrderList>

The motivation for option B might be a smaller message size. One might argue that XML elements are by definition ordered, and if we know that the message consists of pairs of order number and order system, then XML format B is as easy to interpret as format A. Or, is it not?

The problem that I see with option B is that the absence of the <Order> elements gives up the explicit structure of the data and instead requires implicit knowledge, i.e. one must know that the OrderNumber at position i belongs to the OrderSystem at position i+1.

In my view, this design is against the principles of XML. In XML, the tags and their nesting should describe the data explicitly. XML is meant to be a hierarchical data format so that data items that belong together are grouped under the same parent (as in option A). After all, that is the benefit of XML over a flat file format.

I also think that using the explicit structure of option A is less error-prone than using option B which, requires implicit knowledge based on element positions.

How about queryability?

It turns out that XML format A is considerably easier to query than option B. Let’s assume you need to code a SQL/XML query to list the order numbers and systems. For that exercise we assume that the messages are stored in a table “mytable” with XML column “doc”.

The query for XML option A is very simple, using the XMLTABLE function in its most basic form. The row-generating XPath expression $DOC/OrderList/Order iterates over the orders, and the column expressions extract the order number and system.
http://www.ibm.com/developerworks/data/library/techarticle/dm-0708nicola/

-- Query for XML option A:

SELECT OrderNumber, OrderSystem
FROM mytable,
     XMLTABLE ('$DOC/OrderList/Order'
       COLUMNS
         OrderSystem VARCHAR(20) PATH 'OrderSystem',
         OrderNumber VARCHAR(20) PATH 'OrderNumber' );

ORDERNUMBER          ORDERSYSTEM
-------------------- --------------------
A0000001             ABC
B0000001             XP1
C0000001             Q-9

3 record(s) selected.

How can you produce the same result set from XML format B? This is remarkably tricky! You need to iterate over the child elements and keep track of which element is at position i versus i+1 to produce the proper pairs in the output.

There are likely several ways to code the query for XML option B, and you see one solution below. It seems hard to query the elements by name alone because the important structural information is by position. Hence, this query iterates over the positions 1 to N, where N is the number of elements. For each even position 2, 4, 6, etc. the query produces the pair of the current ($pos) and the previous ($pos – 1) element to construct the proper output.

-- Query for XML option B:

SELECT OrderNumber, OrderSystem
FROM mytable,
     XMLTABLE('for $pos in (1 to count($DOC/OrderList/*))
               where $pos mod 2 = 0
               return  <pair>
                         {$DOC/OrderList/*[$pos - 1]}
                         {$DOC/OrderList/*[$pos]}
                       </pair>'
       COLUMNS
         OrderNumber VARCHAR(20) PATH 'OrderNumber',
         OrderSystem VARCHAR(20) PATH 'OrderSystem' );

OrderSYSTEM          OrderNUMBER
-------------------- --------------------
ABC                  A0000001
XP1                  B0000001
Q-9                  C0000001

3 record(s) selected.

This query is not only more complex but also potentially less efficient. Note that the return clause of the FLOWR expresion constructs the explicit pairing that it missing in XML format B.

Let me know if you can think of a simpler query to produce the same result from XML format B.

Next, imagine a message that doesn’t have just 2 fields per order but maybe 10 or 100 fields per order. Then option B becomes increasingly more ugly.

Or worse, what if there is a schema change that allows one order to have a different number of fields than the next? Then the query that we coded for XML format B will no longer work because it was based on a specific structural assumption that is no longer true. In contrast, our query for XML format A will work even if you add additional fields to some of the orders in the message.

In summary, the benefits for XML option A include the following:

Explicit structure rather than implicit positional format
Easier and more efficient to query (query by name rather query by postion)
More resilient to schema changes

I think that these benefits far outweigh the concern that XML format A is larger. Also, you could chose to send or store the XML in a compressed format to greatly alleviate space concerns.

Posted by Matthias Nicola
Filed in Uncategorized

2 Comments »

Native XML Database