计算机科学论坛--如何提取SQL中的数据直接生成XML文档？

数据库的XML API
作者：Ramnivas Laddad
如何通过使用SAX、DOM编程接口集成XML与数据库？
简述：
大多数WEB应用需要用到数据库中的信息。而XML由于将数据的内容与表现相分离的特性，很快成为一种数据交换的工业标准。大多数XML工具使用DOM和SAX编程接口。本文提供了一种将数据库与XML二者长处相结合的方法。另外，本文还提供了一种可通过任何JDBC数据源实现的简单的、纯JAVA的XML数据库编程接口。通过此方法，XML工具可将一个数据库当作一个虚拟的XML文档来处理。—— Ramnivas Laddad
正文：
数据库和XML提供了数据存储的完整功能。数据库保存数据用于高效的数据查询，而XML则提供了一种不同应用间信息交换的简单途径。为了利用XML的优点，我们需要将数据库表转化为XML文档。然后我们便可以使用指定的XML工具对这些文档进行其它处理。例如，XML文档可通过XSLT样式表转化为HTML页显示，或通过如XQL这样基于XML的查询语言进行检索，或作为一种数据交换格式，等等。然而，通常将一个数据库转化为XML文档代价不菲，要包括开始时数据转化的花费及以后数据源同步的花费。

为了能处理XML文档，大多数XML工具使用SAX与DOM编程接口。本文中，我们将看到一种数据库通用编程接口的实现方法，它使得XML工具能象处理XML文档一样处理数据库。通过这种方法，我们可以避免对数据库的XML物理转化。

我们会看到一种用于数据库的SAX编程接口实现，它可通过任何JDBC引擎实现对数据库的操作。然后，我们会看到一种用于数据库的DOM编程接口实现，它是通过SAX编程接口间接实现的。为了演示这种用于数据库的SAX编程接口，我们会看到将其与XT（一种XSLT处理器）的集成。我们同样会看到有关这种集成一个范例，它演示了通过将XSLT样式表如何直接作用于数据库来建立一个HTML页面，及如何将一个数据库转化为一个XML文档。最后，我们会看到如何将用于数据库的DOM编程接口与一个XQL处理器相结合。

本文中，作者利用已有的工具而不是建立一个新的工具来阐明用于数据库的SAX及DOM应用，并显示如何支持众多的XML工具对数据库进行操作。所有本文中提及的XML工具都是免费的（自由软件或非商业用途免费），当然，即使如此，你仍然应该好好看一下有关版权的说明。

SAX与DOM编程接口的概况
SAX是一种基于事件的XML编程接口。通过它，SAX解析器可搜索一个XML文档，并告诉应用程序如某元素的开始与结束等事件。由于解析器是通过查看XML文档的不同部分来产生事件的，因此不需要建立任何内部的结构。这大大减少了对系统资源的需求，并且对那些较大的XML文档的解析非常合适。对于那些以接收数据流形式处理的XML文档，基于事件的XML编程接口是唯一选择。

另一方面，DOM编程接口采用的是一种树型结构。元素之间具有亲子关系，通过DOM编程接口，解析器基于XML文档建立一个内部结构，从而使应用可以通过树型模式对其进行操作。DOM允许一个应用随意访问树型结构文档，代价是增加了内存的负荷。

面向数据库XML编程接口：基本内容
由于数据库具有高度规范的数据存储结构，因此我们可以将其映射为以数据为中心的XML文档。例如，我们可以通过如下的DTD范例转化一个数据库为XML文档：
<!ELEMENT table rows*>
<!ELEMENT rows (column1, column2, ...)>
<!ELEMENT column1 #PCDATA>
<!ELEMENT column2 #PCDATA>
....
换句话说，通过一个XML数据库编程接口，我们可以使数据库看起来像一个XML文档：即使用API将数据库封装为一个虚拟的XML文档。这里我们使用了面向对象设计的基本概念：即我们提供的是一个接口，而不是方法的实现。从应用的角度，使用这种XML数据库编程接口的工具并不关心它们处理的实际是XML文档或是一个数据库表。

面向数据库的SAX编程接口实现
为了实现数据库用的SAX编程接口，我们需要实现一个基于JDBC的解析器，遍历数据源的每一行与列，并产生适当的SAX事件。SAX规范提供了org.xml.sax.InputSource类，它可以将一个数据源以一个URL或一个数据字节流的方式引用。我们可以使用JDBCInputSource，它扩展了org.xml.sax.InputSource类，以下是JDBCInputSource的详细内容：
// JDBCInputSource.java
package dbxml.sax;
import java.sql.*;
import org.xml.sax.InputSource;
public class JDBCInputSource extends InputSource {
    private String _connectionURL;
    private String _userName;
    private String _passwd;
    private String _tableName;
    public JDBCInputSource(String connectionURL, String userName,
                           String passwd, String tableName) {
        super(connectionURL);
        _connectionURL = connectionURL;
        _userName = userName;
        _passwd = passwd;
        _tableName = tableName;
    }
    public String getTableName() {
        return _tableName;
    }
    public Connection getConnection() throws SQLException {
        return DriverManager.getConnection(_connectionURL, _userName, _passwd);
    }
}
在上述的代码中，构造函数使用了数据库连接所需的信息及将被解析的数据库表名。方法getConnection()连接数据库并返回一个连接对象。
下一步，我们需要通过JDBCInputSource实现SAX解析器，并遍历数据库表的行与列，产生SAX事件。为了简化代码，我们创建了一个抽象的ParserBase类，它实现了org.xml.sax.Parser类并负责管理不同的句柄。然后我们建立一个基于JDBC的SAX解析器JDBCSAXParser，它扩展了ParserBase类：
(To view the code for ParserBase.java, click here.)
// JDBCSAXParser.java
package dbxml.sax;
import java.io.IOException;
import java.sql.*;
import org.xml.sax.*;
import org.xml.sax.helpers.AttributeListImpl;
public class JDBCSAXParser extends ParserBase {
    private static final AttributeList _stockEmptyAttributeList
        = new AttributeListImpl();
    //------------------------------------------------------------------
    // Methods from the Parser interface
    //------------------------------------------------------------------
    public void parse (InputSource source) throws SAXException, IOException {
        if (! (source instanceof JDBCInputSource)) {
            throw new SAXException("JDBCSAXParser can work only with source "
                                   + "of JDBCInputSource type");
        }
        parse((JDBCInputSource)source);
    }

    public void parse (String systemId) throws SAXException, IOException {
        throw new SAXException("JDBCSAXParser needs more information to "
                               + "connect to database");
    }

    //------------------------------------------------------------------
    // Additional methods
    //------------------------------------------------------------------
    public void parse(JDBCInputSource source)
        throws SAXException, IOException {
        try {
            Connection connection = source.getConnection();
            if (connection == null) {
                throw new SAXException("Could not establish connection with "
                                       + "database");
            }

String sqlQuery = getSelectorSQLStatement(source.getTableName());
PreparedStatement pstmt = connection.prepareStatement(sqlQuery);

            ResultSet rs = pstmt.executeQuery();
            parse(rs, source.getTableName());
            rs.close();

            connection.close();
        } catch (SQLException ex) {
            throw new SAXException(ex);
        }
    }

    public void parse(ResultSet rs, String tableName)
        throws SAXException, SQLException, IOException {
        if (_documentHandler == null) {
            return; // nobody is interested in me, no need to sweat!
        }

ResultSetMetaData rsmd = rs.getMetaData();
int numCols = rsmd.getColumnCount();

String tableMarker = getTableMarker(tableName);
String rowMarker = getRowMarker();

        _documentHandler.startDocument();
        _documentHandler.startElement(tableMarker, _stockEmptyAttributeList);
        while(rs.next()) {
            _documentHandler.startElement(rowMarker,
                                          _stockEmptyAttributeList);
            for (int i = 1; i <= numCols; i++) {
                generateSAXEventForColumn(rsmd, rs, i);
            }
            _documentHandler.endElement(rowMarker);
        }
        _documentHandler.endElement(tableMarker);
        _documentHandler.endDocument();
    }

    public void parse(String connectionURL, String userName, String passwd,
                      String tableName) throws SAXException, IOException {
        parse(new JDBCInputSource(connectionURL, userName, passwd, tableName));
    }

    //------------------------------------------------------------------
    // Protected methods that derived classes could override to
    // customize the parsing.
    //------------------------------------------------------------------
    protected void generateSAXEventForColumn(ResultSetMetaData rsmd,
                                             ResultSet rs,
                                             int columnIndex)
        throws SAXException, SQLException {
        String columnValue = rs.getString(columnIndex);
        if (columnValue == null) {
            return;
        }
        String columnMarker
            = getColumnMarker(rsmd.getColumnLabel(columnIndex));
        char[] columnValueChars = columnValue.toCharArray();
        _documentHandler.startElement(columnMarker,
                                      _stockEmptyAttributeList);
        _documentHandler.characters(columnValueChars,
                                    0, columnValueChars.length);
        _documentHandler.endElement(columnMarker);
    }

    protected String getTableMarker(String tableName) {
        return tableName;
    }
    protected String getRowMarker() {
        return "row";
    }
    protected String getColumnMarker(String columnName) {
        return columnName;
    }
    protected String getSelectorSQLStatement(String tableName) {
        return "select * from " + tableName;
    }
}
让我们来看看上述代码的详细内容。JDBCSAXParser包括了几个重载的parse()方法。在下表中org.xml.sax.Parser接口需要实现parse(InputSource)与parse(String)方法。其它parse()方法简化了代码并允许通过派生类重载以改变其解析方法。
 如果参数为JDBCInputSource类型，Parse(InputSource)方法调用parse(JDBCInputSource)方法，否则，产生一个SAXException事件表示无法处理数据源。
当所提供的信息不足以访问数据库时parse(String)方法产生一个SAXException事件。
 Parse(JDBCInputSource)方法对输入源建立一个连接对象，并执行一个查询以获得一个ResultSet对象。然后对该对象可调用parse(ResultSet)方法。
 Parse(ResultSet,String)方法执行解析的核心逻辑。它遍历结果集中的每一行与字段。在对每一行循环时调用方法StartElement()与endElement()（将数据库表标识作为元素名参数）。同样的，在对每一行记录的每个字段循环时会调用方法StartElement()与endElement()（将行标识作为元素名参数）。在上述两种情况中一个空的属性表作为第二参数传递给startElement()。在访问记录的每个字段时，方法generateSAXEventForColumn()被调用（使用字段名与字段值为参数）。通过对结果集对象使用getString90方法可获得单个字段的值，同样我们需要用一个字符串表征字段数据，并在characters()事件中使用。
方法parse(String, String, String, String)通过传递给它的参数简单有效地建立了一个JDBCInputSource对象，然后可对该对象使用parse(JDBCInputSource)方法。
方法JDBCSAXParser（protected方式）通过过载提供了一些专门的特性：
 方法generateSAXEventForColumn()为字段数据产生事件。一个数据库中的字段空值（null）与一个字段零值（empty）有着不同的含义。我们通过过滤那些具有null值的字段来捕获这种差别。另一种表现数据库中null值的方法是使用一个二进制属性（如isNull）。通过该选项，一个真值（true）被认为是null值，否则就不是。
 GetTableMarker()，getRowMarker(),及getClumnMarker()方法可分别返回合适的表、行、字段默认值。派生出来的类可重载这些方法以提供特定的标识。
方法GetTableMarker()返回一个“select * from <tableName>”字符串。派生出来的类可通过重载该方法以提供一个不同的select query字符串，并实现数据库级的过滤。

类JDBCSAXUtils提供两种方法来建立一个JDBCInputSource对象：通过属性（property）文件或一个Property对象。它不需要通过SAX或DOM编程接口提供一个有关数据库的参数表给应用程序。它希望用户来提供一个包含完整数据库URL入口的属性文件，一个可连接到数据库的用户名及密码，一个用于建立连接JDBC数据库引擎，及数据库表名。以下是一个典型的属性文件：
# portfolio.prop
# JDBCSAXSource property file
URL=jdbc:odbc:db1
user=jw
password=jw-passwd
table=portfolio
driver=sun.jdbc.odbc.JdbcOdbcDriver
我们现在有了一个简单的解析器，它能对数据库表产生适当的SAX事件。它能区分null值并提供一些专用的标识。这些功能对于一些应用已经足够了，而某些完整的解决方案还需要一些附加的功能，这是因为：

解析器不能合并那些关联的信息，该问题的解决可通过使用Xpointer/Xlink来设置表中外键的相关信息。
数据库中的一个text字段可能会包含标识（marked-up）信息。一个用于数据库的SAX解析器应该能解析这类数据并产生适当的SAX事件。如果这类功能对某个应用非常重要，可以通过重载generateSAXEventForColumn()方法并解析该字段的内容以产生附加的SAX事件

在数据库中，一个数据库表包含了未排序的字段集；有关字段存储的排序并不重要。另一方面，一个XML DTD，无法描述一个未排序的子元素集。

我们可以通过几种方法处理该问题。如果我们要将数据库转化为另一种XML文档，比如说一个HTML页面，为其定制的XSLT样式表可建立正确排序的输出结果。我们也可以重载getSelectorSQLStatement()方法直接提供一个正确排序的字段列表。

有时我们希望通过某些查询能将一个表的被选择部分作为一个XML文档封装。如果XML工具能实现这样的过滤，我们就可以更好的使用数据库。方法getSelectorSQLStatement()可以通过重载并返回合适的select结果字符串。

解析器通过对结果集（result-set）对象使用getString()方法可获得字符串形式的某个字段值。该操作对于文本、数字等类型的字段非常合适，但对于二进制数据就不适合。当使用文本表示二进制数据时，在某些操作中会无法使用。解析器同样无法处理某些可被SQL3/JDBC2.0接受的用户自定义的数据类型。

对于上述问题我们可以通过重载generateSAXEventForCloumn()方法以及提供一种合适的处理（implementation）来实现。

面向数据库的DOM编程接口实现
为了建立一个对应于一个数据库表的DOM树，我们可以遍历每一行每个字段并为其建立树结点，或者我们可以通过其它的类库，如Sun的JAXP工具，它可以通过一个SAX事件流建立一个DOM树。后一个方法更简单，代码更简练，因为它利用了一个已有的功能。为了通过这种方法实现DOM编程接口，我们需要一个合适的SAX数据库解析器，使我们的实现更方便。
将DOM数据库编程接口与XQL处理器相集成
通过类JDBCDOMParser实现面向数据库的DOM编程接口：
// JDBCDOMParser.java
package dbxml.dom;
import java.io.IOException;
import org.w3c.dom.Document;
import org.xml.sax.SAXException;
import com.sun.xml.tree.XmlDocumentBuilder;
import dbxml.sax.*;
public class JDBCDOMParser {
    public static Document createDocument(JDBCInputSource inputSource)
        throws SAXException, IOException {
        XmlDocumentBuilder documentBuilder = new XmlDocumentBuilder();
        JDBCSAXParser saxParser = new JDBCSAXParser();
        documentBuilder.setParser(saxParser);
        saxParser.parse(inputSource);
        return documentBuilder.getDocument();
    }
}
  有关类JDBCDOMParser的实现较为简单，它通过JAXP所提供的XmlDocumentBuilder类可以从一个SAX事件流构建一个DOM文档。JDBCDOMParser仅有一个方法：createDocument()，它需要一个JDBC数据源作为参数。该方法建立一个JDBCSAXParser并可用其来解析一个实际的XmlDocumentBuilder对象。然后它释放解析对象并返回XmlDocumentBuilder对象中产生的结果。在实际编程实现中，XmlDocumentBuilder对象通过建立一个DOM文档的方法来响应JDBCSAXParser对象所产生的SAX事件。
利用面向数据库的SAX编程接口
我们已经看了一个通过面向数据库的SAX编程接口来实现DOM编程接口的实例。现在我们要看另一个使用SAX编程接口的例子。在本节中，我们将看到如何将SAX数据库编程接口与XT（一个用Java写的XSLT处理器）相集成。通过这种集成，我们可以对一个存储在数据库中的虚拟XML文档直接使用XSLT样式表。
  我们封装了实际建立一个SAX数据库源的逻辑，并用给定的XSLT样式表加以处理，在类JDBCXSLProcessor中产生一个输出文件（使用XT中的com.jclark.xsl.sax.Driver）。其中主要的方法包含三个参数：一个数据库特征文件，一个XSLT样式表文件，一个输出文件。
  如我们即将在下面看到的，我们可以通过这种方法直接生成HTML页面而不需要一个临时的XML文件来过渡。另外，我们还将看到如何将SAX数据库编程接口与XT相结合，以将一个数据库转化为一个物理的XML文档。
(要查看有关JDBCXSLProcessor.java的代码, 请点击此处.)
利用XSLT样式表从数据库直接生成HTML页面
  现在我们来看一个简单的样式表，它可以将一个高度规范的XML文档（基于数据库表）格式化显示。该数据库表将被格式化为一个HTML表。样式表createTable.xsl可以被用于处理任何具有表状结构的XML文档。该样式表使用字段的标记名作为表头标题。

(要查看关于createTable.xsl的源代码, 请点击此处.)
通过一个XSLT样式表将一个数据库转化为一个XML文档
  尽管大多数XML应用程序可以直接使用SAX或DOM编程接口，不过在某些场合我们仍需要得到一个实际的XML文档。举例而言，对于某个不使用任何一种XML API的工具，我们就需要一个物理XML文档。在这里，我们推荐一种由数据库生成XML文档的方法。在这种方法中，我们制定一个XSLT样式表来实现统一的转换。使用这样一个样式表（与一个SAX数据库编程接口相结合），我们可以生成一个与数据库表相对应的XML文档。在此，作者提供了一个样式表——identity.xsl——可用于在当前版本的XT中，以实现数据的统一转换。
identity.xsl
<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="@*|*">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>
</xsl:stylesheet>
注意虽然一个XSLT样式表可以容易地实现数据的成批转换，不过这种方法并不十分有效，因为一个完整的样式表其逻辑也许会非常复杂，从而影响样式表的建立及实际数据转换的效率。尤其当数据库表中包含大量记录时这个问题会特别明显。一个折衷的方法是写一个专门用于批量转换的应用程序。这样一个应用程序可以监听SAX事件并建立XML元素（及对应的实际数据），所产生的结果XML文档与数据库表相对应。

使用DOM数据库编程接口
在大多数场合，SAX数据库编程接口较DOM编程接口更节约系统资源。不过，某些应用程序要求随意的访问XML文档，因此就会需要提供一个类似与DOM的树形结构来表示数据库。

将DOM数据库编程接口与XQL处理器相集成
XML Query Language（XQL）是一种用于XML文档查找的语言，其语法与Xpath方式相近。在这里，我们将我们的DOM数据库解析器与GMD-IPSI’S XQL Engine相集成。通过这种集成我们可以对一个表征数据库表的XML文档执行类似于SQL的查询。
作为一个集成的例子，作者提供一个简单封装的用于查询一个基于数据库表的XML文档。类JDBCXQLProcessor建立一个类似于封装的环境，它接收客户的查询并输出结果文档。方法PprocessQueries()可以操作任何文档对象——不仅是由JDBCDOMParser所建立的对象。它读系统，执行查询请求并输出系统的查询结果。Main()方法可通过其参数建立一个JDBCInputSource对象并将其传递给JDBCDOMParser以获得一个与所给数据库表相对应的文档对象。
（要查看JDBCXQLProcessor的Java代码，请点击此处.）
另外说明一点，通过XMLWriter与JDBCDOMParser写一个数据库到XML的转换采用的是快照方式。即从JDBCDOMParser得到一个文档对象并通过XMLWriter.write(Document)写到目标文件中。
结论
在本文中，我们讨论了如何通过面向数据库的XML编程接口来表征数据库中的信息。通过这种编程接口，我们可以避免将一个数据库转换为一个实际的XML文档以及避免保持二者间的同步。我们介绍了一种通过Java实现的SAX与DOM数据库编程接口，然后介绍了一种与XT相集成的SAX编程接口。我们演示了通过这种集成由数据库表直接导出HTML页面以及将数据库表转换为XML文档。最后，我们将DOM数据库编程接口与一个XQL处理器相集成。

关于作者
Ramnivas Laddad 是一个Java的SUN认证工程师。他有一个电子工程通讯技术专业的硕士学位，有六年的软件开发经验（关于网络，图形用户界面，分布式系统等等）。在面向对象的软件系统方面有丰富经验（五年C++，两年Java）。
Ramnivas现在www.rti.com工作，致力于设计与研究ControlShell（一种用于复杂实时监控系统的模块化编程框架技术）。
原文（only the page one）

XML APIs for databases
Blend the power of XML and databases using custom SAX and DOM APIs
Summary
Most Web applications require the presentation of database-generated information. XML, because of its ability to separate content from presentation, is fast becoming an industry standard for data exchange. Most XML tools work with either the SAX or DOM API. This article presents a way to blend the power of a database with the features of XML. It also provides a simple, pure Java implementation of XML APIs for databases that works with any JDBC data source. With this approach, XML tools can treat a database as a virtual XML document. (3,000 words)
By Ramnivas Laddad

atabases and XML offer complementary functionality for storing data. Databases store data for efficient retrieval, whereas XML offers an easy information exchange that enables interoperability between applications. To take advantage of XML's features we can convert database tables into XML documents. We can then use XML tools with such documents for further processing. For example, XML documents can be presented as HTML pages with XSLT stylesheets, can be searched with XML-based query languages such as XQL, can be used as a data-exchange format, and so on. However, converting a database into an XML document is an expensive approach, one that requires not only the initial cost of conversion but also the subsequent costs of synchronizing both information sources.

Abbreviations in this article
API: application programming interface
DOM: Document Object Model

DTD: document type definition

JAXP: Java API for XML Parsing

JDBC: Java Database Connectivity

SAX: Simple API for XML

XML: Extensible Markup Language

XQL: XML Query Language

XSL: Extensible Stylesheet Language

XSLT: XSL Transformations

XT: An XSLT processor

For processing XML documents, most XML tools work with the SAX or DOM API. In this article, we'll look at a way to implement the same APIs directly over a database, enabling XML tools to treat databases as if they were XML documents. That way, we can obviate the need of converting a database.

We'll see an implementation of the SAX API for Databases that should work with any database with a JDBC driver. Next, we'll examine an implementation of the DOM API for Databases that uses the SAX API internally. To demonstrate the SAX API for Databases, we'll look at its integration with XT (an XSLT processor). We'll also see an example of how such integration can create HTML pages that incorporate an XSLT stylesheet directly from a database and how it can convert a database into an XML document. Finally, we'll look at how the DOM API for Databases integrates with an XQL processor.

In this article, I make use of existing tools rather than create new tools to illustrate the applications of the SAX and DOM APIs for Databases. I show how to leverage a number of available XML tools to work with a database. All the XML tools I mention are either available for free or free for noncommercial use (though you should, of course, check licensing agreements).

Overview of the SAX and DOM APIs
SAX is an event-based API for XML. With it, the SAX parser reports events such as the start and end of elements to the application as it walks over the document. Because the parser reports events as it visits different parts of the document, it does not have to build any internal structure. That reduces the strain on system resources, which makes the parser attractive for large documents. For XML documents received as continuous streams, an event-based API is the only choice.

The DOM API, on the other hand, follows a treelike construct. Elements have parent-child relations with other elements. With this API, the parser builds an internal structure such that an application can navigate it in a treelike fashion. DOM allows an application to have random access to the tree-structured document at the cost of increased memory usage.

XML APIs for databases: The basics
Because of a database's highly regular data-storage structure, we can map it into data-centric XML documents. For example, we can transform a database table into an XML document with a DTD of the following form:

<!ELEMENT table rows*>
<!ELEMENT rows (column1, column2, ...)>
<!ELEMENT column1 #PCDATA>
<!ELEMENT column2 #PCDATA>
....

In other words, with an XML API for databases, we can make the database look like an XML document; these APIs present the database as a virtual XML document. We are at the most basic concept of object-oriented design: it is the interface -- not the implementation -- that matters. In our situation, the tools using such an XML API need not care whether they are operating on a database table or an XML file.

A SAX or DOM parser can enable XML tools to work directly with databases

Implementing the SAX API for Databases
To implement the SAX API for Databases, we need to implement a parser that operates on a JDBC data source, iterates over each row and column, and generates appropriate SAX events while iterating. The SAX specification provides the org.xml.sax.InputSource class that models a data source representing a URL or a byte stream. To represent a database, we need a specialized form of it that can represent a table in a database. We therefore implement JDBCInputSource, which extends the org.xml.sax.InputSource class. Let's look at JDBCInputSource in more detail:

// JDBCInputSource.java
package dbxml.sax;
import java.sql.*;
import org.xml.sax.InputSource;
public class JDBCInputSource extends InputSource {
    private String _connectionURL;
    private String _userName;
    private String _passwd;
    private String _tableName;
    public JDBCInputSource(String connectionURL, String userName,
                           String passwd, String tableName) {
        super(connectionURL);
        _connectionURL = connectionURL;
        _userName = userName;
        _passwd = passwd;
        _tableName = tableName;
    }
    public String getTableName() {
        return _tableName;
    }
    public Connection getConnection() throws SQLException {
        return DriverManager.getConnection(_connectionURL, _userName, _passwd);
    }
}

In the code above, the constructor takes the information needed to connect to a database and the name of the table to be parsed. The method getConnection() connects to the database and returns a Connection object.

Next, we need to implement the SAX parser that uses JDBCInputSource to iterate over database table rows and columns and generates SAX events along the way. To simplify the code, we create an abstract ParserBase class, which implements the org.xml.sax.Parser and has responsibility only for managing various handlers. We then create our SAX parser for the JDBC source JDBCSAXParser that extends the ParserBase class.

(To view the code for ParserBase.java, click here.)

// JDBCSAXParser.java
package dbxml.sax;
import java.io.IOException;
import java.sql.*;
import org.xml.sax.*;
import org.xml.sax.helpers.AttributeListImpl;
public class JDBCSAXParser extends ParserBase {
    private static final AttributeList _stockEmptyAttributeList
        = new AttributeListImpl();
    //------------------------------------------------------------------
    // Methods from the Parser interface
    //------------------------------------------------------------------
    public void parse (InputSource source) throws SAXException, IOException {
        if (! (source instanceof JDBCInputSource)) {
            throw new SAXException("JDBCSAXParser can work only with source "
                                   + "of JDBCInputSource type");
        }
        parse((JDBCInputSource)source);
    }

String sqlQuery = getSelectorSQLStatement(source.getTableName());
PreparedStatement pstmt = connection.prepareStatement(sqlQuery);

            ResultSet rs = pstmt.executeQuery();
            parse(rs, source.getTableName());
            rs.close();

            connection.close();
        } catch (SQLException ex) {
            throw new SAXException(ex);
        }
    }

ResultSetMetaData rsmd = rs.getMetaData();
int numCols = rsmd.getColumnCount();

String tableMarker = getTableMarker(tableName);
String rowMarker = getRowMarker();

Let's examine the code in more detail. JDBCSAXParser includes several overloaded parse() methods. In the list below, the org.xml.sax.Parser interface requires implementing the parse(InputSource) and parse(String) methods. The other parse() methods simplify the code and allow derived classes to override them to modify the parser behavior.

The parse(InputSource) method calls the parse(JDBCInputSource) method if the argument is of type JDBCInputSource; otherwise, it throws a SAXException as it cannot deal with it.

The parse(String) method throws a SAXException as the information supplied is not sufficient to access the database.

The parse(JDBCInputSource) method gets a Connection object from the input source and executes a query to obtain a ResultSet object. It then calls parse(ResultSet) with this object.

The parse(ResultSet, String) method performs the core parsing logic. It iterates over each row in the result set and each column in the rows. The row iteration loop is surrounded by calls to startElement() and endElement() with a table marker as the element-name argument. Similarly, each column iteration loop is surrounded by calls to startElement() and endElement() with a row marker as the element-name argument. In both cases an empty attribute list passes as the second argument to the startElement() methods. On each visit to a column, the generateSAXEventForColumn() method is called with column-name and column-value arguments. The value of a column is accessed by the getString() method on the result-set object, as we need a string representation of the column data to be notified in the characters() SAX event.

The convenience method parse(String, String, String, String) simply creates a JDBCInputSource object with the arguments passed to it and then calls the parse(JDBCInputSource) method with it.
The protected methods of JDBCSAXParser offer some customization possibilities through overriding:

The generateSAXEventForColumn() method generates events for column data. A null value for a column in a database has a different meaning from a column with an empty string. We capture the difference by not firing any events for a column that has a null value. Another choice for representing a null value in a database is to use a binary attribute like isNull. With this option, a true value will be set for null data; otherwise it will be false.

The getTableMarker(), getRowMarker(), and getColumnMarker() methods return reasonable defaults for table, row, and column markers. Derived classes may override these to provide custom markups.

The getSelectorSQLStatement() method returns a "select * from <tableName>" string. Derived classes can override it to provide a different select query string to offer database-level filtering.
The JDBCSAXUtils convenience class provides two methods for creating a JDBCInputSource: it can be done from either a property file or a Property object. There's no need to supply a long list of parameters that describe a database to an application that uses either the SAX or DOM APIs for database. The class expects the user to supply a property file that contains entries for a database URL, a user name and password to connect to the database, a JDBC driver to establish a connection, and a table name. The code below demonstrates a typical property file:

# portfolio.prop
# JDBCSAXSource property file
URL=jdbc:odbc:db1
user=jw
password=jw-passwd
table=portfolio
driver=sun.jdbc.odbc.JdbcOdbcDriver

The story so far ...
We now have a simple parser that can generate appropriate SAX events for the information in a database table. It takes care of null data and offers some marker customization. While such functionality may be sufficient for some applications, the complete solution will consider additional functionality because:

The parser does not incorporate relational information. That can be solved by using a XPointer/XLink to set the reference to a foreign key in a table.

A text column in a database may contain marked-up data. A SAX parser for databases should parse those data as well and generate appropriate SAX events. If such functionality is important for an application, it could override generateSAXEventForColumn() and parse the content of the column and generate additional SAX events.

In databases, a table contains an unordered list of columns; the order in which columns are stored is not important. An XML DTD, on the other hand, does not have a way to describe an unordered collection of child elements.
We can deal with this problem in a few ways. If the task is to convert a database into another XML document, say an HTML page, the XSLT stylesheet written for that purpose can create output in the correct order. We could also override the getSelectorSQLStatement() method to supply an explicit list of columns in the correct order.

It is desirable to present only a selected part of a table as a document based on some query. While XML tools can do the filtering, databases are better at it. The getSelectorSQLStatement() method can be overridden to return the appropriate select query string.

The parser uses the getString() method on the result-set object to obtain the string representation of the value in a column. This works fine for columns with text, numbers, and so on, but it does not work well with binary data. While binary data can be represented as text, that may not be suitable for certain tasks. The parser also does not deal with user-defined types available with SQL3/JDBC 2.0.
We can solve both problems by overriding the generateSAXEventForColumn() method and providing a suitable implementation.
对xml感兴趣的朋友欢迎访www.xml.org.cn www.cnxml.org.cn。欢迎交流！


	W 3 C h i n a ( since 2003 ) 旗下站点苏ICP备05006046号《全国人大常委会关于维护互联网安全的决定》《计算机信息网络国际联网安全保护管理办法》	203.125ms