Home    |    Concepts   |   API   |   Samples
Concepts > XML > API Entities
XML Index and Tags
An index in the ArcSDE API is used to enable searches on XML columns, and a tag represents an element.

An ArcSDE XML column can have two types of indexes to support different types of searches.

  • Text Index: A text index associated with the XML document supports full text searches against all content in the XML document. The XML tags in the document are transparent to the text index (in other words, the text index is created in the database on the content with all XML markup removed), thus allowing quick and efficient text searches on the document.

    A text index is created by default on all element/attribute values. ArcSDE parses the XML document for values of all elements and/or attributes and stores them in an SDE_XML_DOC table that supports full doc text queries. For example: In the following XML document, a text index will search only for values of elements such as "New Mexico", "4", and attribute values such as "YES" for the LISTED attribute in MARK tag.

    <gn>
        <coverage>New Mexico</coverage>
        <suptheme>4</suptheme>
        <MARK NUMBER="1" LISTED="yes" TYPE="natural"/>
    </gn>

  • BTREE index: Full text indexes, apart from being bigger and slower than BTREE indexes, are updated outside the transaction scope, which is inefficient when used to index feature property sets. Further, full text indexes are not available on all databases. To overcome the limitation of full text indexes, the tag-indexing model has been extended in ArcSDE 9.2 to include Numbers, Strings, and Text. These would now be indexed using a BTREE, while text uses the full text indexer. Therefore, for coded property sets, the tags would be either numbers of strings and not full text. This implementation supports simple strings or single words using indexing technology that exists in every database.

    In order to implement BTREE indexes, a new column STRING_TAG has been added to the existing sde_xml_idx table. Since. Since the value for for this column will be small, its size is set to 256. At the time of inserting an XMl document, the DOUBLE_TAG, STRING_TAG, and TEXT_TAG columns will be populated with appropriate values based on tag type.

    Thus, we have a new index type

    SE_XML_INDEX_VARCHAR_TYPE.

    In ArcSDE 9.2, by default only string will be indexed using a BTREE and fulltext index will be created on text tag values. The full text index can be activated by calling SE_xmlcolumninfo_set_fulltext_indexing() on text tag values.

In the following XML document, we can consider "metadata/idCitation/resTitle" node for full text index and "metadata/idCitation/citRespParty/rpOrgName" node for BTREE index.
 
<?xml version = \"1.0\" ?>
<metadata>
  <Esri MetaID=\"1000\">
    <idCitation>
      <resTitle>
        Migratory Birds and Spread of West Nile Virus in the Asia.
      </resTitle>
      <citRespParty>
        <rpOrgName>ACASIAN</rpOrgName>
      </citRespParty>
    </idCitation>
  </Esri>\
</metadata>

Example 1

Find all documents with an "metadata/idCitation/resTitle" node equal to ‘West Nile’

Xpath: //metadata/idCitation[resTitle = “West Nile”]

SQL:

SELECT sde_xml_id
FROM sde_xml_idx<xmlcolumn id> x INNER JOIN sde.sde_xml_index_tags t
ON x.tag_id = t.tag_id
WHERE t.tag_name = '//metadata/idCitation/resTitle' AND
CONTAINS (x.text_tag, 'West Nile') > 0;

Example 2

Find all documents with an metadata/idCitation/citRespParty/rpOrgName node equal to ‘ACASIAN’.

Xpath: //metadata/idCitation/citRespParty[rpOrgName = “ACASIAN”]

SQL:

SELECT sde_xml_id
FROM sde_xml_idx<xmlcolumn id> x INNER JOIN sde.sde_xml_index_tags t
ON x.tag_id = t.tag_id
WHERE t.tag_name =
'//metadata/idCitation/citRespParty/rpOrgName' AND
x.string_tag = 'ACASIAN';

Note that we will use different predicate for searching text and string. For text, we will continue to use CONTAINS function and for string we will use usual text operator ‘=’ or ‘!=’.

All indexes associated with an ArcSDE XML column are CONTEXT indexes, which are designed for indexing articles, reports, and similar documents such as metadata describing GIS resources. It is important to manage these indexes appropriately as new documents are published to the XML column to maintain good search performance. This involves synchronizing and optimizing the text indexes.

For information indexed as text, indexing is accomplished using the database’s full text indexing capabilities. The words that are included in the database’s text index will depend on the indexing rules that are used. Each RDBMS may use slightly different rules to index the same text. Typically, words that have meaning, such as river, pollution, and population will be included in the text index, but words such as and, the, and in won’t be included. Different rules are used to index text written in different languages. Both the database and its text indexing components must be properly configured for the language used to author text in the published documents before they can be indexed and searched correctly. See the ArcSDE Database Configuration and Tuning Guides for each RDBMS to learn more about their individual indexing capabilities.

The following code snippets show how XML index and XML tags are handled using the ArcSDE C and Java APIs.

Top

Top

Send your comments to: Site Administrator | Terms of Use | PRIVACY | Copyright © ESRI.