|
An index in the ArcSDE API is used to enable searches on XML columns, and a
tag represents an element.
An
ArcSDE XML column can have two types of indexes to support different
types of searches.
Text Index: A text index associated with the XML
document supports full text searches against all content in the XML document.
The XML tags in the document are transparent to the text index (in other words, the text
index is created in the database on the content with all XML markup removed), thus allowing
quick and efficient text searches on the document.
A text index is created by default on all element/attribute values. ArcSDE
parses the XML document for values of all elements and/or attributes and
stores them in an SDE_XML_DOC table that supports full doc text queries.
For example: In the following XML document, a text index will search only
for values of elements such as "New Mexico", "4",
and attribute values such as "YES" for the LISTED attribute
in MARK tag.
<gn>
<coverage>New Mexico</coverage>
<suptheme>4</suptheme>
<MARK NUMBER="1" LISTED="yes" TYPE="natural"/>
</gn> |
BTREE index: Full text indexes, apart from being bigger and slower than BTREE indexes, are updated outside the transaction scope, which is inefficient when
used to index feature property sets. Further, full text indexes are not available on all databases. To overcome the limitation of full text indexes,
the tag-indexing model has been extended in ArcSDE 9.2 to include Numbers, Strings, and Text. These would now be indexed using a BTREE, while text uses the full
text indexer. Therefore, for coded property sets, the tags would be either numbers of strings and not full text. This implementation supports simple strings or
single words using indexing technology that exists in every database.
In order to implement BTREE indexes, a new column STRING_TAG has been added to the existing sde_xml_idx table. Since. Since the value for for
this column will be small, its size is set to 256. At the time of inserting an XMl document, the DOUBLE_TAG, STRING_TAG, and TEXT_TAG columns will be populated
with appropriate values based on tag type.
Thus, we have a new index type
SE_XML_INDEX_VARCHAR_TYPE.
In ArcSDE 9.2, by default only string will be indexed using a BTREE and fulltext index will be created on text tag values. The full text index can be activated
by calling SE_xmlcolumninfo_set_fulltext_indexing() on text tag values.
In the following XML document, we can consider "metadata/idCitation/resTitle"
node for full text index and "metadata/idCitation/citRespParty/rpOrgName" node
for BTREE index.
<?xml version = \"1.0\" ?>
<metadata>
<Esri MetaID=\"1000\">
<idCitation>
<resTitle>
Migratory Birds and Spread of West Nile Virus in the Asia.
</resTitle>
<citRespParty>
<rpOrgName>ACASIAN</rpOrgName>
</citRespParty>
</idCitation>
</Esri>\
</metadata> |
Example 1
Find all documents with an "metadata/idCitation/resTitle" node equal to ‘West
Nile’
Xpath: //metadata/idCitation[resTitle = “West Nile”]
SQL:
SELECT sde_xml_id
FROM sde_xml_idx<xmlcolumn id> x INNER JOIN sde.sde_xml_index_tags t
ON x.tag_id = t.tag_id
WHERE t.tag_name = '//metadata/idCitation/resTitle' AND
CONTAINS (x.text_tag, 'West Nile') > 0;
Example 2
Find all documents with an metadata/idCitation/citRespParty/rpOrgName node equal
to ‘ACASIAN’.
Xpath: //metadata/idCitation/citRespParty[rpOrgName = “ACASIAN”]
SQL:
SELECT sde_xml_id
FROM sde_xml_idx<xmlcolumn id> x INNER JOIN sde.sde_xml_index_tags t
ON x.tag_id = t.tag_id
WHERE t.tag_name =
'//metadata/idCitation/citRespParty/rpOrgName' AND
x.string_tag = 'ACASIAN';
Note that we will use different predicate for searching text and string. For
text, we will continue to use CONTAINS function and for string we will use usual
text operator ‘=’ or ‘!=’.
All indexes associated with an ArcSDE XML column are CONTEXT indexes, which are designed for indexing articles, reports, and similar documents such as metadata
describing GIS resources. It is important to manage these indexes appropriately as new documents are published to the XML column to maintain good search performance. This
involves synchronizing and optimizing the text indexes.
For information indexed as text, indexing is accomplished using the
database’s full text indexing capabilities. The words that are included in the
database’s text index will depend on the indexing rules that are used. Each
RDBMS may use slightly different rules to index the same text. Typically, words
that have meaning, such as river, pollution, and population will be
included in the text index, but words such as and, the, and in won’t be
included. Different rules are used to index text written in different languages.
Both the database and its text indexing components must be properly configured
for the language used to author text in the published documents before they can
be indexed and searched correctly. See the ArcSDE Database Configuration and
Tuning
Guides for each RDBMS to learn more about their individual indexing
capabilities.
The following code snippets show how XML index and XML tags are handled using the ArcSDE C and Java APIs.
LONG rc, in_types[5], i;
SE_COLUMN_DEF column_defs[num_columns];
index_name[SE_QUALIFIED_XML_INDEX_LEN], *in_names[5], *config_keyword[10];
SE_XMLINDEXINFO xml_index;
SE_XMLTAGINFO tag;
SE_XMLCOLUMNINFO xmlcolumn;
/* Create an index definition for the column */
rc = SE_xmlindexinfo_create (&xml_index);
sprintf(index_name, "xml_index");
rc = SE_xmlindexinfo_set_name (xml_index, index_name);
rc = SE_xmlindexinfo_set_type (xml_index, SE_XML_INDEX_DEFINITION);
in_names[0] = "/metadata/demo/gn/suptheme";
in_names[1] = "/metadata/demo/gn/coverage";
in_names[2] = "/metadata/demo/gn/featured";
in_names[3] = "/metadata/demo@MetaID";
in_types[0] = SE_XML_INDEX_DOUBLE_TYPE;
in_types[1] = SE_XML_INDEX_STRING_TYPE;
in_types[2] = SE_XML_INDEX_STRING_TYPE;
in_types[3] = SE_XML_INDEX_DOUBLE_TYPE;
rc = SE_xmltaginfo_create (&tag);
for (i = 0; i < 4; i++){
rc = SE_xmltaginfo_set_name (tag, in_names[i]);
rc = SE_xmltaginfo_set_data_type (tag, in_types[i]);
rc = SE_xmlindexinfo_add_tag (xml_index, tag);
}
/* Add Index to Column */
rc = SE_xmlcolumninfo_set_index (xmlcolumn, xml_index);
int[] in_types = new int[5];
String[] in_names = new String[5];
//Create XML Index
SeXmlIndex index = new SeXmlIndex(conn);
index.setName("xml_index");
index.setDescription("index description");
index.setType(SeDefs.SE_XML_INDEX_DEFINITION);
in_names[0] = "/metadata/demo/gn/suptheme";
in_names[1] = "/metadata/demo/gn/coverage";
in_names[2] = "/metadata/demo/gn/featured";
in_names[3] = "/metadata/demo@MetaID";
in_types[0] = SeDefs.SE_XML_INDEX_DOUBLE_TYPE;
in_types[1] = SeDefs.SE_XML_INDEX_STRING_TYPE;
in_types[2] = SeDefs.SE_XML_INDEX_STRING_TYPE;
in_types[3] = SeDefs.SE_XML_INDEX_DOUBLE_TYPE;
//Create tags
SeXmlTag tag = new SeXmlTag();
for (int i = 0; i <4; i++)<br> {
tag.setName(in_names[i]);
tag.setDataType(in_types[i]);
index.addTag(tag);
}
//Add Index to Column
SeXmlColumn xmlCol = ...;
xmlCol.setIndex(index);
|
LONG rc, temp_long, in_aliases[5], i, num_tags, in_types[5];
SE_XMLINDEXINFO xml_index;
CHAR temp_str[256], *in_descs[5], *in_names[5];
SE_XMLTAGINFO in_tag, *out_tags;
BOOL in_exclusions[5];
rc = SE_xmlindexinfo_create (&xml_index);
rc = SE_xmlindexinfo_get_name (xml_index, temp_str);
rc = SE_xmlindexinfo_get_type (xml_index, &temp_long);
rc = SE_xmlindexinfo_get_description(xml_index, temp_str);
rc = SE_xmlindexinfo_get_tags (xml_index, &num_tags, &out_tags);
for (i = 0; i < num_tags; i++)
{
rc = SE_xmltaginfo_get_name(out_tags[i],
temp_str);
rc = SE_xmltaginfo_get_data_type (out_tags[i], &temp_long);
rc = SE_xmltaginfo_get_alias (out_tags[i], &temp_long);
rc = SE_xmltaginfo_get_description (out_tags[i], temp_str);
if(SE_xmltaginfo_is_excluded (out_tags[i]))
printf ("Tag %d is excluded\n", i);
}
SeXmlColumn col = ...;
if(col.hasIndex())
{
System.out.print(" Index Name: " + index.getName()
+
"\n Type: " + index.getType() +
"\n Tags: ");
SeXmlTag[] tags = index.getTags();
System.out.print(tags.length + "\n");
if(tags != null)
{
for(int i = 0; i < tags.length; i++)
{
String isExcluded = null;
if(tag.isExcluded())
isExcluded = "Yes\n";
else
isExcluded = "No\n";
System.out.println(" Name: " + tag.getName()
+
"\n Description: " + tag.getDescription()
+
"\n Data Type: " + tag.getDataType()
+
"\n Alias: " + tag.getAlias() +
"\n Is Excluded? " + isExcluded);
}
}
}
|
LONG rc;
SE_XMLINDEXINFO xml_index;
rc = SE_xmlindexinfo_create (&xml_index);
rc = SE_xmlindexinfo_delete_tags (xml_index);
rc = SE_xmlindexinfo_free(xml_index);
rc = SE_xmlcolumn_delete(handle, tableName, xmlcolumn);
SeXmlColumn col = ...;
SeIndex index = col.getIndex();
index.deleteTags();
xmlCol.delete(tableName, xmlColumnName);
|
|