ArcSDE Compressed Binary

ArcSDE uses a compressed binary format to store geometry in either an Oracle binary LONG RAW or BLOB data type. ArcSDE stores compressed binary spatial data in the POINTS columns of the feature table. Compressing the geometry offers efficient storage and retrieval of spatial data by reducing the size of the geometry. Compressed binary store in the LONG RAW data type is the default storage format for ArcSDE feature classes.

 

After the client verifies the geometry, it compresses it and sends it to the server, where it is stored in compressed binary format in a feature table. Compressing the geometry on the client offloads the task from the ArcSDE server and reduces the transmission time to send the geometry to the ArcSDE server. Storing compressed geometry data reduces the space required to store data as much as 40%.

 

Oracle reads blocks in order to read rows from a table. Compressing the data offers a better row/block ratio over non-compressed formats. Also, Oracle stores the first 4K of binary data in a LONG RAW column in-line, as part of the row in a RDBMS block. In-line LONG RAW data does not require a second hit to the disk to find the remaining portion of the data. By compressing the data, more rows within the 5K to 7K ranges have a better chance of being in-line, reducing read time in the RDBMS.

 

Compressed Binary Metadata Tables

A compressed binary feature class comprises three tables: the business table, the feature table and the spatial index table.

 

The business table contains attributes and a spatial column. The spatial column is a key to the feature and spatial index tables.

 

The relationship between the business table and the feature table is managed through the spatial column and the FID column. This key, which is maintained by ArcSDE, is unique.

NAME

DATA TYPE

NULL?

fid NUMBER(38) NOT NULL
numofpts NUMBER(38) NOT NULL
entity NUMBER(38) NOT NULL
eminx NUMBER(64) NOT NULL
eminy NUMBER(64) NOT NULL
emaxx NUMBER(64) NOT NULL
emaxy NUMBER(64) NOT NULL
eminz NUMBER(64) NULL
emaxz NUMBER(64) NULL
min_measure NUMBER(64) NULL
max_measure NUMBER(64) NULL
area NUMBER(64) NOT NULL
len NUMBER(64) NOT NULL
points LONG RAW NULL
anno_text VARCHAR2(256) NULL

Feature table schema

 

The feature table stores the geometry, annotation and CAD in the LONG RAW POINTS column. The internal Oracle datatype is listed in the table above. The ArcSDE datatype for each column is defined below.

 

·     fid (SE_INTEGER_TYPE)contains the unique ID that joins the feature table to the business table

·    entity (SE_INTEGER_TYPE)the type of geometric feature stored in the shape column (e.g., point, linestring)

·     numofpts (SE_INTEGER_TYPE)the number of points defining the shape

·     eminx, eminy, emaxx, emaxy (SE_FLOAT_TYPE)the envelope of the shape

·    eminz (SE_FLOAT_TYPE)the minimum Z value in the shape

·     emaxz (SE_FLOAT_TYPE)the maximum Z value in the shape

·    min_measure (SE_FLOAT_TYPE)the minimum measure value in the shape

·    max_measure (SE_FLOAT_TYPE)the maximum measure value in the shape

·    area (SE_FLOAT_TYPE)the area of the shape

·     len (SE_FLOAT_TYPE)the length or perimeter of the shape

·    points (SE_SHAPE_TYPE)contains the byte stream of point coordinates that define the shapes geometry

·     anno_text (SE_STRING_TYPE)contains the feature annotation string.

NAME DATA TYPE NULL?
sp_fid NUMBER(38) NOT NULL
gx NUMBER(38) NOT NULL
gy NUMBER(38) NOT NULL
eminx NUMBER(64) NOT NULL
eminy NUMBER(64) NOT NULL
emaxx NUMBER(64) NOT NULL
emaxy NUMBER(64) NOT NULL

Spatial index table schema

The spatial index table defines grid range and extent of all geometry in an ArcSDE feature class.

"   sp_fid (NUMBER)contains the unique ID that joins the feature table to the business table

"   gx/gy  (NUMBER)defines the features extent in grid cells)

"   eminx/eminy/emaxx/emaxy  (NUMBER)Extent of feature in system units

 

In this example the FEATURE-ID column from the WELLS business table references features from the feature and spatial index tables:

WELL_ID DEPTH ACTIVE FEATURE-ID
1 30029 Yes 101
2 13939 No 102
3 92891 No 103
&ldots; &ldots;   &ldots;
 
FID AREA LEN EMINX,EMINY,&ldots; POINTS
101       <compressed feature>
102       <compressed feature>
103       <compressed feature>
&ldots;       &ldots;
 
SP_FID GX GY {EMINX,EMINY,EMAXX,EMAXY}
101 70 100  
102 70 100  
103 71 100  
&ldots;      

                                      A business/feature/spatial index key reference.

 

The spatial index grid system

The spatial index grid system is a two-dimensional index that spans a feature class, like the reference grid you might find on a common road map. You may assign a feature class grid system one, two or three grid levels, each with its own distinct cell size. The first, or lowest, grid level has the smallest cell size. If the feature class needs only one index level, set the second and third level grid cell sizes to zero. Otherwise, set the cell size of each level at least three times larger than the prior level.

 

The spatial index table, (S<feature class_id>), has seven integer columns that store the grid cell values, feature envelopes and corresponding feature IDs. Adding a feature to a feature class, adds one or more grid cells to the spatial index table. The number of records added to the spatial index table depends on the number of grid cells the feature spans.

 

The spatial index table contains two indexes. One index is on the SP_FID column which contains the feature ID. The other is a composite index that includes all of the columns of the spatial index table. Since all of the columns of the spatial index table are indexed, the values of the table are read from the leaf blocks of the index and not the table data blocks. The result is less I/O and better performance.

 

Building the spatial index

Every time a feature class is added to a business table, a persistent spatial index is built for it.

The ArcSDE server manages the spatial index throughout the life of the feature class. As features are inserted, updated, or deleted, the spatial index is automatically updated.

 

A load-only mode is available to disable spatial index management until after loading is complete. This boosts loading performance substantially and is imperative for bulk loading efforts (no queries are allowed in the load-only mode, except native SQL-based queries).

Once loading has been completed, the spatial index is enabled by returning it to normal I/O mode.  The conversion from normal I/O mode to load-only I/O mode reconstructs the spatial index.

 

Inserting, updating or deleting a feature updates the spatial index when the feature class is in normal I/O mode.

 

ArcSDE overlays the extent of each feature onto the lowest grid level to obtain the number of grid cells. If the feature exceeds four cells, ArcSDE promotes the feature to the next highest grid level, if one exists. ArcSDE will continue to promote the feature until it fits within four cells or less or until the highest defined grid level is reached. On the highest defined grid level, geometries can be indexed by more than four grid cells.

 

ArcSDE adds the features grid cells to the spatial index table with their corresponding shape ID and feature envelope. The grid level is encoded with each grid cell.

 

In the example below, the feature class has two grid levels. Area shape 101 is located in grid cell 4 on level 1. A record is added to the spatial index table because the feature resides within four grid cells (in this case it is one). Area feature 102s envelope is located in cells 1 through 8 on level 1. Because the features envelope resides in more than four grid cells, the feature is promoted to level 2 where its envelope fits within two grid cells. Feature 102 is indexed at level 2, and two records are added to the spatial index table.

 

Figure 4.4  Shape 101 is indexed on grid level 1, while shape 102 is indexed on grid level 2 where it is in only two grid cells.

 

Spatial queries and the spatial index

Spatial queries, like finding all the lakes within a state boundary, uses the spatial index. Spatial index is used unless the search order has been set to SE_ATTRIBUTE_FIRST in the SE_stream_set_spatial_constraints function. When the search order is set to SE_ATTRIBUTE_FIRST, ArcSDE ignores the spatial index and the criteria of the attribute where clause determines which records of the feature class the query returns.

 

Whenever the spatial index is used, the ArcSDE service generally uses the following decision process when performing the query:

 

1.     Define the envelope. The envelope could be defined directly by the application such as the extent defined by the ArcMap zoom in tool. Alternatively the envelope may be defined as the envelope of another feature.

2.        Join the spatial index table with the feature table and return all features whose grid cells  intersect the envelope.

3.        Join the feature table with the business table and apply the criteria of the attribute where clause to further refining features returned.

 

Grid cell size impacts the size of the spatial index table. Setting up the spatial index means balancing the cell sizessmaller cell sizes mean more cells per shape, which requires more entries in the spatial index table.

 

Guidelines for tuning the spatial index

Because client applications and spatial data profiles vary from one system to another, no single solution fits all. Experienced users of ArcSDE often experiment with the spatial index, trying different cell sizes and different grid level configurations.

 

The sdelayer command has several operations that can help you optimize the spatial index by  changing the grid cell sizes and adding new grid levels with the alter operation. The stats and si_stats operations, profile your spatial data and current spatial index.

 

The following guidelines can help improve the performance of spatial queries.

 

·        Consider how many grid levels are needed, and remember the ArcSDE server scans the spatial index table once per grid level. Sometimes a single grid level is the best solution for a feature class, despite the notion of distributing geometries evenly across many grid levels to minimize the spatial index entries.

·         Use one grid level for pure point type feature class, and consider making the cell sizes large. Spatial queries generally process point geometries faster than other shape types.

·         Monitor the spatial index. Tuning a spatial index is difficult if the data changes frequently. Tuning depends upon the structure of the spatial data. Periodically assess the spatial index as your spatial data changes.

·         Base the spatial index on the application. Match the spatial index grid cell sizes to the extent of the application window. By doing so, the application is probably viewing exact entries in the spatial index table. This helps to size the spatial index table suitably and reduces the amount of processing because fewer candidate feature IDs must be evaluated against the feature table (see Spatial queries and the spatial index above).

·         Design the feature class around spatial data categories like type, shape size, and distribution. Sometimes a carefully designed feature class, using these categories, can substantially boost the performance of spatial queries.

 

Displaying spatial index statistics

The sdelayer commands spatial index statistics operation, si_stats, can help you determine optimum spatial index grid sizes. Optimum grid cell sizes depend on the spatial size of all features, the variation in spatial feature size, and the types of searches to be performed on the map feature class. Below is a sample output generated by si_stats:

 

$ sdelayer -o si_stats -l victoria,parcels -u av -p mo -i esri_40

 

SDE Feature class Administration Utility Tue Oct 8 17:48:22 1996
--------------------------------------------------------
Feature class 1 Spatial Index Statistics:
Level 1,   Grid Size 200 (Meters)
|-------------------------------------------------------|
| Grid Records: 978341                                  |
| Shape Records: 627392                                 |
| Grids/Shape Ratio: 1.56                               |
| Avg. Shapes per Grid: 18.26                           |
| Max. Shapes per Grid: 166                             |
| % of Shapes Wholly Inside 1 Grid: 59.71               |
|-------------------------------------------------------|
|             Spatial Index Record Count By Group       |
| Grids:     <=4    >4  >10  >25  >50  >100  >250  >500 |
|---------- ------ --- ---- ---- ---- ----- ----- ----- |
| Shapes:   627392   0    0    0    0     0     0     0 |
| % Total:     100%  0%   0%   0%   0%    0%    0%    0%|
|-------------------------------------------------------|

Level 2,   Grid Size 1600 (Meters)
|-------------------------------------------------------|
| Grid Records: 70532                                   |
| Shape Records: 36434                                  |
| Grids/Shape Ratio: 1.94                               |
| Avg. Shapes per Grid: 18.21                           |
| Max. Shapes per Grid: 82                              |
| % of Shapes Wholly Inside 1 Grid: 45.35               |
|-------------------------------------------------------|
|             Spatial Index Record Count By Group       |
| Grids:     <=4    >4  >10  >25  >50  >100  >250  >500 |
|---------- ------ --- ---- ---- ---- ----- ----- ----- |
| shapes:   35682  752   87   17    3     0     0     0 |
| % Total:     97%   2%   0%   0%   0%    0%    0%    0%|
|-------------------------------------------------------|

 

As the output shows, for each defined spatial index level in the map feature class definition, the following values and statistics are printed:

 

"  Grid level and cell size.

"  Total spatial index records for the current grid level.

"  Total geometries stored for the current grid level.

"  Ratio of spatial index records per shape.

"  Shape counts and percentages by group that indicate how geometries are grouped within the spatial index at this grid level. The column headings have the following meaning (where N is the number of grid cells):

<=N      Number of geometries and percentage of total geometries that fall within <= N grid cells
>N         Number of geometries and percentage of total geometries that fall within > N grid cells

Notice that the > groupings include count values from the next group. For instance, the >4 group count represents the number of geometries that require more than four grid records as well as more than ten, and so on.

"  Average number of geometries per grid.

"  Maximum number of geometries per grid. This is the maximum number of geometries indexed into a single grid.

"  Percentage of geometries wholly inside one grid. This is the percentage of all geometries wholly contained by one grid record.

 

The output sample shows spatial index statistics for a map feature class that uses two grid levels: one that specifies a grid size of 200 meters, the other a grid size of 1600 meters. When a geometry requires more than four spatial index records, it is automatically promoted to the next grid level, if one is defined. That is, in no case will a shape generate more than four grid records, if more than one grid level exists. If a higher grid level doesnt exist, then a geometry can have more than four grid records.

 

In the example above, 627392 features are indexed through grid level 1. Because the system automatically promotes geometries that need more than four spatial index records to the next defined grid level, all 627392 geometries for grid level 1 are indexed with four grid records or less. Grid level 2 is the last defined grid level, so geometries indexed at this level are allowed to be indexed with more than four grid records. At grid level 2, there are a total of 36434 geometries and 70532 spatial index records. 35682 geometries are indexed with four grid records or fewer, 752 geometries indexed with more than four grid records, 87 geometries indexed with more than 10, 17 geometries with more than 25, and three geometries with more than 50 grid records. Percentage values below each column show how the geometries are dispersed through the eight groups.

 

Creating tables with Compressed Binary Schema

The default geometry storage for ArcSDE feature classes is ArcSDE compressed binary.  If this is the way you want to store your geometry all the time, then there are no adjustments you need to make. 

 

If you wish to have a mix of geometry types in your schema, populate the DBTUNE table with keywords that have a specific geometry storage types and assign the keywords to the application programs creating the feature class.

 

The GEOMETRY_STORAGE parameter in the DBTUNE table defines the storage method to be used when loading data using ArcSDE. The GEOMETRY_STORAGE value for the default LONG RAW compressed binary is SDEBINARY.  In the following example, we define the PARCELS keyword in a dbtune file to contain a geometry storage definition of SDEBINARY. 

 

##PARCELS

GEOMETRY_STORAGE        SDEBINARY

<other parameters>

END

You can add additional parameters for more control on how your data is stored. 

##PARCELS

 

GEOMETRY_STORAGE "SDEBINARY"

F_STORAGE        "TABLESPACE ftabsp STORAGE (FREELISTS 4 INITIAL 500M

                 NEXT 100M MINEXTENTS 1 MAXEXTENTS 50 PCTINCREASE 0)

F_INDEX_1        "TABLESPACE findex_tsp STORAGE (FREELISTS 4 INITAIL 100M

                 NEXT 25M MINEXTENTS 1 MAXEXTENTS 50 PCTINCREASE 0)

S_STORAGE        "TABLESPACE stabsp STORAGE (FREELISTS 4 INITIAL 300M

                 NEXT 50M MINEXTENTS 1 MAXEXTENTS 50 PCTINCREASE 0)

S_INDEX_1        "TABLESPACE sindex_tsp STORAGE (FREELISTS 4 INITIAL 100M
                 NEXT 50M MINEXTENTS 1 MAXEXTENTS 50 PCTINCREASE 0)

S_INDEX_2        "TABLESPACE sindex_tsp STORAGE (FREELISTS 4 INITIAL 100M
                 NEXT 50M MINEXTENTS 1 MAXEXTENTS 50 PCTINCREASE 0)

END

In this example, the storage schema is compressed binary, it defines the storage and location for the feature and spatial index tables. These and several other configuration parameters are specified in the dbtune file and defined in Appendix B of this document.

 

Referential integrity

Maintaining the referential integrity between the business and feature table is important. You should not edit the records of either the feature table or the spatial index table. Several indexes and constraints have been added to the business, feature and spatial index table to insure that the referential integrity is maintained. However, these indexes and constraints are removed when the feature class is converted to the load only I/O mode, a state that allows data to be quickly inserted into the feature class.

 

When the feature class is placed back into normal I/O mode, the state that allows users to query the feature class through an ArcSDE client, the indexes are created and the constraints are enabled. The conversion to normal I/O mode will fail if the unique indexes cannot be built on the business tables spatial column, or the feature tables FID column. It will also fail if a value exists in the business tables spatial column that is not in the feature tables FID column. In this case a reference to the offending business table record is loaded into the SDE_EXCEPTIONS table.