Getting started

ArcSDE compressed binary representation

The ArcSDE compressed binary representation of geometry is used to store binary geometry. This binary representation requires that an offset and scale be applied to the coordinates of a geometric object - the resulting integer coordinates are then encoded using the delta from the previous coordinate. Appended to the geometric object is an optional CAD and ANNO object.

Coordinate values

Internally, all ArcSDE coordinates are 32-bit positive integers between 0 and 2147483647. This format provides better data accuracy, data integrity, and processing speed than real numbers. Developers should be aware of the internal integer representation, because it is possible to attempt to store a number that is too large in a layer. In that case, the ArcSDE software returns the error SE_COORD_OUT_OF_BOUNDS. Developers never need to work directly with the integer values.

Because real-world coordinates are often neither positive nor integer, ArcSDE data requires an offset distance (a false origin) to ensure numbers are positive and a minimum resolution multiplier (called the scale) to convert real numbers to integers. Offset distances are specified in the same units as the data. The scale can be any positive value up to 2147483645.

Logical representation of ArcSDE feature geometry

This section describes the logical view of how an ArcSDE feature’s geometry is represented in a binary stream. There are three issues to present: coordinate ordering, multipart delineation, and point compression.

Coordinate ordering

An ArcSDE feature’s geometry is represented by one or more coordinates. The coordinates consist of, at a minimum, an x,y pair. A feature might also have z (zed) or measure (m) values associated with each x,y pair. Each of these values, x, y, z, and m, is represented internally as 32-bit integers. The order in which these coordinates are stored in a binary stream is x/y, x/y, ..., x/y, z, z, ..., z, m, m, ..., m (again, with the z- and m-values being optional). A one-to-one correspondence exists between the z- or m-values and the x,y pairs. In other words, for each z-coordinate or measure value present in the feature geometry, an x,y pair exists.

Multipart delineation

An ArcSDE feature may have one or more geometric parts (a single-part or multipart feature). Each part is delineated by a separator coordinate within the binary stream which represents a feature’s geometry. The separator coordinate has a predefined value. Beginning with a feature’s second part, the separator is the first coordinate of the part’s ordered coordinate list. The coordinate list of a multipart feature is stored in a binary stream as x/y, x/y, ..., x/y, <separator>, x/y, x/y, ..., x/y, z, z, ..., z, <separator>, z, z, ..., z, m, m, ..., m, <separator>, m, m, ..., m (again, the z- and m-values are optional).

Point compression

Within the binary stream, each of the x/y, z, and measure values are compressed in a byte-order independent manner. The compression of feature coordinates is done in two steps. First, all values are converted to a relative-offset scheme, then each relative-offset value is packed into the minimum number of bytes required to represent the value.

Physical representation of ArcSDE feature geometry

This section describes the physical view of how an ArcSDE feature’s geometry is stored in a binary stream. There are three issues to present: separators, point compression, and the binary layout.

Part separators

The physical representation of the separators which delineate the parts of a feature is an x-value of negative one (-1), a y-value of zero (0), and the z- and m-values are undefined. Separators do not require any special logic when being compressed.

Point compression

The compression or decompression of the coordinates stored in the binary stream is a two step process: the conversion to/from the relative-offset scheme and the packing/unpacking of bytes. To compress coordinates, the values are converted to relative-offsets, then packed into a byte array. To decompress coordinates, the byte array is unpacked, then the values are converted to absolute values. Each step is described below.

Relative-offset value calculation

The goal of converting coordinate values to a relative offset scheme is to make the values as small as possible so that they require fewer bits to represent them. In an array of relative-offset values, the first value is an absolute value (stored as a 32-bit integer) while each subsequent value is the offset, or difference, from the previous absolute value. Therefore, given N absolute values, the relative-offset values are calculated by:

relative_value[0] = absolute_value[0]

relative_value[1] = absolute_value[1] - absolute_value[0]

[...]

relative_value[N-2] = absolute_value[N-2] - absolute_value[N-3]

relative_value[N-1] = absolute_value[N-1] - absolute_value[N-2]

 

Given N relative values, the absolute values are calculated by:

absolute_value[0] = relative_value[0]

absolute_value[1] = absolute_value[0] + relative_value[1]

[...]

absolute_value[N-2] = absolute_value[N-3] + relative_value[N-2]

absolute_value[N-1] = absolute_value[N-2] + relative_value[N-1]

This method is efficient because points within a feature are usually close to neighboring points.

Packing integer values

Relative-offset values are generally represented with fewer bytes than absolute values. The relative-offset values are packed into a series of bytes. The high-order bit of each packed byte acts as a control bit to indicate whether the (integer) value continues into the next byte. For example, if an integer value is packed into three bytes, the high-order bit of bytes one and two is set (indicating the integer value continues into the following byte) and the high-order bit of byte three is not set (indicating that it is the last byte of the integer value). The second bit of the first byte acts as a sign bit. So the first packed byte contains one control bit, one sign bit, and six data bits. All subsequent packed bytes contain one control bit and seven data bits. Because fewer bits are available to represent an integer, up to five packed bytes could be required to represent an integer value (this is a worst case scenario and would only occur when the integer value was greater than 134,217,727).

 

The record layout, by byte, for packed unsigned integers

 

Byte Number           Bit(s)           Value

0                              0                  Control bit (0 = last byte, 1 = integer value continues into the next byte)

0                              1                  Sign bit (0 = positive integer, 1 = negative integer)

0                              2 – 7            Next low-order six bits of the integer

1 – 4                        0                  Control bit (0 = last byte, 1 = integer value continues into the next byte)

1 – 4                        1 – 7            Next low-order seven bits of the integer

 

 

Integer values are packed by taking the low-order six or seven bits (by performing a binary OR operation between the value to be packed and the hexadecimal values 3F or 7F, respectively), depending on which packed byte the value is being stored in, and storing them in the packed byte. The original value is then shifted to the right (i.e., dividing the value) by six or seven bits. If the new, shifted value is nonzero, then the control bit in the packed byte is set, and the steps are repeated again. This process continues until the shifted value is zero. Unpacking is done in a similar manner, but in the reverse order.

Binary layout

In addition to the compressed coordinate values, additional information is stored within the byte stream to provide information about the stored coordinate values. The first eight bytes of the byte stream are reserved for the additional information. Currently, two pieces of additional information are stored within the byte stream: the size of the compressed point byte stream and the dimension of the stored coordinates. Both values are stored as packed integer values (as described previously). The length of the coordinate byte stream is defined as the total length minus the reserved eight bytes (i.e., the size of the compressed point byte stream) and is stored in the first five bytes.

The coordinate dimension indicates whether z- and m-values are present in the byte stream. The dimension is a one-byte bit vector and is stored in the sixth byte of the byte stream. The first low-order bit of the dimension vector indicates whether z-values are present, and the second low-order bit indicates whether measure values are present. If the bit value is turned off (zero), then the corresponding values are not present in the byte stream. If the bit value is turned on (one), then the corresponding values are present. For example, the dimension vector for two-dimensional coordinates has a hexadecimal value of zero (0), for three-dimensional coordinates a value of one (1), for measured two-dimensional coordinates a value of two (2), and for three-dimensional coordinates with measures a value of three (3). The next two bytes of the byte stream are not used currently, but are reserved for future use. The compressed coordinate values are stored in the byte stream following the reserved eight bytes.

 

The binary representation of a feature’s geometry in ArcSDE

 

Byte Number           Use

0 – 4                        Coordinate stream length, packed integer format (byte stream length minus 8 reserved bytes).

5                              Coordinate dimension mask, packed integer format.

6 – 7                        Reserved, but currently unused.

8+                            Compressed coordinate values, packed relative-offset format.

Back