Why are common data models required for efficient SDI data sharing?
SDIs are built and maintained for sharing geospatial data that is published on one GIS (the source) and consumed by another (the target). There are many ways of accomplishing this data exchange, including data download and services available through the ArcGIS REST API. However, for an efficient and effective data exchange, it’s important that both systems have a common interpretation of the data model in which the data is being provided. A geospatial data exchange model contains a definition of geometry, topology, attributes and semantics so that the consumed data is an accurate representation of the source data. Read this blog post to find out more about how important the data model is for effective data exchange in an SDI.
The purpose of an SDI is to share geospatial data within organizations and externally to other organizations. There are many ways to move vector geospatial data from one system to another, including data download, the ArcGIS REST API, OGC Web Services, OGC GeoPackage, other application programming interfaces (APIs) and many others. However, once the data arrives at the target system, how does that system decode the data in order to use it for visualization, integration, storage and analysis? The structure of the information elements communicated from the source system and how the data elements are decoded at the target system are controlled by what’s termed the geospatial data model.
In the Esri GIS Dictionary, Esri defines “data model” in three ways, which are as follows:
- In GIS, a data model is “a mathematical construct for representing geographic objects or surfaces as data. For example, the vector data model represents geography as collections of points, lines and polygons; the raster data model represents geography as cell matrixes that store numeric values and the TIN data model represents geography as sets of contiguous, nonoverlapping triangles.”
- In ArcGIS, a data model is “a set of database design specifications for objects in a GIS application. A data model describes the thematic layers used in the application (for example, hamburger stands, roads and counties); their spatial representation (for example, point, line or polygon); their attributes; their integrity rules and relationships (for example, counties must nest within states); their cartographic portrayal and their metadata requirements.”
- In information theory, a data model is “a description of the rules by which data is defined, organized, queried and updated within an information system (usually a database management system)”.
Diving a little deeper, some critical aspects of the spatial data exchange in an SDI context include representative models for geometry, topology, attributes and semantics. Let’s examine each of these aspects of the data model individually.
Geometry is the mathematical definition of how the object is located on, under or above the earth’s surface. Most modern GIS technologies provide point positions as a coordinate tuple for the point in relation to the center of the earth and time. Most often these positions are the latitude and longitude coordinate pairs based on a local, regional or global datum (vertical and horizontal), as explained here and here. So, to make the shared data as interoperable and accurate as possible, it’s best to use coordinates defined in a global datum with a known epoch. ArcGIS is capable of converting coordinates between various coordinate reference systems (CRS), such as from a geographic coordinate system (GCS), from a projected coordinate system (PCS), from a local coordinate system (LCS) or from a linear reference system (LRS). However, for accuracy and interoperability, it’s best to exchange geospatial data using coordinates based on a GCS with broad acceptance such as WGS-84 to avoid the possibility of errors or positional inaccuracies during coordinate conversion.
This is the process flow that would be performed automatically by ArcGIS to convert projected x-y coordinates for an incoming SDI data file to projected x-y coordinates within the destination GIS. Based on an image from this document.
Next on our list of geospatial data model exchange topics is topology. By its pure mathematical definition, topology means the study of geometrical properties and spatial relations unaffected by the continuous change of shape or size of figures. In the GIS world, topology means the definition of spatial relationships between GIS features so that the representation is as accurate as possible. There are lots of reasons why topology is important, some of which are given in this publication. Today we have plenty of tools in ArcGIS to fix topological errors, but more precision and accuracy is expected than ever before in terms of the topology of the data. Because more 3D and 4D data is being produced and exchanged, the topology between all the spatial entities needs to be well understood and maintained.
When sharing geospatial data, it’s important to ensure that topology rules are enforced to eliminate gaps, overlaps, dangles and undershoots, and that coincident lines and polygons are coincident. Image from “ArcGIS 8.3 Brings Topology to the Geodatabase”.
Next on the topic list are attributes, which are defined in the Esri GIS Dictionary as “the nonspatial information about a geographic feature in a GIS, usually stored in a table and linked to the feature by a unique identifier. For example, attributes of a river might include its name, length, and sediment load at a gauging station.” So clearly each system in the GIS data exchange needs to properly understand the values provided in the attributes. For the example above, this might include whether the language of the river name is English or French, whether the river length is measured in miles or kilometres, or whether the sediment load is in units of tons/day or kilograms/hour. To make use of the attributes, both systems need to have a common understanding of the attributes and how they are defined.
These are the Shoreline Water Level attributes from the GeoBase National Hydro Network open data layer. Each of these water levels are based on the datum. Image from the GeoBase National Hydro Network Data Production Catalogue.
The final topic on the data model exchange list is a common understanding of the data semantics. This is important because the GIS data was initially collected and used for a specific purpose but is now being shared through an SDI, and the SDI data user is likely to be using the data for quite a different purpose. This means that the target data user could have a different understanding of the technical terms used by the data collector. This is termed domain semantics. For example, the Canadian Wetland Classification System differentiates between bogs, fens, marshes, swamps and shallow water. But if wetland data from several agencies is being integrated in an SDI, does all the source data use the same definition for what is a marsh or a swamp or a bog? In transportation applications, some road data could be defined as the centreline of the entire road, while other systems may map the road as the centreline of the lanes or even as the centreline of the road right-of-way.
This is an example of semantics for a road network, where the defined data set is divided where there is a median. This means that the road on the left is a road centreline, while the road on the right is a lane centreline. Given that road networks can be very complicated, it’s important to follow common semantics in a common data model to ensure the data is interoperable. Example taken from the Community Map of Canada’s Community Road Network Specifications document.
The creation and use of a vector GIS data model can be time consuming and complicated and a new data model does not guarantee interoperability. In fact, new data models are often a barrier to data exchange because the new model needs to be implemented and tested. To improve interoperability, it is best to use all or parts of a good existing data model. For data exchange, these models are sometimes called data formats. Governments provide a lot of geospatial data and they often have invested the time and resources to develop broad-based data models. For example, the Canadian federal government has developed the GeoBase series of data models, which they use to provide GeoBase data layers that include roads, hydrography, railways and geographic names. Esri Canada has also developed the Canadian Municipal Data Model (CMDM). The CMDM can be configured to support specific organizational business needs either by tailoring specific themes, or by adding and modifying fields and layer aliases to reflect terms more widely used in your organization and application. Other application-specific formats (or data models) are available for other uses such as public safety.
Common, well-understood data models are essential for SDI data sharing but an organization using data provided in an SDI must use the same data model as the source organization to decode (Extract, Transform, Load [ETL]) the data. The target organization must also have the same understanding of the concepts contained within the data model as the source organization. So, if you want to make your data as interoperable as possible with an SDI, put the exchange data into a common data model that will make data interchange easier and less error prone, and therefore more successful.
This post was translated to French and can be viewed here.