Langston University GIS/GPS
Mr. Stover's Teacher Web Page
Langston
GIS Presentation 11-2007
Click Here
To
Download the latest Version of Adobe Acrobat
These are Adobe Acrobat pdf files:
1.
Using GPS With GIS
2.
GIS for the Media
7. Langston
Aerial Map
3.
Using GIS in Agriculture
4.
Modernizing
Mapping
5.
Being More
Productive Using ArcMap
6. Incoming Langston
Freshmen
![]()
![]()
LECTURE NOTES
Week one
Week five
Week two
Week six
Week three
Week seven
Week four
![]()
Geodata File Downloads From USDA-NRCS
09/24/2003 12:00AM Directory air_quality 09/24/2003 12:00AM Directory cadastral 10/10/2003 12:00AM Directory census 09/24/2003 12:00AM Directory climate 09/24/2003 12:00AM Directory common_land_unit 09/24/2003 12:00AM Directory conservation_practices 09/24/2003 12:00AM Directory cultural_resources 09/24/2003 12:00AM Directory disaster_events 09/24/2003 12:00AM Directory ecological 09/24/2003 12:00AM Directory elevation 09/24/2003 12:00AM Directory endangered_habitat 09/24/2003 12:00AM Directory environmental_easements 09/24/2003 12:00AM Directory geographic_names 09/24/2003 12:00AM Directory geology 02/09/2004 12:00AM Directory government_units 09/24/2003 12:00AM Directory hazard_site 03/31/2005 12:00AM Directory hydrography 02/06/2004 12:00AM Directory hydrologic_units 09/24/2003 12:00AM Directory imagery 09/24/2003 12:00AM Directory land_site 09/24/2003 12:00AM Directory land_use_land_cover 09/24/2003 12:00AM Directory local_geodata 09/24/2003 12:00AM Directory map_indexes 09/24/2003 12:00AM Directory measurement_services 03/09/2004 12:00AM Directory ortho_imagery 09/24/2003 12:00AM Directory plants 09/24/2003 12:00AM Directory project_data 09/24/2003 12:00AM Directory public_utilities 10/03/2006 12:00AM Directory soils 09/24/2003 12:00AM Directory topographic_images 02/05/2004 12:00AM Directory transportation 09/24/2003 12:00AM Directory wetlands 09/24/2003 12:00AM Directory wildlife 09/24/2003 12:00AM Directory zoning 10/04/2006 12:00AM Directory SSURGO_Downloads
![]()
You are now beginning the study of geographic information science. This discipline of study is centered around the fundamentals and applications of geographic information systems or GIS for short.
So what is GIS? What can they do? To give you some idea, consider an example in natural resources management. Assume that you have been given the following tasks for a particular region (ie. local government area, state, country, etc.):
|
| Inventory available forest and mineral resources. |
|
| Obtain flora and fauna requirements. |
|
| Determine water availability and quality. |
|
| Examine extent of disease (ie. dieback). |
|
| Which resources are protected or in short supply (ie. national heritage listing)? |
|
| Evaluate how resources are currently being exploited. |
|
| Predict how availability and quality of these resources will change in the next 10, 20 or even 100 years. |
|
| Assess conflicts with environment, quality of life, populated areas, visual impact, etc. |
|
| Comply with local, regional and national regulations and legislation. |
Quite a task, eh? The more you think about it, the more complex it becomes. Just imagine what you may need: lots (I mean lots!) of data, access to a range of departments and agencies, various software and hardware, many personnel, etc. Well...it can be done - you guessed it - using GIS!
What is a Geographic Information System?
Information system attributes (which also
apply to GIS):
| decision-oriented reporting | |
| effective processing of data | |
| effective management of data | |
| adequate flexibility | |
| a satisfying user environment |
How do we formally define a GIS? No one definition exists since there are many different contexts in which GIS exists. A definition of GIS can be seen from a number of points of view.
The definition that we will use in this course takes into account the various components necessary for the successful establishment of any GIS:
| technology (hardware and software) | |
| people | |
| data |
Geographic Information System:
"An organised collection of computer hardware, software, geographic data, and personnel designed to efficiently capture, store, update, manipulate, analyse, and display all forms of geographically referenced data."
GIS are one of many different types of information systems. The traditional Management Information Systems and Decision Support Systems do not cater for spatial information. There are, however, spatial information systems that are not geographic, such as Computer Aided Design/Computer Aided Manufacturing (CAD/CAM) systems which do not handle a "geographic" component.
![]()
Other terms for GIS
The advantages of GIS are many and relate to the fact that GIS is an
integrating technology - one that brings
together many different applications, data and users. One word that can be used
to describe the benefit of GIS is synergy.
In particular, the following can be sited as advantages of GIS:
| Integrates spatial and other (aspatial) data across a diverse range of applications | |
| Identifies connections between activities based on geographic proximity | |
| Manipulate and display geographic knowledge | |
| Provides access to administrative records | |
| A tool for enhancing decision making | |
| Increases ability to model science and management problems | |
| A catalyst to further development |
The applications of GIS technology can be categorised into
four broad areas:
|
| zoning - urban and regional | |
| subdivision planning and review | |
| environmental impact assessment | |
| water quality management | |
| maintenance of land ownership | |
| land valuation and taxation | |
| town planning schemes |
Infrastructure
| transport route planning | |
| street address matching | |
| location analysis, site selection | |
| disaster planning and evacuation
usage and planning of roads, sewer and water
reticulation, drainage, telephone lines, gas and electricity, etc..
|
| population distribution and forecasting | |
| demographic marketing and analysis | |
| monitoring of patient health | |
| epidemiology | |
| police crime statistics and monitoring | |
| census information public services and access |
GIS have developed over time across a wide range of disciplines. As a matter of fact, the whole foundational concept of GIS is multi-disciplinary.
Many technical and conceptual developments within these areas have converged over time and have been integrated into what now is known as GIS.
Trends in GIS
Hardware
| |||||||||
Software
| |||||||||
New Application Areas
|
![]()
![]()
Trends in GIS (continued)
Data Issues
|
Commercial Software
|
The Three Schools of GIS
| Technician crowd | |||||||||||
| Application crowd | |||||||||||
Computer Science/Programming crowd
|
| Michael F. Goodchild. (1997) What is Geographic Information Science?, NCGIA Core Curriculum in GIScience, http://www.ncgia.ucsb.edu/giscc/units/u002/u002.html, posted October 7, 1997. |
all of these and many others are obtainable through online GIS 'bookstores':
| http://www.esri.com | |
| http://www.geoplace.com |
| GIS World - http://www.geoplace.com | |
| Geo Info Systems - http://www.geoinfosystems.com |
Web references
some cool sites that do GIS over the Web
| |||||||||||||
sites of some major GIS software vendors
| |||||||||||||
some other introductions to GIS
|
Using GIS to solve problems in the real world requires interaction between the real world, the GIS and the users.

The real world needs to be represented within a GIS. The users perceived the real world in a manner related to their problem, and hence need to be able to communicate with the GIS in terms related to their problem (ie. data, functionality, etc.).
Geographic features in the real world can be represented in a number of ways as follows:
1. Analog map
| The traditional analog map has been in use for centuries!. | |
| Divided into physical map sheets | |
| Based on the communication paradigm - emphasis is on visual communication |
2. Digital map
| Maps are stored in digital form on computers to create a cartographic database | |
| Still based on the "analog" map concept | |
| Has greatly enhanced the map-making process and the production of various types of maps. |
3. GIS
| A geographic database involves much more than a cartographic database (ie. much more than simple a map or maps) | |
| The emphasis is on the structure and management of data and their relationships | |
| Based on the analytical paradigm - focus is on analysis | |
| The concepts of GIS extend far beyond the map! |
Abstraction and generalisation
The process for obtaining a representation of the real world follows the
cartographic process for abstraction and generalisation. The process involves
the steps of selection, classification, simplification and symbolisation.
The process for obtaining a GIS representation must consider the purpose, content and detail of the database. This is similar to the cartographic map-making process in which the purpose, content, cartographic scale and presentation must be considered in producing a map.
|
|||||
|
|||||
|
|||||
In many ways, GIS have retained the notion of the map and many map concepts are found back in GIS. However, the manner in which GIS handle and analyse data is very different from that for maps. This is despite the fact that much data input into GIS is derived from maps.
Within GIS, data is often structured in a layered fashion representing the way in which maps have traditionally been handled. Each layer, also known as a coverage, contains some specific data such as a theme (eg. roads, vegetation cover, soils, etc.), time period (eg. years 1970, 1980, 1990) or vertical slices (eg. ground floor, first floor, etc. of a building).
Geographic
data includes both spatial data and
descriptive (or attribute) data.
Spatial data deals with location, shape and relationships among features.
Attribute data deals with the characteristics of the features.
| data | |
| functionality, and | |
| a user interface. |
The database is the heart of the GIS. It must be structured so that the data can be accessed by functions initiated by users. In the following sections, we will consider the structure of the data as well as the functions that operate on the data.
The following chart illustrates the structure of geographic data.
The spatial component consists of locational information (ie. absolute or relative X,Y coordinates), geometry (ie. shape of point, line and polygon features [or raster cells)) and topology (ie. relationships between points, lines and polygons - adjacency, connectivity, and containment). Attribute data can consist of both descriptive data and cartographic attributes (eg. line color and thickness, point symbol, etc.). A third component is temporal data which is sometimes considered as a further dimension (eg. fourth dimension) but is often included as another attribute of the data. Never forget Metadata!
Types of GIS data
| stores pixels (picture elements) in an image and cells in a grid | |
| most closely represents continuous data |
| stores points, lines and polygons to represent the features | |
| most closely represents object-based data |
Note however that either data model can be used to store field-based or object-based data. Both models define a "discretisation" of the features (ie. grid cells or vector objects). In other words, continuous features are represented discretely.
The type of data model used within a GIS will affect, not only the database, but also the functionality and the user interface. We will explore the functions for each type of data model in the following sections
|
Raster vs. Vector
|
![]()
![]()
The meaning or semantics
of the data values stored in a geographic database depend on the scale of
measurement chosen:
|
| values are divisible and multiplicative (an absolute scale defined around zero (0)) |
|
| eg. rainfall of Region 1 is twice that of Region 2 |
| values are additive and subtractive (on a relative scale) | |
| eg. Region 1 is 10 degress warmer than Region 2 |
| values establish order (ranking) only | |
| eg. Region A is most suitable (eg. value of 1) and Region B least suitable (eg. value of 5) |
Nominal:
| numbers establish identity only | |
| eg. lot numbers, postal code zones, etc. |
Note that values progressing from ratio to interval to ordinal to nominal are decreasing in the amount of information contained.
| Different scales of measurement can
be used for the same phenomenon.
Consider, for example, data representing petrol stations. Note how the scale of measurement cannnot be determined from observing the values alone. |
Ratio: 72.9, 68.5, 67.9, 61.3,... (petrol prices) Interval: 25, 29, 30, 27,... Ordinal: 1, 2, 3, 4,... Nominal: 1, 2, 3, 4,... |
| A grid GIS is based on the raster data model. The foundational unit of storage is the grid cell. Square grid cells are most commonly used to store grid data. | |
| Each cell specifies the type or value of an attribute. Only one value is stored per grid cell. Note that if no data is recorded for that grid cell, then a value must still be stored - usually a zero (0) or a special "no data" symbol. A group of contiguous cells having an identical value is referred to as a region. | |
Data is arranged in a matrix
and located by coordinates which relate to the row and column numbers.
Generally speaking, grid cells (matrices) are easy to store, manipulate and
display. |
Because only one value is stored per grid cell, how do we store multiple values for a specific location? We use layering.
Data is stored using the layered concept - a theme or closely-related group of data items are stored in one layer. Hence, a grid database may consist of a number of layers, each representing some theme of information (eg. soils, roads, drill holes, etc.).
Each cell can contain one, and only one, data value for a given layer. Therefore, if multiple attribute are found for a particular theme of data (eg. soil type and pH value for the same soil area), then these attributes must be separated into two or more layers (eg. one for soil types, the other for pH values).
Grid cell data
punctual data: 0-dimensional data lineal data: 1-D areal data: 2-D surficial data: 3-D (or more accurately 2.5-D)
How do you identify the cell location? (depends on the software)
| Center of the cell? | |
| Top-right-hand corner? | |
| Bottom left-hand corner? |
What value do we give to the cell?
|
Dominant feature of cell?
| |
|
Most important feature of cell?
| |
|
Mean value of features within cell?
|
classification
area
perimeter
accuracy
etc...
|
A number of types of operations are available for grid data and may involve one or more layers resulting in a new layer being formed. The following list indicates some of the basic functions of a grid GIS.
| integer - eg. soil classes 1, 2, 3,..., or number of farms |
| real (decimal) values - eg. elevation, average persons per home |
| non-numeric or alpha-numeric values - eg. vegetation classes a, b, c,... |
![]()

![]()
Because vector data consist of three different data primitives (points, lines, polygons) instead of one as for grid (the grid cell) and because all the components (location, topology, attributes) need to be maintained, vector operations are more complex than raster operations in general. Both the spatial data and the attribute data must be handled. Further the link between spatial and attribute data must be maintained.
Vector operations include:
Display and query
Data generalisation and abstraction
Data manipulation
|
Measurement
Topological overlay
Buffering
|
Generalisation and abstraction may involve:
|
![]() |
![]() |
||||||||||||
![]() |
![]() |
Data generalisation and
abstraction also may involve:
|
The spatial overlay of two coverages results in a new coverage which is subjected to planar enforcement. In an overlay operation, both the spatial and the attribute data must be updated to reflect the new geometry/topology/attributes.
A number of different types of spatial overlay exist:
| point-in-polygon | |
| line-on-polygon | |
| polygon overlay (polygon-on-polygon |
Buffering must cater for:
The result of all buffer operations is a polygon coverage which must
have appropriate attributes assigned. Each polygon must have an
attribute that identifies whether or not it is a polygon
inside the buffer or
outside the buffer. |
||||||||
![]() |
| Projection (assumes a spherical earth). ONLY a mathematical method for drawing a 3D object (the earth) on a 2D surface (map, computer screen). | |
| Datum - a modification on a sphere - a spheroid. In other words, a mathematical representation of the earth's shape | |
| Geoid - even more detail than a spheroid. It's an irregular surface. | |
| Coordinates - locations. Most common include lat/long (geographic), UTM, and State Plane. |
More on Datums...... First in common use in the US was Clarke1866. Following in chronological order are NAD27, GRS80, NAD83, and WGS84. Everything from 1980 on are based upon the center of the earth. Those earlier are based on a point on the surface of the earth. To give a little info on the significance of a datum, there is up to a 300 meter difference in x,y coordinates between the NAD27 and NAD83 datums.
More on the Geoid - the definition of the shape of the earth is the field of Geodesy. In theory, the geoid runs through sea level and is a representation of the gravity field of the earth. Far more accurate than mere datum measurements.
Displaying Map Projections - A map projection is a mathamatical method for projection the surface of a globe onto a sheet of paper (3d - 2d)
Only a few (maybe about 20) are in actual use, although there are hundreds out there.
distortion:
| there is always distortion (period) | |
| there are different types of distortion | |
| there are different degrees of distortion dependent on where you are
on the map or which type of projection that you select. Map Classification |
Describing the different projections (2 methods)
1) by geometric construction
| conic: projected onto a cone | |
| cylindrical -- projected onto a cylinder | |
| azimuthal - projected onto a single surface |
modified by aspect
| |||||||
modified by case
|
2) by Preserved properties:
| Area: correct relative size (equal area or equivalent projections) cyl. equal area, sinusoidal, mollweide, eckert IV | |
| angle: correct shapes. Note that area and angle are mutually exclusive. (conformal Projections). Mercator, lambert conformal conic | |
| Distance: distances between points are correct (equidistant). Stereographic, gnomonic | |
| Azimuth: Great circles are straight lines. Often viewed with the pole at the center. | |
| Compromise Nothing is preserved, but nothing is super distorted. Robinson |
Note that both distance and azimuth preserved projections only
apply to or from the center of the map projection. Also, they usually only
show 1 hemisphere or half the earth. gnomonic
These can be modified by interruptions
We discussed the URISA Ethics in GIS page. It can be found at http://www.urisa.org/ethics/code_of_ethics.htm
|
History of GIS
|
| US Census Bureau | |
| ESRI | |
| Tie in with previous trends in GIS lecture |
Week 5 Lectures
This section was edited by Gary Hunter, Department of Geomatics, University of Melbourne, Australia.
This unit is part of the NCGIA Core Curriculum in Geographic Information Science. These materials may be used for study, research, and education, but please credit the authors Howard Veregin, and the project, NCGIA Core Curriculum in GIScience. All commercial rights reserved. Copyright 1998 by Howard Veregin.
![]()
| What is quality? |
| Quality is commonly used to indicate the superiority of a manufactured good or to indicate a high degree of craftsmanship or artistry. We might define it as the degree of excellence in a product, service or performance. | |
| In manufacturing, quality is a desirable goal achieved through management and control of the production process (statistical quality control). (Redman, 1992) | |
| Many of the same issues apply to the quality of databases, since a database is the result of a production process, and the reliability of the process imparts value and utility to the database. |
| Why is there a concern for DQ? |
| Increased data production by the private sector, where there are no required quality standards. In contrast, production of data by national mapping agencies (e.g., US Geological Survey, British Ordnance Survey) has long been required to conform to national accuracy standards (i.e., mandated quality control). | |
| Increased use of GIS for decision support, such that the implications of using low-quality data are becoming more widespread (including the possibility of litigation if minimum standards of quality are not attained). | |
| Increased reliance on secondary data sources, due to the growth of the Internet, data translators and data transfer standards. Thus, poor-quality data is ever easier to get. |
| Who assesses DQ? |
| This is a form of quality control where DQ assessment is the responsibility of the data producer. It is based on compliance testing strategies to identify databases that meet quality thresholds defined a priori. | |
| An example is NMAS, the National Map Accuracy Standards adopted by the US Geological Survey in 1946. | |
| This approach lacks flexibility; in some cases a particular test may be too lax while in others it may be too restrictive. |
| This model views error as inevitable and does not impose a minimum quality standard a priori. Instead, it is the consumer who is responsible for assessing fitness-for-use; the producer’s responsibility is documentation, i.e., “truth-in-labeling.” | |
| An example is SDTS, the Spatial Data Transfer Standard. | |
| This approach is flexible, but there is still no feedback from the consumer, i.e., there is a one-way information flow that inhibits the producer’s ability to correct mistakes. |
| This model uses a two-way information flow to obtain feedback from users on data quality problems. Consumer feedback is processed and analyzed to identify significant problems and prioritize repairs. | |
| An example is Microsoft’s Feedback Wizard, a software utility that lets users email reports of map errors. | |
| This model is useful in a market context in order to ensure that databases match users’ needs and expectations. |
| Accuracy is the inverse of error. Many people equate accuracy with quality but in fact accuracy is just one component of quality. |
| Definition of accuracy is based on the entity-attribute-value model. |
| Entities = real-world phenomena | |
| Attribute = relevant property | |
| Values = Quantitative/qualitative measurements |
| An error is a discrepancy between the encoded and actual value of a particular attribute for a given entity. “Actual value” implies the existence of an objective, observable reality. However, reality may be: |
| Unobservable (e.g., historical data) | |
| Impractical to observe (e.g., too costly) | |
| Perceived rather than real (e.g., subjective entities such as “neighborhoods") |
| In fact, it is not necessary to posit an objective reality in order to assess accuracy, since all geographical data are collected with the aid of a model that specifies -- implicitly or explicitly -- the required level of abstraction and generalization. |
| This is the database “specification” and is closely related to the “terrain nominal” concept of perceived reality (Salgé, 1995). | |
| The specification serves as the standard against which accuracy is assessed. Thus the “actual” value is the value we would expect based on the specification (Brassel et al., 1995). | |
| Accuracy is always a relative measure, since it is always measured relative to the specification. | |
| To judge fitness-for-use, one must judge the data relative to the specification, and also consider the limitations of the specification itself (CEN, 1995). |
| Spatial accuracy is the accuracy of the spatial component of the database. The metrics used depend on the dimensionality of the entities under consideration. |
| For points, accuracy is defined in terms of the distance between the encoded location and “actual” location. |
| Error can be defined in various dimensions: x, y, z, horizontal, vertical, total. | |
| Metrics of error are extensions of classical statistical measures (mean error, RMSE or root mean squared error, inference tests, confidence limits, etc.) (American Society of Civil Engineers 1983; American Society of Photogrammetry 1985; Goodchild 1991a). |
| For lines and areas, the situation is more complex. This is because error is a mixture of positional error (error in locating well-defined points along the line) and generalization error (error in the points selected to represent the line) (Goodchild 1991b). |
| The epsilon band is usually used to define a zone of uncertainty around the encoded line, within which “actual” line exists with some probability. | |
| However, there is little agreement (and little empirical work) on the shape of the band, both planimetrically and in cross-section (Chrisman, 1982; Blakemore, 1983; Honeycutt, 1986; Caspary and Scheuring, 1993). |
| Temporal accuracy is the agreement between the encoded and “actual” temporal coordinates for an entity. |
| Temporal coordinates are often only implicit in geographical data, e.g., a time stamp indicating that the entity was valid at some time. Often this is applied to the entire database (e.g., a map dated “1995”). |
| More realistically, temporal coordinates are the temporal limits within which the entity is valid (e.g., Pothole Q54D-35-021 existed between 2/12/96 and 8/9/96). |
| Temporal accuracy is not the same as “database time”, which is the time the information was entered into the database. |
| Temporal accuracy is not the same as “currentness” (or up-to-dateness) which is actually an assessment of how well the database specification meets the needs of a particular application. A database can be temporal accurate but still out of date; historical applications depend on such data. |
| Thematic accuracy is the accuracy of the attribute values encoded in a database. |
| The metrics used here depend on the measurement scale of the data: |
| Quantitative data (e.g., precipitation) can be treated like a z-coordinate (elevation) and assessed using metrics normally used for vertical error (such as the RMSE). See section 2.1. | |
| Qualitative data (e.g., land use/land cover) is normally assessed using a cross-tabulation of encoded and “actual” classes at sample of locations. This produces a classification error matrix (confusion matrix). |
| Element in row i, column j of the matrix is the number of sample locations assigned to class I but actually belonging to class j. | |
| The sum of the main diagonal divided by the number of samples is a simple measure of overall accuracy. | |
| An error of omission means a sample that has been omitted from its actual class. An error of commission means an error that is included in the wrong class. Ever error of omission is also an error of commission. | |
| There is a large body of research on this topic (e.g., van Genderen and Lock, 1977; Congalton et al., 1983; Aronoff, 1985; Rosenfield and Fitzpatrick-Lins, 1986 |
3. Resolution (precision)
| Resolution (or precision) refers to the amount of detail that can be discerned in space, time or theme. Resolution is always finite because no measurement system is infinitely precise, and because databases are intentionally generalized to reduce detail (Veregin and Hargitai, 1995). |
| Resolution is an aspect of the database specification that determines how useful a given database may be for a particular application. High resolution is not always better; low resolution may be desirable when one wishes to formulate general models. |
| Resolution is linked with accuracy, since the level of resolution affects the database specification against which accuracy is assessed. Two databases with the same overall accuracy levels but different levels of resolution do not have the same quality; the database with the lower resolution has less demanding accuracy requirements. (For example, thematic accuracy will tend to be higher for general land use/land cover classes like “urban” than for specific classes like “residential”.) |
| Spatial resolution is well-defined in the context of raster data were it refers to the linear dimension of a cell. |
| For vector data resolution might be defined as the minimum mapping unit size. Sometimes mean polygon size is used instead, but this is erroneous since smaller polygons may be observable but just not present on the map. |
| Temporal resolution is length (temporal duration) of the sampling interval. |
| For example, the shorter the shutter speed of a camera, the higher the temporal resolution (other factors being equal). | |
| Temporal resolution affects the minimum duration of an event that is discernible. If the duration is less than the resolution, the event is invisible or at best leaves a smudge (like carriages on nineteenth-century daguerreotypes). |
| Temporal resolution is distinct from temporal sampling rate. |
| Resolution is the length of the sampling interval, while sampling rate is the frequency of sampling over time (e.g., once a day, once a week, etc.). | |
| For example, a motion picture camera might have a temporal resolution of 1/1000 second (i.e., the shutter speed to capture a single frame ), and sampling rate of 24 frames per second. |
| Thematic resolution refers to the precision of the measurements or categories for a particular theme. |
| For categorical data, resolution is the fineness of category definitions (e.g., “urban” vs. “residential” and “commercial”). | |
| For quantitative data, thematic resolution is analogous to spatial resolution in the z-dimension (i.e., the degree to which small differences in the quantitative attribute can be discerned). [FIGURE 6] |
| Consistency refers to the absence of apparent contradictions in a database. Consistency is a measure of the internal validity of a database, and is assessed using information that is contained within the database. |
| Consistency can be defined with reference to the three dimensions of geographical data. |
| Spatial consistency includes topological consistency, or conformance to topological rules, e.g., all one-dimensional objects must intersect at a zero-dimensional object (Kainz, 1995). | |
| Temporal consistency is related to temporal topology, e.g., the constraint that only one event can occur at a given location at a given time (Langran, 1992). | |
| Thematic consistency refers to a lack of contradictions in redundant thematic attributes. For example, attribute values for population, area, and population density must agree for all entities. |
| Completeness refers to a lack of errors of omission in a database. It is assessed relative to the database specification, which defines the desired degree of generalization and abstraction (selective omission). |
| There are two kinds of completeness (Brassel et al., 1995) |
| “Data completeness” is a measurable error of omission observed between the database and the specification. Even highly generalized databases can be “data complete” if they contain all of the objects described in the specification. | |
| “Model completeness” refers to the agreement between the database specification and the “abstract universe” that is required for a particular database application. A database is “model complete” if its specification is appropriate for a given application. |
| Incompleteness can be measured in space, time or theme . Consider a database of buildings in Minnesota that have been placed on the National Register of Historic Places as of the end of 1995. |
| Spatial incompleteness: The list contains only buildings in Hennepin County (one county in Minnesota, rather than all of Minnesota). | |
| Temporal incompleteness: The list contains only buildings placed on the Register by June 30, 1995. | |
| Thematic incompleteness: The list contains only residential buildings. |
| Errors of commission can also be assessed. These errors can lead to “over-completeness”. |
| Errors of commission in space, time and theme for the previous example: The list also contains buildings in Wisconsin; the list contains buildings added to the list in 1996; the list contains historic districts as well as buildings. |
| Data quality is the degree of excellence in a database. Quality is assessed relative to the database specification, which defines the desired level of generalization and abstraction. The quality of this specification, and its appropriateness for particular applications, can also be assessed. |
| Quality assessment and reporting is based on minimum quality standards (compliance testing or quality control), metadata standards (truth-in-labeling and fitness-for-use), or market standards (feedback from users). |
| Data quality is contains several components, including accuracy, precision, consistency and completeness. Each component can be assessed in space, time and theme (the three basic dimensions of geographical data). |
| Various assessment methods can be used for each component/dimension combination. Some methods are well-developed and others are not. |
| American Society of Civil Engineers (Committee on Cartographic Surveying, Surveying and Mapping Division) 1983 Map uses, scales and accuracies for engineering and associated purposes. New York: American Society of Civil Engineers. |
| American Society of Photogrammetry (Committee for Specifications and Standards, Professional Practice Division) 1985 Accuracy specification for large-scale line maps. Photogrammetric Engineering and Remote Sensing 51: 195-199. |
| Aronoff S 1985 The minimum accuracy value as an index of classification accuracy. Photogrammetric Engineering and Remote Sensing 51: 99-111. |
| Beard M K 1989 Use error: The neglected error component. Proceedings, Auto Carto 9; 808-817. |
| Berry B 1964 Approaches to regional analysis: A synthesis. Annals, Association of American Geographers 54: 2-11. |
| Blakemore M 1983 Generalisation and error in spatial data bases. Cartographica 21: 131-139. |
| Brassel K, Bucher F, Stephan E-M and Vckovski A 1995 Completeness. In Guptill S C and Morrison J L (eds) Elements of spatial data quality. Oxford, Elsevier: 81-108. |
| Burrough P A 1986 Principles of geographical information systems for land resources assessment. Oxford, Clarendon. |
| Campbell W G and Mortenson D C 1989 Ensuring the quality of geographic information system data. Photogrammetric Engineering and Remote Sensing 55: 1613-1618. |
| Caspary W and Scheuring R 1993 Positional accuracy in spatial databases. Computers, Environment and Urban Systems 17: 103-110. |
| Chrisman N R 1982 A theory of cartographic error and its measurement in digital data bases. Proceedings, Auto Carto 5: 159-168. |
| Chrisman N R 1991 The error component in spatial data. In Maguire D J, Goodchild M F and Rhind D W (eds) Geographical information systems. New York, Wiley: 165-174. |
| Comité Européen de Normalisation (CEN) 1995 Geographic Information - Data Description - Quality (Draft). Brussels: CEN Central Secretariat. |
| Congalton R G, Oderwald R G and Mead R A 1983 Assessing Landsat classification accuracy using discrete multivariate analysis statistical techniques. Photogrammetric Engineering and Remote Sensing 49: 1671-1678. |
| Duecker G T and Platt J T 1990 The role of automated data checks in the quality assurance of GIS data bases. GIS/LIS '90: 264-271. |
| Federal Geographic Data Committee (FGDC) 1994 Content Standards for Digital Geospatial Metadata (June 8). Washington DC: Federal Geographic Data Committee. |
| Fegeas R G, Cascio J L and Lazar R A 1992 An overview of FIPS 173, The Spatial Data Transfer Standard. Cartography and Geographic Information Systems 19: 278-93. |
| Goodchild M F 1988a Stepping over the line: Technological constraints and the new cartography. The American Cartographer 15: 311-319. |
| Goodchild M F 1988b The issue of accuracy in global databases. In Mounsey H (ed) Building Databases for Global Science. London, Taylor and Francis: 31-48. |
| Goodchild M F 1991a Issues of quality and uncertainty In Muller J C (ed) Advances in cartography. London, Elsevier: 113-139. |
| Goodchild M F 1991b Keynote address. Proceedings, Symposium on Spatial Database Accuracy: 1-16. |
| Goodchild M F 1995 Sharing imperfect data. In Onsrud H J and Rushton G (eds) Sharing geographic information. New Brunswick NJ, Center for Urban Policy Research: 413-425. |
| Guptill S C 1993 Describing spatial data quality. Proceedings, 16th International Cartographic Conference: 552-560. |
| Honeycutt D M 1986 Epsilon, generalization and probability in spatial data bases. Unpublished manuscript. |
| Kainz W 1995 Logical consistency. In Guptill S C and Morrison J L (eds) Elements of spatial data quality. Oxford, Elsevier: 109-137. |
| Langran G 1992 Time in geographic information systems. London: Taylor and Francis. |
| Lanter D 1991 Design of a lineage-based meta-database for GIS. Cartography and Geographic Information Systems 18(4): 255-261. |
| Lanter D and Veregin H 1992 A research paradigm for propagating error in layer-based GIS. Photogrammetric Engineering and Remote Sensing 58: 526-533. |
| Moellering H (ed) 1991 Spatial database transfer standards: Current international status. London: Elsevier. |
| Parkes D N and Thrift N J 1980 Times, spaces, and places: A chronogeographic perspective. New York: Wiley. |
| Redman T C 1992 Data quality. New York: Bantam. |
| Rosenfield G H and Fitzpatrick-Lins K 1986 A coefficient of agreement as a measure of thematic classification accuracy. Photogrammetric Engineering and Remote Sensing 52: 223-227. |
| Salgé F 1995 Semantic accuracy. In Guptill S C and Morrison J L (eds) Elements of spatial data quality. Oxford, Elsevier: 139-151. |
| SDTS 1992 The Spatial Data Transfer Standard (FIPS-173). |
| Sinton D 1978 The inherent structure of information as a constraint in analysis. In Dutton G (ed) Harvard papers on geographic information systems. Reading MA, Addison-Wesley. |
| Stearns F 1968 A method for estimating the quantitative reliability of isoline maps. Annals, Association of American Geographers 58: 590-600. |
| Thapa K and Bossler J 1992 Accuracy of spatial data used in geographic information systems. Photogrammetric Engineering and Remote Sensing 58(6): 835-841. |
| Tychon G G and Johnson M R 1990 GIS data exchange: Standards and formats. In Heit M and Shortreid A (eds) GIS applications in natural resources. Boulder CO, GIS World Inc: 155-161. |
| van Genderen J L and Lock B F 1977 Testing land-use map accuracy. Photogrammetric Engineering and Remote Sensing 43: 1135-1137. |
| Veregin H and Hargitai P 1995 An evaluation matrix for geographical data quality. In Guptill S C and Morrison J L (eds) Elements of spatial data quality. Oxford: Elsevier 167-188. |
![]()
Data Input
| the number 1 bottleneck in GIS applications. Often 80%+ of the project cost | |||||||||||||||
Need to automate data input process, but that causes
problems. Common problems/considerations include:
| |||||||||||||||
| Modes of data input include the keyboard, scanners, direct conversion from other digital data, and voice input |
Socio-Economic data
Generally include: (may be aggregate or disaggregate)
| |||||||||||||
Sources of Socio-economic data
| |||||||||||||
Issues in using socio-economic data
|
Environmental and Natural Resource Data
Purposes of resource-based GIS are primarily
| |||||||||
| Contents of Environmental databases include not only natural information, but info regarding any human impacts on the area. | |||||||||
General notes on resource data
| |||||||||
| We then went over an example of all the data that might be required for siting a waste incinerator. |
National Spatial Data Infrastructure (NSDI)
| Concept not new | |||||||||||||||||||||||||
| more and more people requiring spatial data | |||||||||||||||||||||||||
| to maximise benefits, dataset must be accessable | |||||||||||||||||||||||||
benefits: decisions only as good as the information upon
which they are based.
| |||||||||||||||||||||||||
Current shortcomings and obstacles
|
| |
The process of
data capture must involve both spatial and attribute data. If they
are considered separately, then more effort usually is required to
integrate both components within a GIS database. It is better to
consider both components in planning the capture process to ensure
that appropriate IDs and tags are assigned to enable attributes to
be correctly attached to the appropriate spatial units.
Essentially three general operations are
required for the capture of geographic data:
Entering the spatial data
Entering the non-spatial (attribute) data
- Linking the spatial data to the non-spatial data
Most attribute data is entered via a keyboard into a database. Often, much attribute data exists prior to the GIS being built. In terms of volume, usually the vast majority of a GIS database consists of attribute data. The entry of attribute data and its pitfalls is well known and hence we will concentrate primarily on the spatial component of geographic data.
The process of manually
entering spatial data is dependent on whether a grid-based or a
vector-based database is to be generated.
Vector
|
Grid
|
The Global
Positioning System, more commonly
referred to as GPS,
is used to compute positions in 2 or 3 dimensional space from
signals obtained from a series of NAVSTAR satellites. It is owned by
the U.S. Department of Defence.
Satellite
details:
|
![]() |
The Russian equivalent is GLONASS - Global Navigation Satellite System.
| "standard" uncorrected, x and y to 15 meters. Z is a bit worse | |
| Differential GPS - correct using base station data. x and y to 1-5 meters | |
| RTK - Real time kinematic - accuracies to sub-centimeter. |
A digitiser consists of a tablet with an electronic mesh and a cursor (also known as puck or mouse). A map sheet is mounted on the digitiser and the cursor is used to enter required points or trace the desired lines on the map. Lines are digitised as a series of points which can then be processed by the software (eg. editing, conversion to raster format, etc.).
Digitising is largely a manual process involving the concentration and skillfulness of a digitising operator. It is both a time-consuming and boring task.
| Source map
- may be: |
| Digitiser resolution: |
| Digitising process
- dependent on factors such as: |
During the digitising process, a number of errors may occur. Some of these errors can be prevented or alleviated during the digitising process, while others can be correctly/edited in a following step (either automated or manually). Such (potential - not all are necessarily incorrect) errors are discussed in the following sections and include:
| Dangle nodes | |
| Pseudo nodes | |
| Multiple or missing labels | |
| Sliver polygons | |
| etc. |
Dangles, also referred to as Dangling Nodes, are identified as nodes with only one line (arc) attached. They include both overshoots (extended too far past another line) and undershoots (not quite reaching another line). In such a case, the line was intended to connect up with another line and hence overshoots and undershoots are errors.
However, dangles are also identified with lines that are not connected to another line at one end, such as cul-de-sac's or dead-end streets in a road network. Obviously, such dangles are NOT errors.
Correcting dangles (that are errors!) can be partially automated by setting a dangle length tolerance value which specifies the maximum distance within which nodes will automatically be "snapped" to a line. Any dangles falling outside this tolerance level will have to be corrected maually in the editing process.
Pseudo nodes
are identified by nodes that have only two arcs
attached to it. This often occurs two lines are digitised and are
connected together at one end, but the connection point does not occur
at a junction with other lines (ie. node breaks up a long and
complex line such as a contour line).
If the lines are intended to represent the same entity with the same
attributes, then the pseudo node is an error and should be removed.
In other cases, pseudo nodes do not cause any problems, but can be
removed to simplify the storage and provide a "cleaner"
representation.
In certain cases, pseudo nodes
are REQUIRED and therefore certainly are not errors. The most obvious
example is an island polygon
where the bounding line begins and ends at the SAME node.
Removing a pseudo node involves joining the two (different!) arcs on either side of it into one. This usually means ensuring that the attributes of each arc are the same so that there is no problem in merging them. It may also be possible to specify a "snapping distance" tolerance value which will cause all nodes within a specified distance to "snap" together, thereby eliminating some pseudo nodes.
| be more careful in digitising (enter line once!), | |
| try to use existing (already digitised!) lines for boudaries that are common to two or more features (ie. soil types and river boundaries, vegetation and lake boundaries, etc.), or | |
| eliminate all polygons with an area less than a specified tolerance value |
Such errors may include:
Missing arcs
Too many vertices
Weird polygons
Wierd polygons are tiny "knots" usually caused by digitising errors where lines are accidentally crossed. In the "cleaning" process, the intersection points are identified resulting in additional nodes and lines (and the weird polygon). They can be removed by deleting the offending line AND node.
Tolerance values
For the digitising process in a GIS a number of tolerance values can usually be specified to prevent some errors from occurring:
Most GIS provide a range of tolerance values that can be either preset or specified in the editing operation. For example, weed and snap distances can be preset when digitising using the ArcEdit module of ArcInfo. In a subsequent process of "cleaning" lines and line junctions, and "building" topology, ArcInfo provides the option of setting dangle lengths and fuzzy tolerances. Default values are used if no options are specified.
Scanners can be either raster or vector (rare). Vectorization of the scanned images is often necessary.
Another option is to scan an image, bring it into a GIS, and use on-screen digitizing to extract appropriate information from the scanned image.
Photos taken from an airplane.
Good because of their high resolution
Inconvenient because photos of the date of interest are often not available (flying your own photos is costly) and because complex photogrammetric methods are required to eliminate distortion (orthorectification).
Raster imagery collected by a satellite orbitting the earth. Often multiple bands (or layers) of data - depends on how many parts of the electromagnetic spectrum are sampled.
Resolutions vary between 1 meter and 1.1 kilometers - something for almost any application.
Revisit times range from 2x per day to about every 15-20 days.
Radar imagery - good for examining topography, can see through clouds. If long-wavelength and in areas of dry soils, the radar will reflect off subsurface features.
Hyperspectral - thousands of bands. Too new to be significant at this time - but it will be important in the future.
Raster imagery collected by a satellite orbitting the earth. Often multiple bands (or layers) of data - depends on how many parts of the electromagnetic spectrum are sampled.
Resolutions vary between 1 meter and 1.1 kilometers - something for almost any application.
Revisit times range from 2x per day to about every 15-20 days.
Radar imagery - good for examining topography, can see through clouds. If long-wavelength and in areas of dry soils, the radar will reflect off subsurface features.
Hyperspectral - thousands of bands. Too new to be significant at this time - but it will be important in the future.
Important sensor systems
|
Landsat MSS
| |
|
Landsat TM
| |
|
SPOT
| |
|
AVHRR
| |
|
Radarsat
| |
|
Ikonos
|
Capturing attributes of spatial data - usually typed into the computer as columns in a spreadsheet (or database). For example, some attributes linked to a road segment might be: width, construction, speed limit, traffic volume, last date repaved, etc.
![]()
Network analysisA network is a system of connected linear features
through which resources flow. Some examples of features and
resources are as follows:
|
Elements of a networkLinks
Barriers
Turns
Centers Stops (adapted from ESRI documentation on networking)
|