Langston University GIS/GPS

Email Mr. Stover:

Mr. Stover's Teacher Web Page
Langston GIS Presentation 11-2007

Lesson One - The Basics

Click Here To Download the latest Version of Adobe Acrobat
These are Adobe Acrobat pdf files:
 1. Using GPS With GIS                                  2. GIS for the Media                     7. Langston Aerial Map
 3.
Using GIS in Agriculture                             4. Modernizing Mapping
 5.
Being More Productive Using ArcMap          6. Incoming Langston Freshmen    
 

These are Microsoft Word Files
GIS Definitions     Syllabus Intro to GIS/GPS    Syllabus Advanced GIS/GPS

LECTURE NOTES

Week one                Week five
Week two                Week six
Week three              Week seven 
Week four

 

Geodata File Downloads From USDA-NRCS

09/24/2003 12:00AM      Directory air_quality
09/24/2003 12:00AM      Directory cadastral
10/10/2003 12:00AM      Directory census
09/24/2003 12:00AM      Directory climate
09/24/2003 12:00AM      Directory common_land_unit
09/24/2003 12:00AM      Directory conservation_practices
09/24/2003 12:00AM      Directory cultural_resources
09/24/2003 12:00AM      Directory disaster_events
09/24/2003 12:00AM      Directory ecological
09/24/2003 12:00AM      Directory elevation
09/24/2003 12:00AM      Directory endangered_habitat
09/24/2003 12:00AM      Directory environmental_easements
09/24/2003 12:00AM      Directory geographic_names
09/24/2003 12:00AM      Directory geology
02/09/2004 12:00AM      Directory government_units
09/24/2003 12:00AM      Directory hazard_site
03/31/2005 12:00AM      Directory hydrography
02/06/2004 12:00AM      Directory hydrologic_units
09/24/2003 12:00AM      Directory imagery
09/24/2003 12:00AM      Directory land_site
09/24/2003 12:00AM      Directory land_use_land_cover
09/24/2003 12:00AM      Directory local_geodata
09/24/2003 12:00AM      Directory map_indexes
09/24/2003 12:00AM      Directory measurement_services
03/09/2004 12:00AM      Directory ortho_imagery
09/24/2003 12:00AM      Directory plants
09/24/2003 12:00AM      Directory project_data
09/24/2003 12:00AM      Directory public_utilities
10/03/2006 12:00AM      Directory soils
09/24/2003 12:00AM      Directory topographic_images
02/05/2004 12:00AM      Directory transportation
09/24/2003 12:00AM      Directory wetlands
09/24/2003 12:00AM      Directory wildlife
09/24/2003 12:00AM      Directory zoning
10/04/2006 12:00AM      Directory SSURGO_Downloads

Geographic Information Systems

Rough lecture notes

Week 1

What do I do with a GIS?

You are now beginning the study of geographic information science. This discipline of study is centered around the fundamentals and applications of geographic information systems or GIS for short. 

So what is GIS? What can they do? To give you some idea, consider an example in natural resources management.  Assume that you have been given the following tasks for a particular region (ie. local government area, state, country, etc.):

 

bulletInventory available forest and mineral resources.
bulletObtain flora and fauna requirements.
bulletDetermine water availability and quality.
bulletExamine extent of disease (ie. dieback).
bulletWhich resources are protected or in short supply (ie. national heritage listing)?
bulletEvaluate how resources are currently being exploited.
bulletPredict how availability and quality of these resources will change in the next 10, 20 or even 100 years.
bulletAssess conflicts with environment, quality of life, populated areas, visual impact, etc.
bulletComply with local, regional and national regulations and legislation.

Quite a task, eh?  The more you think about it, the more complex it becomes.  Just imagine what you may need: lots (I mean lots!) of data, access to a range of departments and agencies, various software and hardware, many personnel, etc.  Well...it can be done - you guessed it - using GIS!

What is a geographic information system?


What is a Geographic Information System?

An information system applied to geographic data.

System:

A group of connected entities and activities which interact 
for a common purpose.

For a GIS, the "connected" refers to geography, and the "common
purpose" is managing or planning or decision-making.


Information system attributes (which also apply to GIS):
 

bulletdecision-oriented reporting
bulleteffective processing of data
bulleteffective management of data
bulletadequate flexibility
bulleta satisfying user environment

How do we formally define a GIS? No one definition exists since there are many different contexts in which GIS exists. A definition of GIS can be seen from a number of points of view.

The definition that we will use in this course takes into account the various components necessary for the successful establishment of any GIS:

bullettechnology (hardware and software)
bulletpeople
bulletdata


Geographic Information System:

"An organised collection of computer hardware, software, geographic data, and personnel designed to efficiently capture, store, update, manipulate, analyse, and display all forms of geographically referenced data."

 

GIS as an information system

GIS are one of many different types of information systems.  The traditional Management Information Systems and Decision Support Systems do not cater for spatial information. There are, however, spatial information systems that are not geographic, such as Computer Aided Design/Computer Aided Manufacturing (CAD/CAM) systems which do not handle a "geographic" component.

Other terms for GIS

 

 

Spatial information system
Land information system
Geo-information system
Geomatics
Natural resources information system
Geoscience information system
Spatial data analysis system
Multipurpose geographic data system
Spatial data handling system
Multipurpose cadastre
AM/FM - Automated mapping and facilities management
Land resources information system
Land-related information system
Planning information system
Environmental information system
Spatial data management system

Advantages of GIS


The advantages of GIS are many and relate to the fact that GIS is an integrating technology - one that brings together many different applications, data and users.  One word that can be used to describe the benefit of GIS is synergy.  In particular, the following can be sited as advantages of GIS:

bulletIntegrates spatial and other (aspatial) data across a diverse range of applications
bulletIdentifies connections between activities based on geographic proximity
bulletManipulate and display geographic knowledge
bulletProvides access to administrative records
bulletA tool for enhancing decision making
bulletIncreases ability to model science and management problems
bulletA catalyst to further development

Areas of application of GIS technology


The applications of GIS technology can be categorised into four broad areas:

Natural resources

bulletwildlife habitat
bulletwild and scenic rivers
bulletrecreation resources
bulletfloodplains
bulletwetlands
bulletagricultural lands
bulletaquifers
bulletforests
bulletminerals and exploration
bulletoil and gas
Land parcel-based
bulletzoning - urban and regional
bulletsubdivision planning and review
bulletenvironmental impact assessment
bulletwater quality management
bulletmaintenance of land ownership
bulletland valuation and taxation
bullettown planning schemes

 
Infrastructure

bullettransport route planning
bulletstreet address matching
bulletlocation analysis, site selection
bulletdisaster planning and evacuation
usage and planning of roads, sewer and water reticulation, drainage, telephone lines, gas and electricity, etc..

Socio-economic
bulletpopulation distribution and forecasting
bulletdemographic marketing and analysis
bulletmonitoring of patient health
bulletepidemiology
bulletpolice crime statistics and monitoring
bulletcensus information
public services and access

GIS-related disciplines

GIS have developed over time across a wide range of disciplines.  As a matter of fact, the whole foundational concept of GIS is multi-disciplinary. 

Disciplines involved:
 

Computer science
Remote sensing
Cartography
Statistics
Geodesy
Photogrammetry
Surveying
Geography
Geosciences - geology, geophysics, minerals and petroleum, etc.
Mathematics: geometry, graph theory
Operations Research
Civil Engineering
Environmental biology
Information systems
Urban and regional planning
etc.....

Many technical and conceptual developments within these areas have converged over time and have been integrated into what now is known as GIS. 

 

Trends in GIS

bulletHardware
bulletPlatform
bulletNetworks
bulletPeripherals
bulletSoftware
bulletOperating Systems
bulletGIS
bulletNew Application Areas
bullet2.5 to 4 dimensional stuff
bulletModelling
bulletDecision Support
bulletImage Processing

 

Rough lecture notes

Week 2

Trends in GIS (continued)

bulletData Issues
bulletSources
bulletError/Uncertainty
bulletData sharing
bulletData standards
bulletMetadata
bulletCommercial Software
bulletFull featured GIS
bulletDesktop GIS
bulletCAD
bulletDatabases
bulletImage Processing
bulletGraphics
bulletWeb-GIS

The Three Schools of GIS

bulletTechnician crowd
bulletApplication crowd
bulletComputer Science/Programming crowd
bulletThe software they are using:
bulletMacro/Proprietary software (SML, AML, Avenue)
bulletVisual Basic
bulletJava
bulletC++

General GIS references:  taken from:

bulletMichael F. Goodchild. (1997) What is Geographic Information Science?, NCGIA Core Curriculum in GIScience, http://www.ncgia.ucsb.edu/giscc/units/u002/u002.html, posted October 7, 1997.

Basic and practical introductions to GIS

John C. Antenucci and others (1991) Geographic Information Systems: A Guide to the Technology. New York : Van Nostrand Reinhold.

Tor Bernhardsen (1992) Geographic Information Systems. Arendal, Norway: Viak (but widely available in the US).

Keith C. Clarke (1997) Getting Started with Geographic Information Systems. Upper Saddle River, NJ: Prentice Hall.

Michael N. DeMers (1997) Fundamentals of Geographic Information Systems. New York: J. Wiley & Sons.

        all of these and many others are obtainable through online GIS 'bookstores':

bullet http://www.esri.com
bullethttp://www.geoplace.com

GIS magazines

bulletGIS World - http://www.geoplace.com
bulletGeo Info Systems - http://www.geoinfosystems.com

Web references

bulletsome cool sites that do GIS over the Web
bullethttp://www.mapquest.com
bullethttp://www.esri.com and try the live demos 
bulletsites of some major GIS software vendors
bullet http://www.esri.com
bullethttp://www.intergraph.com
bullethttp://www.autodesk.com
bulletsome other introductions to GIS
bulletUSGS GIS Tutorial - http://nsdi.usgs.gov/nsdi/pages/what_is_gis.html
bulletThe Geographer's Craft - http://www.utexas.edu/depts/grg/gcraft/notes/intro/intro.html
bulletNick Chrisman's "What is GIS?" - http://weber.u.washington.edu/~chrisman/G460/Lec02.html
bulletESRI's  About GIS - http://www.esri.com/library/gis/abtgis/what_gis.html
bulletThe Essential Guide to GIS -  http://giswww.kingston.ac.uk/ESGUIDE/start.html
bullet"What is GIS?" from Australia - http://www.dlsr.com.au/whatgis.htm

Solving real world problems

Using GIS to solve problems in the real world requires interaction between the real world, the GIS and the users.

 

The real world needs to be represented within a GIS.  The users perceived the real world in a manner related to their problem, and hence need to be able to communicate with the GIS in terms related to their problem (ie. data, functionality, etc.).

 

 

How do we represent the real world?

Geographic features in the real world can be represented in a number of ways as follows:

1. Analog map

bulletThe traditional analog map has been in use for centuries!.
bulletDivided into physical map sheets
bulletBased on the communication paradigm - emphasis is on visual communication

2.  Digital map

bulletMaps are stored in digital form on computers to create a cartographic database
bulletStill based on the "analog" map concept
bulletHas greatly enhanced the map-making process and the production of various types of maps.

3.  GIS

bulletA geographic database involves much more than a cartographic database (ie. much more than simple a map or maps)
bulletThe emphasis is on the structure and management of data and their relationships
bulletBased on the analytical paradigm - focus is on analysis
bulletThe concepts of GIS extend far beyond the map!

Abstraction and generalisation

 


The process for obtaining a representation of the real world follows the cartographic process for abstraction and generalisation.  The process involves the steps of selection, classification, simplification and symbolisation.

The process for obtaining a GIS representation must consider the purpose, content and detail of the database.  This is similar to the cartographic map-making process in which the purpose, content, cartographic scale and presentation must be considered in producing a map.

Steps of the generalisation process 

The steps of the process of abstraction and generalisation are described as follows:

 
bulletSelection. Involves decisions regarding the geographic space to be mapped, map scale, map coordinates and projection, data variables to be mapped, data gathering/sampling techniques.
bulletClassification. Process in which objects are placed in groups according to similar properties. This reduces the complexity and improves the organisation of a map.
bulletSimplification. Map features can be simplified by smoothing curves and straightening paths to eliminate unnecessary detail. For example, a straight line between two cities could indicate the connectivity between cities rather than the exact positional location of a road which may be irrelevant for a particular application.


 

 

bulletSymbolisation. A set of marks or symbols is used to represent real world phenomena on a map. Such symbolisation involves defining size, shape, pattern, and color for points, lines, and polygons (areas).

Data representation with GIS 

In many ways, GIS have retained the notion of the map and many map concepts are found back in GIS. However, the manner in which GIS handle and analyse data is very different from that for maps. This is despite the fact that much data input into GIS is derived from maps.

Within GIS, data is often structured in a layered fashion representing the way in which maps have traditionally been handled. Each layer, also known as a coverage, contains some specific data such as a theme (eg. roads, vegetation cover, soils, etc.), time period (eg. years 1970, 1980, 1990) or vertical slices (eg. ground floor, first floor, etc. of a building).


Geographic data includes both spatial data and descriptive (or attribute) data. Spatial data deals with location, shape and relationships among features. Attribute data deals with the characteristics of the features.

 

Essential GIS components

Every GIS must include: 
bulletdata
bulletfunctionality, and 
bulleta user interface.

The database is the heart of the GIS.  It must be structured so that the data can be accessed by functions initiated by users.  In the following sections, we will consider the structure of the data as well as the functions that operate on the data.

Structure of geographic data

The following chart illustrates the structure of geographic data.

 

The spatial component consists of locational information (ie. absolute or relative X,Y coordinates), geometry (ie. shape of point, line and polygon features [or raster cells)) and topology (ie. relationships between points, lines and polygons - adjacency, connectivity, and containment).  Attribute data can consist of both descriptive data and cartographic attributes (eg. line color and thickness, point symbol, etc.).  A third component is temporal data which is sometimes considered as a further dimension (eg. fourth dimension) but is often included as another attribute of the data. Never forget Metadata!

 

Types of GIS data

Two broad types of data can be identified:
  1. Continuous data - The information is a collection of spatial distributions and is referred to as continuous data.  Examples include altitude, rainfall, temperature, etc.
  2. Object-based data - The information is composed of identifiable entities and is referred to as discrete data.  Examples include roads, rivers, land parcels, etc.

GIS data models

The two types of data - field-based and object-based - are implemented in two geographic data models: 
  1. Raster - stores the space "around" the objects (features)
    bulletstores pixels (picture elements) in an image and cells in a grid
    bulletmost closely represents continuous data
  2. Vector - stores the objects "in" space
    bulletstores points, lines and polygons to represent the features
    bulletmost closely represents object-based data

Note however that either data model can be used to store field-based or object-based data.  Both models define a "discretisation" of the features (ie. grid cells or vector objects).  In other words, continuous features are represented discretely.

The type of data model used within a GIS will affect, not only the database, but also the functionality and the user interface.  We will explore the functions for each type of data model in the following sections

bullet
Raster vs. Vector
bulletLayer (Raster) view of the world
bulletObject (vector) view of the world
bulletGeneral comparison
bulletBasic issues
bulletCoordinate precision
bulletSpeed of processing
bulletMass storage requirements
bulletCharacteristics of phenomena
bulletSummary

 

 

Rough lecture notes

Week 3

Scale of measurement


The meaning or semantics of the data values stored in a geographic database depend on the scale of measurement chosen:

Ratio:

Interval:

bulletvalues are divisible and multiplicative (an absolute scale defined around zero (0))
bulleteg. rainfall of Region 1 is twice that of Region 2
bulletvalues are additive and subtractive (on a relative scale)
bulleteg. Region 1 is 10 degress warmer than Region 2

Ordinal:

bulletvalues establish order (ranking) only
bulleteg. Region A is most suitable (eg. value of 1) and Region B least suitable (eg. value of 5)


Nominal:

bulletnumbers establish identity only
bulleteg. lot numbers, postal code zones, etc.

Note that values progressing from ratio to interval to ordinal to nominal are decreasing in the amount of information contained.

Different scales of measurement can be used for the same phenomenon. 

Consider, for example, data representing petrol stations.

Note how the scale of measurement cannnot be determined from observing the values alone.

Ratio: 72.9, 68.5, 67.9, 61.3,...
(petrol prices)

Interval: 25, 29, 30, 27,...
(average  temperature of petrol)

Ordinal: 1, 2, 3, 4,...
(ranked by decreasing price of petrol)

Nominal: 1, 2, 3, 4,...
(1=BP, 2=Ampol, 3=Caltex, etc...)

 

Grid GIS

  A grid GIS is based on the raster data model. The foundational unit of storage is the grid cell. Square grid cells are most commonly used to store grid data. 
  Each cell specifies the type or value of an attribute.  Only one value is stored per grid cell.  Note that if no data is recorded for that grid cell, then a value must still be stored - usually a zero (0) or a special "no data" symbol.  A group of contiguous cells having an identical value is referred to as a region.
  Data is arranged in a matrix and located by coordinates which relate to the row and column numbers. Generally speaking, grid cells (matrices) are easy to store, manipulate and display.

 

Grid database and layering

Creating a grid database essentially involves overlaying an empty grid on the original data, reading off the data values for each grid cell and storing them in a matrix. 

Because only one value is stored per grid cell, how do we store multiple values for a specific location? We use layering

Data is stored using the layered concept - a theme or closely-related group of data items are stored in one layer. Hence, a grid database may consist of a number of layers, each representing some theme of information (eg. soils, roads, drill holes, etc.).

Each cell can contain one, and only one, data value for a given layer. Therefore, if multiple attribute are found for a particular theme of data (eg. soil type and pH value for the same soil area), then these attributes must be separated into two or more layers (eg. one for soil types, the other for pH values). 

Grid cell data

 

The types of features that can be represented within a grid cell include the following:
  • punctual data: 0-dimensional data
  • lineal data: 1-D
  • areal data: 2-D
  • surficial data: 3-D (or more accurately 2.5-D)
  • How do you identify the cell location?  (depends on the software)

    bulletCenter of the cell?
    bulletTop-right-hand corner?
    bulletBottom left-hand corner?

    What value do we give to the cell?

    bullet
    Dominant feature of cell?
    bullet
    Most important feature of cell?
    bullet
    Mean value of features within cell?

    Grid resolution

    The resolution of a grid database is dependent on the grid cell size. Grid cell size IS EXTREMELY IMPORTANT and must be chosen with care when the database is designed. Once set, it is difficult or even impossible to change. Changing the resolution affects a whole range of properties including: 
    bullet
    classification
    area
    perimeter
    accuracy
    etc...

     

    Cell values

     

     

    A number of types of operations are available for grid data and may involve one or more layers resulting in a new layer being formed. The following list indicates some of the basic functions of a grid GIS.

    bulletinteger - eg. soil classes 1, 2, 3,..., or number of farms
    bulletreal (decimal) values - eg. elevation, average persons per home
    bulletnon-numeric or alpha-numeric values - eg. vegetation classes a, b, c,...

     

     

     

    Week 4

     

    Week 4 Lectures

    Vector functionality

    Because vector data consist of three different data primitives (points, lines, polygons) instead of one as for grid (the grid cell) and because all the components (location, topology, attributes) need to be maintained, vector operations are more complex than raster operations in general. Both the spatial data and the attribute data must be handled. Further the link between spatial and attribute data must be maintained. 

    Vector operations include: 
     

    Display and query
    bulletof spatial and attribute data, both individually and in combination

    Data generalisation and abstraction

    bulletbuilding desired points, lines, areas and relationships from input data

    Data manipulation

    bulletmanipulate data coordinates to provide georeferencing, remove distortion, etc.
    Measurement
    bulletmeasure distance, shape, volume, etc.

    Topological overlay

    bulletoverlay points, lines and polygons

    Buffering

    bulletbuffer points, lines or polygons to produce new polygons

    Data generalisation and abstraction

    Generalisation and abstraction may involve:
    bulletmatching data across map sheet edges
    bulletthinning out coordinates (vertices) to simplify lines and polygon boundaries
    bulletcalculation of centroids and label points in polygons
    bulletautomatic contouring
    bulletproximal mapping - finding areas of proximity
    bulletvector/raster conversion
    Data generalisation and abstraction also may involve:
    bulletreclassification of points, lines and areas
    bulletdissolving polygons and dropping lines between them - note that the attributes of such polygons must be merged

    The spatial overlay of two coverages results in a new coverage which is subjected to planar enforcement.  In an overlay operation, both the spatial and the attribute data must be updated to reflect the new geometry/topology/attributes.

    A number of different types of spatial overlay exist:

    bulletpoint-in-polygon
    bulletline-on-polygon
    bulletpolygon overlay (polygon-on-polygon

    Buffering


    Buffering must cater for:
    bulletpoint, line and polygon-based features
    bulletdifferent buffer shapes
    bulletvariable size buffers
    bulletinterior/exterior buffers (for polygons)

    The result of all buffer operations is a polygon coverage which must have appropriate attributes assigned.  Each polygon must have an attribute that identifies whether or not  it is a polygon inside the buffer or outside the buffer.
     



     

     

     

    Describing a location on the Earth's surface

    bulletProjection (assumes a spherical earth). ONLY a mathematical method for drawing a 3D object (the earth) on a 2D surface (map, computer screen).
    bulletDatum - a modification on a sphere - a spheroid. In other words, a mathematical representation of the earth's shape
    bulletGeoid - even more detail than a spheroid. It's an irregular surface.
    bulletCoordinates - locations. Most common include lat/long (geographic), UTM, and State Plane.

    More on Datums...... First in common use in the US was Clarke1866. Following in chronological order are NAD27, GRS80, NAD83, and WGS84. Everything from 1980 on are based upon the center of the earth. Those earlier are based on a point on the surface of the earth. To give a little info on the significance of a datum, there is up to a 300 meter difference in x,y coordinates between the NAD27 and NAD83 datums.

    More on the Geoid - the definition of the shape of the earth is the field of Geodesy. In theory, the geoid runs through sea level and is a representation of the gravity field of the earth. Far more accurate than mere datum measurements.

     

    Displaying Map Projections - A map projection is a mathamatical method for projection the surface of a globe onto a sheet of paper (3d - 2d)

    Only a few (maybe about 20) are in actual use, although there are hundreds out there.

    distortion:

    bulletthere is always distortion (period)
    bulletthere are different types of distortion
    bulletthere are different degrees of distortion dependent on where you are on the map or which type of projection that you select.
    Map Classification

    Describing the different projections (2 methods)

    1) by geometric construction

    bulletconic: projected onto a cone
    bulletcylindrical -- projected onto a cylinder
    bulletazimuthal - projected onto a single surface
    bulletmodified by aspect
    bulletequatorial
    bulletpolar
    bulletoblique
    bulletmodified by case
    bullettangent cone touches earth along one line -- no distortion along this line
    bulletsecant two lines -- have two lines along which there is no distortion
    bulletPolyconic Do a number of cones and sort of stack them up

    2) by Preserved properties:
     

    bulletArea: correct relative size (equal area or equivalent projections) cyl. equal area, sinusoidal, mollweide, eckert IV
    bulletangle: correct shapes. Note that area and angle are mutually exclusive. (conformal Projections). Mercator, lambert conformal conic
    bulletDistance: distances between points are correct (equidistant). Stereographic, gnomonic
    bulletAzimuth: Great circles are straight lines. Often viewed with the pole at the center.
    bulletCompromise Nothing is preserved, but nothing is super distorted. Robinson

    Note that both distance and azimuth preserved projections only apply to or from the center of the map projection. Also, they usually only show 1 hemisphere or half the earth. gnomonic
     

    These can be modified by interruptions

     

     

    We discussed the URISA Ethics in GIS page. It can be found at http://www.urisa.org/ethics/code_of_ethics.htm

     

     

    bullet
    History of GIS
    bulletHistoric use of Multi-theme maps
    bulletSetting the stage for computerised GIS
    bulletCanadian GIS
    bulletHarvard Labs
    bulletUS Census Bureau
    bulletESRI
    bulletTie in with previous trends in GIS lecture

     Week 5 Lectures

    Data Quality Measurement and Assessment

    (Data Quality: this section modified from the NCGIA core curriculum in GIScience:)
    Written by Howard Veregin, Department of Geography,University of Minnesota, Room 414267 19th Avenue South, Minneapolis, MN 55455, USA veregin@atlas.socsci.umn.edu

    This section was edited by Gary Hunter, Department of Geomatics, University of Melbourne, Australia.

    This unit is part of the NCGIA Core Curriculum in Geographic Information Science. These materials may be used for study, research, and education, but please credit the authors Howard Veregin, and the project, NCGIA Core Curriculum in GIScience. All commercial rights reserved. Copyright 1998 by Howard Veregin.

    1. Data Quality

    bulletWhat is quality?
    bulletQuality is commonly used to indicate the superiority of a manufactured good or to indicate a high degree of craftsmanship or artistry. We might define it as the degree of excellence in a product, service or performance.
    bulletIn manufacturing, quality is a desirable goal achieved through management and control of the production process (statistical quality control). (Redman, 1992)
    bulletMany of the same issues apply to the quality of databases, since a database is the result of a production process, and the reliability of the process imparts value and utility to the database.
    bulletWhy is there a concern for DQ?
    bulletIncreased data production by the private sector, where there are no required quality standards. In contrast, production of data by national mapping agencies (e.g., US Geological Survey, British Ordnance Survey) has long been required to conform to national accuracy standards (i.e., mandated quality control).
    bulletIncreased use of GIS for decision support, such that the implications of using low-quality data are becoming more widespread (including the possibility of litigation if minimum standards of quality are not attained).
    bulletIncreased reliance on secondary data sources, due to the growth of the Internet, data translators and data transfer standards. Thus, poor-quality data is ever easier to get.
    bulletWho assesses DQ?

    Model 1. Minimum Quality Standards.

    bulletThis is a form of quality control where DQ assessment is the responsibility of the data producer. It is based on compliance testing strategies to identify databases that meet quality thresholds defined a priori.
    bulletAn example is NMAS, the National Map Accuracy Standards adopted by the US Geological Survey in 1946.
    bulletThis approach lacks flexibility; in some cases a particular test may be too lax while in others it may be too restrictive.

    Model 2. Metadata Standards.

    bulletThis model views error as inevitable and does not impose a minimum quality standard a priori. Instead, it is the consumer who is responsible for assessing fitness-for-use; the producer’s responsibility is documentation, i.e., “truth-in-labeling.”
    bulletAn example is SDTS, the Spatial Data Transfer Standard.
    bulletThis approach is flexible, but there is still no feedback from the consumer, i.e., there is a one-way information flow that inhibits the producer’s ability to correct mistakes.

    Model 3. Market Standards.

    bulletThis model uses a two-way information flow to obtain feedback from users on data quality problems. Consumer feedback is processed and analyzed to identify significant problems and prioritize repairs.
    bulletAn example is Microsoft’s Feedback Wizard, a software utility that lets users email reports of map errors.
    bulletThis model is useful in a market context in order to ensure that databases match users’ needs and expectations.

    A definition of geographical data includes the three dimensions of space, time and theme (where-when-what). These three dimensions are the basis for all geographical observation. (Berry, 1964; Sinton, 1978)

    Geographical data quality is likewise defined by space-time-theme. Data quality also contains several components such as accuracy, precision, consistency and completeness.

    2. Accuracy

    bulletAccuracy is the inverse of error. Many people equate accuracy with quality but in fact accuracy is just one component of quality.
    bulletDefinition of accuracy is based on the entity-attribute-value model.
    bulletEntities = real-world phenomena
    bulletAttribute = relevant property
    bulletValues = Quantitative/qualitative measurements
    bulletAn error is a discrepancy between the encoded and actual value of a particular attribute for a given entity. “Actual value” implies the existence of an objective, observable reality. However, reality may be:
    bulletUnobservable (e.g., historical data)
    bulletImpractical to observe (e.g., too costly)
    bulletPerceived rather than real (e.g., subjective entities such as “neighborhoods")
    bulletIn fact, it is not necessary to posit an objective reality in order to assess accuracy, since all geographical data are collected with the aid of a model that specifies -- implicitly or explicitly -- the required level of abstraction and generalization.
    bulletThis is the database “specification” and is closely related to the “terrain nominal” concept of perceived reality (Salgé, 1995).
    bulletThe specification serves as the standard against which accuracy is assessed. Thus the “actual” value is the value we would expect based on the specification (Brassel et al., 1995).
    bulletAccuracy is always a relative measure, since it is always measured relative to the specification.
    bulletTo judge fitness-for-use, one must judge the data relative to the specification, and also consider the limitations of the specification itself (CEN, 1995).

    2.1. Spatial Accuracy

    bulletSpatial accuracy is the accuracy of the spatial component of the database. The metrics used depend on the dimensionality of the entities under consideration.
    bullet For points, accuracy is defined in terms of the distance between the encoded location and “actual” location.
    bulletError can be defined in various dimensions: x, y, z, horizontal, vertical, total.
    bulletMetrics of error are extensions of classical statistical measures (mean error, RMSE or root mean squared error, inference tests, confidence limits, etc.) (American Society of Civil Engineers 1983; American Society of Photogrammetry 1985; Goodchild 1991a).
    bulletFor lines and areas, the situation is more complex. This is because error is a mixture of positional error (error in locating well-defined points along the line) and generalization error (error in the points selected to represent the line) (Goodchild 1991b).
    bulletThe epsilon band is usually used to define a zone of uncertainty around the encoded line, within which “actual” line exists with some probability.
    bulletHowever, there is little agreement (and little empirical work) on the shape of the band, both planimetrically and in cross-section (Chrisman, 1982; Blakemore, 1983; Honeycutt, 1986; Caspary and Scheuring, 1993).

    2.2. Temporal accuracy

    bulletTemporal accuracy is the agreement between the encoded and “actual” temporal coordinates for an entity.
    bulletTemporal coordinates are often only implicit in geographical data, e.g., a time stamp indicating that the entity was valid at some time. Often this is applied to the entire database (e.g., a map dated “1995”).
    bulletMore realistically, temporal coordinates are the temporal limits within which the entity is valid (e.g., Pothole Q54D-35-021 existed between 2/12/96 and 8/9/96).
    bulletTemporal accuracy is not the same as “database time”, which is the time the information was entered into the database.
    bulletTemporal accuracy is not the same as “currentness” (or up-to-dateness) which is actually an assessment of how well the database specification meets the needs of a particular application. A database can be temporal accurate but still out of date; historical applications depend on such data.

    2.3. Thematic Accuracy

    bulletThematic accuracy is the accuracy of the attribute values encoded in a database.
    bulletThe metrics used here depend on the measurement scale of the data:
    bulletQuantitative data (e.g., precipitation) can be treated like a z-coordinate (elevation) and assessed using metrics normally used for vertical error (such as the RMSE). See section 2.1.
    bulletQualitative data (e.g., land use/land cover) is normally assessed using a cross-tabulation of encoded and “actual” classes at sample of locations. This produces a classification error matrix (confusion matrix).
    bulletElement in row i, column j of the matrix is the number of sample locations assigned to class I but actually belonging to class j.
    bulletThe sum of the main diagonal divided by the number of samples is a simple measure of overall accuracy.
    bulletAn error of omission means a sample that has been omitted from its actual class. An error of commission means an error that is included in the wrong class. Ever error of omission is also an error of commission.
    bulletThere is a large body of research on this topic (e.g., van Genderen and Lock, 1977; Congalton et al., 1983; Aronoff, 1985; Rosenfield and Fitzpatrick-Lins, 1986

    3. Resolution (precision)

    bulletResolution (or precision) refers to the amount of detail that can be discerned in space, time or theme. Resolution is always finite because no measurement system is infinitely precise, and because databases are intentionally generalized to reduce detail (Veregin and Hargitai, 1995).
    bulletResolution is an aspect of the database specification that determines how useful a given database may be for a particular application. High resolution is not always better; low resolution may be desirable when one wishes to formulate general models.
    bulletResolution is linked with accuracy, since the level of resolution affects the database specification against which accuracy is assessed. Two databases with the same overall accuracy levels but different levels of resolution do not have the same quality; the database with the lower resolution has less demanding accuracy requirements. (For example, thematic accuracy will tend to be  higher for general land use/land cover classes like “urban” than for specific classes like “residential”.)

    3.1. Spatial Resolution

    bulletSpatial resolution is well-defined in the context of raster data were it refers to the linear dimension of a cell.
    bulletFor vector data resolution might be defined as the minimum mapping unit size. Sometimes mean polygon size is used instead, but this is erroneous since smaller polygons may be observable but just not present on the map.

    3.2. Temporal Resolution

    bulletTemporal resolution is length (temporal duration) of the sampling interval.
    bulletFor example, the shorter the shutter speed of a camera, the higher the temporal resolution (other factors being equal).
    bulletTemporal resolution affects the minimum duration of an event that is discernible. If the duration is less than the resolution, the event is invisible or at best leaves a smudge (like carriages on nineteenth-century daguerreotypes).
    bulletTemporal resolution is distinct from temporal sampling rate.
    bulletResolution is the length of the sampling interval, while sampling rate is the frequency of sampling over time (e.g., once a day, once a week, etc.).
    bulletFor example, a motion picture camera might have a temporal resolution of 1/1000 second (i.e., the shutter speed to capture a single frame ), and sampling rate of 24 frames per second.

    3.3. Thematic Resolution

    bulletThematic resolution refers to the precision of the measurements or categories for a particular theme.
    bulletFor categorical data, resolution is the fineness of category definitions (e.g., “urban” vs. “residential” and “commercial”).
    bulletFor quantitative data, thematic resolution is analogous to spatial resolution in the z-dimension (i.e., the degree to which small differences in the quantitative attribute can be discerned). [FIGURE 6]

    4. Consistency

    bulletConsistency refers to the absence of apparent contradictions in a database. Consistency is a measure of the internal validity of a database, and is assessed using information that is contained within the database.
    bulletConsistency can be defined with reference to the three dimensions of geographical data.
    bulletSpatial consistency includes topological consistency, or conformance to topological rules, e.g., all one-dimensional objects must intersect at a zero-dimensional object (Kainz, 1995).
    bulletTemporal consistency is related to temporal topology, e.g., the constraint that only one event can occur at a given location at a given time (Langran, 1992).
    bulletThematic consistency refers to a lack of contradictions in redundant thematic attributes. For example, attribute values for population, area, and population density must agree for all entities.

    5. Completeness

    bulletCompleteness refers to a lack of errors of omission in a database. It is assessed relative to the database specification, which defines the desired degree of generalization and abstraction (selective omission).
    bulletThere are two kinds of completeness (Brassel et al., 1995)
    bullet“Data completeness” is a measurable error of omission observed between the database and the specification. Even highly generalized databases can be “data complete” if they contain all of the objects described in the specification.
    bullet“Model completeness” refers to the agreement between the database specification and the “abstract universe” that is required for a particular database application. A database is “model complete” if its specification is appropriate for a given application.
    bulletIncompleteness can be measured in space, time or theme . Consider a database of buildings in Minnesota that have been placed on the National Register of Historic Places as of the end of 1995.
    bulletSpatial incompleteness: The list contains only buildings in Hennepin County (one county in Minnesota, rather than all of Minnesota).
    bulletTemporal incompleteness: The list contains only buildings placed on the Register by June 30, 1995.
    bulletThematic incompleteness: The list contains only residential buildings.
    bulletErrors of commission can also be assessed. These errors can lead to “over-completeness”.
    bulletErrors of commission in space, time and theme for the previous example: The list also contains buildings in Wisconsin; the list contains buildings added to the list in 1996; the list contains historic districts as well as buildings.

    6. Summary of Important Points

    bulletData quality is the degree of excellence in a database. Quality is assessed relative to the database specification, which defines the desired level of generalization and abstraction. The quality of this specification, and its appropriateness for particular applications, can also be assessed.
    bulletQuality assessment and reporting is based on minimum quality standards (compliance testing or quality control), metadata standards (truth-in-labeling and fitness-for-use), or market standards (feedback from users).
    bulletData quality is contains several components, including accuracy, precision, consistency and completeness. Each component can be assessed in space, time and theme (the three basic dimensions of geographical data).
    bulletVarious assessment methods can be used for each component/dimension combination. Some methods are well-developed and others are not.

    7. References and Bibliography

    bulletAmerican Society of Civil Engineers (Committee on Cartographic Surveying, Surveying and Mapping Division) 1983 Map uses, scales and accuracies for engineering and associated purposes. New York: American Society of Civil Engineers.
    bulletAmerican Society of Photogrammetry (Committee for Specifications and Standards, Professional Practice Division) 1985 Accuracy specification for large-scale line maps. Photogrammetric Engineering and Remote Sensing 51: 195-199.
    bulletAronoff S 1985 The minimum accuracy value as an index of classification accuracy. Photogrammetric Engineering and Remote Sensing 51: 99-111.
    bulletBeard M K 1989 Use error: The neglected error component. Proceedings, Auto Carto 9; 808-817.
    bulletBerry B 1964 Approaches to regional analysis: A synthesis. Annals, Association of American Geographers 54: 2-11.
    bulletBlakemore M 1983 Generalisation and error in spatial data bases. Cartographica 21: 131-139.
    bulletBrassel K, Bucher F, Stephan E-M and Vckovski A 1995 Completeness. In Guptill S C and Morrison J L (eds) Elements of spatial data quality. Oxford, Elsevier: 81-108.
    bulletBurrough P A 1986 Principles of geographical information systems for land resources assessment. Oxford, Clarendon.
    bulletCampbell W G and Mortenson D C 1989 Ensuring the quality of geographic information system data. Photogrammetric Engineering and Remote Sensing 55: 1613-1618.
    bulletCaspary W and Scheuring R 1993 Positional accuracy in spatial databases. Computers, Environment and Urban Systems 17: 103-110.
    bulletChrisman N R 1982 A theory of cartographic error and its measurement in digital data bases. Proceedings, Auto Carto 5: 159-168.
    bulletChrisman N R 1991 The error component in spatial data. In Maguire D J, Goodchild M F and Rhind D W (eds) Geographical information systems. New York, Wiley: 165-174.
    bulletComité Européen de Normalisation (CEN) 1995 Geographic Information - Data Description - Quality (Draft). Brussels: CEN Central Secretariat.
    bulletCongalton R G, Oderwald R G and Mead R A 1983 Assessing Landsat classification accuracy using discrete multivariate analysis statistical techniques. Photogrammetric Engineering and Remote Sensing 49: 1671-1678.
    bulletDuecker G T and Platt J T 1990 The role of automated data checks in the quality assurance of GIS data bases. GIS/LIS '90: 264-271.
    bulletFederal Geographic Data Committee (FGDC) 1994 Content Standards for Digital Geospatial Metadata (June 8). Washington DC: Federal Geographic Data Committee.
    bulletFegeas R G, Cascio J L and Lazar R A 1992 An overview of FIPS 173, The Spatial Data Transfer Standard. Cartography and Geographic Information Systems 19: 278-93.
    bulletGoodchild M F 1988a Stepping over the line: Technological constraints and the new cartography. The American Cartographer 15: 311-319.
    bulletGoodchild M F 1988b The issue of accuracy in global databases. In Mounsey H (ed) Building Databases for Global Science. London, Taylor and Francis: 31-48.
    bulletGoodchild M F 1991a Issues of quality and uncertainty In Muller J C (ed) Advances in cartography. London, Elsevier: 113-139.
    bulletGoodchild M F 1991b Keynote address. Proceedings, Symposium on Spatial Database Accuracy: 1-16.
    bulletGoodchild M F 1995 Sharing imperfect data. In Onsrud H J and Rushton G (eds) Sharing geographic information. New Brunswick NJ, Center for Urban Policy Research: 413-425.
    bulletGuptill S C 1993 Describing spatial data quality. Proceedings, 16th International Cartographic Conference: 552-560.
    bulletHoneycutt D M 1986 Epsilon, generalization and probability in spatial data bases. Unpublished manuscript.
    bulletKainz W 1995 Logical consistency. In Guptill S C and Morrison J L (eds) Elements of spatial data quality. Oxford, Elsevier: 109-137.
    bulletLangran G 1992 Time in geographic information systems. London: Taylor and Francis.
    bulletLanter D 1991 Design of a lineage-based meta-database for GIS. Cartography and Geographic Information Systems 18(4): 255-261.
    bulletLanter D and Veregin H 1992 A research paradigm for propagating error in layer-based GIS. Photogrammetric Engineering and Remote Sensing 58: 526-533.
    bulletMoellering H (ed) 1991 Spatial database transfer standards: Current international status. London: Elsevier.
    bulletParkes D N and Thrift N J 1980 Times, spaces, and places: A chronogeographic perspective. New York: Wiley.
    bulletRedman T C 1992 Data quality. New York: Bantam.
    bulletRosenfield G H and Fitzpatrick-Lins K 1986 A coefficient of agreement as a measure of thematic classification accuracy. Photogrammetric Engineering and Remote Sensing 52: 223-227.
    bulletSalgé F 1995 Semantic accuracy. In Guptill S C and Morrison J L (eds) Elements of spatial data quality. Oxford, Elsevier: 139-151.
    bulletSDTS 1992 The Spatial Data Transfer Standard (FIPS-173).
    bulletSinton D 1978 The inherent structure of information as a constraint in analysis. In Dutton G (ed) Harvard papers on geographic information systems. Reading MA, Addison-Wesley.
    bulletStearns F 1968 A method for estimating the quantitative reliability of isoline maps. Annals, Association of American Geographers 58: 590-600.
    bulletThapa K and Bossler J 1992 Accuracy of spatial data used in geographic information systems. Photogrammetric Engineering and Remote Sensing 58(6): 835-841.
    bulletTychon G G and Johnson M R 1990 GIS data exchange: Standards and formats. In Heit M and Shortreid A (eds) GIS applications in natural resources. Boulder CO, GIS World Inc: 155-161.
    bulletvan Genderen J L and Lock B F 1977 Testing land-use map accuracy. Photogrammetric Engineering and Remote Sensing 43: 1135-1137.
    bulletVeregin H and Hargitai P 1995 An evaluation matrix for geographical data quality. In Guptill S C and Morrison J L (eds) Elements of spatial data quality. Oxford: Elsevier 167-188.

     

     

    Week 6 Lectures

     

    Data Types and the NSDI

    Data Input

    bulletthe number 1 bottleneck in GIS applications.  Often 80%+ of the project cost
    bulletNeed to automate data input process, but that causes problems.  Common problems/considerations include:
    bulletmap projections, scale, coordinate system
    bulletraster/vector conversions
    bulletpaper distortions
    bulleterror corrections
    bulletcontrol points
    bulletdiscrepancies across map sheets
    bulletuser fatigue and boredom
    bulletModes of data input include the keyboard, scanners, direct conversion from other digital data, and voice input

    Socio-Economic data

    bulletGenerally include: (may be aggregate or disaggregate)
    bulletdemographics
    bullethousing
    bulletmigration
    bullettransportation
    bulleteconomics
    bulletretail
    bulletSources of Socio-economic data
    bulletfield surveys
    bulletgovernment statistics
    bulletgovernment administrative records
    bulletother stuff, including marketing info, mailing lists, etc.
    bulletIssues in using socio-economic data
    bulletcost
    bulletdocumentation
    bulletdata quality
    bulletdata conversion
    bulletaggregation
    bulletaccuracy of location

    Environmental and Natural Resource Data

    bulletPurposes of resource-based GIS are primarily
    bulletInventory tool
    bulletbetter manage the marketing of the resource
    bulletprotect the resource from improper development
    bulletmodel the complex interactions between phenomenae so that predictions can be made
    bulletContents of Environmental databases include not only natural information, but info regarding any human impacts on the area.
    bulletGeneral notes on resource data
    bulletComparatively static
    bulletgenerally low spatial resolution
    bullettypically raster
    bulletOften uses satellite imagery.
    bulletWe then went over an example of all the data that might be required for siting a waste incinerator.

    National Spatial Data Infrastructure (NSDI)

    bulletConcept not new
    bulletmore and more people requiring spatial data
    bulletto maximise benefits, dataset must be accessable
    bulletbenefits: decisions only as good as the information upon which they are based.
    bulletprovide government with reliable, consistent, timely data
    bulletProvide community access to data
    bulletminimise waste & duplication of datasets
    bulletcommon standards allow maximum integration of datasets
    bulletCurrent shortcomings and obstacles
    bulletknowledge of data availability
    bulletmetadata
    bulletaccess impeded or not possible
    bulletinconsistent policies regarding data use
    bulletfundamental datasets incomplete/out of date
    bulletlack of data consistency across nation
    bulletpricing
    bulletduplication among agencies/agency competition
    bulletnational security considerations
    bulletenforcing standards
    bulletlow levels of customer focus
    bulletpoor/inconsistent funding
    bullet 

     

    Process of data capture

    The process of data capture must involve both spatial and attribute data.  If they are considered separately, then more effort usually is required to integrate both components within a GIS database.  It is better to consider both components in planning the capture process to ensure that appropriate IDs and tags are assigned to enable attributes to be correctly attached to the appropriate spatial units.
      Essentially three general operations are required for the capture of geographic data:

    1.  
       
       
       
       
       
       
       
      Entering the spatial data
    2.  
      Entering the non-spatial (attribute) data
    3. Linking the spatial data to the non-spatial data

    Manual data capture

    Manual data capture includes the entry of data from field notes and maps, data entry sheets, data recordings sheets and graphs, etc.  The process of manual entry can be sped up through by-passing the intermediate paper records and directly recording data through laptop computers in the field, electronic surveying instruments, automated GPS position recording, etc.

    Most attribute data is entered via a keyboard into a database.  Often, much attribute data exists prior to the GIS being built.  In terms of volume, usually the vast majority of a GIS database consists of attribute data.  The entry of attribute data and its pitfalls is well known and hence we will concentrate primarily on the spatial component of geographic data.

    The process of manually entering spatial data is dependent on whether a grid-based or a vector-based database is to be generated. 
     

    Vector
    bulletType in the coordinates of points, lines, areas
    bulletCoordinates can be two dimensional (X and Y coordinates) or three dimensional (X, Y and Z)
    bulletUsually an integer ID must be attached to each coordinate (used to attach attributes)
    Grid
    bulletDetermine the grid cell size
    bulletOverlay the grid on the data
    bulletDetermine the grid cell values
    bulletType in the values

    Global positioning system


    The Global Positioning System, more commonly referred to as GPS, is used to compute positions in 2 or 3 dimensional space from signals obtained from a series of NAVSTAR satellites. It is owned by the U.S. Department of Defence.
     

    Satellite details:
    bullet21 satellites (and 3 spares) provide continuous coverage of the earth
    bullet6 orbits at 20,200 km
    bulletone rotation every 12 hours
    bullet6 satellites are in view at all times at a given location on the earth

    The Russian equivalent is GLONASS - Global Navigation Satellite System.

    GPS Accuracies

    bullet"standard" uncorrected, x and y to 15 meters. Z is a bit worse
    bulletDifferential GPS - correct using base station data. x and y to 1-5 meters
    bulletRTK - Real time kinematic - accuracies to sub-centimeter.

     

     
    Digitising

    A digitiser consists of a tablet with an electronic mesh and a cursor (also known as puck or mouse). A map sheet is mounted on the digitiser and the cursor is used to enter required points or trace the desired lines on the map.  Lines are digitised as a series of points which can then be processed by the software (eg. editing, conversion to raster format, etc.).

    Digitising is largely a manual process involving the concentration and skillfulness of a digitising operator.  It is both a time-consuming and boring task. 

    Accuracy of digitising

    The accuracy of digitising is limited by a number of factors related to the data source (map), equipment being used and human factors.
    bulletSource map - may be:

    - inaccurate in the first place because of missing or misplaced data
    - poor in resolution (ie. how do you digitise on a "thick" road line?)
    - stretched (paper stretch) due to aging and storage conditions

    bulletDigitiser resolution:

    - must exceed the required resolution
    - is usually not a problem for most digitisers

    bulletDigitising process - dependent on factors such as:

    - positioning the cursor and generating points on the line (human factors)
    - representing lines in digital form
    - operator skillfulness and fatigue

    During the digitising process, a number of errors may occur. Some of these errors can be prevented or alleviated during the digitising process, while others can be correctly/edited in a following step (either automated or manually).  Such (potential - not all are necessarily incorrect) errors are discussed in the following sections and include:

    bulletDangle nodes
    bulletPseudo nodes
    bulletMultiple or missing labels
    bulletSliver polygons
    bulletetc. 

    Data input errors (Dangles)

     
    Dangles, also referred to as Dangling Nodes, are identified as nodes with only one line (arc) attached.  They include both overshoots (extended too far past another line) and undershoots (not quite reaching another line).  In such a case, the line was intended to connect up with another line and hence overshoots and undershoots are errors.  

    However, dangles are also identified with lines that are not connected to another line at one end, such as cul-de-sac's or dead-end streets in a road network. Obviously, such dangles are NOT errors. 

    Correcting dangles (that are errors!) can be partially automated by setting a dangle length tolerance value which specifies the maximum distance within which nodes will automatically be "snapped" to a line. Any dangles falling outside this tolerance level will have to be corrected maually in the editing process.

    Data input errors (Pseudo nodes)

     


    Pseudo nodes are identified by nodes that have only two arcs attached to it.  This often occurs two lines are digitised and are connected together at one end, but the connection point does not occur at a junction with other lines (ie. node breaks up a long and complex line such as a contour line).  If the lines are intended to represent the same entity with the same attributes, then the pseudo node is an error and should be removed.  In other cases, pseudo nodes do not cause any problems, but can be removed to simplify the storage and provide a "cleaner" representation.

    In certain cases, pseudo nodes are REQUIRED and therefore certainly are not errors.  The most obvious example is an island polygon where the bounding line begins and ends at the SAME node.  

    Removing a pseudo node involves joining the two (different!) arcs on either side of it into one.  This usually means ensuring that the attributes of each arc are the same so that there is no problem in merging them.  It may also be possible to specify a "snapping distance" tolerance value which will cause all nodes within a specified distance to "snap" together, thereby eliminating some pseudo nodes.

    Data input errors (Too many label points)

     


    Label points (often referred to as a "centroid") are associated with polygons and are used to attach the attributes for the polygon.  The labels have to be placed in a polygon (either manually or automatically) with the correct attribute (or identification number) attached.  Errors may occur when polygons have more than one label or too many labels.   

    The correction of label errors requires either the removal, editing or addition of labels.  It should be noted that some label errors may be due to existing dangle errors where two adjacent polygons appear as ONE polygon simply because of an undershoot (on the common arc of the two polygon boundaries).  In such a case, the undershoot should be corrected.

    Data input errors (Sliver polygons)

    Sliver polygons
    , also referred to as
    Spurious polygons
    , are polygons that result when two lines are digitised twice along the same boundary and don't exactly coincide (because of digitising inaccuracies - it is impossible to digitise two lines that coincide exactly!).  This can occur as a result of digitising the same line twice or overlaying two layers (containing the same line which has been captured separately on two different occasions).

    The solution is to either:

    bulletbe more careful in digitising (enter line once!),
    bullettry to use existing (already digitised!) lines for boudaries that are common to two or more features (ie. soil types and river boundaries, vegetation and lake boundaries, etc.), or
    bulleteliminate all polygons with an area less than a specified tolerance value

    Data input errors (Other)

    Other data input errors are also possible.  Some can be identified and corrected (either automatically or manually) and others are very difficult, if not impossible, to detect.

    Such errors may include:

    Missing arcs

    Arcs are missing.  They may be detected by studying the original map (if they exist on it), otherwise they may be impossible to identify.  Any missing arcs detected must be captured and integrated into the appropriate dataset.
     

    Too many vertices

     
    Having too many vertices within an arc may increase storage space and cause processing problems.  Vertices that are not required (ie. multiple vertices on a straight line) can be eliminated and vertices that are too close together can be thinned out by specifying a "weed" tolerance value. 
     

    Weird polygons

     

    Wierd polygons are tiny "knots" usually caused by digitising errors where lines are accidentally crossed.  In the "cleaning" process, the intersection points are identified resulting in additional nodes and lines (and the weird polygon).  They can be removed by deleting the offending line AND node.

    Tolerance values


     
     
     
     
     
     
     
     
    For the digitising process in a GIS a number of tolerance values can usually be specified to  prevent some errors from occurring:

    Weed distance - determines minimum distance allowed between two points of an arc being digitised 

    Snap distance - determines maximum distance between two nodes which would cause the nodes to be "snapped" into one node for the current arc being digitised

    Dangle length
    - the maximum length of any dangle which would cause that arc to be removed 
    Fuzzy tolerance
    - the minimum distance allowed between any two points (all other points are removed or snapped together)

    Most GIS provide a range of tolerance values that can be either preset or specified in the editing operation.  For example, weed and snap distances can be preset when digitising using the ArcEdit module of ArcInfo.  In a subsequent process of "cleaning" lines and line junctions, and "building" topology, ArcInfo provides the option of setting dangle lengths and fuzzy tolerances.  Default values are used if no options are specified.

    Data Capture - Scanning

    Scanners can be either raster or vector (rare). Vectorization of the scanned images is often necessary.

    Another option is to scan an image, bring it into a GIS, and use on-screen digitizing to extract appropriate information from the scanned image.

    Data Capture - Aerial Photography

    Photos taken from an airplane.

    Good because of their high resolution

    Inconvenient because photos of the date of interest are often not available (flying your own photos is costly) and because complex photogrammetric methods are required to eliminate distortion (orthorectification).

    Data Capture - Satellite Imagery

    Raster imagery collected by a satellite orbitting the earth. Often multiple bands (or layers) of data - depends on how many parts of the electromagnetic spectrum are sampled.

    Resolutions vary between 1 meter and 1.1 kilometers - something for almost any application.

    Revisit times range from 2x per day to about every 15-20 days.

    Radar imagery - good for examining topography, can see through clouds. If long-wavelength and in areas of dry soils, the radar will reflect off subsurface features.

    Hyperspectral - thousands of bands. Too new to be significant at this time - but it will be important in the future.

    Raster imagery collected by a satellite orbitting the earth. Often multiple bands (or layers) of data - depends on how many parts of the electromagnetic spectrum are sampled.

    Resolutions vary between 1 meter and 1.1 kilometers - something for almost any application.

    Revisit times range from 2x per day to about every 15-20 days.

    Radar imagery - good for examining topography, can see through clouds. If long-wavelength and in areas of dry soils, the radar will reflect off subsurface features.

    Hyperspectral - thousands of bands. Too new to be significant at this time - but it will be important in the future.

    Important sensor systems

    bullet
    Landsat MSS
    bullet
    Landsat TM
    bullet
    SPOT
    bullet
    AVHRR
    bullet
    Radarsat
    bullet
    Ikonos

    Capturing attributes of spatial data - usually typed into the computer as columns in a spreadsheet (or database). For example, some attributes linked to a road segment might be: width, construction, speed limit, traffic volume, last date repaved, etc.

     

    Week 7 Lectures

     

     

    Network analysis

    A network is a system of connected linear features through which resources flow. Some examples of features and resources are as follows:
     

    Linear Feature

    Resource

    streets vehicles
    pipes water, sewage
    power lines electricity
    railroad trains
    water channels water (drainage)
    telephone lines phone calls

     

     

     

    Elements of a network

    Links

    Links are the conduits for movement.

    Attributes:

    - two way impedances such as time or rate of flow
    - demand such as students, customers, water, electricity, etc.

    Barriers

    Barriers prevent movement between links.

    Turns

    Turns indicate all possible turns at an intersection of links.

    Attributes:

    - impedance such as turning time or turning flow rate
    - restrictions such as no left turn

    Centers

    Centers are locations which receive or distribute resources; for example, schools, fire stations, and reservoirs.

    Attributes:

    - resource capacity such as student enrolment, parking spaces, and water volume
    - impedance limit such as maximum distance or time between a center and a link

    Stops

    Stops are locations on a route to pick up or drop off resources; for example bus stops, newspaper dropoff points, warehouses

    Attributes:

    - demand for resources to be transported along the links, such as students, products, commuters, etc.
     

    (adapted from ESRI documentation on networking)

     

    Example of attributes used in network analysis

    These network elements contain both spatial and attribute components which must be represented and analysed using network-specific operations which simulate network characteristics and functions. An example of attribute data included for a street network might be:
    traffic speed

     
     
     
     
     
     
     
    travel time

     
     
     
     
     
     
     
    distance

     
     
     
     
     
     
     
    intersection conditions

     
     
     
     
     
     
     
    type of intersection control (eg. traffic lights, stop sign)

     
     
     
     
     
     
     
    time-of-day

     
     
     
     
     
     
     
    road construction

     
     
     
     
     
     
    stop-over (parcel delivery)

     
     
     
     
     
     
    number of turn-offs

     
     
     
     
     
     
    etc...

    Main network functions

    Routing
    - route resources along an optimal path
    eg. from fire station to fire

     
     
     
     
     
     
     
    from current location to scene of accident

     
     
     
     
     
     
     
    from warehouse to customers

     
     
     
     
     
     
     
     -to define the paths the following information is needed:
    origin

     
     
     
     
     
     
     
    pass-thru points

     
     
     
     
     
     
     
    stops

     
     
     
     
     
     
     
    destination
    Resource Allocation
    - allocate resources along linear features to or from centers
    eg. students to schools

     
     
     
     
     
     
     
    voters to polling booths

     
     
     
     
     
     
     
    water from reservoirs

     
     
     
     
     
     
     
     - the centers must be defined

    - then the resources are allocated to the centers based on such criteria as:

    distance to/from center

     
     
     
     
     
     
     
    time taken to get to/from center

     
     
     
     
     
     
     
    capacity of center
    Address Matching
    - match addresses to locations on streets (and which side of street)
    eg. locate customers

     
     
     
     
     
     
     
    site facilities

     
     
     
     
     
     
     
    locate students

     
     
     
     
     
     
     
    site schools

     
     
     
     
     
     
     
    determine market for mailing list/advertising

    Applications for networking

    The various types of applications of network analysis include the following and are provided with some examples.
    Routing
    bus routes

     
     
     
     
     
     
     
    pedestrian routing

     
     
     
     
     
     
     
    rubbish collection routes

     
     
     
     
     
     
     
     
    Emergency Services
    evaluate potential emergency vehicle sites

     
     
     
     
     
     
     
    route emergency vehicles

     
     
     
     
     
     
     
    evacuation plans (disaster/planning)

     
     
     
     
     
     
     
     
    Districting
    determine polling locations

     
     
     
     
     
     
     
    analyse distribution of goods to customers

     
     
     
     
     
     
     
     
    Facility Siting and Design
    calculate service and parking demands for libraries, shopping centers, airports, etc.

     
     
     
     
     
     
     
    utility companies analyse customer requirements/usage

     
     
     
     
     
     
     
     
    Natural Resources Management
    water availability and distribution

     
     
     
     
     
     
     
    simulate water pollution and analyse downstream effects

     
     
     
     
     
     
     
    storm runoff

     
     
     
     
     
     
     
    forestry logging plans

     

    Framework for GIS selection, implementation and management

    Some of the fundamentals in selecting a GIS are:

     
     
     
     
     
     
     
     

    - know what you want
    - cost justification (benefits)
    - long term planning
    - management support
    - commitment
    - DATA is the most important ingredient

    The selection, implementation, and management of a GIS should be viewed from a broader perspective than simply the technology. A general framework to view these issues can be shown as 4 levels:

    bulletcorporate (includes organisational and personnel issues)
    bulletapplication
    bullettechnical (includes input, management, analysis, and output components)
    bullettechnology

    Corporate aspects: integration, commitment

    Integration

    - corporate and strategic plan
    - information systems strategy
    - corporate data architecture and standards
    - technology infrastructure

    Commitment

    - currently
    - longer term - GIS requirements grow
    - data required
    - allocation of people
    - management of role/support
    - who else needs system?

    Corporate aspects: people issues, payback

    People Issues

    - system "champion"
    - able to manage change
    - attract right people
    - training for mangement and users (an often underestimated area which usually involves several months of awareness, familiarity, usage, testing, etc. before the system becomes accepted)
    - user participation

    Payback

    - over what period (generally long-term)
    - requires large initial outlay
    - pilot project to show returns and viability
    - be careful of overselling

    Application aspects: methodologies, data availability

    Methodologies and procedures

    - what are the current procedures?
    - what changes are required?
    - what is the organisational impact?
    - how does the system integrate into existing plans?

    Data availability

    - what exists and in what format?
    - what needs to be captured?
    - priority and schedule
    - custodianship
    - data custodian
    - information custodian

    Definition and requirements of problem

    - well-known
    - vague
    - a mere guess

    Scope and time frame

    Cost justification

    People aspects

    - getting right people
    - training support for concepts and management of GIS
    - getting management support

    Technical aspects

    We will consider the four technical areas:

    input
    management
    analysis
    output

    Technical aspects: data input

    Data Input

    - digitising
    - satellite/image data integration
    - distributed/centralised data capture approach
    - data loading

    - bulk attribute data loading
    - linking spatial and attribute data
    - supported data interchange formats

    - creation of topology

    - automated or manual

    - edge matching
    - transformation and projection changes
    - generalisation
    - ergonomics

    Technical aspects: data management

    Data Management

    - data models and structures

    - vector and/or grid
    - topology

    - how is spatial and attribute data linked?

    - two-way navigation?

    - multi-user/multi-application support
    - how is spatial data managed?

    - seamless data set
    - is partitioning controlled by user or system?
    - retrieval criteria
    - in same database or separate from attribute data?
    - multiple layers supported?

    - database technology

    - relational, network or heirarchical
    - data dictionary facilities
    - query facilities
    - training and support
    - access methods, performance
    - security and protection
    - integrity checking/problem detection
    - recovery and space management
    - data and record structures
    - programming interface support

    Technical aspects: data analysis

    Data Analysis

    - analysis facilities provided as required
    - vector-grid conversion
    - polygon overlay operations
    - buffer creation
    - grid analysis
    - terrain analysis
    - neighborhood and connectivity
    - modelling facilities

    Technical aspects: data output

    Data Output

    - browsing/display (windowing, zooming, panning, etc.)
    - map creation and enhancement
    - ergonomics
    - report generation
    - soft copy preview of maps
    - range of hard-copy devices

    - pen plotter
    - electrostatic plotter
    - ink jet
    - laser printers (postscript, mono, color, ...)

    - interchange data formats supported

    Other issues (Technology)

    Other Issues in GIS Selection, Implementation and Management

    - user friendly

    - help facilities
    - user interface
    - flexibility/consistency

    - what is the life span of the system?
    - distributed or centralised?
    - acquiring system

    - turnkey - allow tailoring?
    - hybrid
    - in-house

    - vendor credibility

    - support of product/hardware
    - track record in industry and application area
    - support facilities
    - philosophy (toolbox, turnkey, etc...)

    - allow for expected growth
    - processing vs. space tradeoff
    - single/multiple vendor approach
    - inventory or project-based

    Costs of GIS

    Costs

    - study leading up to implementation
    - hardware and communications infrastructure
    - software purchase and/or development
    - hardware and sofware upgrades and maintenance
    - data capture and maintenance
    - operational costs, supplies
    - organisational and administrative costs
    - staff and specialist costs
    - training and development costs

    Benefits of GIS

    Benefits

    Tangible
    - time saving (production, maintenance, administration)
    - staff-saving
    - more efficient use of resources
    - higher standards and accuracy of information
    - improved access to information
    - new products and facilities

    Intangible
    - more and better information - what is the benefit?
    - improved analysis
    - improved decision-making and planning
    - development of people within organisation
    - new market opprotunities and a greater competitive edge

    The challenge is to quantify the intangible benefits.

    GIS tradeoffs

    Tradeoffs

    - investment in technology - now or later?
    - level of investment - incremental approach?
    - application focus

    - not too specialised
    - not too generalised
    - begin to aim for specifics and broaden the scope later

    - geographic coverage

    - priorities
    - specific high-activity areas?
    - overall general areas?

    Factors influencing GIS development

    Many factors influence the development of a GIS:

    political
    administrative
    financial
    technical
    educational
    professional

     

     

    http://www.co.yakima.wa.us/gis/

     

     

     

    Home | Jaime's Family | Jana's Family | Jeremy's Family | Photo Gallery | GIS/GPS Data | Mexico 2007

    This site was last updated 02/01/08