In this, the final edition of my three part series on ‘What is GIS?,’ I turn my attention to explaining what GIS can do. After describing the basic building blocks of GIS, we will look at the file types it uses and several example use cases. And finally, I close with an encompassing definition of GIS and/or GIScience based on my experience using the software and the research done for these articles.
As a quick reminder, this G-FAQ series has focused on three core questions:
What is GIS: is it just computer software or is it a science? How did GIS develop into an established field of study? How does GIS work and what can you use it for?
From a very high level, GIS is a digital tool that lets you take layers of spatial data and overlay them, almost as you could overlay transparent acetate sheets back in elementary school on the overhead projector. By stacking multiple ‘sheets’ of spatial data over each other, then adding in a legend, compass rose, a title, etc., these layers become a map as you might see in an atlas. And then when you add in spatial analysis techniques that combine these layers in various ways, such as adding them together, subtracting, buffering, etc., you are able to answer questions about the world around us. This is the true power of GIS, i.e. an independently verifiable and repeatable scientific methodology.
Every object on the planet has a latitude and longitude coordinate and further it is estimated that 80% of all data has some spatial component; and so with a GIS software application, you can convert this data into information. GIS takes data from spreadsheets, tables, etc. and makes it visual; and thus (often times), more comprehendible as after all humans are spatial, visually-oriented creatures. By communicating spatial data in a visual and interactive fashion, GIS can show us patterns that might be missed otherwise.
In this final piece of our look at ‘What is GIS?,’ we focus on GIS as a software package, and specifically on ArcGIS as, well, that is the software I learned while at the University of Colorado. The point of the discussion that follows is to give our readers a high-level overview of how GIS software works and what one can do with it. Admittedly, it is impossible for me to cover all of the tools, formats and other intricacies of ArcGIS in several pages; and in fact, I am the first to tell people that the more I know about ArcGIS, the more I realize I don’t know!
GIS Data Formats
In the world of GIS, there are two basic data formats: vectors and rasters. Simply stated, vectors are points, lines and polygons as you would find in an atlas or on the dusty globe sitting in your basement. While rasters are cells of numeric values (or colors) as you would see in a photograph or on your TV when you look really close at them.
Similar to your lessons from geometry class, a vector can either be a single point, a collection of points as a line or a collection of lines as a polygon. All vectors also have attributes tied to them – for instance, you could have a vector file showing all the cities in your county with 2010 population and average income associated with each point. Most vector files are not a single point, line or polygon, rather they are a collection of one of these geometric shapes in a single file – as in the example stated, above of all the cities in your county. A vector is a discrete object with a defined boundary. Each point or vertex, in the case of a line or polygon, has a defined X and Y positional value(e.g. latitude and longitude; northing and easting; etc.); and in some cases, they have a Z value or height.
Vectors can also have topology tied to them which is how one feature in a layer relates to another – for example, polygon A (e.g. the boundaries of an elementary school playground) is always inside polygon B (e.g. the boundaries of the entire elementary school’s grounds). There are four basic spatial topological relationships: adjacency (which features are next to which); connectivity (which features connect to which); containment (which features are contained within other features); and coincidence (which features occupy the same space).
A raster file is a gridded representation of the world as opposed to a point, line or polygon. Raster files have squares of a set dimension, such as 2-feet by 2-feet, organized in rows (horizontal) and columns (vertical); and the dimension of each cell is often called the pixel size, which would be 2-feet pixel size in this example. Embedded in each cell is a unique numeric value which defines a spatial value on the ground below the raster – in our example here, if the raster represents elevation, each 2-foot cell would be filled with the height above the ground such as 124.52 feet. As you can imagine, a pixel’s value is typically an average of the phenomena being described on the ground as, for example, elevation will change across a 2-foot by 2-foot pixel.
Rasters are commonly used to describe a continuous surface as opposed to the discrete objects vectors define. A single raster file can also contain various layers with cells that are perfectly aligned, each layer representing a different value – for example, a color image which would be a combination of three layers with red, green and blue pixel values. Since a raster file is forced to be represented as a collection of cells, it is not possible to display feature boundaries as accurately as you can with a vector which can denote specific locations not within a set grid.
It is possible to convert files between vector and raster formats but the conversion process is not an accurate one. Here is an example to help illustrate why conversions are not very accurate. Consider a polygon vector file of the boundaries of Chicago, Illinois, it is a very complex shape with many irregular edges. Now let’s convert that to a raster file with 4-foot pixels. When this is done, diagonal lines become stair-stepped as it is not possible to draw straight diagonal lines in a grid of cells. Further, the position of other lines are shifted as the edge of a vector polygon will not follow a grid exactly, so these boundaries become thicker and move either north, south, east or west. And finally, when we convert the raster file of Chicago’s borders back to a vector, all of these stair-stepped edges and positional errors are maintained. It also follows that choosing a finer grid size, say 1-inch in our example above, will reduce the distortions of converting from vector-to-raster and vice versa, but some level of distortion will always be present after these conversions.
Common File Formats for ArcGIS
ArcGIS is able to handle a wide variety of vector and raster file formats. Here is a list of the most common ones we use at Apollo Mapping.
- Shapefile (SHP) – this is Esri’s industry standard for vector and is by far the most common format we use daily. A shapefile is actually a combination of at least three individual files: .shp (spatial details); .dbf (database information); and .shx (header). Most shapefiles will also have a .prj which defines the projection.
- Layer file (LYR) – this can be added to other vector files, such as a SHP. It tells ArcGIS how each of the various files included inside should be displayed with regards to color, labels, naming conventions, etc.
- File Geodatabase (GDB) – this format has a rather complex data structure but has many advantages over SHP’s, such as faster performance. Each feature in a GDB is an object that is maintained in a relational database management system. Esri is actively encouraging users to create GDB’s over SHP’s.
Other common vector file formats include smart data compression (SDC), ArcInfo coverage, Arc interchange format (E00), triangular irregular network (TIN), digital line graph (DLG) and personal geodatabases.
- GeoTIFF (TIF) – this is the most common raster format we use, it is an uncompressed dataset so the image quality is the best you can get. A GeoTIFF is essentially the same as a traditional TIFF file only it has geotags embedded in the header which tell ArcGIS the data’s projection and coordinates on the ground.
- Compressed Rasters – TIF’s can be very large datasets so there are a variety of compressed raster formats such as MrSID, JPEG2000 and ECW.
As a quick aside, deploying spatial data online, such as in Google Earth or through streaming services, such as web mapping services (WFS) and web feature services (WFS), is becoming increasingly popular. These online services are a way to disseminate spatial data in the formats described above to a large group of users. Depending on how the spatial datasets are shared, users with access may be able to view them only, such as in Google Earth or a WMS, or they can edit the data, such as in a WFS. As a GIS administrator, online and/or streaming services have the huge advantage of central storage so that a number of users can access the same data with permissions applied to them; and then these users can easily collaborate on the project.
Well, as some of our regular readers have found out, I am what might be called a bit verbose. As such, this G-FAQ is already longer than I planned and we still have three topics to cover! Given this, you will have to wait till next month for the conclusion of this (now) four-part series on ‘What is GIS?’ In the final edition (I promise the final one!), we will look at the tools available in ArcGIS for spatial analysis, run through three example analyses and then provide my encompassing definition of GIS.
Do you have an idea for a future G-FAQ? If so, let me know by email at [email protected].
Find Out More About This Topic Here
In addition to the resources used in the first two parts of this G-FAQ series, you can check these out as well:
- Colorado State University – Introduction to Python for ArcGIS 10.1
- Harvard University – Formats for Geographic Data
- North Carolina State University – Geospatial Data Formats
- North Carolina State University – Weighted Overlay
- Penn State University – Why Learn ModelBuilder?
- University of Arkansas – ArcGIS Desktop Tutorial
- University of California, Berkeley – GIS Data Types
- University of Nebraska, Omaha – GIS Data Formats and Data Conversion
- University of Texas – Spatial Analysis Using Grids
- University of Wisconsin, Green Bay – Raster and Vector Data
Brock Adam McCarty