Mapping Data Through Network Analysis

No matter what the discipline, any body of knowledge is built upon understanding relationships. An appreciation for how ideas, people, planets, physical processes, or geometric concepts interconnect offers students a platform for grounding their understanding of a topic. Today we are blessed (and some say cursed) with copious amounts of data that has been collected both publicly and privately and aggregated in vast electronic databases. On its own, all this information is rather sterile and uninformative, but discerning patterns and relationships within the data allows us to construct applicable knowledge.

The process of translating raw data into coherent information offers three useful stages for learning: 1) Finding data — allowing students to question who controls and collects data, 2) Editing data — allowing students  to question ideas of objectivity and manipulation of information, and 3) Representing data — allowing students to question the legibility of data to diverse audiences.

The capacity and technology for visualizing and mapping data has evolved dramatically in the past two decades, becoming easier and more powerful. Many have probably come across network graphs — from immensely complex ‘maps of the internet’, to self-indulgent social universes, or fundamental mathematics. The software used to generate these graphs is diverse and some programs are discipline specific. An emerging open-source platform for lay and scientific endeavor is the Gephi software package. Developed by a French non-profit collaborative, Gephi is freely available for download, is compatible with most data file types, and offers a deep reserve of instructional aids supported by a large community of users (all available here: www.gephi.org).

Aside from the aesthetic stimulation these graphs offer, network analysis is helping scientists uncover reclusive properties of genomes, proteins, bacterias. The analytical virtues of visualizing data in such a manner are apparent, but how might this technology be employed pedagogically? Can Gephi enhance learning? As with adapting any emerging technology to a classroom environment, Gephi remains only a medium through which knowledge may be transmitted. Ultimately the knowledge is the product, not the fancy graph.

With this caveat in mind, the form through which knowledge is conveyed does have a direct impact on its comprehension and retention. Linking together concepts within cognitive space has been shown to enhance retention of information (from the formation of geospatial relationships in navigating environments to simple mnemonic devices). Thus, the sort of conceptual mapping offered by network graphs has great promise as a teaching and learning tool.

For example, history classes could supplement textbook narration of the onset of World War I with a networked visualization of Archduke Franz Ferdinand, how he was connected to other actors in the conflict, and how the events following his assassination are linked. Further, Gephi allows for different nodes (e.g., Ferdinand, his assassin, Woodrow Wilson, etc.) and the links between these nodes to be weighted variably. That is, students or instructors could manipulate the impact of the Archduke’s assassination, unrest in Russia, or employment opportunities in Sarajevo to minimize or emphasize certain ideas as they relate to a topic, allowing students to visualize the contingencies of history. Not only could this device improve retention of events, but the non-linear representation offers an exciting alternative to flat one-dimensional timelines. Similar approaches could be attempted with concepts in physics, mathematics, or the humanities.

Gephi allows the user to input and link data manually (as in the example of World War I), or to transform existing datasets into discernible visualizations. For example, we can download a dataset from NYC Open Data. This reserve of publicly held data has been accrued by the city government, and includes metrics on everything from 311 noise complaints to the location of parks with dog runs. This data can be filtered and searched a number of ways. For this example we can filter our search to datasets and type in a phrase such as “social media” in the search box. The first search result is a dataset on the social media usage of New York’s city agencies.

data data2

You can download (export) this data in a number of formats. For our purposes we can select a .csv file.  Gephi runs its proprietary file extension (.gexf), but can import several other data specific file types (including .csv files common to applications like Microsoft Excel). When we open this file in Excel, we may find that the data needs “cleaned” a bit. We can delete redundant entries or eliminate city agencies we don’t care about. Once we’ve cleaned the data we can import it within the Data Laboratory in Gephi.

data3 data4

Once the data is imported we can switch to the Overview tab and see our network graph. Colors, sizes, and labels can all be manipulated similarly to many image formatting programs.

data5

Other useful sources for freely available datasets include:

To help introduce these concepts into the classroom, provided below is a rudimentary sample assignment which could be used as a template for introducing data visualization into the classroom.

Leave a Reply

Your email address will not be published. Required fields are marked *