How to Navigate the Data Deluge: Ecometrics and the Case for the Urban Social Sciences
The recent emergence of large-scale administrative data sets, or “big data,” has inspired much attention and discussion in the sciences as well as society more broadly. One of the metaphors for this event has been the “data deluge.” This metaphor seems apt on a number of levels; not only does the amount of newly available information resemble a flood, we seem to have been completely overwhelmed by it. Indeed, there is a sense we are wading about in data trying to figure out what exactly to do with it all. To make matters worse, big data does not necessarily mean more reliable or more valid data than that derived from traditional sources.
The question stands, then: What is the vessel that will enable users to intelligently navigate the waters that characterize this post-diluvian age? The answer to this question will likely differ by context, and here we will focus on “urban informatics,” typically defined as using communications technology and the data sciences to measure urban environments. We argue that argue that cities make a distinctive case for merging urban informatics with key insights from the social sciences. Cities have a high density of people and technology, and therefore data being produced; of public and private entities with the resources to generate and leverage digital data; and of academic institutions that might study them. Cities also provide a shared geographical context for such data, making it possible to examine how objects, people, and institutions interact and, in turn, give rise to societal patterns.
A central goal of the Boston Area Research Initiative (hereafter BARI; see also www.bostonarearesearchinitiative.net) is to pursue and support original research about greater Boston that engages directly with these novel data resources and demonstrates how they might be used to advance both science and public policy. The current essay focuses on one of BARI’s flagship projects—the construction of a library of ecometrics. More specifically, we describe our efforts to measure variations in the physical and social landscape of the city from administrative data as one illustration of how the data deluge might be navigated.
Before detailing the history and products of this project, we want to propose that there are at least two steps to this process, which will then act as the guiding framework for the remainder of this essay. The first step is to address the important but overlooked fact that administrative data are not made for research purposes, and thus do not have any clear connection to existing theories and debates at the forefront of the field. This hard reality requires the development of methodologies that incorporate the data into long-standing traditions while also making them compatible with data from other sources.
But constraining these new data to approaches built for the data resources that preceded them would be to ignore their full potential. The second step, then, is to develop another set of methodologies that can capitalize on the newfound content and detail these data hold. The second step must necessarily build on the work conducted in the first step. Without the former, research based on the data will not be conversant with current knowledge. Without the latter, we are still wading about in data, pretending that they are just the same as before only more numerous, which simply is not the case.
Ecometrics in the Age of Big Data
“Ecometrics,” per their name (eco- = space; -metric = measurement), refer to a statistical approach to reliably and validly describing some characteristic of a particular geography (Raudenbush and Sampson 1999), be it a building, street, or neighborhood. The methodologies that generate ecometrics, including neighborhood surveys and in-person audits, are commonly utilized by research on neighborhood dynamics, the most prevalent examples being observations of physical and social disorder (e.g., graffiti, public drunkenness) in public spaces and surveys assessing the social relationships between neighbors. Surveys and audits that cover an entire city, however, are extremely expensive to conduct, for which reason no city has ever been fully assessed more than twice in a decade.
We have recently argued that administrative data provide a new potential resource for such work, and we use the data generated by Boston’s 311 system of all requests for non-emergency services (e.g., graffiti removal, street light outage) as a test case (O'Brien et al. 2015). Boston’s 311 system receives over 500 reports per day, each one describing a discrete event or condition occurring at a particular place and time. As a corpus, these reports potentially act as “the eyes and ears of the city,” to paraphrase Jane Jacobs (1961), offering an administratively based window into the urban landscape. For example, 311 reports might be able to measure the deterioration and denigration described as physical disorder or, more colloquially, “broken windows,” in the spirit of the classic criminological theory. But as noted above, administrative data like these are intended to support the operations of basic city services, not researchers. For this reason, we had to develop a new methodology for utilizing 311 reports to measure “broken windows” across the city, and one that could act as a more general model for ecometrics in the age of big data. The methodology had three parts.
First, the data were too rich to be immediately useful, creating a need to isolate relevant content. Of the 178 different types of service requests, 33 reflected physical disorder, which were further organized into two dimensions: private neglect, including housing issues (e.g., pests), the uncivil use of private space (e.g., illegal rooming house), and complaints about big buildings (i.e., condos); and public denigration, including references to graffiti and the improper disposal of trash.
Second, there was the concern of validity. If 311 reports are “the eyes and ears of the city,” it is likely that some eyes and ears are more sensitive to certain types of issues than others. As such, our measures were likely a mixture of objective disorder and subjective responses to disorder. To address this issue, BARI and the City of Boston conducted neighborhood audits (with the help of a team of undergraduates from UMass Boston) that identified how likely users in neighborhoods were to report a street light outage or broken sidewalk. The resultant measure of “civic response rate” enabled us to calibrate the call-based measures to better reflect objective conditions. This calibration was constructed in such a way that it could be re-assessed dynamically without additional neighborhood audits.
Third, there was a need for criteria for reliability—administrative data do not come with guidelines regarding the spatial or temporal windows at which they are most appropriately measured. Testing across multiple time intervals revealed that both private neglect and public denigration could be reliably measured every six months for census block groups and every two months for tracts.
In the end, we had not only developed a methodology that made administrative data compatible with an existing program of research, the resultant measures were distinctive. The 311 data supported two dimensions of physical disorder, whereas previous work had almost exclusively identified only one (e.g., Caughy et al. 2001; Raudenbush and Sampson 1999); this was largely thanks to the ability to “see” problems in both public and private spaces. These measures could also be recalibrated multiple times a year, which was unprecedented. To boot, the cost of each repeated measure is merely the handful of hours it takes to download the data and process it with a relatively modest piece of R code. Thus the methodology is of high efficiency considering the scale of the data.
A Generalized Ecometric Approach: Research, Practice, and Teaching
Our proposed framework or measuring “broken windows” with 311 data could be extended to just about any data that document events or conditions in space and time. This property has led us to pursue a generalized ecometric approach that can translate the diversity of novel digital data—from city service records to social media to public transit usage—into a broad library of ecometrics that comprehensively tracks the dimensions that describe and differentiate the neighborhoods of the city.
In addition to the 311 measures of physical disorder, we have incorporated 911 dispatches and building permits into this library. A recent paper highlighted 911 as a way to measure different aspects of social disorder and crime, debuting a measure of private social disorder, including domestic violence and other conflict between people living together (O'Brien and Sampson 2015). Another project used building permits to measure “the other side of the broken window”—that is, the active efforts of residents and landowners to invest in the neighborhood (O'Brien and Montgomery 2015). There has also been much work on a measure that was originally seen as a byproduct of the 311 project. “Civic response rate” has been found to break out into two separate constructs: engagement, or knowing of and being willing to use the 311 system; and custodianship, or the likelihood of someone reporting an issue in the public space (e.g., street light outage). This latter measure has provided a variety of new insights into the psychology and behavioral dynamics that go into neighborhood maintenance (O'Brien 2015; O'Brien et al. 2014).
The ecometric library, which is consistently growing in both content and timespan, is a unique resource for hypothesis testing and can support robust longitudinal research designs. For example, we recently used it to examine the broken windows theory of crime, testing whether physical and social disorder in a neighborhood really did lead to increases in crime (O'Brien and Sampson 2015). It turns out that they did—but not in the way typically proposed. Instead of disorder in public spaces inviting crime, we found that it was private disorder that appeared to escalate and spill out, leading to more serious and visible violations. We called this the social escalation model. Again, the ability of administrative data to “see” the dynamics of both private and public spaces added a new dimension to our understanding of urban neighborhoods.
It is important to note that, BARI’s role goes beyond an academic exercise; its position is at the intersection of urban science and policy. For example, BARI has been working closely with the City of Boston’s Department of Innovation and Technology and Mayor’s Office of New Urban Mechanics to incorporate these new measures in their own tracking systems and to utilize them directly in new projects being undertaken by other departments and agencies.
In addition, “urban informatics” is a burgeoning field that is just beginning to train the first generation of scholars who identify as such, and one of us (O’Brien) is a member of the core faculty for Northeastern University’s new Masters in Urban Informatics. In partnership with the City of Boston, he has made ecometrics a central component of his courses. One of the great challenges of learning how to work with record-level data is to translate them into interpretable measures that can then be utilized in research, policy, and practice. Students learn this generalizable skill by developing ecometrics from City of Boston data, potentially furthering research and policy in the process.
Ecometrics and the Future of the Urban Sciences
As we proposed at the beginning, incorporating new and so-called “big data” into existing traditions is an essential first step, but it falls short of capitalizing on their full potential. Their novelty includes a richness that existing theories and methodologies were not constructed to handle. In the example of administrative municipal data, events and conditions can be analyzed at spatial and temporal scales that are very rarely accessed or analyzed by social scientists. To further complicate matters, the diversity of measures available is rapidly growing beyond anything we have seen.
BARI and its partners are developing new methodologies that break through current limitations. One could measure patterns of events for streets or even buildings; examine how events on one day relate to those of the next day or week or month in surrounding areas; describe, follow, and predict the trajectories of neighborhoods in a multidimensional manner. Such ideas have often been mere dreams for a field whose most comprehensive data sets contained annual patterns for regions of thousands of residents. In each instance, BARI’s ecometric library is critical as it gives meaning to the different types of events that might be documented. Rather than cherry-picking events according to one logic or another, researchers can select classes of events based on their shared substance and what they are really telling us about a neighborhood.
To return to the metaphor we began with, ecometrics is one vessel that BARI has adopted to navigate the data deluge. It is a tried and true tool that has been important for urban science, and new data have offered an opportunity to markedly advance its use. However, in our opinion, a true transformation still awaits the social sciences in the age of large-scale data. Analogous stories can likely be told in many other disciplines. For transformations to happen we must design the tools that utilize the new data resources as not just “bigger” data but “different” data that can answer questions we have not yet been able to, or, in some cases, haven’t yet imagined.
Caughy, Margaret O., Patricia J. O'Campo and Jacqueline Patterson. 2001. "A Brief Observational Measure for Urban Neighborhoods." Health & Place 7:225-236.
Jacobs, Jane. 1961. The Death and Life of Great American Cities. New York: Random House.
O'Brien, Daniel Tumminelli. 2015. "Custodians and Custodianship in Urban Neighborhoods: A Methodology Using Reports of Public Issues Received by a City's 311 Hotline." Environment and Behavior 47:304-327.
O'Brien, Daniel Tumminelli and Robert J. Sampson. 2015. "Public and Private Spheres of Neighborhood Disorder: Assessing Pathways to Violence Using Large-Scale Digital Records." Journal of Research in Crime and Delinquency 52.
O'Brien, Daniel Tumminelli and Barrett W. Montgomery. 2015. "The Other Side of the Broken Window: A Methodology That Translates Building Permits into an Ecometric of Investment by Community Members." American Journal of Community Psychology 55:25-36.
O'Brien, Daniel Tumminelli, Eric Gordon and Jesse Baldwin-Philippi. 2014. "Caring About the Community, Counteracting Disorder: 311 Reports of Public Issues as Expressions of Territoriality." Journal of Environmental Psychology 40:320-330.
O'Brien, Daniel Tumminelli, Robert J. Sampson and Christopher Winship. 2015. "Ecometrics in the Age of Big Data: Measuring and Assessing "Broken Windows" Using Administrative Records." Sociological Methodology 45.
Raudenbush, Stephen W. and Robert J. Sampson. 1999. "Ecometrics: Toward a Science of Assessing Ecological Settings, with Application to the Systematic Social Observation of Neighborhoods." Sociological Methodology 29:1-41.