From Data to Models that Improve Urban Policy

Urban settlements are today the home of 249 million people in the United States, which is 81% of our population. It is precisely why in our most populated cities we find most modern jobs. The increase in urban population also implies limited infrastructure and space constraints. This is not just simple overcrowding translated to longer waiting times in traffic lines, running administrative errands or long lines in the central market. Population growth in cities can seriously affect the overall quality of life and the longer-term wellbeing of its citizens. The interplay of house affordability vs. commuting times is a challenge for urbanites trying to build a better life. The lack of reliable and efficient transportation is often a huge barrier to affordable housing. Today, for millions of households finding a place to rent in the city is not a trivial matter, if possible at all. A reasonable price for monthly housing should be about 30% of the household income. However, these places are lacking ever more in booming cities where the income disparity of fewer highly qualified employees inflate housing prices that simply can’t accommodate the majority of the population.  The choices are reduced to either living in a few affordable zones in the city that have higher crime rates and more struggling schools or double commuting times at the expense of family and leisure time.

The right to learn, leisure and affordable housing should be essential performance indicators of our cities. Human decisions on how to plan them are at the core of the problem. Free markets and unregulated real estate development in limited urban spaces do not generate livable cities; we must modernize the way we plan for them.  This requires information decision tools that generate up to date and massive information on people’s mobility and housing needs. This information can be translated into meaningful knowledge to support and test economic and social theories. This is by definition a complex systems challenge that entitles the interdisciplinary expertise of social sciences, economics and information sciences at the service of better cities.

Towards an Urban Decision Platform:

The first need to plan the city is to have a reliable travel demand model that allows us to make informed policies regarding house affordability metrics and commuting needs. For that end, we need to know where citizens live, where they can work and their travel times for work and other types of activities. These models combine meticulous methods of statistical sampling in local [1-2] and national household travel surveys [3-4] to process and infer trip information between areas of a city. The estimates they produce are critically important for understanding the use of transportation infrastructure, planning for its future [5-14] and taking into account the cost of housing. While the surveys that provide the empirical foundation for these models offer a combination of highly detailed travel logs for carefully selected representative population samples, they are expensive to administer and participate in. As a result, the time between surveys can range from 5 to 10 years in even the most developed cities.

The rise of ubiquitous mobile computing has lead to a dramatic increase in new, big data resources that capture the movement of vehicles and people in near real time and promise solutions to some of these deficiencies. With these new opportunities, however, come new challenges of estimation, integration, and validation with existing models. While these data are available nearly instantaneously and provide large, long running samples at low cost, they often lack important contextual demographic information due to privacy reasons, lack resolution to infer choices of mode, and have their own noise and biases that must be accounted for. Despite these issues, their use for urban and transportation planning has the potential to radically decrease the time in-between updated surveys, increase survey coverage and reduce data acquisition costs. In order to realize these benefits, a number of challenges must be overcome to integrate new data sources into traditional modeling and estimation tools.

Data generated by the pervasive use of cellular phones has offered insights into abstract characteristics of human mobility patterns. Recent work has found that individuals are predictable, unique, and slow to explore new places [15-21]. The availability of similar data nearly anywhere in the world has facilitated comparative studies that show many of these properties hold across the globe despite differences in culture, socioeconomic variables and geography. The benefits of this data have been realized in various contexts such as daily mobility motifs [22, 23], disease spreading [24, 25] and population movement [26].

Combining transportation domain knowledge and data analysis from mobile phone data, we have developed a modular, efficient computational platform that performs many aspects of travel demand estimation with billions of geo-tagged data points as an input. We review and integrate new and existing algorithms to produce validated origin-destination matrices and road usage patterns in several cities [27].

The next necessary steps are to use online the travel demand information and to integrate it with demographics and housing prices. This platform can be used for creating a decision support system for both mobility solutions and for urban economics. For example, from the knowledge of congestion bottlenecks we can create a communication system to the phones in our pockets that offers better routing alternatives, informs us on travel times’ variability and their relation with jobs and housing supply. Such an information platform can be particularly useful to assess up to which extent information can help us alleviate congestion without introducing new infrastructure, and how to plan for better infrastructures from roads to public transportation.  A critical aspect is that we keep in mind that the problem is not a purely operational one. The goal of information should not be just to optimize travel times and match existing budgets for housing. Cities are social systems and the integration of the needs of various income groups must be taken into account. By matching existing jobs, income groups, and current house prices with their travel times the end goal is to develop social policies that makes cities more livable for everyone.

Social Media Platforms for Social Good:

Many companies today do not need to produce any goods; instead they are software platforms that connect social and economic relations. Airbnb connects hosts that occasionally rent their houses when they travel, and Uber connects travelers with drivers that provide a commercial drive with their own cars. While these interactions facilitate services that before required central infrastructures, there are still questions regarding how this new form of platform capitalism can be integrated with social well being. For example, Uber drivers are contractors that do not have any employee benefit; and it is still unclear if Airbnb has a negative impact in affordable housing for cities.

If participatory economies are here to stay we have the responsibility of investigating how to facilitate new connections and on-line communication that promote grassroots organizations towards the solutions of our urban challenges. If some of us are trusting strangers to open our houses and our own cars, we wonder if it would be possible to connect us towards practices that benefit social good? Though it is still not clear which demographics of people the sharing economy caters to, important questions are “which kind of users adopt sharing services?” and “may these trends favor social media applications for community organization and social good?”

Communication technologies also provide us with the opportunity to understand large-scale social interactions in cities [43].  Only in the last decade, social network analysis methods have allowed us to uncover local and global patterns of social networks [28], locate influential individuals [29], and examine network dynamics [30]. Nodes represent individuals, and links are generally defined by friendships or acquaintances among them, inferred from social media connections, email communications or mobile phone communications. Well-documented structural patterns of these networks are: a short diameter (increasing as the natural logarithm of the number of nodes) [32], and network transitivity or clustering, which is the propensity for nodes pairs to be connected if they share a mutual neighbor [32].

In addition to physical proximity, social links are also the consequence of similar attributes of their nodes. Similar people tend to select each other [33, 34, 35]; they communicate more frequently and present stronger social interactions [31]. An interesting topic of study, which has started to grow recently, is to combine findings from human mobility to explain the relationship between social networks and geographical space in cities. Evidently, social contacts can exist only if there is the opportunity for such contacts to be created. This explains, for example, the ubiquitous findings showing that geographic proximity favors the existence of social contacts [36, 37]. Within cities the possibilities of face-face interactions are favored [38] due to the nature of urban organization [39, 40]. The geographic distance between two nodes is then defined as the distance of their respective most common locations, typically home or work. It is expected that within cities this distance should not be a strong limiting factor in the creation of social ties as it may be other factors that define their social distance. On the other hand, social distance is given by differences between groups of society, including differences such as socio-demographic, race, profession, hobbies and political identity [41]. It is at the core of people trips and their social interactions that we can better use information to face urban challenges related to transportation, security and community organization. A promising way of engaging social participation could be to motivate urban choices via serious games. These are “simulations of real-world events or processes designed for the purpose of solving a problem. Although serious games can be entertaining, their main purpose is to train or educate users” [42]. A promising avenue is the design of games in which on-line users of urban platforms can be rewarded for helping the city.

After designing solutions for the most pressing challenges in a city or neighborhood, an important question is how to convince citizens to choose actions towards social good. Ideally the use of social media can benefit residents, and the users that generate the data in the first place. Providing useful information regarding urban policy outcomes and letting people communicate about them may be the main ingredients towards applications for social good. An example of a game for social good may be to reward changes in the commuting times to relieve peak hour traffics, and to communicate alternative routes that accumulate more points for decongesting busy streets. Based in existing methods of extracting travel demands and road usage per census tract, it is possible to use information on online social networks to promote engaging games. In the new data-rich reality of cities, deeper insights into the social connections will help make the places we live more sustainable, efficient and fun.


[1] C. F. Daganzo, Optimal sampling strategies for statistical models with discrete dependent variables, Transportation Science 14 (4) (1980).

[2] M. E. Smith, Design of small-sample home-interview travel surveys, Transportation Research Record 701 (1979).

[3] P. R. Stopher, S. P. Greaves, Household travel surveys: Where are we going?, Transportation Research Part A: Policy and Practice 41 (5) (2007)

[4] A. J. Richardson, E. S. Ampt, A. H. Meyburg, Survey methods for transport planning, Eucalyptus Press Melbourne, 1995.

[5] H. J. Van Zuylen, L. G. Willumsen, The most likely trip matrix estimated from traffic counts, Transportation Research Part B: Methodological 14 (3)(1980).

[6] H. Spiess, A maximum likelihood model for estimating origin-destination matrices, Transportation Research Part B: Methodological 21 (5) (1987).

[7] M. Maher, Inferences on trip matrices from observations on link volumes: a bayesian statistical approach, Transportation Research Part B: Methoddological 17 (6) (1983).

[8] H. Lo, N. Zhang, W. H. Lam, Estimation of an origin-destination matrix with random link choice proportions: a statistical approach, Transportation Research Part B: Methodological 30 (4) (1996).

[9] M. L. Hazelton, Some comments on origin-destination matrix estimation, Transportation Research Part A: Policy and Practice 37 (10) (2003).

[10] M. L. Hazelton, Inference for origin{destination matrices: estimation, prediction and reconstruction, Transportation Research Part B: Methodological 35 (7) (2001).

[11] M. L. Hazelton, Estimation of origin{destination matrices from link  flows on uncongested networks, Transportation Research Part B: Methodological 34 (7) (2000).

[12] C.-C. Lu, X. Zhou, K. Zhang, Dynamic origin{destination demand  flow estimation under congested tract conditions, Transportation Research Part 715 C: Emerging Technologies 34 (2013).

[13] E. Cascetta, Estimation of trip matrices from tra_c counts and survey data: a generalized least squares estimator, Transportation Research Part B: Methodological 18 (4) (1984).

[14] M. G. Bell, The estimation of origin-destination matrices by constrained generalised least squares, Transportation Research Part B: Methodological 25 (1) (1991).

[15] M. C. Gonzalez, C. A. Hidalgo, A.-L. Barabasi, Understanding individual human mobility patterns, Nature 453 (7196) (2008).

[16] D. Brockmann, L. Hufnagel, T. Geisel, The scaling laws of human travel, 725 Nature 439 (7075) (2006).

[17] Y.-A. de Montjoye, C. A. Hidalgo, M. Verleysen, V. D. Blondel, Unique in the crowd: The privacy bounds of human mobility, Scientific reports 3.

[18] C. Song, Z. Qu, N. Blumm, A.-L. Barab_asi, Limits of predictability in human mobility, Science 327 (5968) (2010) 1018.

[19] C. Song, T. Koren, P. Wang, A.-L. Barab_asi, Modelling the scaling properties of human mobility, Nature Physics 6 (10) (2010).

[20] J. Candia, M. C. Gonz_alez, P. Wang, T. Schoenharl, G. Madey, A.-L. Barabasi, Uncovering individual and collective human dynamics from mobile phone records, Journal of Physics A: Mathematical and Theoretical (22) (2008) 224015.

[21] F. Calabrese, M. Diao, G. Di Lorenzo, J. Ferreira Jr, C. Ratti, Understanding individual mobility patterns from urban sensing data: A mobile phone trace example, Transportation research part C: emerging technologies (2013).

[22] C. M. Schneider, V. Belik, T. Couronne, Z. Smoreda, M. C. Gonzalez, Unravelling daily human mobility motifs, Journal of The Royal Society Interface 10 (84) (2013) 20130246.

[23] A. Sevtsuk, C. Ratti, Does urban mobility have a daily routine? Learning from the aggregate data of mobile networks, Journal of Urban Technology 745 17 (1) (2010).

[24] V. Belik, T. Geisel, D. Brockmann, Natural human mobility patterns and spatial spread of infectious diseases, Physical Review X 1 (1) (2011) 011001.

[25] A. Wesolowski, N. Eagle, A. J. Tatem, D. L. Smith, A. M. Noor, R. W. Snow, C. O. Buckee, Quantifying the impact of human mobility on malaria, Science 338 (6104) (2012).

[26] X. Lu, L. Bengtsson, P. Holme, Predictability of population displacement after the 2010 Haiti earthquake, Proceedings of the National Academy of Sciences 109 (29) (2012) 11576.

[27] Toole, Jameson L., et al. "The path most traveled: Travel demand estimation using big data resources." Transportation Research Part C: Emerging Technologies (2015).

[28] Leskovec, J. & Horvitz, E. Planetary-scale views on a large instant-messaging network. In Proc. of the 17th International Conference on World Wide Web, WWW '08, Beijing, China. New York, NY, USA: ACM. DOI:10.1145/1367497.1367620 (2008).

[29] Kitsak, M. et al. Identi_cation of influential spreaders in complex networks. Nature Phys. 6, (2010).

[30] Rybski, D., Buldyrev, S. V., Havlin, S., Liljeros, F. & Makse, H. A. Scaling laws of human interaction activity. Proc. Natl. Acad. of Sci. U.S.A. 106, (2009).

[31] Onnela, J. et al. Structure and tie strengths in mobile communication networks. Proc. Natl. Acad. of Sci. U.S.A. 104, 7332 (2007).

[32] Watts, D. & Strogatz, S. The small world problem. Collective Dynamics of Small-World Networks 393, (1998).

[33] Newman, M. E. The structure and function of complex networks. SIAM review 45, (2003).

[34] Centola, D. An experimental study of homophily in the adoption of health behavior. Science 334, 1269{1272 (2011).

[35] Christakis, N. A. & Fowler, J. H. The spread of obesity in a large social network over 32 years. NEJW 357, 370{379 (2007).

[36] Liben-Nowell, D., Novak, J., Kumar, R., Raghavan, P. & Tomkins, A. Geographic routing in social networks. Proc. Natl. Acad. of Sci. U.S.A. 102, 11623{11628 (2005).

[37] Noulas, A., Scellato, S., Lambiotte, R., Pontil, M. & Mascolo, C. A tale of many cities: universal patterns in human urban mobility. PloS one 7, e37027 DOI:10.1371/journal.pone.0037027 (2012).

[38] Calabrese, F., Smoreda, Z., Blondel, V. D. & Ratti, C. Interplay between telecommunications and face-to-face interactions: A study using mobile phone data. PloS one 6, e20814 DOI:10.1371/journal.pone.0020814 (2011).

[39] Bettencourt, L. M., Lobo, J., Helbing, D., Kuhnert, C. & West, G. B. Growth, innovation, scaling, and the pace of life in cities. Proc. Natl. Acad. of Sci. U.S.A. 104, (2007).

[40] Pan, W., Ghoshal, G., Krumme, C., Cebrian, M. & Pentland, A. Urban characteristics attributable to density-driven tie formation. Nat. Commun. 4 DOI:10.1038/ncomms2961 (2013).

[41] Bogardus, E. S. Measurement of personal-group relations. Sociometry 306{311 (1947).


[43] Carlos Herrera-Yagüe, Christian M Schneider, Thomas Couronne, Zbigniew Smoreda, Rosa M Benito, Pedro J Zufiria, and Marta C Gonzalez, The anatomy of urban social networks and its implications in the searchability problemty, Scientific Reports, 2015.

About the Author
Marta C. González is associate professor of Civil and Environmental Engineering at the Massachusetts Institute of Technology.
Posted on September 21st, 2015.