This particular Technology Salon was memorable for me not just because it was on my birthday but also because it was centered on a topic I have been looking forward to learning about. And as a Birthday gift, I learnt from pundits in the Big Data space. As usual, I supported Linda Raftree with the notes and summary … kindly read and enjoy the summary of the Salon.
The NYC Technology Salon on February 28th examined the connection betweenbigger, better data and resilience. We held morning and afternoon Salons due to the high response rate for the topic. Jake Porway, DataKind; Emmanuel Letouzé, Harvard Humanitarian Initiative; and Elizabeth Eagen, Open Society Foundations; were our lead discussants for the morning. Max Shron, Data Strategy; joined Emmanuel and Elizabeth for the afternoon session.
This post summarizes key discussions from both Salons.
What the heck do we mean by ‘big data’?
The first question at the morning salon was: What precisely do we mean by the term ‘big data’? Participants and lead discussants had varying definitions. One way of thinking about big data is that it is comprised of small bits of unintentionally produced ‘data exhaust’ (website cookies, cellphone data records, etc.) that add up to a dataset. In this case, the term big data refers to the quality and nature of the data, and we think of non-sampled data that are messy, noisy and unstructured. The mindset that goes with big data is one of ‘turning mess into meaning.’
Some Salon participants understood big data as datasets that are too large to be stored, managed and analyzed via conventional database technologies or managed on normal computers. One person suggested dropping the adjective ‘big,’ forgetting about the size, and instead considering the impact of the contribution of the data to understanding. For example, if there were absolutely no data on something and 1000 data points were contributed, this might have a greater impact than adding another 10,000 data points to an existing set of 10 million.
The point here was that when the emphasis is on big (understood as size and/or volume), someone with a small data set (for example, one that fits into an excel sheet) might feel inadequate, yet their data contribution may be actually ‘bigger’ than a physically larger data set (aha! it’s not the size of the paintbrush…). There was a suggestion that instead of talking about big data we should talk about smart data.
How can big data support development?
Two frameworks were shared for thinking about big data in development. One from UN Global Pulse considers that big data can improve a) real-time awareness, b) early warning and c) real-time monitoring. Another looks at big data being used for three kinds of analysis: a) descriptive (providing a summary of something that has already happened), b) predictive (likelihood and probability of something occurring in the future), and c) diagnostic (causal inference and understanding of the world).
What’s the link between big data and resilience?
‘Resilience’ as a concept is contested, difficult to measure and complex. In its most simple definition, resilience can be thought of as the ability to bounce back or bounce forward. (For an interesting discussion on whether we should be talking about sustainability or resilience, see this piece). One discussant noted that global processes and structures are not working well for the poor, as evidenced from continuing cycles of poverty and glaring wealth inequalities. In this view, people are poor as a result of being more exposed and vulnerable to shocks, at the same time, their poverty increases their vulnerability, and it’s difficult to escape from the cycle where over time, small and large shocks deplete assets. An assets-based model of resilience would help individuals, families and communities who are hit by a shock in one sphere — financial, human, capital, social, legal and/or political — to draw on the assets within another sphere to bounce back or forward.
Big data could help this type of an assets-based model of resilience by predicting /helping poor and vulnerable people predict when a shock might happen and preparing for it. Big data analytics, if accessible to the poor, could help them to increase their chances of making better decisions now and for the future. Big data then, should be made accessible and available to communities so that they can self-organize and decrease their own exposure to shocks and hazards and increase their ability to bounce back and bounce forward. Big data could also help various actors to develop a better understanding of the human ecosystem and contribute to increasing resilience.
Can ivory tower big data approaches contribute to resilience?
The application of big data approaches to efforts that aim to increase resilience and better understand human ecosystems often comes at things from the wrong angle, according to one discussant. We are increasingly seeing situations where a decision is made at the top by people who know how to crunch data yet have no way of really understanding the meaning of the data in the local context. In these cases, the impact of data on resilience will be low, because resilience can only truly be created and supported at the local level. Instead of large organizations thinking about how they can use data from afar to ‘rescue’ or ‘help’ the poor, organizations should be working together with communities in crisis (or supporting local or nationally based intermediaries to facilitate this process) so that communities can discuss and pull meaning from the data, contextualize it and use it to help themselves. They can also be more informed what data exist about them and more aware of how these data might be used.
For the Human Rights community, for example, the story is about how people successfully use data to advocate for their own rights, and there is less emphasis on large data sets. Rather, the goal is to get data to citizens and communities. It’s to support groups to define and use data locally and to think about what the data can tell them about the advocacy path they could take to achieve a particular goal.
Can data really empower people?
To better understand the opportunities and challenges of big data, we need to unpack questions related to empowerment. Who has the knowledge? The access? Who can use the data? Salon participants emphasized that change doesn’t come by merely having data. Rather it’s about using big data as an advocacy tool to tell the world to change processes and to put things normally left unsaid on the table for discussion and action. It is also about decisions and getting ‘big data’ to the ‘small world,’ e.g., the local level. According to some, this should be the priority of ‘big data for development’ actors over the next 5 years.
Though some participants at the Salon felt that data on their own do not empower individuals; others noted that knowing your credit score or tracking how much you are eating or exercising can indeed be empowering to individuals. In addition, the process of gathering data can help communities understand their own realities better, build their self-esteem and analytical capacities, and contribute to achieving a more level playing field when they are advocating for their rights or for a budget or service. As one Salon participant said, most communities have information but are not perceived to havedata unless they collect it using ‘Western’ methods. Having data to support and back information, opinions and demands can serve communities in negotiations with entities that wield more power. (See the book “Who Counts, the power of participatory statistics” on how to work with communities to create ‘data’ from participatory approaches).
On the other hand, data are not enough if there is no political will to make change to respond to the data and to the requests or demands being made based on the data. As one Salon participant said: “giving someone a data set doesn’t change politics.”
Should we all jump on the data bandwagon?
Both discussants and participants made a plea to ‘practice safe statistics!’ Human rights organizations wander in and out of statistics and don’t really understand how it works, said one person. ‘You wouldn’t go to court without a lawyer, so don’t try to use big data unless you can ensure it’s valid and you know how to manage it.’ If organizations plan to work with data, they should have statisticians and/or data scientists on staff or on call as partners and collaborators. Lack of basic statistical literacy is a huge issue amongst the general population and within many organizations, thought leaders, and journalists, and this can be dangerous.
As big data becomes more trendy, the risk of misinterpretation is growing, and we need to place more attention on the responsible use of statistics and data or we may end up harming people by bad decisions. ‘Everyone thinks they are experts who can handle statistics – bias, collection, correlation’ these days. And ‘as a general rule, no matter how many times you say the data show possible correlation not causality, the public will understand that there is causality,’ commented one discussant. And generally, he noted, ‘when people look at data, they believe them as truth because they include numbers, statistics, science.’ Greater statistical literacy could help people to not just read or access data and information but to use them wisely, to understand and question how data are interpreted, and to detect political or other biases. What’s more, organizations today are asking questions about big data that have been on statisticians’ minds for a very long time, so reaching out to those who understand these issues can be useful to avoid repeating mistakes and re-learning lessons that have already been well-documented.
This poor statistical literacy becomes a serious ethical issue when data are used to determine funding or actions that impact on people’s lives, or when they are shared openly, accidentally or in ways that are unethical. In addition, privacy and protection are critical elements in using and working with data about people, especially when the data involve vulnerable populations. Organizations can face legal action and liability suits if their data put people at harm, as one Salon participant noted. ‘An organization could even be accused of manslaughter… and I’m speaking from experience,’ she added.
What can we do to move forward?
Some potential actions for moving forward included:
- Emphasis with donors that having big data does not mean that in order to cut costs, you should eliminate community level processes related to data collection, interpretation, analysis, and ownership;
- Evaluations and literature/documentation on the effectiveness of different tools and methods, and when and in which contexts they might be applicable, including things like cost-benefit analyses of using big data and evaluation of its impact on development/on communities whencombined with community level processes vs used alone/without community involvement – practitioner gut feelings are that big data without community involvement is irresponsible and ineffective in terms of resilience, and it would be good to have evidence to help validate or disprove this;
- More and better tools and resources to support data collection, visualization and use and to help organizations with risk analysis, privacy impact assessments, strategies and planning around use of big data; case studies and a place to share and engage with peers, creation of a ‘cook book’ to help organizations understand the ingredients, tools, processes of using data/big data in their work;
- ‘Normative conventions’ on how big data should be used to avoid falling into tech-driven dystopia;
- Greater capacity for ‘safe statistics’ among organizations;
- A community space where frank and open conversations around data/big data can occur in an ongoing way with the right range of people and cross-section of experiences and expertise from business, data, organizations, etc.
We touched upon all types of data and various levels of data usage for a huge range of purposes at the two Salons. One closing thought was around the importance of having a solid idea of what questions we trying to answer before moving on to collecting data, and then understanding what data collection methods are adequate for our purpose, what ICT tools are right for which data collection and interpretation methods, what will done with the data/what is the purpose of collecting data, how we’ll interpret them, and how data will be shared, with whom, and in what format.
Thanks to participants and lead discussants for the fantastic exchange, and a big thank you to ThoughtWorks for hosting us at their offices for this Salon. Thanks also to Hunter Goldman, Elizabeth Eagen and Emmanuel Letouzé for their support developing this Salon topic, and to Somto Fab-Ukozor for support with notes and the summary. Salons are held under Chatham House Rule, therefore no attribution has been made in this post. If you’d like to attend future Salons, sign up here!