Frontier Data Study – Global Stakeholder Survey Background
The UK Department for International Development (DFID) has commissioned the NIRAS Data Futures Hub to carry out a study on how best to use big data in development. We have developed an on-line survey to help us understand the very latest knowledge and unpick some of the outstanding challenges. The focus of the study is on finding practical solutions to development challenges based on the better integration of big data opportunities with other data sources, including the minimisation of risk. We are looking at what big data tells us about how we should navigate the wide frontier of a rapidly changing wider data landscape.
We hope the survey and the study will help drive forward the global debate on the use of big data in development and help everyone optimise the emerging opportunities in the data landscape.
We are interested in views from a wide range of stakeholders, including those using or affected by data, those producing it, and those who are innovating and setting the possibilities for the future.
The Frontier Data Study will be publicly available in early in 2020, with your contributions incorporated, and acknowledged with your agreement.
This page provides some of our background thinking to help you answer the questions. We hope that by taking part, you and/or your organisation will:
- Develop your views on the outstanding issues that need to be addressed
- Help generate practical guidance for development practitioners and others who use data for social good
The study’s web-page already has some useful resources including:
- bibliography of research that has influenced the survey questions
- interactive map of case studies of where big data is being applied to development challenges. Feel free to add to the list (click here) and help us map out the learning.
If you would like to set up a one-to-one discussion, please email Matthew Shearing, at email@example.com
Many thanks for your help in pushing forward this important agenda. Do read through this web page before answering the survey.
Defining our challenges
Big data is now a commonly used term. But it is still unclear whether it is useful to define it differently from other data sources. There have also been lots of attempt to categorise different types of big data, but again it is unclear how this is helpful in practical terms when it comes to addressing development or policy needs.
The definition of big data has been widely discussed and is evolving. While there has been much discussions of defining big data in terms of a number of Vs (volume, velocity, variety, veracity, value), we may consider that the boundaries between ‘big’ and other data are ambiguous and changing. For example, the digitisation of administration is starting to include AI approaches to provision of social welfare, while in other cases people are being encouraged to use social media channels to participate in decision-making.
It is also likely that the best uses of big data are in combination with other data and in contributing to wider data ecosystems, such as national statistical systems. When considering the challenges big data represents, we may benefit from thinking about how, as development practitioners, official statisticians, data users/beneficiaries, and data producers/innovators, we adapt more broadly to the opportunities of new data sources, new technologies, and new sociologies of data/development (such as the way we form partnerships) rather than isolating attention on big data as one component.
Some research points us towards distinguishing between ‘digital data’ (such as big data and Open data) and traditional data (such as official statistics and traditional Monitoring and Evaluation data). There also seems to be a divide in how we allocate our human capital respectively between these categories; ‘data scientists’ to the former and statisticians/ traditional research disciplines to the latter. But will this continue to be useful? A lot of traditional sources are becoming digitised themselves, and the boundary will become more ambiguous.
To create the right data solutions to pressing problems in human development, including the Sustainable Development Goals (SDGs), perhaps a focus on ‘big data’ is not optimal. But understanding how best we categorise within and between data sources may still be important.
A simple categorisation of big data could be as per the diagram below. But for each category we can anticipate examples (some are happening already) where both categorisations apply, either in the nature of the data source or because we need to combine them for a robust solution to a challenge:
Source: NIRAS Data Futures Hub
A more systematic approach in terms of fitting big data into the wider digital landscape may be as follows:
Source: IFAD/Michael Bamberger
Refining the evidence-base
There is already lots of exciting work available to help us understand what can be done with big data. But there appears to be little evidence of how the potential opportunities have been used to systematically improve decision-making in development programmes and policymaking. At the same time, we need to move towards prioritisation of the most promising opportunities.
Our research aims to develop guidance for DFID project designers in how to select and combine data sources and techniques. Such guidance will also broadly apply to the challenges of government policy implementation and official statistics.
On one hand, development project and policy objectives should be informed by data availability, ie what is measurable. But, on the other hand, the needs of data-users should also drive how we innovate within the possibilities within the data landscape; often this crucial user-driven approach is missing in approaches to new opportunities for data in international development.
There are many examples of how new techniques in using big data and other emerging sources could be applied or have been applied to development projects/challenges. But we need to move towards interpreting how these examples can inform the development of better models of project or policy design which users/decision-makers could systematically apply to different sectors of development.
We also need to prioritise and consider what is going to be most useful in the near future, moving forward with implementation models, while identifying what’s on the frontier and requires careful monitoring (such as related to quantum computing).
Dealing with the challenges
Big data brings big challenges in terms of data quality, including ethics and the leaving no-one behind agenda. There is also a significant challenge in terms of developing and maintaining appropriate technical and legal capabilities.
Data quality, including ethics and integrity, is a significant issue with big data and other emerging possibilities in the data landscape. By contrast, official statisticians have spent many decades refining approaches to traditional data that seek to optimise quality to a standard which supports effective decision-making. Using big data effectively may take many years to address, and, even then, there may be limited examples where we can be confident of using it.
Perhaps the most important lesson from official statistics is to consider that data quality is about understanding the strengths and weaknesses of the data, mitigating risks, and making transparent trade-offs that will inform decision-making. We can then see a framework for a future research agenda around big data sources.
For data in development, below is a summary of dimensions of data quality we need to consider:
There are also a number of cross-cutting challenges around how we develop institutions and inter-institutional cooperation to address these issues efficiently. These include human resource development (both as methodologists and in terms of statistical literacy for data users), internal and external coordination, development of data/meta-data management systems, and legal frameworks for cooperation/data-sharing.
Ethical considerations are worth underlining, given not only their intrinsic value but also importance to effective international development and sustainable data collection. There is a new range of challenges emerging around unintended consequences, commercial/political use of personal data, transparency, manipulation of data, ‘Fake News’, psychographic targeting; algorithmic and technological biases/ errors etc.