Surveys , social licence and the Integrated Data Infrastructure

INTRODUCTION: Statistics New Zealand’s Integrated Data Infrastructure (IDI) is a central repository for researchers to access multiple government agency datasets. The aim of this investigation was to understand social licence for including survey data in the IDI. METHODS: Two convenience samples were recruited: (1) participants in one of 10 focus groups; and (2) respondents to pilot surveys for the 2018 NZ census or a population-based survey on violence experience. Qualitative data were transcribed and analysed using thematic analysis. Analyses were conducted independently by two members of the research team and


Introduction
" [The] quest to understand and explain what works and for whom in what circumstances underpins the notion of evidence-based policy making" (Sanderson, 2002). In 2014, Statistics New Zealand (StatsNZ, Aotearoa New Zealand's (NZ's) independent agency for the collection and delivery of robust, independent statistics (Statistics New Zealand, 2017c)) launched the Integrated Data Infrastructure (IDI), which "contains person-centred data from a range of government agencies, Statistics NZ surveys, and non-government organisations" (see Figure 1). The IDI provides a "one-stopshop" for researchers to access multiple government datasets to understand health, justice, education, social services and income outcomes. While linkage is undertaken at the individual level, analysis occurs at the aggregate level by academic and policy development researchers (Statistics New Zealand, 2017a). The creation of linked datasets for research purposes is not new (Holman et al., 2008). However, the difference between bespoke linkages to answer specific research questions and large-scale, linked datasets developed as a repository lies in the use, re-use, and QUALITATIVE RESEARCH ORIGINAL ARTICLE potential unknown future use of such data sets.
To date, the IDI has largely eluded the NZ public's consciousness. However, it is increasingly used as evidence for policy decisions, such as restructuring funding mechanisms for state-funded schools (Edwards, 2016). Governmental support for the IDI has resulted in pressure on nongovernmental organisations and researchers in receipt of government funding to make client databases available for inclusion in the IDI as a condition of their funding (Kirk, 2016). Since the change in government following the 2017 general election, the discourse around the use of large-scale, linked data sets has subtly changed: "while the numbers are critical, the insight gained from doing the analysis is just as important" (Social Investment Agency, 2018). Further, the NZ government has launched an investigation into how "government currently uses algorithms, to give New Zealanders confidence that their data is being used appropriately" (Curran & Shaw, 2018).
The principles and regulatory mechanisms governing the establishment and use of large-scale, linked datasets such as the IDI are subject to increasing debate (Casanovas, De Koker, Meddelson, & Watts, 2017). In NZ, four Health and Disability Ethics Committees (HDECs) review health and disability research. However, secondary analysis of de-identified administrative data for observational studies is exempt from HDEC review (Health and Disability Ethics Committee, 2014). Because personally identifying information within the IDI is withheld or encrypted, analyses utilising IDI data are considered exempt. While some users of the IDI, particularly university researchers, are required to obtain ethics approval through university ethics committees, StatsNZ does not require approval from an ethical review committee

QUALITATIVE RESEARCH
prior to granting access to the IDI, or peer review to ensure research rigour.
There are, however, safety mechanisms in place for use of the IDI: • Referee checks are conducted on researchers; • Only data necessary to answer research questions are made available; • Access to the data is through a controlled data environment; • Only research questions considered to have a wider public interest are considered. • Output from research conducted using IDI data is checked to ensure results are not personally identifiable (Statistics New Zealand, 2017a).
International experiences reveal challenges for broad-scale data linkage and use. Developing and employing large datasets containing microdata can engender public distrust (Presser, Hruskova, Rowbottom, & Kancir, 2015), and political anxiety about invasion of privacy (Dudley-Nicholson, 2016). Recent English experience illustrates how a failure to secure public trust can fatally jeopardise efforts to harness the potential of 'big data'. Care.data was a central repository of health and social care data from all English National Health Service funded care settings, developed primarily for research and evaluation purposes. Established in 2013, its implementation became so problematic that it was discontinued in 2014. Contributing issues included inadequate management and communication, a contradictory legislative environment with respect to patient confidentiality (Presser et al., 2015), the intention for Care.data to be accessible to private companies, and an all-or-nothing opt-out consent process (Shaw, 2014). Its collapse was attributed to failure to attend to levels of public trust in health services and research, and the conditions upon which that trust is based, combined with scepticism about Care.data's public good orientation. The concept of social licence has been employed to explain the failure of Care. data, (Carter, Laurie, & Dixon-Woods, 2014). It is also invoked by StatsNZ as a necessary condition of the success of the IDI.
The aim of the current investigation was to elucidate the extent of social licence for including survey data in the IDI. As with related concepts such as trust, social licence is not straightforward to delineate or measure (Boutilier & Thomson, 2011). Both the concept of trust and the requirements of social licence are likely to: mean different things to different people (Rooney, Leach, & Ashworth, 2014); vary by context (Hall, Lacey, Carr-Cornish, & Dowd, 2015) and be highly sensitive to changing public perceptions (Heikkinen, Lepy, Sarkki, & Komu, 2016). Despite its ineffability, experience drawn from a range of fields suggests that social licence can be a decisive force in allowing or preventing a range of activities (Rooney et al., 2014). Therefore, it is useful to understand social licence as it pertains to the IDI and what conditions or parameters members of the public might place upon it, whilst acknowledging its dynamic, variable nature.

Social licence
The concept of social licence derives from the work of sociologist Everett Hughes (Hughes, 1958), who explored the conditions under which society was prepared to afford professions permission to adopt practices that violate accepted social norms without incurring social sanction. This permission constitutes social licence -it implies a mandate, empowering the licensed agent to ask things of others in relation to the licensed practice (Hughes, 1958, p. 78).
Social licence has risen to prominence particularly within industries and enterprises that impose harms upon resources or communities, such as mining, forestry and fishing. A "social licence to operate" is predicated upon an agent meeting "the expectations of society regarding the conduct and activities of corporations that go beyond

ORIGINAL ARTICLE
the requirements of formal regulation" (Carter et al., 2014), p. 404). It emerges from a process. Trust is central in enabling and sustaining social licence, representing a willingness to accept vulnerability to the actions of another in some domain (Mollering, 2006;Rousseau, Sitkin, Burt, & Camerer, 1998). We were interested to learn how people conceived themselves and others as vulnerable, and what determined willingness to accept vulnerability (and therefore trust the process) and confer social licence.
We adopted the following working definition of social licence: Societal acceptance that a practice that lies outside general norms may be performed by a certain agent, on certain terms. It is the result of an ongoing process of negotiating terms with a wider societal group, and means that the practice can be performed by that agent without incurring social sanction. Social licence confers a mandate upon the licensee to ask things of others in relation to the licensed practice.
Our definition makes explicit that the practice under consideration lies outside of general norms. Specifically, the linkage of individual-level information collected by government agencies without explicit knowledge of the people from whom data were collected. It differs from the definition adopted by the Data Futures Partnership which limited social licence to "When people trust that their data will be used as they have agreed, and accepted that enough value will be created, they are likely to be comfortable with its use" (Data Futures Partnership, 2016).
It is important to note at the outset that this work is focussed specifically on social licence as it relates to the linkage of data for analysis at the aggregate level for research and policy development purposes. This differs from the sharing of information for the purposes of service delivery, a process regularly undertaken, often with the explicit consent from the individual and understanding of the process involved.

Methods
As social licence cannot be conferred if the relevant community is unaware of the agency seeking it, the research team were interested in investigating whether members of the NZ public were aware that the IDI existed; were able to understand the practice lying outside social norms involved in using the IDI; and the terms considered necessary to engage in this practice without incurring social sanction. Rather than attempting to quantify the overall level of social licence, we investigated awareness of, and attitudes towards, the IDI amongst two separately selected convenience samples.
One comprised respondents to one of two surveys with different levels of perceived sensitivity; the other were participants in focus groups.
Respondents to the survey were "primed" about the nature of the survey by answering QUALITATIVE RESEARCH a series of questions, administered faceto-face by an interviewer. The goal was to determine if the nature of the survey was associated with the likelihood of consenting to linkage. Survey 1 was a subset of census questions administered by StatsNZ interviewers (n = 31 participants); Survey 2 was a subset of questions designed to assess violence exposure in the general population (n = 32 participants). Age, sex and ethnicity of survey participants are presented in Table 1. There is an important distinction in the recruitment methods employed by StatsNZ and the university. The university research team was particularly interested in obtaining feedback from sections of the population who are either over-represented in violence statistics, or for whom very little information about the prevalence of violence is reported. As such, recruitment methods were designed to encourage participation from Māori, LGBTIQ+ and disabled members of the community.
Following the survey questions, participants were asked open-ended questions about: a) whether they would consent to have the information they had just provided uploaded and linked with government agency data; b) their reasons to provide or withhold consent; c) what safeguards would favourably influence their decision; and d) what information about process (where would the data be stored) and usage (access rights, long-term storage) would assist an informed choice.
Semi-structured focus groups were conducted to contribute to the design of a population-based survey on violence exposure. Participants were asked about: 1. Their understanding of the information NZ government agencies hold about individuals and what is done with it. 2. The degree to which they considered it acceptable to link survey data with information held by government agencies: a in general; b if they were data participants had personally provided (to the survey); c if the data were anonymised; d if the focus of the research was at the population level, rather than focusing on individuals; e if the information they provided could be used again in the future for unknown purposes. 3. Their views on the safeguards that should be in place for the data to be made available for other researchers. 4. Who should store the combined administrative and survey-based data. 5. Whether there are types of data that they felt could be shared without seeking individual permission, or types of data that it would be wrong to seek permission to share.
Interview data were recorded with an electronic data capture programme, REDCap (P. A. Harris et al., 2009) and hand-written notes. Focus group discussions were digitally recorded and transcribed. All data were analysed using thematic analysis, an inductive method of analysis which explores the manifest (content that is noted or mentioned directly by respondents) and latent (implicit or underlying) themes. Analyses were conducted independently by two members of the research team and results compared. Where interpretation differed, the analyses were brought to the wider research team and resolved through discussion.

ORIGINAL ARTICLE
Results from interviews and focus group discussions were analysed and are presented separately.
Ethics approval for this investigation was granted by the University of Auckland Human Participants Ethics Committee (ref 017300). All participants provided informed consent prior to participating.

Arm 1: Interviews with Survey Respondents.
Most (81% of census and 70% of violence survey) survey respondents indicated they would be willing to have their completed survey linked with government agency data. Where participants did not consent to linkage, they were more likely to have disclosed a social norm breach. Violence survey participants were asked about their pornography use -30% of those who reported pornography use would consent compared with 63% of those who reported no pornography use.
Amongst those willing to consent to data linkage there was a perception that this would "help the government make better decisions" (Survey respondent 1 [SR1]) and "provides access to more comprehensive data" [SR6]. Respondents to both surveys also indicated their data were "already out there" [SR23], alluding to information collected through social media, or that they "didn't really have anything to hide" [SR18]. Those who would not consent were concerned with how their information would be used, who would access it, and whether they would be identified: Not right to link up my story from separate sources. [SR32] No knowledge of who may access it. Concern around confidentiality, inappropriate use, lack of clarity around destruction dates. [SR24] A total of 23% of the violence survey and 45% of census survey participants wanted additional information to help them decide whether they would consent to having their survey data linked with government agency data, including who would access it, what it would be used for, and what protections were in place: Who can access government agency data? What will government use it for? [SR3] Explain the purpose and process of data sharing in the IDI and a guarantee around privacy. [SR16] Survey participants were provided with a range of options that might influence their willingness to provide consent to have their information linked. The proportion of participants from each survey who

Option Census Sample Sensitive Sample
Assurance that my name and address will be removed from the data I provide 74% 81% Guarantee that the information is only available to bona fide researchers 65% 65% Freedom to be able to withdraw the data whenever I want 58% 58% Knowledge that the information was held by Statistics New Zealand 58% 50% Assurance that the information would be destroyed after a set period of time 52% 46% Knowledge that the information was held by a university 39% 46% QUALITATIVE RESEARCH agreed to each option (more than one could be selected) are summarised in Table 2. Across the two groups, there was consistent agreement with the importance of knowing that their name and address would be removed, control over access, and being able to withdraw the data.

Arm 2: Focus Groups:
All focus group participants were aware that government agencies collected information about them. There was acceptance of information sharing for the purposes of service provision. However, few participants were aware of the IDI. Extended discussion was needed to shift thinking from individual-level data sharing for service delivery, to population-level data linkage for service and policy design.
Five key themes emerged from discussions: (1) good quality data are important; (2) understanding the context of data collection; (3) privacy is important; (4) oversight of the researchers is required; and (5)  The pervading concern expressed by these groups was that the IDI could ultimately use incorrectly recorded data. This would be amplified for community members with a higher degree of interaction with government agencies, who subsequently have more information collected about them.

Understanding the context of data collection
A key concern for Māori was understanding the context of data and collection and understanding the meaning of data. These concerns were mirrored by representatives of marginalised populations as well:

Discussion
The input from our participants provides insights into the nature of the vulnerabilities those living in New Zealand perceive in relation to the IDI, along with the conditions under which they may be willing to make themselves vulnerable and place trust in those analysing and using the data for the sake of the good that they anticipate may be derived from use of the IDI. This informs an understanding of the conditions that may be placed upon the IDI's social licence. Whilst little prior awareness of the IDI existed amongst our participants, they developed considered judgements about it, identifying concerns and proposing safeguards that would encourage them to support its maintenance and use. Fairly high levels of institutional trust in the integrity and competence of StatsNZ and related agencies present in our sample were tempered with suspicion, often borne from experience. This reinforces the view that social licence is an on-going process of negotiation, dependent on judgements over an agent's integrity and competence (Butler & Cantrell, 1984;Mayer, Davis, & Schoorman, 1995).
The expressed willingness to grant social licence, whilst strong, was not unconditional: guarantees are needed about data quality; researcher awareness of, and sensitivity to, the structural and societal context surrounding data; and that individuals will not be identifiable. Robust oversight mechanisms are required to ensure appropriate use of this taonga (treasure / gift). Further, Māori expressed concern about how decisions around data use were to be made. This speaks to the notions of data ownership and data sovereignty.
A strength of the research presented is the methods used to derive an understanding of social licence for linking data in different contexts. When interview participants were provided with the opportunity to report a social norm breach (use of pornography) in confidence, and then asked about whether they would consent to having that data QUALITATIVE RESEARCH

ORIGINAL ARTICLE
linked with government agency data, they were less likely to do so. These findings suggests there are limits to acceptable sharing of data. Where those data have the potential to be damaging in the future, it appears that individuals are less likely to trust the process. Indeed, those who expressed fewer concerns about sharing data indicated they had "nothing to hide", implying no perceived breaches of social norms. The focus group discussions allowed a discussion amongst peers about these limits. For example, the men's and young mother's focus groups moved from indicating they had "nothing to hide" to discussing concerns around the use of information they did consider sensitive (financial data and health disorders, respectively).

Good quality data are important
Government agency administrative data form the basis of the IDI. As administrative data vary in quality (Daas, Ossen, & Tennekes, 2010), problems with the raw data collected from service-level interactions is subsequently embedded in the IDI, and carried over into future use. Participants with significant experience of government agencies were readily able to identify this risk.
To date, the IDI has been used as the basis for two predictive risk models (for child maltreatment, and educational non-achievement) that, if implemented, could influence a child's life trajectory. Although developed at the population level, implementation of predictive models at the individual level has been contemplated. Therefore, despite the application of StatsNZ's safety mechanisms that prevent the publication of results that could be personally identifiable, it is possible that the application of research derived from the IDI could impact at the individual level. It is well established that population-level risk measures seldom translate well to the individual level (Rockhill, Kawachi, & Colditz, 2000). The failings of population-level risk prediction based on suboptimal data will be greater still. Examples of incomplete or inconsistent reporting are easily identified in NZ (Gulliver, Cryer, & Langley, 2013;Mansell, Ota, Erasmus, & Marks, 2011) as they are in the majority of developed countries (Jansen, 2012). Because of the non-random nature of missing data in the majority of social measures, imputation is not appropriate where data are absent (Sterne et al., 2009). The drive for the use of linked datasets for policy development needs to be balanced with investment to ensure good quality data are collected, recorded, and transferred appropriately. Without this, there is a risk that both competence-based and integrity-based elements of public trust will be eroded, and social licence correspondingly compromised. Our findings demonstrate the permeable interface between trust of government agencies and the IDI itself: participants' views about the competence and integrity of agencies they had dealings with informed their views about the IDI. Securing social licence for the IDI requires attention to data integrity at the provider agency level, as well as within the IDI.

Context of data collection is important
The promise of the IDI is that previously elusive associations will be revealed through data linkage. However, to understand and interpret identified associations, the original context and purpose of data collection must be understood. Participants who had experienced unfair assumptions by agencies were particularly likely to identify this need.
In NZ, Māori are over-represented in the majority of adverse outcomes (McIntosh, 2011;McIntosh & Coster, 2017), and are more likely to come into contact with government agencies, increasing the data collected about them. Understanding the drivers for over-representation is important (Cram, Gulliver, Wilson, & Ota, 2015). Institutional racism, enacted in government policies, and interpersonal racism, enacted through discriminatory behaviour, is demonstrated to be negatively associated with health

QUALITATIVE RESEARCH
outcomes for Māori (R. Harris et al., 2006) and Indigenous and minority populations elsewhere (Jones, 2001). Indeed, some key determinants of health outcomes (ethnicity and gender) are socially defined constructs which influence life experiences (Jones, 2001).
In countries like NZ, whose colonial history impacts upon current health and social outcomes, power within the research space is an important contextual issue. The Treaty of Waitangi establishes the responsibility of the Crown to uphold the rights of Māori. Te Mana Rauranga (the Māori Data Sovereignty Network) have identified three areas of focus to uphold Treaty obligations with respect to the IDI. These include social licence, the expectation that the government will act in the interests of Māori; cultural licence, the impact of data integration and sharing on the social contract that exists through the Treaty; and Māori data sovereignty, recognising that Māori data should be subject to Māori governance (Hudson, 2016).
Māori are often not included in research design even where research focuses upon them, or is particularly salient to Māori. Utilising deficits-focussed data is likely to reproduce deficit perspectives and further entrench power differentials (Walter, 2016).
Recently, StatsNZ has opened its first international data lab, providing access to the IDI for researchers outside NZ (Statistics New Zealand, 2017b). This underscores the need to ensure that researchers understand the NZ cultural context.

Privacy is important
Privacy is central to research ethics, and is established in NZ law through the Privacy Act (1993). Participant concern that data should not enable individuals to be identified is reflected in Principle Ten of the Privacy Act ("Privacy Act," 1993), which states that information obtained by an agency for a given purpose should not be used for another purpose, unless one of seven permitted exceptions applies, including that the information: "is used in a form in which the individual concerned is not identified" (10.(f) (i)); or "is used for statistical or research purposes and will not be published in a form that could reasonably be expected to identify the individual concerned" (10.(f) (ii)). Principle Three states that, where personal information is collected from a person, the agency must take reasonable steps to ensure that the person is aware that information is being collected (3.
While StatsNZ apply safe reporting practices to protect privacy (Statistics New Zealand, 2017a), the potential for re-identification is real (Malin, Karp, & Scheuermann, 2015). Attention to identifiability reflects the role that privacy is seen to play in protecting individuals from adverse consequences. Such concerns might be indicated in the apparent association between disclosed social norm breach (pornography use) and lack of willingness to consent to survey linkage revealed in our findings. This suggests that ensuring information in the IDI cannot be linked back to individuals will be crucial in securing and maintaining social licence for including survey data in the IDI. However reasons to protect privacy extend beyond adverse consequences for the individual into respect for persons (Benn, 1984;Fried, 1968); preserving liberty (Hallborg, 1986); and delimiting state power (Solove, 2007). The preference our participants expressed for IDI data use to be predicated upon consent suggests that their concerns about privacy extend beyond direct adverse consequences into the social norms establishing the terms of engagement between state and individual. Privacy is often seen as an umbrella concept with implications for how information is collected, stored, used, disseminated and applied (Solove 2007). Whilst research conducted using the IDI has been at the population level, some proposed applications are at the individual level QUALITATIVE RESEARCH ORIGINAL ARTICLE (Edwards, 2016). The concern about privacy returns the focus to the strong links between trust in the institutions using the data and granting social licence for the IDI. Concerns expressed that such information could once again be used to discriminate and reinforce prejudices are based on historical realities rather than abstract paranoia.

Who watches the researchers?
Oversight from an independent, trusted agency, was identified by some participants as a way to minimise potential harms. This finding resonates with experience from mining, in which the existence of a trusted agency with decision-making rights has been seen to support social licence (Prno, 2013). One response to the risks our participants identified would be to require ethics committee approval to access the IDI. Training for committee members in the potential harms and benefits of statistical analysis of large de-identified population datasets would be necessary to ensure that the review process provided the intended protections. However, this measure cannot guarantee protection if the data collected were already prejudiced. Māori have critiqued university ethics procedures as "Eurocentric", privileging liberal notions of the "autonomous individual participant" rather than considering collectivist constructs to guide the research process, resulting in a "condescending ethos" (Tuari, 2014, p.134 (2016) acknowledged the importance of consent as a consistent theme of public responses to the linkage of health data. However, the authors also highlighted that the need for consent appeared to be strongly associated with trust in the institutions, organisations or individuals involved in processing or accessing their data: …rather than focussing on which consent mechanisms are most favoured by members of the public, it may be more valuable to focus on how relationships of trust are built up (and conversely eroded) and how trust can be facilitated within research and data-sharing or data-linkage processes including through public / patient engagement or involvement. (Aitken et al., 2016, p. 15) Once again, this highlights the transient nature of social licence for data linkage if agencies and researchers do not behave as good stewards of the taonga.
Transparency and public engagement appear to have been at the forefront of data linkage initiatives in Scotland, drawing on lessons from the English Care.data experience, and seeking to maintain social licence. The Farr Institute of Health Informatics Research holds four regional public panels that scrutinise and advise on governance systems, public engagement plans and research practices, as well as additional virtual panels, forums and public panels (Farr Institute). The Scottish Primary Care Information

QUALITATIVE RESEARCH
Resource has an independent advisory group that reviews requests for the use of data, as well as an opt-out process (NHS National Services Scotland, 2016). The advisory group includes patient representatives, general practitioners and specialist confidentiality advisors.

Yes, I am happy, with my consent
Many participants expected consent to be sought for data use. This was perceived as important even when participants were willing to consent. This may reflect the idea that respect for persons requires acknowledgement of their rights and interests, and consequently permission for activities that may impact upon them (Benn 1984). The same ideas underpin social licence processes and align strongly with recognition of the significance of Indigenous Data Sovereignty. Data Sovereignty is linked with Indigenous Peoples' rights to: maintain, control, protect, and develop their cultural heritage, traditional knowledge and traditional cultural expressions, as well as their right to maintain, control, protect and develop their intellectual property over these… If Indigenous Peoples have control over what and how data and knowledge will be generated, analysed and documented, and over the dissemination and use of these, positive results can come about. (Kukutai & Taylor, 2016, p. xxii) The consent condition may also reflect a concern for transparency as a check upon agency power, a mark of respect for citizens and a way of protecting public good orientation (Taylor, 2011). The expectation for individual-level consent is a challenge to projects like the IDI, and one that requires serious reflection from agencies such as StatsNZ. Whilst our participants expressed an expectation to consent, they also appreciated the logistical barriers. Whilst individual consent for use of all data is not feasible, governance structures involving lay members and ensuring representation of special interest groups are feasible and embody respect for those whose information is contained within the IDI, and whose interests may be affected by its use. In support of Te Mana Rauranga (Hudson, 2016), we argue that the Treaty of Waitangi, which guarantees Māori control over taonga (treasures), requires strong Māori input into how the IDI is used.

Conclusion
As Rooney et al. (2014, p. 211) have observed, "in many vitally important respects a [social licence] is constituted by knowledge and meaning, rather than by legal documents and permits instituted through a bureaucratic-administrative mechanism." Participants in this study appreciated the purpose and potential of the IDI when it was explained to them, but they lacked pre-existing knowledge of it. They identified concerns and suggested safeguards that would reassure them sufficiently to consent to inclusion of their data within the IDI. We conclude that, while there is the potential for social licence to be granted for the IDI, an on-going, transparent engagement process is also required that provides individuals with the ability to interact with research and policy initiatives being developed in this space. As an over-represented population within government agency data, active, honest engagement is required with Māori, and safeguards to reduce the risks of further stigmatisation and marginalisation are required.