1 Introduction
1.1 Overview of the Family Whānau and Wellbeing Project and the New Zealand Census of Population and Dwellings
1.1.1 Research aim
The Family Whānau and Wellbeing Project (FWWP) is a five-year research programme supported from the Social Science funding pool of the Foundation for Research, Science and Technology. The principal goal of this programme is to develop ways to examine and monitor social and economic determinants of family and whānau wellbeing and how these determinants have changed over the 1981–2006 period.
1.1.2 Defining family and wellbeing
Individuals in a familial relationship are defined by Statistics New Zealand as people in a relationship in which ‘a person is related to another person by blood, registered marriage, civil union, consensual union, fostering or adoption.’ (Statistics New Zealand 2006). Family can be further defined as a group of people where resources are shared, including personal, social and material resources, and interdependency exists between the family members.
A full discussion on the definition of wellbeing is available in another publication of this project: Family Wellbeing Indicators from the 1981–2001 New Zealand Censuses, (Milligan, Fabian et al. 2006). In summary, however, wellbeing is linked to quality of life and according to Hird (Hird 2003), can be broken down into two types, subjective and objective wellbeing, which affect each other.
Objective wellbeing is the focus of this project and includes factors such as physical, developmental and activity-based, material, social and emotional wellbeing. All these types of wellbeing have tangible outputs that can be measured, for example, income, access to telecommunications, heating of dwellings, educational qualifications, etc. This report looks at some of the variables that are available from the New Zealand Census of Population and Dwellings that can be used to measure wellbeing.
1.1.3 The New Zealand Census of Population and Dwellings
The New Zealand Census of Population and Dwellings is a self-administered repeated cross-sectional survey of the entire population of New Zealand. The Statistics Act 1975 prescribes that a census be conducted every five years, and provides an outline of census content (Statistics New Zealand 1998).
Purpose and use in time series analysis
The primary purpose of the census is to provide social, economic and demographic information on the people of New Zealand at a given point in time. This information is used by a variety of organisations to describe the present, to analyse trends and to plan for the future. The census has been described as ‘a primary source of information on the size, composition, distribution, economic activities and state of wellbeing of the population’ (Statistics New Zealand 1998). Respondents are required by law to respond to the census.
The New Zealand census also aims to provide data on a consistent number of measures so that social change may be monitored (Statistics New Zealand 1998; Statistics New Zealand 2003). Utilising census data enables us to create a historical time series for people living in New Zealand at the time of each census. It also allows us to construct benchmarks of family and household wellbeing and to compare levels of wellbeing across time.
The census has dual aims (providing relevant information and historical continuity) that can be in conflict with each other, and these aims need to be balanced when deciding upon the specific content of each census. This is because as society changes, data needs also change. Some topics become less relevant to society and other areas become more important. Similarly, things that were considered important to the quality of life twenty years ago may no longer be considered as important today. The census needs to keep pace with social change, but also provide the tools to monitor social change. This tension has been acknowledged by Statistics New Zealand (Statistics New Zealand 1998; Statistics New Zealand 2003) and referred to in depth by Morrison (Morrison 1991).
In determining the content of the census a number of factors have to be kept in mind. The topic needs to be publicly acceptable and have significant community value, the census has to be the most appropriate information source, and inclusion of the topic has to produce high quality information (Statistics New Zealand 2003). For these reasons, census questions (and the variables subsequently constructed from them) may change from census to census, limiting comparability of information between various census years.
Advantages of using the census
- The census achieves broad (almost universal) coverage.
- The census is unique in its ability to provide information on a variety of small groups within the population and small area data (Statistics New Zealand 1997).
- The census is not subject to sampling error. Sampling error is the measure of the variability that occurs because a sample, rather than the entire population, is surveyed.
- Census information is available for a wide variety of topics, spanning many different areas of social concern (for example, income, education, work, housing, assets and health). Other surveys (such as the Household Economic Survey and the Household Labour Force Survey) have a more narrow focus.
- A lot of contextual information is available from census data.
- Information is collected from the entire household, rather than just particular individuals (as is often the case in sample surveys).
- Family and household identifiers are attached to unit records, making it easy to collate information at both the household and family levels.
- Compared with other sources (such as ad hoc surveys, or using data from multiple sources) there is considerable consistency in the information sought between censuses. Statutory requirements ensure certain information must be collected in each census. A major emphasis within Statistics New Zealand is on continuity and consistency of data, making it easier to construct, analyse and interpret time series information.
- Compared with using multiple sources of data, there is reasonable consistency in the type of survey and method of collection. This makes comparing results from period to period easier, as the same potential biases occur within each census dataset. This means the researcher does not have to try to evaluate how other types of biases resulting from different research methods have affected variable measurement.
- A considerable body of metadata surrounds the census, which aids in the analysis and interpretation of variables and data.
- Census data are available over a long time period, which allows analysis of the effects of social and economic change.
- The census provides good quality data. Statistics New Zealand has quality management processes in place. Compared with other data sources (and providing the security conditions are met) census data are easily accessible in a readily utilisable form.
Limitations of using census data
- There are limits on the nature, quantity and detail of census questions. In order to maximise the response rate, the census seeks to minimise the burden on the respondent (Statistics New Zealand 2003). This means asking a limited number of quick, simple questions. The information generated is not detailed or complex and does not indicate causal links (Statistics New Zealand 2003).
- Sources of miscount: undercounting – The census achieves around 98 percent coverage. Some dwellings and people may not be enumerated. Some dwellings may be missed entirely, some people may deliberately avoid answering the census or occupied dwellings may be classified as vacant. For example, it is generally acknowledged that temporary private dwellings will be undercounted, due to difficulties in locating these dwellings. For further reasons for non-enumeration see the post-enumeration survey (Statistics New Zealand 2002), which is conducted in order to ascertain the extent of miscounting in the census.
- Sources of miscount: double counting – A person may inadvertently be counted more than once during the census. For example children in shared custody situations may mistakenly be counted in both residences. Double counting is discussed further in the post-enumeration survey (Statistics New Zealand 2002).
- Individual census questions are subject to non-response. This is known as item non-response, and occurs when the respondent returns the questionnaire but has not responded to all of the questions that were relevant to them. Response rates for specific questions vary. High non-response rates may have an impact on the usefulness of data, and mean that results need to be interpreted with caution, as data may not be as reflective of the population as data from questions with low non-response rates. In some cases, when no response is given, a value for a variable is imputed. This method could introduce bias if the imputed response is very different to the (unknown) actual value.
- The census is subject to forms of non-sampling error. These include errors arising from questionnaire wording and question positions, respondent error (which may result from respondent misunderstanding, mechanical error, or purposeful distortion of information), and errors in the coding and processing of forms. Some respondent error has been mitigated by the introduction of electronic census forms on the Internet, for example multiple responses when only one is allowed.
- Although there are many consistencies in census variables between the census years, there are also some differences in the ways variables are constructed, defined and classified. These differences may impact upon data interpretation and analysis. Such intercensal variation can come from a number of sources, as outlined in table 2.2. The comparability of each census variable used in the construction of wellbeing indicators needs to be assessed.
- The data generated are constrained by the census definitions and classifications. For example, when using family-level census data, it is important to remember that data only exist for families and extended families whose members all live within the one household.
The advantages and disadvantages of using census data need to be evaluated in light of the other data sources available. In general, it should be noted that although the best source of time series information is generally longitudinal studies, such studies are extremely rare. These studies are usually geographically based (such as the Christchurch Health and Development Study and the Dunedin Multidisciplinary Study) and may only provide information on a particular group of people (for example, a certain age cohort).
1.2 Outline of this report
1.2.1 Report purpose and overview
The purpose of this report is to explore the availability, measurement and comparability of key variables from the 1981–2006 Censuses. The report draws on the experiences of the Family and Whānau Wellbeing Project team, and aims to ease the way for future researchers and technical users of census data, especially those using the data for time series analysis or intercensal comparisons.
The report summarises the most relevant information from the report Family Wellbeing Indicators from the 1981–2001 New Zealand Censuses (Milligan, et al. 2006), with a focus on a unique time series examination of key census variables. The method used there and here for assessing variable comparability is described in detail in section 2.7. In sum, it involves identifying sources of intercensal variation, assessing their impact on the data in terms of magnitude and effect, seeing if there are any ways of increasing the comparability and then applying a consistent scale of terminology to arrive at an overall comparability assessment.
1.2.2 Interpreting tables and appendices within the report
Appendix 7.1
Appendix 7.1 contains names of rebased variables that are available as output variables from Statistics New Zealand. These are variables whose source data has been reclassified according to classifications and definitions of other census years. For example, the variable labour_force_status91 has reclassified information from the 1981 Census so that the definitions of part-time and full-time labour force match the 1991 definitions for these concepts.
We have not named all the area variables, which are always available rebased to the most recent census.
Appendix 7.2
Some variables are not generally released by Statistics New Zealand. Appendix 7.2 lists these variables, along with the reasons for non-release.
Appendix 7.3
Appendix 7.3 contains the output variables available from Statistics New Zealand for each census year. The names and descriptions of variables in this appendix are taken directly from the data dictionaries produced by Statistics New Zealand. In 2001 and 2006, there were no abbreviated variable names in the data dictionary, so this field has been left blank. In order to show similarities between what has been asked across time, these variables have been grouped under general headings, e.g. the heading Age, followed by entries for each year describing the particular variables available. In most cases, this indicates that the variable provides the same information in different census years, but as there are many sources of intercensal variation this does not necessarily mean that the variables under these headings are comparable. On occasion, the heading in the appendix is more general, and a variety of related variables (almost subcategories of the heading) appear underneath.1 For example, the different benefit receipt variables are all listed under the heading ‘sources of personal income’.
Appendix 7.4
Appendix 7.4 contains the output categories that can be used when comparing variables across the 1981–2006 Censuses.
Appendix 7.5
Appendix 7.5 gives the variable names and SAS codes associated with key variable classifications for the 1981–2006 censuses. The aim of doing this is to make the data much more accessible, not only for the purposes of our analyses but also for external researchers who may wish to use time series census data for comparative purposes in the future. At times, this has involved utilising different variables for different census years, in order to reconstruct a variable classification that is similar to that produced for other census years. At other times, it involves aggregating or collapsing down categories for some years. The names for the variables and the codes associated with them have been taken from the Statistics New Zealand data dictionaries pertaining to each census year (1981–2006). Researchers should be aware that the names of the variables contained in the data dictionaries are not necessarily the same as the SAS names that appear in the census datasets. However, the codes for each classification category are the same as those that appear in the census datasets.
1.2.3 Information regarding the ‘variable definitions and variable information’ sections
The information in these sections has been sourced from the 2001 census glossary definitions (Statistics New Zealand 2001) and other Statistics New Zealand classifications and definitions documents. This section of the report is extremely information- and fact-intensive and unless the information emanated from a different source, references have not been given.
The exception to this is the comparability assessment of each variable (described below), which was developed as part of the FWWP, rather than from information provided by Statistics New Zealand.
Derivation tables
Some variables are derived from more than one question on the census form. For key derived variables, the report provides a derivation table which shows the census questions or variables used in the derivation for the 1981–2006 census years. At this stage of the project, we have confirmation of the content of these tables for the 1991 and 2001 Census years. Information on the other years has sometimes been obtained from ancillary documentation (such as glossary publications and classification documents), and by applying a consistent template across the other census years. Therefore, tables showing the derivation of variables should be interpreted with caution for 1981, 1986 and 1996 Census years.
Interpreting non-response rates
Where non-response rates are available, these will be provided and interpreted. Unless otherwise stated, non-response rates provided are calculated by working out the number of responses set to ‘not stated’ (for 1981–1996 Censuses this was called ‘not specified’) as a proportion of the subject population for that question. In this report, the interpretations of non-response rates apply as shown in table 1.1.
Table 1.1 Non-response rate interpretation scale
| Non-response rate | Interpretation |
| <3.0% | low |
| 3.0–4.9% | relatively low |
| 5.0–6.9% | moderate |
| 7.0–8.9% | relatively high |
| 9.0%+ | high |
Comparability assessments
In this report, comparability assessments have been made in accordance with the method outlined in section 2.6. Potential sources of intercensal variation and their likely impacts on the data are contained throughout the various sections for each domain. The sources of intercensal variation that are deemed to impact upon the analysis are then summarised for each indicator under the section ‘Limitations of the data’. It should be noted that in all cases intercensal comparability should not be taken as a stand-alone judgement, but that the limitations of the data should be borne in mind.
For this project, the following scale has been used to summarise the impact of intercensal variation on the comparability of Statistics New Zealand census variables between the census years. For further discussion on intercensal comparability, see section 2.6, ‘Assessing the intercensal comparability of variables’.
Table 1.2 Variable comparability scale
| Terminology | Interpretation |
| Totally comparable | No intercensal variation |
| Highly comparable | Very little intercensal variation. Any variations are likely to have only a minor impact upon data |
| Broadly comparable | Some intercensal variation exists, although basic definitions of the variable are the same. Sometimes there may be differences in some of the classifications, or in the way a particular variable is derived |
| Limited comparability | Enough intercensal variation exists (usually in definition, the concept being measured, or in variable derivations) that comparability of data is severely curtailed. |
1. In particular, this applies to situations when separate variables (usually indicating possession, or lack thereof) have been created for what may otherwise have been one variable with many classification categories.