Census Data Guide

A guide to using data from the New Zealand Census: 1981–2006



Chris Errington, Gerard Cotterell & Martin von Randow, The University of Auckland

Sue Milligan, Presbyterian Support, Christchurch



Published by Statistics New Zealand and The University of Auckland, August 2008

0 Disclaimer & Acknowledgements

Disclaimer

The views expressed in this occasional paper are the personal views of the authors and should not be taken to represent the views or policy of Statistics New Zealand or the Government. Although all reasonable steps have been taken to ensure the accuracy of the information, no responsibility is accepted for the reliance by any person on any information contained in this occasional paper, nor for any error in or omission from the occasional paper.


Acknowledgements

The Family and Whānau Wellbeing Project (FWWP) is funded by the Foundation for Research Science and Technology. Practical support from Statistics New Zealand is also gratefully acknowledged.

The study team would like to express their gratitude to the people who assisted with the preparation of this report. We would also like to thank our reviewers from Statistics New Zealand and June Atkinson and Charles Crothers. Responsibility for the final product, however, rests solely with the authors.

1 Introduction

1.1 Overview of the Family and Whānau Wellbeing Project and the New Zealand Census of Population and Dwellings


1.1.1 Research aim

The Family and Whānau Wellbeing Project (FWWP) is a five-year research programme supported from the Social Science funding pool of the Foundation for Research, Science and Technology. The principal goal of this programme is to develop ways to examine and monitor social and economic determinants of family and whānau wellbeing and how these determinants have changed over the 1981–2006 period.


1.1.2 Defining family and wellbeing

Individuals in a familial relationship are defined by Statistics New Zealand as people in a relationship in which ‘a person is related to another person by blood, registered marriage, civil union, consensual union, fostering or adoption.’ (Statistics New Zealand 2006). Family can be further defined as a group of people where resources are shared, including personal, social and material resources, and interdependency exists between the family members.

A full discussion on the definition of wellbeing is available in another publication of this project: Family Wellbeing Indicators from the 1981–2001 New Zealand Censuses, (Milligan, Fabian et al. 2006). In summary, however, wellbeing is linked to quality of life and according to Hird (Hird 2003), can be broken down into two types, subjective and objective wellbeing, which affect each other.

Objective wellbeing is the focus of this project and includes factors such as physical, developmental and activity-based, material, social and emotional wellbeing. All these types of wellbeing have tangible outputs that can be measured, for example, income, access to telecommunications, heating of dwellings, educational qualifications, etc. This report looks at some of the variables that are available from the New Zealand Census of Population and Dwellings that can be used to measure wellbeing.


1.1.3 The New Zealand Census of Population and Dwellings

The New Zealand Census of Population and Dwellings is a self-administered repeated cross-sectional survey of the entire population of New Zealand. The Statistics Act 1975 prescribes that a census be conducted every five years, and provides an outline of census content (Statistics New Zealand 1998).

Purpose and use in time series analysis

The primary purpose of the census is to provide social, economic and demographic information on the people of New Zealand at a given point in time. This information is used by a variety of organisations to describe the present, to analyse trends and to plan for the future. The census has been described as ‘a primary source of information on the size, composition, distribution, economic activities and state of wellbeing of the population’ (Statistics New Zealand 1998). Respondents are required by law to respond to the census.

The New Zealand census also aims to provide data on a consistent number of measures so that social change may be monitored (Statistics New Zealand 1998; Statistics New Zealand 2003). Utilising census data enables us to create a historical time series for people living in New Zealand at the time of each census. It also allows us to construct benchmarks of family and household wellbeing and to compare levels of wellbeing across time.

The census has dual aims (providing relevant information and historical continuity) that can be in conflict with each other, and these aims need to be balanced when deciding upon the specific content of each census. This is because as society changes, data needs also change. Some topics become less relevant to society and other areas become more important. Similarly, things that were considered important to the quality of life twenty years ago may no longer be considered as important today. The census needs to keep pace with social change, but also provide the tools to monitor social change. This tension has been acknowledged by Statistics New Zealand (Statistics New Zealand 1998; Statistics New Zealand 2003) and referred to in depth by Morrison (Morrison 1991).

In determining the content of the census a number of factors have to be kept in mind. The topic needs to be publicly acceptable and have significant community value, the census has to be the most appropriate information source, and inclusion of the topic has to produce high quality information (Statistics New Zealand 2003). For these reasons, census questions (and the variables subsequently constructed from them) may change from census to census, limiting comparability of information between various census years.

Advantages of using the census

  • The census achieves broad (almost universal) coverage.
  • The census is unique in its ability to provide information on a variety of small groups within the population and small area data (Statistics New Zealand 1997).
  • The census is not subject to sampling error. Sampling error is the measure of the variability that occurs because a sample, rather than the entire population, is surveyed.
  • Census information is available for a wide variety of topics, spanning many different areas of social concern (for example, income, education, work, housing, assets and health). Other surveys (such as the Household Economic Survey and the Household Labour Force Survey) have a more narrow focus.
  • A lot of contextual information is available from census data.
  • Information is collected from the entire household, rather than just particular individuals (as is often the case in sample surveys).
  • Family and household identifiers are attached to unit records, making it easy to collate information at both the household and family levels.
  • Compared with other sources (such as ad hoc surveys, or using data from multiple sources) there is considerable consistency in the information sought between censuses. Statutory requirements ensure certain information must be collected in each census. A major emphasis within Statistics New Zealand is on continuity and consistency of data, making it easier to construct, analyse and interpret time series information.
  • Compared with using multiple sources of data, there is reasonable consistency in the type of survey and method of collection. This makes comparing results from period to period easier, as the same potential biases occur within each census dataset. This means the researcher does not have to try to evaluate how other types of biases resulting from different research methods have affected variable measurement.
  • A considerable body of metadata surrounds the census, which aids in the analysis and interpretation of variables and data.
  • Census data are available over a long time period, which allows analysis of the effects of social and economic change.
  • The census provides good quality data. Statistics New Zealand has quality management processes in place. Compared with other data sources (and providing the security conditions are met) census data are easily accessible in a readily utilisable form.


Limitations of using census data

  • There are limits on the nature, quantity and detail of census questions. In order to maximise the response rate, the census seeks to minimise the burden on the respondent (Statistics New Zealand 2003). This means asking a limited number of quick, simple questions. The information generated is not detailed or complex and does not indicate causal links (Statistics New Zealand 2003).
  • Sources of miscount: undercounting – The census achieves around 98 percent coverage. Some dwellings and people may not be enumerated. Some dwellings may be missed entirely, some people may deliberately avoid answering the census or occupied dwellings may be classified as vacant. For example, it is generally acknowledged that temporary private dwellings will be undercounted, due to difficulties in locating these dwellings. For further reasons for non-enumeration see the post-enumeration survey (Statistics New Zealand 2002), which is conducted in order to ascertain the extent of miscounting in the census.
  • Sources of miscount: double counting – A person may inadvertently be counted more than once during the census. For example children in shared custody situations may mistakenly be counted in both residences. Double counting is discussed further in the post-enumeration survey (Statistics New Zealand 2002).
  • Individual census questions are subject to non-response. This is known as item non-response, and occurs when the respondent returns the questionnaire but has not responded to all of the questions that were relevant to them. Response rates for specific questions vary. High non-response rates may have an impact on the usefulness of data, and mean that results need to be interpreted with caution, as data may not be as reflective of the population as data from questions with low non-response rates. In some cases, when no response is given, a value for a variable is imputed. This method could introduce bias if the imputed response is very different to the (unknown) actual value.
  • The census is subject to forms of non-sampling error. These include errors arising from questionnaire wording and question positions, respondent error (which may result from respondent misunderstanding, mechanical error, or purposeful distortion of information), and errors in the coding and processing of forms. Some respondent error has been mitigated by the introduction of electronic census forms on the Internet, for example multiple responses when only one is allowed.
  • Although there are many consistencies in census variables between the census years, there are also some differences in the ways variables are constructed, defined and classified. These differences may impact upon data interpretation and analysis. Such intercensal variation can come from a number of sources, as outlined in table 2.2. The comparability of each census variable used in the construction of wellbeing indicators needs to be assessed.
  • The data generated are constrained by the census definitions and classifications. For example, when using family-level census data, it is important to remember that data only exist for families and extended families whose members all live within the one household.

The advantages and disadvantages of using census data need to be evaluated in light of the other data sources available. In general, it should be noted that although the best source of time series information is generally longitudinal studies, such studies are extremely rare. These studies are usually geographically based (such as the Christchurch Health and Development Study and the Dunedin Multidisciplinary Study) and may only provide information on a particular group of people (for example, a certain age cohort).


1.2 Outline of this report


1.2.1 Report purpose and overview

The purpose of this report is to explore the availability, measurement and comparability of key variables from the 1981–2006 Censuses. The report draws on the experiences of the Family and Whānau Wellbeing Project team, and aims to ease the way for future researchers and technical users of census data, especially those using the data for time series analysis or intercensal comparisons.

The report summarises the most relevant information from the report Family Wellbeing Indicators from the 1981–2001 New Zealand Censuses (Milligan, et al. 2006), with a focus on a unique time series examination of key census variables. The method used there and here for assessing variable comparability is described in detail in section 2.7. In sum, it involves identifying sources of intercensal variation, assessing their impact on the data in terms of magnitude and effect, seeing if there are any ways of increasing the comparability and then applying a consistent scale of terminology to arrive at an overall comparability assessment.

1.2.2 Interpreting tables and appendices within the report

Appendix 7.1

Appendix 7.1 contains names of rebased variables that are available as output variables from Statistics New Zealand. These are variables whose source data has been reclassified according to classifications and definitions of other census years. For example, the variable labour_force_status91 has reclassified information from the 1981 Census so that the definitions of part-time and full-time labour force match the 1991 definitions for these concepts.

We have not named all the area variables, which are always available rebased to the most recent census.

Appendix 7.2

Some variables are not generally released by Statistics New Zealand. Appendix 7.2 lists these variables, along with the reasons for non-release.

Appendix 7.3

Appendix 7.3 contains the output variables available from Statistics New Zealand for each census year. The names and descriptions of variables in this appendix are taken directly from the data dictionaries produced by Statistics New Zealand. In 2001 and 2006, there were no abbreviated variable names in the data dictionary, so this field has been left blank. In order to show similarities between what has been asked across time, these variables have been grouped under general headings, e.g. the heading Age, followed by entries for each year describing the particular variables available. In most cases, this indicates that the variable provides the same information in different census years, but as there are many sources of intercensal variation this does not necessarily mean that the variables under these headings are comparable. On occasion, the heading in the appendix is more general, and a variety of related variables (almost subcategories of the heading) appear underneath.1 For example, the different benefit receipt variables are all listed under the heading ‘sources of personal income’.

Appendix 7.4

Appendix 7.4 contains the output categories that can be used when comparing variables across the 1981–2006 Censuses.

Appendix 7.5

Appendix 7.5 gives the variable names and SAS codes associated with key variable classifications for the 1981–2006 censuses. The aim of doing this is to make the data much more accessible, not only for the purposes of our analyses but also for external researchers who may wish to use time series census data for comparative purposes in the future. At times, this has involved utilising different variables for different census years, in order to reconstruct a variable classification that is similar to that produced for other census years. At other times, it involves aggregating or collapsing down categories for some years. The names for the variables and the codes associated with them have been taken from the Statistics New Zealand data dictionaries pertaining to each census year (1981–2006). Researchers should be aware that the names of the variables contained in the data dictionaries are not necessarily the same as the SAS names that appear in the census datasets. However, the codes for each classification category are the same as those that appear in the census datasets.

1.2.3 Information regarding the ‘variable definitions and variable information’ sections

The information in these sections has been sourced from the 2001 census glossary definitions (Statistics New Zealand 2001) and other Statistics New Zealand classifications and definitions documents. This section of the report is extremely information- and fact-intensive and unless the information emanated from a different source, references have not been given.

The exception to this is the comparability assessment of each variable (described below), which was developed as part of the FWWP, rather than from information provided by Statistics New Zealand.

Derivation tables

Some variables are derived from more than one question on the census form. For key derived variables, the report provides a derivation table which shows the census questions or variables used in the derivation for the 1981–2006 census years. At this stage of the project, we have confirmation of the content of these tables for the 1991 and 2001 Census years. Information on the other years has sometimes been obtained from ancillary documentation (such as glossary publications and classification documents), and by applying a consistent template across the other census years. Therefore, tables showing the derivation of variables should be interpreted with caution for 1981, 1986 and 1996 Census years.

Interpreting non-response rates

Where non-response rates are available, these will be provided and interpreted. Unless otherwise stated, non-response rates provided are calculated by working out the number of responses set to ‘not stated’ (for 1981–1996 Censuses this was called ‘not specified’) as a proportion of the subject population for that question. In this report, the interpretations of non-response rates apply as shown in table 1.1.

Table 1.1 Non-response rate interpretation scale

Non-response rate     Interpretation    
<3.0% low
3.0–4.9% relatively low
5.0–6.9% moderate
7.0–8.9% relatively high
9.0%+ high


Comparability assessments

In this report, comparability assessments have been made in accordance with the method outlined in section 2.6. Potential sources of intercensal variation and their likely impacts on the data are contained throughout the various sections for each domain. The sources of intercensal variation that are deemed to impact upon the analysis are then summarised for each indicator under the section ‘Limitations of the data’. It should be noted that in all cases intercensal comparability should not be taken as a stand-alone judgement, but that the limitations of the data should be borne in mind.

For this project, the following scale has been used to summarise the impact of intercensal variation on the comparability of Statistics New Zealand census variables between the census years. For further discussion on intercensal comparability, see section 2.6, ‘Assessing the intercensal comparability of variables’.

Table 1.2 Variable comparability scale

Terminology                 Interpretation
Totally comparable No intercensal variation
Highly comparable Very little intercensal variation. Any variations are likely to have only a minor impact upon data
Broadly comparable Some intercensal variation exists, although basic definitions of the variable are the same. Sometimes there may be differences in some of the classifications, or in the way a particular variable is derived
Limited comparability Enough intercensal variation exists (usually in definition, the concept being measured, or in variable derivations) that comparability of data is severely curtailed.



1. In particular, this applies to situations when separate variables (usually indicating possession, or lack thereof) have been created for what may otherwise have been one variable with many classification categories.

2 Understanding and using census data

2.1 Levels of aggregation and analysis: census statistical units

Four different levels of aggregation of information used by Statistics New Zealand are important to understand for research; these are the dwelling, household, family, and individual levels (the individual is sometimes called the personal level in Statistics New Zealand publications). Each specific variable will be associated with one of these levels of aggregation; the level of a particular variable can be ascertained from Statistics New Zealand data dictionaries.

Figure 2.1 shows the levels of aggregation of census information. Geographic information can relate to any of the levels of aggregation and is itself available at various levels of aggregation. It should be noted that for private occupied dwellings, the dwelling and the household are basically interchangeable levels of analysis because each private occupied dwelling contains a (that is, one) household within it. The sole exception to this is private occupied dwellings that contain visitors only. However, the dwelling refers to the physical structure (that is, the building), and the household refers to the group of people who live within the dwelling.

The Family and Whānau Wellbeing Project (FWWP) looks at wellbeing indicators at the family and household levels and how these can be used to understand changes in society over time. The interrelationships between the four levels of analysis and key related variables are explained in more depth in figure 2.2.

Figure 2.1 Levels of aggregation of census variables


2.2 Relationships between key variables

Source: The basis of this diagram was sourced from Statistics New Zealand (Statistics New Zealand 1999), standard terms for dwellings, households and families.

Figure 2.2 Key variables and their relationships to the levels of variable aggregation


2.3 How variables are constructed using census data

The variable construction process is integral to the output data generated for each census and consequently affects intercensal comparability.

Figure 2.3 shows the process of variable construction using census information. Respondents answer questions on the individual and dwelling forms. These responses are then processed, and in some instances edited. When no answer is given to certain questions, a value for that variable may be imputed.

* It should be noted that ‘Key Statistics New Zealand resources to consult’ is not an exhaustive list.

Figure 2.3 Variable construction process and relevant Statistics New Zealand resources

The responses are coded into classification categories according to the classification relevant to each variable. A classification assigns data reported for a particular variable into categories according to shared characteristics. This facilitates the accurate and systematic arrangement of data according to common properties, so that the resulting statistics are reproducible, comparable with data from other sources and comparable over time. In some instances, variables take into account answers to more than one census question. In these cases, answers are combined to form a derived variable. Each classification category has an associated ‘SAS code’, a value used in the statistical software package SAS to keep track of variable formats. The classification categories of each output variable and their associated codes are outlined in the relevant data dictionary for each census year.

An example of a variable classification from the 2001 data dictionary is shown below to familiarise the reader with the terminology and format.

Figure 2.4 Sample classification from the 2001 Data Dictionary

The concepts and definitions relevant to the classification for each variable are outlined in concepts, classifications and definitions documentation (see section 2.8.3 for available references).

In some instances (usually when multiple response options are possible), multiple variables are constructed from respondents’ answers to one census question. The income source data from 1986 are an example of this. Other constructed variables take into account respondents’ answers to more than one census question – these are called derived variables. For example in 1991, labour force status was ascertained from responses to five different census questions. The process used to derive variables is sometimes outlined in concepts and classifications documentation, and for 2001 it is also outlined in the census glossary sheets available on the Statistics New Zealand website.

2.4 Factors that may affect variable values

2.4.1 Editing

As collector and custodian of census data, Statistics New Zealand conducts various internal checks on the quality of the data. These checks may also be relevant to our data analysis. Before 2001, there were more consistency edits and Statistics New Zealand had ‘tried to tidy the data by editing every variable to eliminate inconsistency’ (Statistics New Zealand; for further explanation of the quality of census variables and the distinctions between the levels of variables for editing purposes, refer to section 7 of the 2001 Introduction to the Census). In 2001, a different approach was taken, and the level of editing regarding a variable was dependent on the level of importance of the variable (foremost, defining or supplementary). The effect of such changes in editing over time or between censuses is hard to quantify (in some cases, the effect of consistency edits may be to increase the number of responses that go into residual categories). However, it does mean that small changes in movement of a variable need to be interpreted with caution.

2.4.2 Substitute forms

Substitute forms are created by Statistics New Zealand where there is sufficient evidence that either a person or an occupied dwelling exists, but no census form has been submitted for it (Statistics New Zealand 2001). Substitute forms make up about 2 percent of all census forms. These forms affect non-response rates because information not gained on substitute forms is generally set to ‘not stated’.

2.4.3 Imputation

Imputation is the process by which Statistics New Zealand allocates a value to a variable where no value has been stated by the respondent. Values for variables have been imputed by Statistics New Zealand in the stated census years as shown in table 2.1.

Table 2.1 Variable imputation by census year

* Indicates that this variable was only imputed for the rebased dataset for this year.

The value allocated by Statistics New Zealand is ascertained through a variety of methods. Imputation requires thorough testing before implementation. The following list outlines the process involved for the imputable variables listed in table 2.1 (taken from the Statistics New Zealand website).

Age: Age imputation supplies an age in years where this value is missing for an individual. This means that age will be imputed if it cannot be calculated from the response to date of birth. Age is imputed using various other responses from the individual; for example, whether they are legally married, responses supplied on the dwelling form, and the known distribution of ages in the population.

Sex: Sex imputation supplies a value of male or female where the response for the sex variable is missing. If they are available, the name of the person or their relationship to others in the household may be used to impute a value. Otherwise a value is assigned randomly, with 49 percent being imputed as male.

Work and labour force status: Work and labour force status imputation supplies a value for labour force status where this cannot be derived from the labour force information supplied by the respondent. The labour force status imputation uses whatever labour force information has been given, and various other responses from the individual (for example, age and income). A labour force status is then imputed to equal the known labour force status of a similar person.

Usual residence: Usual residence imputation supplies a value for the usual residence meshblock where a meshblock cannot be coded from the address information supplied by the respondent. The usual residence meshblock imputation uses whatever level of geographic information has been given and various other responses from the individual. A usual residence meshblock is then imputed based on the distribution of known usual residence meshblocks for similar people.

2.5 Other factors affecting census data interpretation

2.5.1 Non-response rates

Non-response rates provided by Statistics New Zealand are generally the percentage of respondents in the ‘not stated’ category for each variable. Before 2001, this residual category was called ‘not specified’. Non-response rates for 1996 and 2001 variables can be found in the census glossary publications.

2.5.2 Rebased datasets

Electronic data from previous censuses (1981–1996) has been rebased according to the current meshblock pattern to allow geography-based comparisons over time. The variables that are altered according to current patterns are usual residence, census night address and workplace address. This allows the different levels of aggregation of geographic variables (meshblock, area unit, territorial authority, regional council and national) to be held constant, so that meshblocks in 1981 and 2001 are defined by the same boundaries.

2.6 Assessing intercensal consistency of variables

As outlined in table 2.2, the information generated from census data may vary from census to census for a number of reasons. In order to accurately monitor and establish empirical relationships, researchers need to establish that any effect is a real effect, rather than one that has been caused by changes in the process of extracting and measuring the information provided. To do so, it is necessary to examine a variety of Statistics New Zealand publications in order to get a thorough understanding of census data and the changes that have taken place across the different census years.

2.6.1 Sources of intercensal variation

Table 2.2 Summary of sources of intercensal variation

Sources of intercensal variation
1) Removal or inclusion of the actual census question
2) Changes in the subject population for a question
3) Differences in the wording of the census questions asked
4) Changes in the layout of the census form
5) Changes in the format of the census question, e.g. single or multiple response, tick box or written response format
6) Differences in the guide note instructions that accompany the census question, although the impact of this is unknown as the number of respondents who read the guide notes is undetermined
7) Differences in the response options used in the census question
8) Changes in the way the data are collected. These changes are reasonably infrequent but do occur. The two major changes recently were in 1996 when the dwelling type variable was ascertained from responses from enumerators rather than respondents and in 2006 when it was possible to complete census forms online.
9) Changes in the classifications and definitions for a variable, which describe variable construction
10) Changes in the instructions given to enumerators, such as which dwellings to give forms to, and enumerator doorstop checks. For example in 1996, enumerators were required to check the whole form for completeness, whereas in 2001 they were only required to check the front page of the individual form.
11) Changes in processing practices, e.g. scanning, recognition and operator instructions
12) Changes in the way a particular variable is edited
13) Changes in the general editing practices from census to census
14) Changes in the variables for which responses are imputed, and changes in the way variables are imputed
15) Changes in the name of a variable
16) Changes in the number of variables constructed from responses to a census question
17) Changes in the way a variable has been derived:

  • alterations in the variables used to derive it
  • changes in the derivation process, e.g. what is done if information from one variable is missing
18) Changes in the classification of a variable:

  • the addition of extra classification categories
  • the deletion of previous classification categories
  • changes in the content of classification categories
  • differences in the way a classification groups things together or splits them up

In New Zealand, very few researchers have looked at intercensal variation. In 1991, Philip Morrison wrote an article entitled Change or Continuity in the Census: Problems of comparability in the New Zealand Census (Morrison 1991). This article provides an overview of the changes in census content and format on the personal form between the 1951 and 1991 Censuses. It also provides a detailed discussion of changes to census questions dealing with employment and work. A more up-to-date source that can be used for an overview of changes A guide to using data from the New Zealand Census: 1981–2006 in census topics is the Historical Summary of the Scope of the Census (Statistics New Zealand 2001). This provides a basic overview of the different census topics that have been covered on both dwelling and individual forms, from the inception of the census up to (and including) the 2001 Census.

2.6.2 Assessing the impact of intercensal variation

When using census data for time series analysis, all sources of intercensal variation need to be considered, and, where possible, evaluated as to their likely impact on the data (establishing the time series comparability of variables is a key aspect of this report). A good method for assessing the impact of intercensal variation is to use the following steps:

Identify the source of the variation

If there has been a change in instructions given to the respondent, it is necessary to note where this change occurred (i.e. in the guide notes or on the census form itself). Statistics New Zealand acknowledges that instructions in the guide notes are often not read and therefore not followed (Department of Statistics 1991). This report discusses all census instructions (including guide note instructions) as if they are followed by the respondent. However, it must be remembered when reading this document that guide note instructions appear not to be followed as often as instructions on the actual census form. Therefore, when evaluating the impact of changes on intercensal comparability, it will be assumed that changes in the guide notes will probably have had less impact on the data than changes to the questionnaire.

Estimate the magnitude (impact) of the effect

Estimate to what degree census data will be affected as a result of the variation. The impact of intercensal variation may, in many instances, be difficult to assess and quantify. While every attempt is made to minimise errors due to systems and processes, as with any survey it is not possible to know or eliminate all non-sampling error. That said, the impact of changes is often relatively minor.

For the sake of simplicity, it is best to use a binary scale, and assess the impact as either major or minor. The impact may depend on exactly what the source of the variation was. For example, a change in the availability of a variable, the underlying concept being measured, or the variable derivation will generally be assessed as major, whereas a change in the editing process or instructions given to respondents would generally be seen as minor.

Assess the effect of the variation on census data (direction of the effect)

Identify the likely outcome of the variation on the actual data collected. Make judgements as to whether data for a particular year will be overestimated or underestimated relative to other census years. If there is a best practice method, take this into account.

Identify if any manipulations can be made to increase variable comparability

Sometimes when a variable is not comparable across different census years there are ways in which the comparability of variable information can be increased.

Make an overall assessment of the comparability of each variable

The final comparability assessment of key variables across the 1981–2006 Censuses can be made in accordance with the criteria listed in the comparability assessment method (outlined in Table 2.3). This method takes into account the findings from the steps above and applies the variable comparability scale outlined in table 1.2.

Table 2.3 Comparability assessment method

Magnitude and number of variations     Manipulation available     Comparability assessment
Major None Limited comparability
Major Available Broadly comparable
Minor – many None Broadly comparable
Minor – many Available Highly comparable
Minor – few None Broadly comparable
Minor – one or two None Highly comparable


2.6.3 Methods to deal with intercensal variation

When a variable is assessed as being either broadly comparable or of limited comparability, time series analysis of this variable will be affected. As Morrison has pointed out, these changes can sometimes be rectified during analysis of the data (Morrison 1991). This means that, depending on the type of intercensal variation, there may be methods that can be used to make the data more comparable.

When a variable is missing for a particular census year, it may be possible to extract comparable information from a variable of another name. An example of this is tenure of household information, which can be gained from the nature of occupancy variable in earlier censuses.

If multiple variables have been constructed from responses to a question for one census year, but not others, then comparability may be increased by comparing multiple variables from one census year with one variable from another census year. For the 1981 post-school qualifications data, four variables need to be accessed in order to create categories that are comparable with other census years.

If the way in which a variable is derived has changed, this can sometimes be rectified by accessing the variables used to derive it, then re-deriving it according to a consistent method, usually the method used most recently. For example, the highest qualification variable was derived differently in 1996 than in 2001. In 1996, respondents who did not answer at least one of the component questions (on school or post-school qualifications) were put in the ‘not stated’ category. In 2001, if a response was given to either of these two questions, then respondents were allocated that value as their highest qualification. Using these different derivation methods led to an apparent decrease in the non-response rate of the highest qualification variable. In order to make information comparable over time, researchers can access the two component variables and use a consistent derivation process for all the censuses being investigated.

On occasion, Statistics New Zealand has re-derived a variable according to subsequent classifications in order to make time series information more comparable. One example of this is the labour force status variable for 1981. The 1981 variable pertaining to this information had different definitions of part-time and full-time work to subsequent censuses. In 1991, a labour force status variable for the 1981 dataset was re-derived according to subsequent definitions. This variable labour_force_status91 is available from the 1991 rebased dataset. Another variable on this rebased dataset for 1981 (highest_level_educ_attend) is also available.

When there is a change in the definition of a variable, it may be possible to make information more comparable by excluding particular classification categories from the analysis of previous census years. For example, for the 2001 Census, households were defined to exclude visitor-only dwellings. As this was previously a distinct category of the household composition classification, information can be made comparable by excluding this category from the analysis when using household-level information from census years before 2001. When trying to compare information across time, it is essential to devise comparable classification categories for the concept involved, rather than the exact output names attributed to categories for each year. For example, the school qualification variable can be classified according to year of schooling, rather than the exact names of the qualification gained for each year (which, like benefit income source categories, are subject to change).

Similarly, if a variable contains information on a variety of aspects related to the topic, it is possible that some types of information are comparable, while others are not. For example, although the post-school qualification variables do not provide comparable time series data on field of study, they can provide broadly comparable information on level of attainment.

When the number of classification categories for a particular variable changes for different census years, some categories may need to be aggregated to ascertain comparable time series information. This is illustrated in table 1 of appendix 7.3, which shows that in 1991, two classification categories need to be aggregated to ascertain the number of unemployed from the labour force status variable, whereas in 1996 and 2001, comparable information comes from just one classification category.

At times, there may be instances where the definition does not appear to accurately reflect the information that the data contains. For example, the definitions of cigarette smoking in 1981 and 1996 both include cigarettes and roll-your-owns, and exclude pipes and cigars and so appear to be comparable. However, examining the census questionnaire forms and guide notes alerts us to the issue that cigarettes were never specifically defined on the census form or guide notes in 1981. Therefore, people who smoked roll-your-owns may not have counted themselves as smokers, and people who smoked cigars may have counted themselves as smokers. Although both definitions include roll-your-owns, neither the 1981 nor the 1996 question says to include them, so some people who smoked roll-your-owns may not have counted themselves as smokers; therefore, the data obtained may not exactly fit with the definition associated with that data. The effect of this wording difference may be quite minimal, but it is difficult to quantify exactly. No action can be taken to make the information more comparable.

Note: Variable definition of cigarette smoking from Statistics New Zealand Concepts classifications and definitions documents:
1981 ‘A regular smoker was defined as a person who currently smokes one or more cigarettes per day, including roll-your-own, but excluding pipe or cigar smokers’.
1996 ‘Cigarette smoking refers to the active smoking of any tobacco products including manufactured and hand-rolled cigarettes (excluding cigars, pipe tobacco and cigarillos). It does not include the smoking of any other substances, for example herbal cigarettes or marijuana, but does include the smoking of home grown tobacco’.
The main difference in these definitions lies in the exclusion of marijuana and passive smokers in the 1996 document (both of which may not count themselves as cigarette smokers anyway).

Other changes, such as in the wording, subject population, guide note instructions, format of the questionnaire and data collection and processing, are irreversible (Morrison 1991). For example, before 1981 the subject population for most census questions was the ‘de facto’ population, which included overseas visitors and temporary residents. In the 1981 Census and all subsequent censuses, the population was divided into two groups: the ‘de jure’ population, or census night usually resident population count, which excludes overseas visitors and the ‘de facto’ population, which is everyone in New Zealand on census night.

2.7 Questions in 1981–2006 Censuses

Tables 2.4, 2.5, 2.6 and 2.7 provide a list of the questions asked in the 1981–2006 Censuses that relate to the variables examined in this report. In order to ascertain comparability, these are grouped according to the type of information they seek to extract, rather than according to the exact wording of the question. Grouping census questions according to exact wording would result in a large number of questions that were only asked in one census.

Table 2.4 Socio-demographic questions asked in the 1981–2006 Censuses, from individual forms

Census year
Census question 1981            1986         1991          1996                 2001         2006    
Name q1 + + q1 q2 q2
Sex q2 q4 q4 q6 q3 q3
Date of birth q3 & 4 q5 q5 q7 q4 q4
Census night address q6 + + q5 q8 q8
Usual residential address q7 q1 q1 q2 q5 q5
Usual residential address at previous census / five years ago     q9 q3 q3 q4 q7 q7
Years at usual residence q2 q2 q3 q6 q6
Country of birth q10 q7 q10 q8 q9 q9
Number of years in New Zealand* q10 q8 q9 q10 q10
Religion q11 q10 q12 q15 q18 q18
Ethnic origin/group q12 q9 q7 q10 q11 q11
Māori ancestry/Māori descent q8 q13 q16 q14
Iwi q9 q14 q17 q15
Marital status (legal) q14 q12 q13 q16, q17,
q18 & q19
q21 q23
De facto status q14 q11 q11
Social marital status q16 q19 q19
Number of children born q15 q29 q25
Ability to converse in certain languages q12 q13 q13
Highest secondary qualification q26
Highest post-school qualification q27 & q28
Unpaid activities q46

* In 1996, this question was changed to the month and year that the person first arrived to live in New Zealand, and the number of years in New Zealand was derived from this.
+ Unnumbered questions asked at the beginning of the personal or dwelling questionnaire forms.

Table 2.5 Income- and employment-related questions in the 1981–2006 Censuses

Census year
Census question 1981            1986         1991          1996                 2001         2006    
Availability for work q24 q53 q40 q45
Hours worked q16 q22 q26 q48 q35 q40
Industry q19, q20 & q21 q24, q25 & q26 q28, q29 & q30 q45 & q46 q32 & q33 q37 & q38
Job search methods q23 q52 q39 q44
Main means of travel to work q22 q27 q31 q49 q36 q41
Occupation q18 q23 q27 q43 & q44 q30 & q31 q35 & q36
Seeking work q19 q22 q51 q38 q43
Sources of personal income q23 q13 q14 & q21 q35 q25 q30
Status in employment q17 q21 q25 q42 q29 q34
Total personal income q24 & q25 q14 q15 q36 q26 q31


Table 2.6 Family- and household-related questions in the 1981–2006 Censuses

Census year
Census question 1981 1986 1991 1996 2001 2006
Number of occupants in the dwelling on census night q3 q1 q1 q2 q2
Persons absent on census night q18 q9 q8 q19 q20
Household composition q2, q3, q4, q5, q7 & q14 q1, q4, q5, q6, q11 & q12 q1, q4, q5, q6, q11 & q13 q2, q3 DF, q6, q7, q16, q17, q18, q20, q21, q22 & q23 q4 DF, q3, q4, q5, q19 & q21 q6 DF, q21 DF, q3, q4, q5, q19 & q23
Household composition with child dependency status (uses the household composition variable already derived, and age and labour force status) q4, q16 & q17 q5, q16, q19, q20 & q22 q5, q21, q22, q23, q24 & q26 q7, q40, q48, q51, q52 & q53 q4, q27, q35, q38, q39 & q40 q4, q32, q40, q43, q44 & q45
Living arrangements (including de facto status q11 q11 q16, q20, q21, q22 & q23 q19 q19
Relationship to reference person* q5 q6 q6 q3 DF q4 DF q6 DF


Table 2.7 Dwelling-related questions in the 1981–2006 Censuses

Census year
Census question 1981 1986 1991 1996 2001 2006
Access to Telecommunications q15 q16 q16 q17
Dwelling Type q4 q2 q2 q5 q4 & q5
Heating Fuels Used q8 q6 q6 q15 q15 q16
Mortgage Payments q9 q4 q4 q9 q8 q13
Motor Vehicles q17 q8 q7 q10 q17 q18
Number of Bedrooms q13 q3 q11 q13 q14
Number of Heating Fuels q8 q6 q6 q15 q15 q16
Sector of Landlord q10 q5 q5 q5 q10 q10
Tenure of Household q9 & q10 q4 & q5     q4 & q5     q4, q7, q8 & q9     q8, q9, q11 & q12     q7, q8, q9, q11, q12 & q13    
Weekly Rent Paid by Household q10 q5 q5 q8 q12 q12


2.8 Accessing Statistics New Zealand census resources

Although all reasonable steps have been taken to ensure that web addresses in this report are up-to-date and accurate, they are subject to change, and at the time of writing Statistics New Zealand was in the process of structural change. All Statistics New Zealand links should be available through the Statistics New Zealand website at www.stats.govt.nz.

Statistics New Zealand has a large variety of metadata surrounding the creation, definition, interpretation and comparability of census variables, especially for recent census years. Metadata is data about data and is used to gain an understanding about data, and to ascertain the most appropriate ways to use it (Statistics New Zealand 2004), but much of the Statistics New Zealand metadata is spread across many different documents, and contained in publications specific to the census year being covered by the metadata. This metadata is also presented in a variety of formats, with little longitudinal analysis of it. This report intends to make a contribution towards a longitudinal understanding of variables, using the metadata available from Statistics New Zealand.

Metadata for recent censuses is generally available electronically. For less recent censuses, publications can often be accessed through public or university libraries. Statistics New Zealand has its own library, and if certain publications cannot be obtained elsewhere, it is possible to request a copy of the required documentation from Statistics New Zealand – there may be a fee for this service. In order to progress this project and contribute towards ease of use for future researchers, we have compiled a list of some of the resources available surrounding the census, and where these resources can be accessed.

2.8.1 Census questionnaires

All New Zealand census forms from 1906 onwards (both dwelling and individual) can be found in the 2006 Statistics New Zealand publication, Definitions and Questionnaires. This is available in hard copy or on the Statistics New Zealand website at www.stats.govt.nz/census/2006-census-information-about-data/2006-definitions-questionnaires/default.htm. The census forms referred to in this report can also be found at this link - click on 'Forms' near the top of the page.

2.8.2 Census guide notes/help notes

Census guide notes accompany the census forms that are delivered to every dwelling. They provide extra information for respondents on how to fill out the questionnaire. There are guide notes for both individual and dwelling forms. In 2001, the guide notes were called help notes.

Table 2.8 Where to access Statistics New Zealand guide notes/help notes for 1981–2006 Censuses

Year Resource
1981      Refer to the back of hard copy publications from the 1981 Census – Volume 12 Population Perspectives 81: General Report (page 162 for the individual form guide notes, page 169 for the dwelling form guide notes).
1986 Refer to 1986 Census of Population and Dwellings: Questionnaire Contents and Submissions Report, Department of Statistics (1985), Wellington.
1991 Refer to the back of hard copy publications from the 1991 Census, for example, Range and Availability of Statistics (page 110 for the individual form guide notes, page 116 for the dwelling form guide notes) and National Summary (page 55 for the individual form guide notes, page 61 for the dwelling form guide notes).
1996 Individual form help notes
Dwelling form help notes
2001 Individual and dwelling forms help notes
2006 The individual and dwelling forms help notes are at the end of the census questionnaires


2.8.3 Concepts, definitions and classifications documentation

In the census variable definitions and classifications sections of this report, the text quoted for definitional purposes is sourced from the 2001 Census definitions. This is then compared to previous definitions in order to highlight similarities and differences. Table 2.10 indicates where to access Statistics New Zealand information on the definitions of variables.

Table 2.9 Where to access Statistics New Zealand classifications and definitions for the 1981–2006 Censuses

Year Resource
1981      New Zealand Census of Population and Dwellings 1981 – Range and Availability of Statistics (see pages 9–15).
1986 New Zealand Census of Population and Dwellings 1986 – General Information (refer to section 2, pages 23–74. Please note that these definitions also contain retrospective information for the 1981 Census).
1991 Concepts, Definitions and Classifications (entire document).
1996 An Introduction to the Census (refer to section 12).
2001 Definitions and Questionnaires available in hard copy or via the Statistics New Zealand website
2006 Information by variable


2.8.4 Other information on census questions, content and processes

Prior to the census, discussion documents are circulated, and end users and interested parties are consulted about the contents of the census. An interim report (preliminary views on content) is then published. The 2001 and 2006 preliminary views on content publications are available from the Statistics New Zealand website. The preliminary reports form the basis for broad discussion about the content of the upcoming census. The exact content of this report varies from census to census, but it always covers criteria for determining census content, a brief overview of the main topics covered in the census, and the submissions made that relate to each of these. The 2001 report also provides a brief history of some variables. The 2006 preliminary views on content contain two appendices that are particularly useful to external researchers. These entail: a survey information table (outlining other surveys conducted by various organisations, including type, frequency and available related products or services), and a series of additional data source tables.

The final report on content outlines Statistics New Zealand’s final decisions on content for the next census, that is, which topics and variables will be included, along with the rationale behind these decisions. Sometimes the final report on content provides useful information as to how and why a variable may have changed between censuses.

2.8.5 Data dictionaries

Data dictionaries contain a not necessarily exhaustive list of variables generated from each census and the coded classification categories for each variable. The 2006 data dictionary is available from the Statistics New Zealand website.

The 1996 data dictionary is also available in electronic form. The 1981, 1986 and 1991 data dictionaries are available in hard copy, and information on certain variables is also available electronically. Any request for data dictionaries should be made to Statistics New Zealand, customer services.

2.8.6 Factual output information from previous censuses

A range of Statistics New Zealand products and services, including Table Builder, can be found on the Statistics New Zealand website.

Some of the resources that may be of use to researchers include:
Statistics New Zealand tabular and analytic reports
Statistics New Zealand produces several reports that can be generally described as either analytical or tabular. Tabular reports contain very little text and predominantly consist of tables. Prior to 2001, these were available in printed form only; for 2001 the tables are also accessible online. Analytical reports contain more description, discussion, graphical presentation and analysis of the data, and often incorporate information from other data sources. A series of analytical reports aimed at a wide general readership, called New Zealand Now, were produced following the 1991 and 1996 Censuses. The 1996 series is available in printed form and many of the component reports are available on the Statistics New Zealand website under the description ‘New Zealand Stories’.

There were no analytical reports produced directly from the 2001 Census, but a series of reference reports, also called topic-based reports, on various topics (for example, ethnic groups, housing) is available for download from the Statistics New Zealand website. These are predominantly tabular reports, but do also contain some pages of highlights. A similar set of reports is available for the 1996 Census.

Table Builder
Table Builder enables the user to access aggregated information in the form of tables and is available on the Statistics New Zealand website. It is a product for building tables not only for population census data, but also for income, injury, agriculture, business and import/export statistics. Tables are interactively built by the user from a selection of variables. For the population census, these include census year (1991, 1996 and 2001), geographic area (regional council, territorial authority and area unit) and a range of output census variables. The tables can be downloaded from the website in several different formats (e.g. Excel). Help notes on how to use Table Builder are also available.

Access to unit record data through the data lab facility
Access to anonymised unit record statistical data is currently managed through Statistics New Zealand’s data laboratory (data lab). There is a data lab in each of Statistics New Zealand’s Auckland, Wellington and Christchurch offices. Access to unit record data may be obtained by submitting a proposal outlining details of the proposed research. Applicants need to provide specific information on the researchers’ backgrounds, the dataset(s) and variables required, methods of analysis and intended outputs. The proposal is then considered by Statistics New Zealand and access is provided at the discretion of the Government Statistician. If the proposal is approved, costs are estimated and conditions of access are negotiated. All researchers are required to sign a declaration of secrecy as specified in the Statistics Act 1975.

2001 Census output information
Information pertaining to 2001 Census outputs.
The 2001 Census snapshots are particularly useful for a quick, broad overview of factual information on particular topics. These documents provide information about different topics from the 2001 Census, and overview indicator and variable context, allowing the user to explore associations, trends and patterns in different variables. These documents are available as separate downloadable PDF files for each topic.

2.8.7 New Statistics New Zealand initiatives currently under development

Statistics New Zealand continues to provide new initiatives and services for interested external researchers and users of their data. The three recent schemes presented provide useful ways for researchers to access Statistics New Zealand data and expertise. The source of information for this section of the report is a communication with the expert data users group, which external researchers and interested parties may join.
To subscribe to this newsletter, send an email to listserv@stats.govt.nz (listserv@stats.govt.nz) with ‘subscribe expert user’ in the subject line.

The Official Statistics Research and Data Archive Centre (OSRDAC)
OSRDAC will provide a single access point for all Tier 1 unit record data and administrative data for use by government, university and other researchers. Preliminary work has begun on the design of this facility and the process for lodging and processing unit record data.

Note: Tier 1 statistics will be determined primarily by their purpose, not their producer. These statistics will have most of the following attributes: essential to central government decision making, high public interest, meet public expectations of impartiality and statistical quality, require long-term continuity of the data, provide international comparability or meet international statistical obligations.

Official Statistics Portal
Users will be able to access a full list of the statistics produced by government agencies through an official statistics portal, currently being designed.

Confidentialised Unit Record Files (CURFs)
CURFs are datasets that contain individual-level data arranged in a way that does not identify any individual’s identity (Statistics New Zealand). This enables external researchers outside Statistics New Zealand to access individual-level data for research purposes. The data provided differ from the unit-level data accessible in the data lab; some modifications will be made to the data and it is likely that there will be restrictions on the level to which data are available (Statistics New Zealand). The dataset provided is therefore ‘perturbed’ slightly from the real data gained from the census, in order to ensure confidentiality. However, unlike with current census datasets, researchers will be able to
analyse data at their own workplace, rather than at the Statistics New Zealand data lab. For authorisation to access CURFs, researchers must comply with the ethical and security obligations set out by Statistics New Zealand.

2.8.8 Further information on variables

Other research tools that provide useful information about census variables are:
Statistical standards – These documents contain guidelines on how to collect and categorise information on a particular topic. They cover aspects such as questionnaire requirements, definitions and classifications. Statistical standards are designed for use in various data collections, including surveys and administrative collections. These standards are guidelines only and lack of data or other complexities associated with the census may mean they are not strictly followed in constructing census variables (or other datasets). The purpose of these standards is to facilitate consistency in the way variables are collected and classified across several surveys and across time. Such consistency enhances comparability, enriching the body of data available for analysis. Statistical standards are available online.

Summary profiles – Another rich source of variable information is the Information about the Census of Population and Dwellings for each census year. This information is particularly useful for the 1981 and 1986 Censuses, as documentation surrounding these earlier census years is scarce, and also generally difficult to access. These documents for earlier census years (1981 and 1986) contain information such as lists of output variables that are available, a description of what output variables entail, and in some instances, whether variables are derived and if so, how. They also contain references to the census questions that variables are constructed from, and in some instances, reprints of the questions and/or classification categories. Summary profiles are available online.

Census classifications for 1996 – This resource consists of a set of documents that provide classifications and standards used in the 1996 Census of population and dwellings. It features a mix of introduction, structure, definition and code descriptor sections. These documents can be assessed online.

Variable glossary definitions for the 2001 Census – These are a rich source of information about the main variables used in the 2001 Census. They contain a definition of each variable, a description of the question number and which questionnaire form the question was asked on, and the relevant subject population. Furthermore, they comment on non-response rates to census questions (in 1996 and 2001), the quality level of variables, their comparability with previous censuses, and things to be aware of when using the variables. These documents also note whether variables have been derived, and if so, from what. It should be noted that this list is not exhaustive for all variables and all years, but it does provide a good starting point for thinking about consistency and comparability. It must also be borne in mind that comparability of the 2001 variables is discussed with reference to the 1996 and 1991 variables only, not to variables constructed from any earlier censuses. These documents can be accessed online.