2 Understanding and using census data

2.1 Levels of aggregation and analysis: census statistical units

Four different levels of aggregation of information used by Statistics New Zealand are important to understand for research; these are the dwelling, household, family, and individual levels (the individual is sometimes called the personal level in Statistics New Zealand publications). Each specific variable will be associated with one of these levels of aggregation; the level of a particular variable can be ascertained from Statistics New Zealand data dictionaries.

Figure 2.1 shows the levels of aggregation of census information. Geographic information can relate to any of the levels of aggregation and is itself available at various levels of aggregation. It should be noted that for private occupied dwellings, the dwelling and the household are basically interchangeable levels of analysis because each private occupied dwelling contains a (that is, one) household within it. The sole exception to this is private occupied dwellings that contain visitors only. However, the dwelling refers to the physical structure (that is, the building), and the household refers to the group of people who live within the dwelling.

The Family and Whānau Wellbeing Project (FWWP) looks at wellbeing indicators at the family and household levels and how these can be used to understand changes in society over time. The interrelationships between the four levels of analysis and key related variables are explained in more depth in figure 2.2.

Figure 2.1 Levels of aggregation of census variables


2.2 Relationships between key variables

Source: The basis of this diagram was sourced from Statistics New Zealand (Statistics New Zealand 1999), standard terms for dwellings, households and families.

Figure 2.2 Key variables and their relationships to the levels of variable aggregation


2.3 How variables are constructed using census data

The variable construction process is integral to the output data generated for each census and consequently affects intercensal comparability.

Figure 2.3 shows the process of variable construction using census information. Respondents answer questions on the individual and dwelling forms. These responses are then processed, and in some instances edited. When no answer is given to certain questions, a value for that variable may be imputed.

* It should be noted that ‘Key Statistics New Zealand resources to consult’ is not an exhaustive list.

Figure 2.3 Variable construction process and relevant Statistics New Zealand resources

The responses are coded into classification categories according to the classification relevant to each variable. A classification assigns data reported for a particular variable into categories according to shared characteristics. This facilitates the accurate and systematic arrangement of data according to common properties, so that the resulting statistics are reproducible, comparable with data from other sources and comparable over time. In some instances, variables take into account answers to more than one census question. In these cases, answers are combined to form a derived variable. Each classification category has an associated ‘SAS code’, a value used in the statistical software package SAS to keep track of variable formats. The classification categories of each output variable and their associated codes are outlined in the relevant data dictionary for each census year.

An example of a variable classification from the 2001 data dictionary is shown below to familiarise the reader with the terminology and format.

Figure 2.4 Sample classification from the 2001 Data Dictionary

The concepts and definitions relevant to the classification for each variable are outlined in concepts, classifications and definitions documentation (see section 2.8.3 for available references).

In some instances (usually when multiple response options are possible), multiple variables are constructed from respondents’ answers to one census question. The income source data from 1986 are an example of this. Other constructed variables take into account respondents’ answers to more than one census question – these are called derived variables. For example in 1991, labour force status was ascertained from responses to five different census questions. The process used to derive variables is sometimes outlined in concepts and classifications documentation, and for 2001 it is also outlined in the census glossary sheets available on the Statistics New Zealand website.

2.4 Factors that may affect variable values

2.4.1 Editing

As collector and custodian of census data, Statistics New Zealand conducts various internal checks on the quality of the data. These checks may also be relevant to our data analysis. Before 2001, there were more consistency edits and Statistics New Zealand had ‘tried to tidy the data by editing every variable to eliminate inconsistency’ (Statistics New Zealand; for further explanation of the quality of census variables and the distinctions between the levels of variables for editing purposes, refer to section 7 of the 2001 Introduction to the Census). In 2001, a different approach was taken, and the level of editing regarding a variable was dependent on the level of importance of the variable (foremost, defining or supplementary). The effect of such changes in editing over time or between censuses is hard to quantify (in some cases, the effect of consistency edits may be to increase the number of responses that go into residual categories). However, it does mean that small changes in movement of a variable need to be interpreted with caution.

2.4.2 Substitute forms

Substitute forms are created by Statistics New Zealand where there is sufficient evidence that either a person or an occupied dwelling exists, but no census form has been submitted for it (Statistics New Zealand 2001). Substitute forms make up about 2 percent of all census forms. These forms affect non-response rates because information not gained on substitute forms is generally set to ‘not stated’.

2.4.3 Imputation

Imputation is the process by which Statistics New Zealand allocates a value to a variable where no value has been stated by the respondent. Values for variables have been imputed by Statistics New Zealand in the stated census years as shown in table 2.1.

Table 2.1 Variable imputation by census year

* Indicates that this variable was only imputed for the rebased dataset for this year.

The value allocated by Statistics New Zealand is ascertained through a variety of methods. Imputation requires thorough testing before implementation. The following list outlines the process involved for the imputable variables listed in table 2.1 (taken from the Statistics New Zealand website).

Age: Age imputation supplies an age in years where this value is missing for an individual. This means that age will be imputed if it cannot be calculated from the response to date of birth. Age is imputed using various other responses from the individual; for example, whether they are legally married, responses supplied on the dwelling form, and the known distribution of ages in the population.

Sex: Sex imputation supplies a value of male or female where the response for the sex variable is missing. If they are available, the name of the person or their relationship to others in the household may be used to impute a value. Otherwise a value is assigned randomly, with 49 percent being imputed as male.

Work and labour force status: Work and labour force status imputation supplies a value for labour force status where this cannot be derived from the labour force information supplied by the respondent. The labour force status imputation uses whatever labour force information has been given, and various other responses from the individual (for example, age and income). A labour force status is then imputed to equal the known labour force status of a similar person.

Usual residence: Usual residence imputation supplies a value for the usual residence meshblock where a meshblock cannot be coded from the address information supplied by the respondent. The usual residence meshblock imputation uses whatever level of geographic information has been given and various other responses from the individual. A usual residence meshblock is then imputed based on the distribution of known usual residence meshblocks for similar people.

2.5 Other factors affecting census data interpretation

2.5.1 Non-response rates

Non-response rates provided by Statistics New Zealand are generally the percentage of respondents in the ‘not stated’ category for each variable. Before 2001, this residual category was called ‘not specified’. Non-response rates for 1996 and 2001 variables can be found in the census glossary publications.

2.5.2 Rebased datasets

Electronic data from previous censuses (1981–1996) has been rebased according to the current meshblock pattern to allow geography-based comparisons over time. The variables that are altered according to current patterns are usual residence, census night address and workplace address. This allows the different levels of aggregation of geographic variables (meshblock, area unit, territorial authority, regional council and national) to be held constant, so that meshblocks in 1981 and 2001 are defined by the same boundaries.

2.6 Assessing intercensal consistency of variables

As outlined in table 2.2, the information generated from census data may vary from census to census for a number of reasons. In order to accurately monitor and establish empirical relationships, researchers need to establish that any effect is a real effect, rather than one that has been caused by changes in the process of extracting and measuring the information provided. To do so, it is necessary to examine a variety of Statistics New Zealand publications in order to get a thorough understanding of census data and the changes that have taken place across the different census years.

2.6.1 Sources of intercensal variation

Table 2.2 Summary of sources of intercensal variation

Sources of intercensal variation
1) Removal or inclusion of the actual census question
2) Changes in the subject population for a question
3) Differences in the wording of the census questions asked
4) Changes in the layout of the census form
5) Changes in the format of the census question, e.g. single or multiple response, tick box or written response format
6) Differences in the guide note instructions that accompany the census question, although the impact of this is unknown as the number of respondents who read the guide notes is undetermined
7) Differences in the response options used in the census question
8) Changes in the way the data are collected. These changes are reasonably infrequent but do occur. The two major changes recently were in 1996 when the dwelling type variable was ascertained from responses from enumerators rather than respondents and in 2006 when it was possible to complete census forms online.
9) Changes in the classifications and definitions for a variable, which describe variable construction
10) Changes in the instructions given to enumerators, such as which dwellings to give forms to, and enumerator doorstop checks. For example in 1996, enumerators were required to check the whole form for completeness, whereas in 2001 they were only required to check the front page of the individual form.
11) Changes in processing practices, e.g. scanning, recognition and operator instructions
12) Changes in the way a particular variable is edited
13) Changes in the general editing practices from census to census
14) Changes in the variables for which responses are imputed, and changes in the way variables are imputed
15) Changes in the name of a variable
16) Changes in the number of variables constructed from responses to a census question
17) Changes in the way a variable has been derived:

  • alterations in the variables used to derive it
  • changes in the derivation process, e.g. what is done if information from one variable is missing
18) Changes in the classification of a variable:

  • the addition of extra classification categories
  • the deletion of previous classification categories
  • changes in the content of classification categories
  • differences in the way a classification groups things together or splits them up

In New Zealand, very few researchers have looked at intercensal variation. In 1991, Philip Morrison wrote an article entitled Change or Continuity in the Census: Problems of comparability in the New Zealand Census (Morrison 1991). This article provides an overview of the changes in census content and format on the personal form between the 1951 and 1991 Censuses. It also provides a detailed discussion of changes to census questions dealing with employment and work. A more up-to-date source that can be used for an overview of changes A guide to using data from the New Zealand Census: 1981–2006 in census topics is the Historical Summary of the Scope of the Census (Statistics New Zealand 2001). This provides a basic overview of the different census topics that have been covered on both dwelling and individual forms, from the inception of the census up to (and including) the 2001 Census.

2.6.2 Assessing the impact of intercensal variation

When using census data for time series analysis, all sources of intercensal variation need to be considered, and, where possible, evaluated as to their likely impact on the data (establishing the time series comparability of variables is a key aspect of this report). A good method for assessing the impact of intercensal variation is to use the following steps:

Identify the source of the variation

If there has been a change in instructions given to the respondent, it is necessary to note where this change occurred (i.e. in the guide notes or on the census form itself). Statistics New Zealand acknowledges that instructions in the guide notes are often not read and therefore not followed (Department of Statistics 1991). This report discusses all census instructions (including guide note instructions) as if they are followed by the respondent. However, it must be remembered when reading this document that guide note instructions appear not to be followed as often as instructions on the actual census form. Therefore, when evaluating the impact of changes on intercensal comparability, it will be assumed that changes in the guide notes will probably have had less impact on the data than changes to the questionnaire.

Estimate the magnitude (impact) of the effect

Estimate to what degree census data will be affected as a result of the variation. The impact of intercensal variation may, in many instances, be difficult to assess and quantify. While every attempt is made to minimise errors due to systems and processes, as with any survey it is not possible to know or eliminate all non-sampling error. That said, the impact of changes is often relatively minor.

For the sake of simplicity, it is best to use a binary scale, and assess the impact as either major or minor. The impact may depend on exactly what the source of the variation was. For example, a change in the availability of a variable, the underlying concept being measured, or the variable derivation will generally be assessed as major, whereas a change in the editing process or instructions given to respondents would generally be seen as minor.

Assess the effect of the variation on census data (direction of the effect)

Identify the likely outcome of the variation on the actual data collected. Make judgements as to whether data for a particular year will be overestimated or underestimated relative to other census years. If there is a best practice method, take this into account.

Identify if any manipulations can be made to increase variable comparability

Sometimes when a variable is not comparable across different census years there are ways in which the comparability of variable information can be increased.

Make an overall assessment of the comparability of each variable

The final comparability assessment of key variables across the 1981–2006 Censuses can be made in accordance with the criteria listed in the comparability assessment method (outlined in Table 2.3). This method takes into account the findings from the steps above and applies the variable comparability scale outlined in table 1.2.

Table 2.3 Comparability assessment method

Magnitude and number of variations     Manipulation available     Comparability assessment
Major None Limited comparability
Major Available Broadly comparable
Minor – many None Broadly comparable
Minor – many Available Highly comparable
Minor – few None Broadly comparable
Minor – one or two None Highly comparable


2.6.3 Methods to deal with intercensal variation

When a variable is assessed as being either broadly comparable or of limited comparability, time series analysis of this variable will be affected. As Morrison has pointed out, these changes can sometimes be rectified during analysis of the data (Morrison 1991). This means that, depending on the type of intercensal variation, there may be methods that can be used to make the data more comparable.

When a variable is missing for a particular census year, it may be possible to extract comparable information from a variable of another name. An example of this is tenure of household information, which can be gained from the nature of occupancy variable in earlier censuses.

If multiple variables have been constructed from responses to a question for one census year, but not others, then comparability may be increased by comparing multiple variables from one census year with one variable from another census year. For the 1981 post-school qualifications data, four variables need to be accessed in order to create categories that are comparable with other census years.

If the way in which a variable is derived has changed, this can sometimes be rectified by accessing the variables used to derive it, then re-deriving it according to a consistent method, usually the method used most recently. For example, the highest qualification variable was derived differently in 1996 than in 2001. In 1996, respondents who did not answer at least one of the component questions (on school or post-school qualifications) were put in the ‘not stated’ category. In 2001, if a response was given to either of these two questions, then respondents were allocated that value as their highest qualification. Using these different derivation methods led to an apparent decrease in the non-response rate of the highest qualification variable. In order to make information comparable over time, researchers can access the two component variables and use a consistent derivation process for all the censuses being investigated.

On occasion, Statistics New Zealand has re-derived a variable according to subsequent classifications in order to make time series information more comparable. One example of this is the labour force status variable for 1981. The 1981 variable pertaining to this information had different definitions of part-time and full-time work to subsequent censuses. In 1991, a labour force status variable for the 1981 dataset was re-derived according to subsequent definitions. This variable labour_force_status91 is available from the 1991 rebased dataset. Another variable on this rebased dataset for 1981 (highest_level_educ_attend) is also available.

When there is a change in the definition of a variable, it may be possible to make information more comparable by excluding particular classification categories from the analysis of previous census years. For example, for the 2001 Census, households were defined to exclude visitor-only dwellings. As this was previously a distinct category of the household composition classification, information can be made comparable by excluding this category from the analysis when using household-level information from census years before 2001. When trying to compare information across time, it is essential to devise comparable classification categories for the concept involved, rather than the exact output names attributed to categories for each year. For example, the school qualification variable can be classified according to year of schooling, rather than the exact names of the qualification gained for each year (which, like benefit income source categories, are subject to change).

Similarly, if a variable contains information on a variety of aspects related to the topic, it is possible that some types of information are comparable, while others are not. For example, although the post-school qualification variables do not provide comparable time series data on field of study, they can provide broadly comparable information on level of attainment.

When the number of classification categories for a particular variable changes for different census years, some categories may need to be aggregated to ascertain comparable time series information. This is illustrated in table 1 of appendix 7.3, which shows that in 1991, two classification categories need to be aggregated to ascertain the number of unemployed from the labour force status variable, whereas in 1996 and 2001, comparable information comes from just one classification category.

At times, there may be instances where the definition does not appear to accurately reflect the information that the data contains. For example, the definitions of cigarette smoking in 1981 and 1996 both include cigarettes and roll-your-owns, and exclude pipes and cigars and so appear to be comparable. However, examining the census questionnaire forms and guide notes alerts us to the issue that cigarettes were never specifically defined on the census form or guide notes in 1981. Therefore, people who smoked roll-your-owns may not have counted themselves as smokers, and people who smoked cigars may have counted themselves as smokers. Although both definitions include roll-your-owns, neither the 1981 nor the 1996 question says to include them, so some people who smoked roll-your-owns may not have counted themselves as smokers; therefore, the data obtained may not exactly fit with the definition associated with that data. The effect of this wording difference may be quite minimal, but it is difficult to quantify exactly. No action can be taken to make the information more comparable.

Note: Variable definition of cigarette smoking from Statistics New Zealand Concepts classifications and definitions documents:
1981 ‘A regular smoker was defined as a person who currently smokes one or more cigarettes per day, including roll-your-own, but excluding pipe or cigar smokers’.
1996 ‘Cigarette smoking refers to the active smoking of any tobacco products including manufactured and hand-rolled cigarettes (excluding cigars, pipe tobacco and cigarillos). It does not include the smoking of any other substances, for example herbal cigarettes or marijuana, but does include the smoking of home grown tobacco’.
The main difference in these definitions lies in the exclusion of marijuana and passive smokers in the 1996 document (both of which may not count themselves as cigarette smokers anyway).

Other changes, such as in the wording, subject population, guide note instructions, format of the questionnaire and data collection and processing, are irreversible (Morrison 1991). For example, before 1981 the subject population for most census questions was the ‘de facto’ population, which included overseas visitors and temporary residents. In the 1981 Census and all subsequent censuses, the population was divided into two groups: the ‘de jure’ population, or census night usually resident population count, which excludes overseas visitors and the ‘de facto’ population, which is everyone in New Zealand on census night.

2.7 Questions in 1981–2006 Censuses

Tables 2.4, 2.5, 2.6 and 2.7 provide a list of the questions asked in the 1981–2006 Censuses that relate to the variables examined in this report. In order to ascertain comparability, these are grouped according to the type of information they seek to extract, rather than according to the exact wording of the question. Grouping census questions according to exact wording would result in a large number of questions that were only asked in one census.

Table 2.4 Socio-demographic questions asked in the 1981–2006 Censuses, from individual forms

Census year
Census question 1981            1986         1991          1996                 2001         2006    
Name q1 + + q1 q2 q2
Sex q2 q4 q4 q6 q3 q3
Date of birth q3 & 4 q5 q5 q7 q4 q4
Census night address q6 + + q5 q8 q8
Usual residential address q7 q1 q1 q2 q5 q5
Usual residential address at previous census / five years ago     q9 q3 q3 q4 q7 q7
Years at usual residence q2 q2 q3 q6 q6
Country of birth q10 q7 q10 q8 q9 q9
Number of years in New Zealand* q10 q8 q9 q10 q10
Religion q11 q10 q12 q15 q18 q18
Ethnic origin/group q12 q9 q7 q10 q11 q11
Māori ancestry/Māori descent q8 q13 q16 q14
Iwi q9 q14 q17 q15
Marital status (legal) q14 q12 q13 q16, q17,
q18 & q19
q21 q23
De facto status q14 q11 q11
Social marital status q16 q19 q19
Number of children born q15 q29 q25
Ability to converse in certain languages q12 q13 q13
Highest secondary qualification q26
Highest post-school qualification q27 & q28
Unpaid activities q46

* In 1996, this question was changed to the month and year that the person first arrived to live in New Zealand, and the number of years in New Zealand was derived from this.
+ Unnumbered questions asked at the beginning of the personal or dwelling questionnaire forms.

Table 2.5 Income- and employment-related questions in the 1981–2006 Censuses

Census year
Census question 1981            1986         1991          1996                 2001         2006    
Availability for work q24 q53 q40 q45
Hours worked q16 q22 q26 q48 q35 q40
Industry q19, q20 & q21 q24, q25 & q26 q28, q29 & q30 q45 & q46 q32 & q33 q37 & q38
Job search methods q23 q52 q39 q44
Main means of travel to work q22 q27 q31 q49 q36 q41
Occupation q18 q23 q27 q43 & q44 q30 & q31 q35 & q36
Seeking work q19 q22 q51 q38 q43
Sources of personal income q23 q13 q14 & q21 q35 q25 q30
Status in employment q17 q21 q25 q42 q29 q34
Total personal income q24 & q25 q14 q15 q36 q26 q31


Table 2.6 Family- and household-related questions in the 1981–2006 Censuses

Census year
Census question 1981 1986 1991 1996 2001 2006
Number of occupants in the dwelling on census night q3 q1 q1 q2 q2
Persons absent on census night q18 q9 q8 q19 q20
Household composition q2, q3, q4, q5, q7 & q14 q1, q4, q5, q6, q11 & q12 q1, q4, q5, q6, q11 & q13 q2, q3 DF, q6, q7, q16, q17, q18, q20, q21, q22 & q23 q4 DF, q3, q4, q5, q19 & q21 q6 DF, q21 DF, q3, q4, q5, q19 & q23
Household composition with child dependency status (uses the household composition variable already derived, and age and labour force status) q4, q16 & q17 q5, q16, q19, q20 & q22 q5, q21, q22, q23, q24 & q26 q7, q40, q48, q51, q52 & q53 q4, q27, q35, q38, q39 & q40 q4, q32, q40, q43, q44 & q45
Living arrangements (including de facto status q11 q11 q16, q20, q21, q22 & q23 q19 q19
Relationship to reference person* q5 q6 q6 q3 DF q4 DF q6 DF


Table 2.7 Dwelling-related questions in the 1981–2006 Censuses

Census year
Census question 1981 1986 1991 1996 2001 2006
Access to Telecommunications q15 q16 q16 q17
Dwelling Type q4 q2 q2 q5 q4 & q5
Heating Fuels Used q8 q6 q6 q15 q15 q16
Mortgage Payments q9 q4 q4 q9 q8 q13
Motor Vehicles q17 q8 q7 q10 q17 q18
Number of Bedrooms q13 q3 q11 q13 q14
Number of Heating Fuels q8 q6 q6 q15 q15 q16
Sector of Landlord q10 q5 q5 q5 q10 q10
Tenure of Household q9 & q10 q4 & q5     q4 & q5     q4, q7, q8 & q9     q8, q9, q11 & q12     q7, q8, q9, q11, q12 & q13    
Weekly Rent Paid by Household q10 q5 q5 q8 q12 q12


2.8 Accessing Statistics New Zealand census resources

Although all reasonable steps have been taken to ensure that web addresses in this report are up-to-date and accurate, they are subject to change, and at the time of writing Statistics New Zealand was in the process of structural change. All Statistics New Zealand links should be available through the Statistics New Zealand website at www.stats.govt.nz.

Statistics New Zealand has a large variety of metadata surrounding the creation, definition, interpretation and comparability of census variables, especially for recent census years. Metadata is data about data and is used to gain an understanding about data, and to ascertain the most appropriate ways to use it (Statistics New Zealand 2004), but much of the Statistics New Zealand metadata is spread across many different documents, and contained in publications specific to the census year being covered by the metadata. This metadata is also presented in a variety of formats, with little longitudinal analysis of it. This report intends to make a contribution towards a longitudinal understanding of variables, using the metadata available from Statistics New Zealand.

Metadata for recent censuses is generally available electronically. For less recent censuses, publications can often be accessed through public or university libraries. Statistics New Zealand has its own library, and if certain publications cannot be obtained elsewhere, it is possible to request a copy of the required documentation from Statistics New Zealand – there may be a fee for this service. In order to progress this project and contribute towards ease of use for future researchers, we have compiled a list of some of the resources available surrounding the census, and where these resources can be accessed.

2.8.1 Census questionnaires

All New Zealand census forms from 1906 onwards (both dwelling and individual) can be found in the 2006 Statistics New Zealand publication, Definitions and Questionnaires. This is available in hard copy or on the Statistics New Zealand website at www.stats.govt.nz/census/2006-census-information-about-data/2006-definitions-questionnaires/default.htm. The census forms referred to in this report can also be found at this link - click on 'Forms' near the top of the page.

2.8.2 Census guide notes/help notes

Census guide notes accompany the census forms that are delivered to every dwelling. They provide extra information for respondents on how to fill out the questionnaire. There are guide notes for both individual and dwelling forms. In 2001, the guide notes were called help notes.

Table 2.8 Where to access Statistics New Zealand guide notes/help notes for 1981–2006 Censuses

Year Resource
1981      Refer to the back of hard copy publications from the 1981 Census – Volume 12 Population Perspectives 81: General Report (page 162 for the individual form guide notes, page 169 for the dwelling form guide notes).
1986 Refer to 1986 Census of Population and Dwellings: Questionnaire Contents and Submissions Report, Department of Statistics (1985), Wellington.
1991 Refer to the back of hard copy publications from the 1991 Census, for example, Range and Availability of Statistics (page 110 for the individual form guide notes, page 116 for the dwelling form guide notes) and National Summary (page 55 for the individual form guide notes, page 61 for the dwelling form guide notes).
1996 Individual form help notes
Dwelling form help notes
2001 Individual and dwelling forms help notes
2006 The individual and dwelling forms help notes are at the end of the census questionnaires


2.8.3 Concepts, definitions and classifications documentation

In the census variable definitions and classifications sections of this report, the text quoted for definitional purposes is sourced from the 2001 Census definitions. This is then compared to previous definitions in order to highlight similarities and differences. Table 2.10 indicates where to access Statistics New Zealand information on the definitions of variables.

Table 2.9 Where to access Statistics New Zealand classifications and definitions for the 1981–2006 Censuses

Year Resource
1981      New Zealand Census of Population and Dwellings 1981 – Range and Availability of Statistics (see pages 9–15).
1986 New Zealand Census of Population and Dwellings 1986 – General Information (refer to section 2, pages 23–74. Please note that these definitions also contain retrospective information for the 1981 Census).
1991 Concepts, Definitions and Classifications (entire document).
1996 An Introduction to the Census (refer to section 12).
2001 Definitions and Questionnaires available in hard copy or via the Statistics New Zealand website
2006 Information by variable


2.8.4 Other information on census questions, content and processes

Prior to the census, discussion documents are circulated, and end users and interested parties are consulted about the contents of the census. An interim report (preliminary views on content) is then published. The 2001 and 2006 preliminary views on content publications are available from the Statistics New Zealand website. The preliminary reports form the basis for broad discussion about the content of the upcoming census. The exact content of this report varies from census to census, but it always covers criteria for determining census content, a brief overview of the main topics covered in the census, and the submissions made that relate to each of these. The 2001 report also provides a brief history of some variables. The 2006 preliminary views on content contain two appendices that are particularly useful to external researchers. These entail: a survey information table (outlining other surveys conducted by various organisations, including type, frequency and available related products or services), and a series of additional data source tables.

The final report on content outlines Statistics New Zealand’s final decisions on content for the next census, that is, which topics and variables will be included, along with the rationale behind these decisions. Sometimes the final report on content provides useful information as to how and why a variable may have changed between censuses.

2.8.5 Data dictionaries

Data dictionaries contain a not necessarily exhaustive list of variables generated from each census and the coded classification categories for each variable. The 2006 data dictionary is available from the Statistics New Zealand website.

The 1996 data dictionary is also available in electronic form. The 1981, 1986 and 1991 data dictionaries are available in hard copy, and information on certain variables is also available electronically. Any request for data dictionaries should be made to Statistics New Zealand, customer services.

2.8.6 Factual output information from previous censuses

A range of Statistics New Zealand products and services, including Table Builder, can be found on the Statistics New Zealand website.

Some of the resources that may be of use to researchers include:
Statistics New Zealand tabular and analytic reports
Statistics New Zealand produces several reports that can be generally described as either analytical or tabular. Tabular reports contain very little text and predominantly consist of tables. Prior to 2001, these were available in printed form only; for 2001 the tables are also accessible online. Analytical reports contain more description, discussion, graphical presentation and analysis of the data, and often incorporate information from other data sources. A series of analytical reports aimed at a wide general readership, called New Zealand Now, were produced following the 1991 and 1996 Censuses. The 1996 series is available in printed form and many of the component reports are available on the Statistics New Zealand website under the description ‘New Zealand Stories’.

There were no analytical reports produced directly from the 2001 Census, but a series of reference reports, also called topic-based reports, on various topics (for example, ethnic groups, housing) is available for download from the Statistics New Zealand website. These are predominantly tabular reports, but do also contain some pages of highlights. A similar set of reports is available for the 1996 Census.

Table Builder
Table Builder enables the user to access aggregated information in the form of tables and is available on the Statistics New Zealand website. It is a product for building tables not only for population census data, but also for income, injury, agriculture, business and import/export statistics. Tables are interactively built by the user from a selection of variables. For the population census, these include census year (1991, 1996 and 2001), geographic area (regional council, territorial authority and area unit) and a range of output census variables. The tables can be downloaded from the website in several different formats (e.g. Excel). Help notes on how to use Table Builder are also available.

Access to unit record data through the data lab facility
Access to anonymised unit record statistical data is currently managed through Statistics New Zealand’s data laboratory (data lab). There is a data lab in each of Statistics New Zealand’s Auckland, Wellington and Christchurch offices. Access to unit record data may be obtained by submitting a proposal outlining details of the proposed research. Applicants need to provide specific information on the researchers’ backgrounds, the dataset(s) and variables required, methods of analysis and intended outputs. The proposal is then considered by Statistics New Zealand and access is provided at the discretion of the Government Statistician. If the proposal is approved, costs are estimated and conditions of access are negotiated. All researchers are required to sign a declaration of secrecy as specified in the Statistics Act 1975.

2001 Census output information
Information pertaining to 2001 Census outputs.
The 2001 Census snapshots are particularly useful for a quick, broad overview of factual information on particular topics. These documents provide information about different topics from the 2001 Census, and overview indicator and variable context, allowing the user to explore associations, trends and patterns in different variables. These documents are available as separate downloadable PDF files for each topic.

2.8.7 New Statistics New Zealand initiatives currently under development

Statistics New Zealand continues to provide new initiatives and services for interested external researchers and users of their data. The three recent schemes presented provide useful ways for researchers to access Statistics New Zealand data and expertise. The source of information for this section of the report is a communication with the expert data users group, which external researchers and interested parties may join.
To subscribe to this newsletter, send an email to listserv@stats.govt.nz (listserv@stats.govt.nz) with ‘subscribe expert user’ in the subject line.

The Official Statistics Research and Data Archive Centre (OSRDAC)
OSRDAC will provide a single access point for all Tier 1 unit record data and administrative data for use by government, university and other researchers. Preliminary work has begun on the design of this facility and the process for lodging and processing unit record data.

Note: Tier 1 statistics will be determined primarily by their purpose, not their producer. These statistics will have most of the following attributes: essential to central government decision making, high public interest, meet public expectations of impartiality and statistical quality, require long-term continuity of the data, provide international comparability or meet international statistical obligations.

Official Statistics Portal
Users will be able to access a full list of the statistics produced by government agencies through an official statistics portal, currently being designed.

Confidentialised Unit Record Files (CURFs)
CURFs are datasets that contain individual-level data arranged in a way that does not identify any individual’s identity (Statistics New Zealand). This enables external researchers outside Statistics New Zealand to access individual-level data for research purposes. The data provided differ from the unit-level data accessible in the data lab; some modifications will be made to the data and it is likely that there will be restrictions on the level to which data are available (Statistics New Zealand). The dataset provided is therefore ‘perturbed’ slightly from the real data gained from the census, in order to ensure confidentiality. However, unlike with current census datasets, researchers will be able to
analyse data at their own workplace, rather than at the Statistics New Zealand data lab. For authorisation to access CURFs, researchers must comply with the ethical and security obligations set out by Statistics New Zealand.

2.8.8 Further information on variables

Other research tools that provide useful information about census variables are:
Statistical standards – These documents contain guidelines on how to collect and categorise information on a particular topic. They cover aspects such as questionnaire requirements, definitions and classifications. Statistical standards are designed for use in various data collections, including surveys and administrative collections. These standards are guidelines only and lack of data or other complexities associated with the census may mean they are not strictly followed in constructing census variables (or other datasets). The purpose of these standards is to facilitate consistency in the way variables are collected and classified across several surveys and across time. Such consistency enhances comparability, enriching the body of data available for analysis. Statistical standards are available online.

Summary profiles – Another rich source of variable information is the Information about the Census of Population and Dwellings for each census year. This information is particularly useful for the 1981 and 1986 Censuses, as documentation surrounding these earlier census years is scarce, and also generally difficult to access. These documents for earlier census years (1981 and 1986) contain information such as lists of output variables that are available, a description of what output variables entail, and in some instances, whether variables are derived and if so, how. They also contain references to the census questions that variables are constructed from, and in some instances, reprints of the questions and/or classification categories. Summary profiles are available online.

Census classifications for 1996 – This resource consists of a set of documents that provide classifications and standards used in the 1996 Census of population and dwellings. It features a mix of introduction, structure, definition and code descriptor sections. These documents can be assessed online.

Variable glossary definitions for the 2001 Census – These are a rich source of information about the main variables used in the 2001 Census. They contain a definition of each variable, a description of the question number and which questionnaire form the question was asked on, and the relevant subject population. Furthermore, they comment on non-response rates to census questions (in 1996 and 2001), the quality level of variables, their comparability with previous censuses, and things to be aware of when using the variables. These documents also note whether variables have been derived, and if so, from what. It should be noted that this list is not exhaustive for all variables and all years, but it does provide a good starting point for thinking about consistency and comparability. It must also be borne in mind that comparability of the 2001 variables is discussed with reference to the 1996 and 1991 variables only, not to variables constructed from any earlier censuses. These documents can be accessed online.