Four different levels of aggregation of information used by Statistics New Zealand are important to understand for research; these are the dwelling, household, family, and individual levels (the individual is sometimes called the personal level in Statistics New Zealand publications). Each specific variable will be associated with one of these levels of aggregation; the level of a particular variable can be ascertained from Statistics New Zealand data dictionaries.
Figure 2.1 shows the levels of aggregation of census information. Geographic information can relate to any of the levels of aggregation and is itself available at various levels of aggregation. It should be noted that for private occupied dwellings, the dwelling and the household are basically interchangeable levels of analysis because each private occupied dwelling contains a (that is, one) household within it. The sole exception to this is private occupied dwellings that contain visitors only. However, the dwelling refers to the physical structure (that is, the building), and the household refers to the group of people who live within the dwelling.
The Family and Whānau Wellbeing Project (FWWP) looks at wellbeing indicators at the family and household levels and how these can be used to understand changes in society over time. The interrelationships between the four levels of analysis and key related variables are explained in more depth in figure 2.2.

Source: The basis of this diagram was sourced from Statistics New Zealand (Statistics New Zealand 1999), standard terms for dwellings, households and families.
The variable construction process is integral to the output data generated for each census and consequently affects intercensal comparability.
Figure 2.3 shows the process of variable construction using census information. Respondents answer questions on the individual and dwelling forms. These responses are then processed, and in some instances edited. When no answer is given to certain questions, a value for that variable may be imputed.

* It should be noted that ‘Key Statistics New Zealand resources to consult’ is not an exhaustive list.
The responses are coded into classification categories according to the classification relevant to each variable. A classification assigns data reported for a particular variable into categories according to shared characteristics. This facilitates the accurate and systematic arrangement of data according to common properties, so that the resulting statistics are reproducible, comparable with data from other sources and comparable over time. In some instances, variables take into account answers to more than one census question. In these cases, answers are combined to form a derived variable. Each classification category has an associated ‘SAS code’, a value used in the statistical software package SAS to keep track of variable formats. The classification categories of each output variable and their associated codes are outlined in the relevant data dictionary for each census year.
An example of a variable classification from the 2001 data dictionary is shown below to familiarise the reader with the terminology and format.
The concepts and definitions relevant to the classification for each variable are outlined in concepts, classifications and definitions documentation (see section 2.8.3 for available references).
In some instances (usually when multiple response options are possible), multiple variables are constructed from respondents’ answers to one census question. The income source data from 1986 are an example of this. Other constructed variables take into account respondents’ answers to more than one census question – these are called derived variables. For example in 1991, labour force status was ascertained from responses to five different census questions. The process used to derive variables is sometimes outlined in concepts and classifications documentation, and for 2001 it is also outlined in the census glossary sheets available on the Statistics New Zealand website.
As collector and custodian of census data, Statistics New Zealand conducts various internal checks on the quality of the data. These checks may also be relevant to our data analysis. Before 2001, there were more consistency edits and Statistics New Zealand had ‘tried to tidy the data by editing every variable to eliminate inconsistency’ (Statistics New Zealand; for further explanation of the quality of census variables and the distinctions between the levels of variables for editing purposes, refer to section 7 of the 2001 Introduction to the Census). In 2001, a different approach was taken, and the level of editing regarding a variable was dependent on the level of importance of the variable (foremost, defining or supplementary). The effect of such changes in editing over time or between censuses is hard to quantify (in some cases, the effect of consistency edits may be to increase the number of responses that go into residual categories). However, it does mean that small changes in movement of a variable need to be interpreted with caution.
Substitute forms are created by Statistics New Zealand where there is sufficient evidence that either a person or an occupied dwelling exists, but no census form has been submitted for it (Statistics New Zealand 2001). Substitute forms make up about 2 percent of all census forms. These forms affect non-response rates because information not gained on substitute forms is generally set to ‘not stated’.
Imputation is the process by which Statistics New Zealand allocates a value to a variable where no value has been stated by the respondent. Values for variables have been imputed by Statistics New Zealand in the stated census years as shown in table 2.1.

* Indicates that this variable was only imputed for the rebased dataset for this year.
The value allocated by Statistics New Zealand is ascertained through a variety of methods. Imputation requires thorough testing before implementation. The following list outlines the process involved for the imputable variables listed in table 2.1 (taken from the Statistics New Zealand website).
Age: Age imputation supplies an age in years where this value is missing for an individual. This means that age will be imputed if it cannot be calculated from the response to date of birth. Age is imputed using various other responses from the individual; for example, whether they are legally married, responses supplied on the dwelling form, and the known distribution of ages in the population.
Sex: Sex imputation supplies a value of male or female where the response for the sex variable is missing. If they are available, the name of the person or their relationship to others in the household may be used to impute a value. Otherwise a value is assigned randomly, with 49 percent being imputed as male.
Work and labour force status: Work and labour force status imputation supplies a value for labour force status where this cannot be derived from the labour force information supplied by the respondent. The labour force status imputation uses whatever labour force information has been given, and various other responses from the individual (for example, age and income). A labour force status is then imputed to equal the known labour force status of a similar person.
Usual residence: Usual residence imputation supplies a value for the usual residence meshblock where a meshblock cannot be coded from the address information supplied by the respondent. The usual residence meshblock imputation uses whatever level of geographic information has been given and various other responses from the individual. A usual residence meshblock is then imputed based on the distribution of known usual residence meshblocks for similar people.
Non-response rates provided by Statistics New Zealand are generally the percentage of respondents in the ‘not stated’ category for each variable. Before 2001, this residual category was called ‘not specified’. Non-response rates for 1996 and 2001 variables can be found in the census glossary publications.
Electronic data from previous censuses (1981–1996) has been rebased according to the current meshblock pattern to allow geography-based comparisons over time. The variables that are altered according to current patterns are usual residence, census night address and workplace address. This allows the different levels of aggregation of geographic variables (meshblock, area unit, territorial authority, regional council and national) to be held constant, so that meshblocks in 1981 and 2001 are defined by the same boundaries.
As outlined in table 2.2, the information generated from census data may vary from census to census for a number of reasons. In order to accurately monitor and establish empirical relationships, researchers need to establish that any effect is a real effect, rather than one that has been caused by changes in the process of extracting and measuring the information provided. To do so, it is necessary to examine a variety of Statistics New Zealand publications in order to get a thorough understanding of census data and the changes that have taken place across the different census years.
| Sources of intercensal variation |
| 1) Removal or inclusion of the actual census question |
| 2) Changes in the subject population for a question |
| 3) Differences in the wording of the census questions asked |
| 4) Changes in the layout of the census form |
| 5) Changes in the format of the census question, e.g. single or multiple response, tick box or written response format |
| 6) Differences in the guide note instructions that accompany the census question, although the impact of this is unknown as the number of respondents who read the guide notes is undetermined |
| 7) Differences in the response options used in the census question |
| 8) Changes in the way the data are collected. These changes are reasonably infrequent but do occur. The two major changes recently were in 1996 when the dwelling type variable was ascertained from responses from enumerators rather than respondents and in 2006 when it was possible to complete census forms online. |
| 9) Changes in the classifications and definitions for a variable, which describe variable construction |
| 10) Changes in the instructions given to enumerators, such as which dwellings to give forms to, and enumerator doorstop checks. For example in 1996, enumerators were required to check the whole form for completeness, whereas in 2001 they were only required to check the front page of the individual form. |
| 11) Changes in processing practices, e.g. scanning, recognition and operator instructions |
| 12) Changes in the way a particular variable is edited |
| 13) Changes in the general editing practices from census to census |
| 14) Changes in the variables for which responses are imputed, and changes in the way variables are imputed |
| 15) Changes in the name of a variable |
| 16) Changes in the number of variables constructed from responses to a census question |
17) Changes in the way a variable has been derived:
|
18) Changes in the classification of a variable:
|
In New Zealand, very few researchers have looked at intercensal variation. In 1991, Philip Morrison wrote an article entitled Change or Continuity in the Census: Problems of comparability in the New Zealand Census (Morrison 1991). This article provides an overview of the changes in census content and format on the personal form between the 1951 and 1991 Censuses. It also provides a detailed discussion of changes to census questions dealing with employment and work. A more up-to-date source that can be used for an overview of changes A guide to using data from the New Zealand Census: 1981–2006 in census topics is the Historical Summary of the Scope of the Census (Statistics New Zealand 2001). This provides a basic overview of the different census topics that have been covered on both dwelling and individual forms, from the inception of the census up to (and including) the 2001 Census.
When using census data for time series analysis, all sources of intercensal variation need to be considered, and, where possible, evaluated as to their likely impact on the data (establishing the time series comparability of variables is a key aspect of this report). A good method for assessing the impact of intercensal variation is to use the following steps:
If there has been a change in instructions given to the respondent, it is necessary to note where this change occurred (i.e. in the guide notes or on the census form itself). Statistics New Zealand acknowledges that instructions in the guide notes are often not read and therefore not followed (Department of Statistics 1991). This report discusses all census instructions (including guide note instructions) as if they are followed by the respondent. However, it must be remembered when reading this document that guide note instructions appear not to be followed as often as instructions on the actual census form. Therefore, when evaluating the impact of changes on intercensal comparability, it will be assumed that changes in the guide notes will probably have had less impact on the data than changes to the questionnaire.
Estimate to what degree census data will be affected as a result of the variation. The impact of intercensal variation may, in many instances, be difficult to assess and quantify. While every attempt is made to minimise errors due to systems and processes, as with any survey it is not possible to know or eliminate all non-sampling error. That said, the impact of changes is often relatively minor.
For the sake of simplicity, it is best to use a binary scale, and assess the impact as either major or minor. The impact may depend on exactly what the source of the variation was. For example, a change in the availability of a variable, the underlying concept being measured, or the variable derivation will generally be assessed as major, whereas a change in the editing process or instructions given to respondents would generally be seen as minor.
Identify the likely outcome of the variation on the actual data collected. Make judgements as to whether data for a particular year will be overestimated or underestimated relative to other census years. If there is a best practice method, take this into account.
Sometimes when a variable is not comparable across different census years there are ways in which the comparability of variable information can be increased.
The final comparability assessment of key variables across the 1981–2006 Censuses can be made in accordance with the criteria listed in the comparability assessment method (outlined in Table 2.3). This method takes into account the findings from the steps above and applies the variable comparability scale outlined in table 1.2.
| Magnitude and number of variations | Manipulation available | Comparability assessment |
| Major | None | Limited comparability |
| Major | Available | Broadly comparable |
| Minor – many | None | Broadly comparable |
| Minor – many | Available | Highly comparable |
| Minor – few | None | Broadly comparable |
| Minor – one or two | None | Highly comparable |
When a variable is assessed as being either broadly comparable or of limited comparability, time series analysis of this variable will be affected. As Morrison has pointed out, these changes can sometimes be rectified during analysis of the data (Morrison 1991). This means that, depending on the type of intercensal variation, there may be methods that can be used to make the data more comparable.
When a variable is missing for a particular census year, it may be possible to extract comparable information from a variable of another name. An example of this is tenure of household information, which can be gained from the nature of occupancy variable in earlier censuses.
If multiple variables have been constructed from responses to a question for one census year, but not others, then comparability may be increased by comparing multiple variables from one census year with one variable from another census year. For the 1981 post-school qualifications data, four variables need to be accessed in order to create categories that are comparable with other census years.
If the way in which a variable is derived has changed, this can sometimes be rectified by accessing the variables used to derive it, then re-deriving it according to a consistent method, usually the method used most recently. For example, the highest qualification variable was derived differently in 1996 than in 2001. In 1996, respondents who did not answer at least one of the component questions (on school or post-school qualifications) were put in the ‘not stated’ category. In 2001, if a response was given to either of these two questions, then respondents were allocated that value as their highest qualification. Using these different derivation methods led to an apparent decrease in the non-response rate of the highest qualification variable. In order to make information comparable over time, researchers can access the two component variables and use a consistent derivation process for all the censuses being investigated.
On occasion, Statistics New Zealand has re-derived a variable according to subsequent classifications in order to make time series information more comparable. One example of this is the labour force status variable for 1981. The 1981 variable pertaining to this information had different definitions of part-time and full-time work to subsequent censuses. In 1991, a labour force status variable for the 1981 dataset was re-derived according to subsequent definitions. This variable labour_force_status91 is available from the 1991 rebased dataset. Another variable on this rebased dataset for 1981 (highest_level_educ_attend) is also available.
When there is a change in the definition of a variable, it may be possible to make information more comparable by excluding particular classification categories from the analysis of previous census years. For example, for the 2001 Census, households were defined to exclude visitor-only dwellings. As this was previously a distinct category of the household composition classification, information can be made comparable by excluding this category from the analysis when using household-level information from census years before 2001. When trying to compare information across time, it is essential to devise comparable classification categories for the concept involved, rather than the exact output names attributed to categories for each year. For example, the school qualification variable can be classified according to year of schooling, rather than the exact names of the qualification gained for each year (which, like benefit income source categories, are subject to change).
Similarly, if a variable contains information on a variety of aspects related to the topic, it is possible that some types of information are comparable, while others are not. For example, although the post-school qualification variables do not provide comparable time series data on field of study, they can provide broadly comparable information on level of attainment.
When the number of classification categories for a particular variable changes for different census years, some categories may need to be aggregated to ascertain comparable time series information. This is illustrated in table 1 of appendix 7.3, which shows that in 1991, two classification categories need to be aggregated to ascertain the number of unemployed from the labour force status variable, whereas in 1996 and 2001, comparable information comes from just one classification category.
At times, there may be instances where the definition does not appear to accurately reflect the information that the data contains. For example, the definitions of cigarette smoking in 1981 and 1996 both include cigarettes and roll-your-owns, and exclude pipes and cigars and so appear to be comparable. However, examining the census questionnaire forms and guide notes alerts us to the issue that cigarettes were never specifically defined on the census form or guide notes in 1981. Therefore, people who smoked roll-your-owns may not have counted themselves as smokers, and people who smoked cigars may have counted themselves as smokers. Although both definitions include roll-your-owns, neither the 1981 nor the 1996 question says to include them, so some people who smoked roll-your-owns may not have counted themselves as smokers; therefore, the data obtained may not exactly fit with the definition associated with that data. The effect of this wording difference may be quite minimal, but it is difficult to quantify exactly. No action can be taken to make the information more comparable.
Note: Variable definition of cigarette smoking from Statistics New Zealand Concepts classifications and definitions documents:
1981 ‘A regular smoker was defined as a person who currently smokes one or more cigarettes per day, including roll-your-own, but excluding pipe or cigar smokers’.
1996 ‘Cigarette smoking refers to the active smoking of any tobacco products including manufactured and hand-rolled cigarettes (excluding cigars, pipe tobacco and cigarillos). It does not include the smoking of any other substances, for example herbal cigarettes or marijuana, but does include the smoking of home grown tobacco’.
The main difference in these definitions lies in the exclusion of marijuana and passive smokers in the 1996 document (both of which may not count themselves as cigarette smokers anyway).
Other changes, such as in the wording, subject population, guide note instructions, format of the questionnaire and data collection and processing, are irreversible (Morrison 1991). For example, before 1981 the subject population for most census questions was the ‘de facto’ population, which included overseas visitors and temporary residents. In the 1981 Census and all subsequent censuses, the population was divided into two groups: the ‘de jure’ population, or census night usually resident population count, which excludes overseas visitors and the ‘de facto’ population, which is everyone in New Zealand on census night.
Tables 2.4, 2.5, 2.6 and 2.7 provide a list of the questions asked in the 1981–2006 Censuses that relate to the variables examined in this report. In order to ascertain comparability, these are grouped according to the type of information they seek to extract, rather than according to the exact wording of the question. Grouping census questions according to exact wording would result in a large number of questions that were only asked in one census.
| Census year | ||||||
| Census question | 1981 | 1986 | 1991 | 1996 | 2001 | 2006 |
| Name | q1 | + | + | q1 | q2 | q2 |
| Sex | q2 | q4 | q4 | q6 | q3 | q3 |
| Date of birth | q3 & 4 | q5 | q5 | q7 | q4 | q4 |
| Census night address | q6 | + | + | q5 | q8 | q8 |
| Usual residential address | q7 | q1 | q1 | q2 | q5 | q5 |
| Usual residential address at previous census / five years ago | q9 | q3 | q3 | q4 | q7 | q7 |
| Years at usual residence | q2 | q2 | q3 | q6 | q6 | |
| Country of birth | q10 | q7 | q10 | q8 | q9 | q9 |
| Number of years in New Zealand* | q10 | q8 | q9 | q10 | q10 | |
| Religion | q11 | q10 | q12 | q15 | q18 | q18 |
| Ethnic origin/group | q12 | q9 | q7 | q10 | q11 | q11 |
| Māori ancestry/Māori descent | q8 | q13 | q16 | q14 | ||
| Iwi | q9 | q14 | q17 | q15 | ||
| Marital status (legal) | q14 | q12 | q13 | q16, q17, q18 & q19 |
q21 | q23 |
| De facto status | q14 | q11 | q11 | |||
| Social marital status | q16 | q19 | q19 | |||
| Number of children born | q15 | q29 | q25 | |||
| Ability to converse in certain languages | q12 | q13 | q13 | |||
| Highest secondary qualification | q26 | |||||
| Highest post-school qualification | q27 & q28 | |||||
| Unpaid activities | q46 |
* In 1996, this question was changed to the month and year that the person first arrived to live in New Zealand, and the number of years in New Zealand was derived from this.
+ Unnumbered questions asked at the beginning of the personal or dwelling questionnaire forms.
| Census year | ||||||
| Census question | 1981 | 1986 | 1991 | 1996 | 2001 | 2006 |
| Availability for work | q24 | q53 | q40 | q45 | ||
| Hours worked | q16 | q22 | q26 | q48 | q35 | q40 |
| Industry | q19, q20 & q21 | q24, q25 & q26 | q28, q29 & q30 | q45 & q46 | q32 & q33 | q37 & q38 |
| Job search methods | q23 | q52 | q39 | q44 | ||
| Main means of travel to work | q22 | q27 | q31 | q49 | q36 | q41 |
| Occupation | q18 | q23 | q27 | q43 & q44 | q30 & q31 | q35 & q36 |
| Seeking work | q19 | q22 | q51 | q38 | q43 | |
| Sources of personal income | q23 | q13 | q14 & q21 | q35 | q25 | q30 |
| Status in employment | q17 | q21 | q25 | q42 | q29 | q34 |
| Total personal income | q24 & q25 | q14 | q15 | q36 | q26 | q31 |
| Census year | ||||||
| Census question | 1981 | 1986 | 1991 | 1996 | 2001 | 2006 |
| Number of occupants in the dwelling on census night | q3 | q1 | q1 | q2 | q2 | |
| Persons absent on census night | q18 | q9 | q8 | q19 | q20 | |
| Household composition | q2, q3, q4, q5, q7 & q14 | q1, q4, q5, q6, q11 & q12 | q1, q4, q5, q6, q11 & q13 | q2, q3 DF, q6, q7, q16, q17, q18, q20, q21, q22 & q23 | q4 DF, q3, q4, q5, q19 & q21 | q6 DF, q21 DF, q3, q4, q5, q19 & q23 |
| Household composition with child dependency status (uses the household composition variable already derived, and age and labour force status) | q4, q16 & q17 | q5, q16, q19, q20 & q22 | q5, q21, q22, q23, q24 & q26 | q7, q40, q48, q51, q52 & q53 | q4, q27, q35, q38, q39 & q40 | q4, q32, q40, q43, q44 & q45 |
| Living arrangements (including de facto status | q11 | q11 | q16, q20, q21, q22 & q23 | q19 | q19 | |
| Relationship to reference person* | q5 | q6 | q6 | q3 DF | q4 DF | q6 DF |
| Census year | ||||||
| Census question | 1981 | 1986 | 1991 | 1996 | 2001 | 2006 |
| Access to Telecommunications | q15 | q16 | q16 | q17 | ||
| Dwelling Type | q4 | q2 | q2 | q5 | q4 & q5 | |
| Heating Fuels Used | q8 | q6 | q6 | q15 | q15 | q16 |
| Mortgage Payments | q9 | q4 | q4 | q9 | q8 | q13 |
| Motor Vehicles | q17 | q8 | q7 | q10 | q17 | q18 |
| Number of Bedrooms | q13 | q3 | q11 | q13 | q14 | |
| Number of Heating Fuels | q8 | q6 | q6 | q15 | q15 | q16 |
| Sector of Landlord | q10 | q5 | q5 | q5 | q10 | q10 |
| Tenure of Household | q9 & q10 | q4 & q5 | q4 & q5 | q4, q7, q8 & q9 | q8, q9, q11 & q12 | q7, q8, q9, q11, q12 & q13 |
| Weekly Rent Paid by Household | q10 | q5 | q5 | q8 | q12 | q12 |
Although all reasonable steps have been taken to ensure that web addresses in this report are up-to-date and accurate, they are subject to change, and at the time of writing Statistics New Zealand was in the process of structural change. All Statistics New Zealand links should be available through the Statistics New Zealand website at www.stats.govt.nz.
Statistics New Zealand has a large variety of metadata surrounding the creation, definition, interpretation and comparability of census variables, especially for recent census years. Metadata is data about data and is used to gain an understanding about data, and to ascertain the most appropriate ways to use it (Statistics New Zealand 2004), but much of the Statistics New Zealand metadata is spread across many different documents, and contained in publications specific to the census year being covered by the metadata. This metadata is also presented in a variety of formats, with little longitudinal analysis of it. This report intends to make a contribution towards a longitudinal understanding of variables, using the metadata available from Statistics New Zealand.
Metadata for recent censuses is generally available electronically. For less recent censuses, publications can often be accessed through public or university libraries. Statistics New Zealand has its own library, and if certain publications cannot be obtained elsewhere, it is possible to request a copy of the required documentation from Statistics New Zealand – there may be a fee for this service. In order to progress this project and contribute towards ease of use for future researchers, we have compiled a list of some of the resources available surrounding the census, and where these resources can be accessed.
All New Zealand census forms from 1906 onwards (both dwelling and individual) can be found in the 2006 Statistics New Zealand publication, Definitions and Questionnaires. This is available in hard copy or on the Statistics New Zealand website at www.stats.govt.nz/census/2006-census-information-about-data/2006-definitions-questionnaires/default.htm. The census forms referred to in this report can also be found at this link - click on 'Forms' near the top of the page.
Census guide notes accompany the census forms that are delivered to every dwelling. They provide extra information for respondents on how to fill out the questionnaire. There are guide notes for both individual and dwelling forms. In 2001, the guide notes were called help notes.
| Year | Resource |
| 1981 | Refer to the back of hard copy publications from the 1981 Census – Volume 12 Population Perspectives 81: General Report (page 162 for the individual form guide notes, page 169 for the dwelling form guide notes). |
| 1986 | Refer to 1986 Census of Population and Dwellings: Questionnaire Contents and Submissions Report, Department of Statistics (1985), Wellington. |
| 1991 | Refer to the back of hard copy publications from the 1991 Census, for example, Range and Availability of Statistics (page 110 for the individual form guide notes, page 116 for the dwelling form guide notes) and National Summary (page 55 for the individual form guide notes, page 61 for the dwelling form guide notes). |
| 1996 | Individual form help notes Dwelling form help notes |
| 2001 | Individual and dwelling forms help notes |
| 2006 | The individual and dwelling forms help notes are at the end of the census questionnaires |
In the census variable definitions and classifications sections of this report, the text quoted for definitional purposes is sourced from the 2001 Census definitions. This is then compared to previous definitions in order to highlight similarities and differences. Table 2.10 indicates where to access Statistics New Zealand information on the definitions of variables.
| Year | Resource |
| 1981 | New Zealand Census of Population and Dwellings 1981 – Range and Availability of Statistics (see pages 9–15). |
| 1986 | New Zealand Census of Population and Dwellings 1986 – General Information (refer to section 2, pages 23–74. Please note that these definitions also contain retrospective information for the 1981 Census). |
| 1991 | Concepts, Definitions and Classifications (entire document). |
| 1996 | An Introduction to the Census (refer to section 12). |
| 2001 | Definitions and Questionnaires available in hard copy or via the Statistics New Zealand website |
| 2006 | Information by variable |
Prior to the census, discussion documents are circulated, and end users and interested parties are consulted about the contents of the census. An interim report (preliminary views on content) is then published. The 2001 and 2006 preliminary views on content publications are available from the Statistics New Zealand website. The preliminary reports form the basis for broad discussion about the content of the upcoming census. The exact content of this report varies from census to census, but it always covers criteria for determining census content, a brief overview of the main topics covered in the census, and the submissions made that relate to each of these. The 2001 report also provides a brief history of some variables. The 2006 preliminary views on content contain two appendices that are particularly useful to external researchers. These entail: a survey information table (outlining other surveys conducted by various organisations, including type, frequency and available related products or services), and a series of additional data source tables.
The final report on content outlines Statistics New Zealand’s final decisions on content for the next census, that is, which topics and variables will be included, along with the rationale behind these decisions. Sometimes the final report on content provides useful information as to how and why a variable may have changed between censuses.
Data dictionaries contain a not necessarily exhaustive list of variables generated from each census and the coded classification categories for each variable. The 2006 data dictionary is available from the Statistics New Zealand website.
The 1996 data dictionary is also available in electronic form. The 1981, 1986 and 1991 data dictionaries are available in hard copy, and information on certain variables is also available electronically. Any request for data dictionaries should be made to Statistics New Zealand, customer services.
A range of Statistics New Zealand products and services, including Table Builder, can be found on the Statistics New Zealand website.
Some of the resources that may be of use to researchers include:
Statistics New Zealand tabular and analytic reports
Statistics New Zealand produces several reports that can be generally described as either analytical or tabular. Tabular reports contain very little text and predominantly consist of tables. Prior to 2001, these were available in printed form only; for 2001 the tables are also accessible online. Analytical reports contain more description, discussion, graphical presentation and analysis of the data, and often incorporate information from other data sources. A series of analytical reports aimed at a wide general readership, called New Zealand Now, were produced following the 1991 and 1996 Censuses. The 1996 series is available in printed form and many of the component reports are available on the Statistics New Zealand website under the description ‘New Zealand Stories’.
There were no analytical reports produced directly from the 2001 Census, but a series of reference reports, also called topic-based reports, on various topics (for example, ethnic groups, housing) is available for download from the Statistics New Zealand website. These are predominantly tabular reports, but do also contain some pages of highlights. A similar set of reports is available for the 1996 Census.
Table Builder
Table Builder enables the user to access aggregated information in the form of tables and is available on the Statistics New Zealand website. It is a product for building tables not only for population census data, but also for income, injury, agriculture, business and import/export statistics. Tables are interactively built by the user from a selection of variables. For the population census, these include census year (1991, 1996 and 2001), geographic area (regional council, territorial authority and area unit) and a range of output census variables. The tables can be downloaded from the website in several different formats (e.g. Excel). Help notes on how to use Table Builder are also available.
Access to unit record data through the data lab facility
Access to anonymised unit record statistical data is currently managed through Statistics New Zealand’s data laboratory (data lab). There is a data lab in each of Statistics New Zealand’s Auckland, Wellington and Christchurch offices. Access to unit record data may be obtained by submitting a proposal outlining details of the proposed research. Applicants need to provide specific information on the researchers’ backgrounds, the dataset(s) and variables required, methods of analysis and intended outputs. The proposal is then considered by Statistics New Zealand and access is provided at the discretion of the Government Statistician. If the proposal is approved, costs are estimated and conditions of access are negotiated. All researchers are required to sign a declaration of secrecy as specified in the Statistics Act 1975.
2001 Census output information
Information pertaining to 2001 Census outputs.
The 2001 Census snapshots are particularly useful for a quick, broad overview of factual information on particular topics. These documents provide information about different topics from the 2001 Census, and overview indicator and variable context, allowing the user to explore associations, trends and patterns in different variables. These documents are available as separate downloadable PDF files for each topic.
Statistics New Zealand continues to provide new initiatives and services for interested external researchers and users of their data. The three recent schemes presented provide useful ways for researchers to access Statistics New Zealand data and expertise. The source of information for this section of the report is a communication with the expert data users group, which external researchers and interested parties may join.
To subscribe to this newsletter, send an email to listserv@stats.govt.nz (listserv@stats.govt.nz) with ‘subscribe expert user’ in the subject line.
The Official Statistics Research and Data Archive Centre (OSRDAC)
OSRDAC will provide a single access point for all Tier 1 unit record data and administrative data for use by government, university and other researchers. Preliminary work has begun on the design of this facility and the process for lodging and processing unit record data.
Note: Tier 1 statistics will be determined primarily by their purpose, not their producer. These statistics will have most of the following attributes: essential to central government decision making, high public interest, meet public expectations of impartiality and statistical quality, require long-term continuity of the data, provide international comparability or meet international statistical obligations.
Official Statistics Portal
Users will be able to access a full list of the statistics produced by government agencies through an official statistics portal, currently being designed.
Confidentialised Unit Record Files (CURFs)
CURFs are datasets that contain individual-level data arranged in a way that does not identify any individual’s identity (Statistics New Zealand). This enables external researchers outside Statistics New Zealand to access individual-level data for research purposes. The data provided differ from the unit-level data accessible in the data lab; some modifications will be made to the data and it is likely that there will be restrictions on the level to which data are available (Statistics New Zealand). The dataset provided is therefore ‘perturbed’ slightly from the real data gained from the census, in order to ensure confidentiality. However, unlike with current census datasets, researchers will be able to
analyse data at their own workplace, rather than at the Statistics New Zealand data lab. For authorisation to access CURFs, researchers must comply with the ethical and security obligations set out by Statistics New Zealand.
Other research tools that provide useful information about census variables are:
Statistical standards – These documents contain guidelines on how to collect and categorise information on a particular topic. They cover aspects such as questionnaire requirements, definitions and classifications. Statistical standards are designed for use in various data collections, including surveys and administrative collections. These standards are guidelines only and lack of data or other complexities associated with the census may mean they are not strictly followed in constructing census variables (or other datasets). The purpose of these standards is to facilitate consistency in the way variables are collected and classified across several surveys and across time. Such consistency enhances comparability, enriching the body of data available for analysis. Statistical standards are available online.
Summary profiles – Another rich source of variable information is the Information about the Census of Population and Dwellings for each census year. This information is particularly useful for the 1981 and 1986 Censuses, as documentation surrounding these earlier census years is scarce, and also generally difficult to access. These documents for earlier census years (1981 and 1986) contain information such as lists of output variables that are available, a description of what output variables entail, and in some instances, whether variables are derived and if so, how. They also contain references to the census questions that variables are constructed from, and in some instances, reprints of the questions and/or classification categories. Summary profiles are available online.
Census classifications for 1996 – This resource consists of a set of documents that provide classifications and standards used in the 1996 Census of population and dwellings. It features a mix of introduction, structure, definition and code descriptor sections. These documents can be assessed online.
Variable glossary definitions for the 2001 Census – These are a rich source of information about the main variables used in the 2001 Census. They contain a definition of each variable, a description of the question number and which questionnaire form the question was asked on, and the relevant subject population. Furthermore, they comment on non-response rates to census questions (in 1996 and 2001), the quality level of variables, their comparability with previous censuses, and things to be aware of when using the variables. These documents also note whether variables have been derived, and if so, from what. It should be noted that this list is not exhaustive for all variables and all years, but it does provide a good starting point for thinking about consistency and comparability. It must also be borne in mind that comparability of the 2001 variables is discussed with reference to the 1996 and 1991 variables only, not to variables constructed from any earlier censuses. These documents can be accessed online.