NZSSDS has been used in teaching at The University of Auckland, and we promote its use for such purposes in general. One postgraduate course in statistics has made extensive use of survey metadata and our teaching data set for the New Zealand Quality of Healthcare Study, while in sociology, postgraduates have been introduced to analysing social survey data in SPSS, using International Social Survey Programme surveys from NZSSDS. For more generic teaching/self-learning modules, we have had developed workbooks to support some of our more popular data holdings in introducing basic statistical analysis techniques in SPSS. These are available upon request, and accompany reduced teaching data sets that are freely downloadable from NZSSDS.
The aim of this workbook is to introduce users to survey research analysis using existing survey data. The New Zealand Social Science Data Service administers an online archive of past surveys in the social sciences in New Zealand. Its holdings include a number of surveys from the New Zealand Election Study, several in the area of health and primary care, and a number from the International Social Survey Programme (ISSP) series. These are run in 45 countries including New Zealand, and cover a range of social science topics, many of which are repeated every five years or so, enabling analysis of changes over time.
The example used in this workbook is the Family and Changing Gender Roles Survey from ISSP. This was last carried out in 2002, with the 1994 data set also available for comparison of any changes over that period. Findings can also be analysed by demographic variables such as age and gender.
The survey was carried out by the Department of Marketing, Massey University between August and November 2002. (Gendall, P. 2003: The Roles of Men and Women, International Social Survey Programme, Massey University, Palmerston North.) A nationwide mail survey was conducted of 2,075 people aged 18 and over, randomly selected from the New Zealand electoral roll. Following three reminders, the survey produced 1,025 valid responses after 364 of the initial sample were rejected as ineligible – for example, person no longer living at address. This was an effective response rate of 1,025 out of 1,711, or 60%. (Full details at http://webview.nzssds.org.nz/NZSSDSData/notes/NZSSDS00052%20Study%20Description.pdf. A sample of this size has a maximum error margin at the 95% confidence level of approximately plus or minus 3%, i.e. for 95 out of 100 samples; the result will be within +/- 3%. (For more information, see De Vaus, D. 2002: Surveys in Social Research, 5th ed., Allen & Unwin, p.232.)
While the full survey data set is available in the data archive, a smaller, more manageable subset of questions has been created for use with this workbook, and for teaching in general – refer to the instructions on page 5 for accessing this data set. For those with a particular interest in this topic, an earlier survey on the same topic was run in 1994 and is held in the data archive – this could be accessed and changes in attitudes to gender roles examined over time. This topic will be the focus of ISSP again in 2012, if you are thinking of future research in this area. Analysis in this workbook is done using the SPSS statistical analysis package. (Statistical Package for the Social Sciences, Version 17 for Windows. Some versions may be known as Predictive Analysis Software (PASW), and in future it will be IBM SPSS Statistics 18 (and beyond).) (Recommended supplementary text: Acton, C. & Miller, R., 2009, SPSS Statistics for Social Scientists, 2nd ed., Palgrave Macmillan.)
There are a number of similar programmes for survey data analysis. SPSS was chosen for the purposes of this workbook as it provides a good combination of being relatively user-friendly as well as being used in real world research environments. SPSS enables us:
Other options, such as SAS, are less commonly taught in the social sciences, but are more likely to be used in large research organisations. However, once the basics of survey data processing and analysis have been learned in SPSS, it will be easier to transfer those to another system. Some others, such as Minitab, are used for teaching purposes but are not found in real world research settings.
Once you have completed the exercises in this workbook you should similarly be able to analyse any of the data sets in the NZSSDS data archive that interest you. Survey topics on which data are currently available include: attitudes to the role of government, politics, social inequality, work orientations, religion, national identity, citizenship, the environment, leisure time and sports, and health.
You can access such data sets at http://webview.nzssds.org.nz/webview/. There you can navigate the tree interface to the left by clicking on the icons, and access survey metadata, i.e. background information, how the sample was selected, and the questionnaire(s) that the survey involved. The list of variables available can also be viewed, and frequency tables for each of them. The online interface also allows for further analysis, and for data sets to be downloaded.
Where teaching data sets have been created, these are freely available for download; for other data sets, you generally need to apply for further access.

The first step in doing survey research is to decide what it is you want to know. What questions do you want to ask of this data set? To decide, you need to become familiar with the scope of the information contained in the data set. There are two ways to do this.
Looking at the questionnaire for the 2002 Family and Changing Gender Roles Survey, you will see that there are five groups of questions, plus a final section of demographic variables such as date of birth, sex, marital and family status, and ethnic origin. (Appendix 2 includes the reduced version of the questionnaire for the teaching data set. Full questionnaires for all data sets are available in the data archive.)
The first group of questions, Q1a to Q1f are about women’s roles as parents, paid workers and housewives. The second group of questions, Q2a to Q2d are about men’s roles in similar areas.
The next group are about how much time women should be spending in paid work, depending on the age of their children.
Section four is another set of questions around attitudes to women with children doing paid work, and section five a group of questions about men’s roles as fathers.
In the real world, you would be analysing every question in the survey. For the purposes of the assignment associated with this workbook you will need to be more selective. For example, you might choose to focus on attitudes to men’s roles. Or whether New Zealanders think women are more fulfilled as mothers or as paid workers, or what impact we think women doing paid work has on children and family life.
The examples used in the workbook may be selected from any of these areas and are chosen to illustrate specific analytical points.
The second stage in developing research questions develops out of examining the findings for each question – we call these the univariate or single variable findings. So you may be guided initially by your interest in particular sections of the questionnaire, but once you see the findings on each question your focus may shift. For example, you may find an interesting or unexpected pattern of agreement or disagreement with some questions.
At this point you may also start raising further questions, such as “I wonder whether men and women think differently about this?” or “I wonder if attitudes to this are changing over time – do younger people think differently about this to older people?” or “How did this finding change between the 1994 survey and the 2002 survey?”
Data analysis is about describing and interpreting what your data are telling you. This involves reducing the data from unmanageable details to manageable summaries, which is done by:
Accessing the metadata online
Visit http://webview.nzssds.org.nz/webview
Click on the ⊕ icon next to RESEARCH METHODS TEACHING MATERIALS
Click on the ⊕ icon next to ISSP, New Zealand, 2002: Family and Changing Gender Roles III
Click on the word Metadata to read the full extent of background information
Scroll down to access questionnaires and study description files for further details.
Downloading the teaching data set:
After the 2nd step above you can click on the floppy disc icon on the toolbar, towards the top-right
Click the drop box to select the file format and click SPSS; then click Download
Log in to download the data set
The data set will download as a zip file. You should be able to open the data set from within that – the file nz.org.nzssds.ddi.00052-t_F1.sav.
Once inside SPSS, there are two windows available:


As an example, in variable view, click on q1b – this is the question: A pre-school child is likely to suffer if his or her mother works – if you click the “...” button in the values column of this row, you can see that the possible responses range from ‘1’ for strongly agree to ‘5’ for strongly disagree.
Now click on data view – q1b is in the 3rd column – each row has the numerically coded response for each respondent to that question, e.g. the first person answered “3” or neither agree nor disagree, the second person answered “2” = agree, the third person “5” = strongly disagree, and so on. You can confirm this by clicking on the View menu and selecting Value Labels.
But you want to know how many people agreed or disagreed with this statement. To find out, you need to produce a table of counts or univariate ‘frequency table’ in SPSS.
Producing frequency tables in SPSS
Click on the Analyze menu 
Select Descriptive Statistics
Click on Frequencies...
Choose the variable(s) you want summarised 
Click variables individually to highlight then click the arrow to add them to the Variable box
OR to select a range of variables, click on the first one, hold down the shift key, scroll down to the last one you want, click to highlight the whole range of variables, then click the arrow to move the whole lot into the working list in the variable box.
For example, click on working mom: preschool child suffers, hold down the shift key and scroll down to work is best for women’s independence.
Click OK
Note: You can resize this frequencies dialog and most other windows in SPSS by clicking and dragging from one of the corners of the window. This can be very useful for seeing full variable labels.
Output
You should now have an Output window, displaying the frequency tables you have created. You can look through the output using the right-hand scrollbar or arrows. An example of one of these tables is shown below.
To print
Click File, and then Print… – scroll through using print preview first to check you are going to print what you want. Following are some useful tips for printing multiple pages of output from SPSS.
Click a table heading or Title in the left pane, then click the Insert menu, and click Page Break to ensure that the heading and table print on same page.

If you are printing the output, click again in the white space below the title before you do so, or SPSS will interpret that you only want to print that one heading. To make sure of what you are printing, and how many pages it will be, click the File menu and then Print Preview. If there is output that is too wide, you may need to change the page orientation to landscape under File menu, Page Setup.
Analysis
Following is a frequency table for the responses to the question of whether people think a pre-school child is likely to suffer if his or her mother works.

Univariate table layout
1st column: the range of possible responses or values or categories for that question
2nd column: the frequency or number of people in each category
3rd column: the number of people in each category as a percentage of the total sample
4th column: the valid percent is the number of people in each category as a percentage of those who answered that question
5th column: cumulative percent, an ongoing addition of the valid percentages from each category.
The response options or possible values for the question are in the first column, ranging from strongly agree to strongly disagree, followed by an option for those who can’t choose, and a total for the number of people who answered this question.
This is followed by a “missing” category, for those who did not respond to this question, and a final total of the number people who took part in the survey, or the full sample.
The second column presents the number of people in each category, called the “frequency”, and the third column presents these as a percentage of the total number of people in the survey sample. This includes the two people who did not answer this question.
The next column is headed “valid percent” and presents the percentage responses based only on the total number who answered this question – that is, minus the two people who did not answer. In this case the two totals are very similar, and thus the percentages in each column are very similar.
In the second example you can see that many more people did not answer this question – n = 38, or 3.7% of the total sample. There is now more of a difference between the percentages for the total sample, and the valid percent based on those who answered the question, but it is still not making a substantial difference.
It is always important to look at the missing data figures and if it substantially affects the percentage outcomes it needs to be reported, and a decision needs to be made about which percentage to report. Are you interested in the knowing just the percentage who answered that question, or the breakdown for the whole sample, including the percentage who did not respond.
Sometimes there is a valid reason for a high missing figure – check the questionnaire; it may be that only those who answered the previous question a certain way were eligible to answer the next.
Generally the missing data is small and does not substantially affect the percentage findings, as in the two examples below, so it would not matter which column of percentages you use. But you must be clear in describing your findings whether you are reporting them for the total sample, or just those who answered each question. As a rule of thumb, work with the valid percent column (for more information see De Vaus 2002, p.212), noting where there is high non-response and any apparent explanation for it, such as being a contingency question, i.e. only those answering some previous question in a specified way are eligible. Otherwise it may be an indication that the survey population does not have clear views on this question.
Examples
The response options for most of the questions in this survey are 5-point scales from strongly agree to strongly disagree, plus a ‘can’t choose’ option; the first example shows this format.

Now you can start analysing this table. Questions to ask of the data:
First, look down the valid percent column to see how the responses are distributed across the range of response categories. Look for the highest figure, and the lowest figure. Look for patterns in the way the responses are spread. Are they skewed towards one end of the 5-point agreement scale, or evenly spread over all categories, or do they perhaps form a normal curve with most in the middle and few at either end.
The largest group here are in the ‘agree’ category (34.5%), but this is not much higher than the next largest group of 28.2% who disagree. There are then much smaller proportions in the strongly agree and strongly disagree categories.
Sometimes it is helpful to collapse categories to simplify into “for” and “against”. For example, adding the two agreement categories (43.9%) and the two disagreement categories (35.1%) shows that while there are more in agreement with the statement that a pre-school child is likely to suffer if his or her mother works, the difference is not great. Note that you need to refer to the questionnaire for the actual question wording when reporting findings as an abbreviated form is used in the SPSS printouts.
Collapsing categories is also useful when there are few responses in some categories.
Finally, 20% are neutral on this issue (and a further 1% can’t choose). To sum up, New Zealanders are fairly evenly spread in their attitudes on this issue; overall there is strong agreement that women’s paid work impacts on the wellbeing of their pre-school children – only just over a third disagreed.
The second example, which explores views on how much time mothers of pre-schoolers should spend in paid work, has a different range of responses, from working full-time outside the home, or working part-time, to staying at home full-time.
Looking at the distribution of responses in the table below shows there is little support for mothers of preschool children working full-time (1.8%) and a majority (56.8% of those who answered the question) think that mothers should stay at home and not go out to work at all. However, almost a third (31%) think it is okay for mothers to work part-time when their children are under school age. There are still 10% who do not have a view on this topic, and another 3.7% of the total sample did not answer this question, but that is not enough to substantially affect the findings.
| Frequency | Percent | Valid Percent | Cumulative Percent | ||
| Valid | Work full-time | 18 | 1.8 | 1.8 | 1.8 |
| Work part-time | 308 | 30.0 | 31.1 | 33.0 | |
| Stay at home | 562 | 54.7 | 56.8 | 89.8 | |
| Can't choose | 101 | 9.8 | 10.2 | 100.0 | |
| Total | 988 | 96.3 | 100.0 | ||
| Missing | System | 38 | 3.7 | ||
| Total | 1025 | 100.0 |
Here is an example for you to try:
| Frequency | Percent | Valid Percent | Cumulative Percent | |||
| Valid | Strongly agree | 84 | 8.2 | 8.3 | 8.3 | |
| Agree | 457 | 44.6 | 45.4 | 53.8 | ||
| Neither agree nor disagree | 329 | 32.1 | 32.7 | 86.5 | ||
| Disagree | 121 | 11.8 | 12.0 | 98.5 | ||
| Strongly disagree | 15 | 1.5 | 1.5 | 100.0 | ||
| Total | 1006 | 98.1 | 100.0 | |||
| Missing | Can't choose | 13 | 1.3 | |||
| NA, refused | 6 | 0.6 | ||||
| Total | 19 | 1.9 | ||||
| Total | 1025 | 100.0 |
The original data file has a number of demographic variables, including marital and family status, religion, a range of socioeconomic status variables and ethnicity. The two included in the teaching data set are age and gender. The sample is overwhelmingly European (82%, n = 830) with Māori + New Zealand European the only other ethnic group with sufficient numbers for any analysis (9%, n = 93).
There are two main purposes for demographic variables. Firstly, you can use them to check the representativeness or bias of your sample by comparing them with census data for the whole population. For example, the table below shows that women are slightly over-represented in our sample, and men under-represented. Age comparison is approximate as census data are in 5-year age groups, and in this range start at 20–24, while the sample was of people aged 18 or over. Taking this into account, the age representation in the sample is very good.
| Survey Sample (%) | New Zealand population (%) | |
| Men | 42.7 | 48.7 |
| Women | 57.3 | 51.2 |
| Age 18–34 | 32.2 | 29.2 |
| 35–49 | 30.6 | 31.8 |
| 50–64 | 21.4 | 21.8 |
| 65+ | 15.7 | 17.2 |
Note: the 2001 census was used for proportions of the New Zealand population as it was the closest to the year when the survey was carried out, 2002.
The second major function of demographic variables in survey data analysis is to provide a breakdown of findings by these different groups within society, to see if the survey findings, in this case attitudes of New Zealanders to the roles of men and women, differ by gender, age, etc. To do this we carry out crosstabulation – looking at two variables or questions at the same time, also called bivariate analysis.
SPSS – a computer package for analysing survey research data
Research questions – define what it is you want to find out from the survey data
Missing values – what is the proportion of non-response to each question – is it large enough to affect the response outcomes – are there any explanations for a high non-response – report a high non-response and the different outcomes for the total sample compared with just those who did respond to the question.
Univariate analysis – one question or variable – looking for patterns in the data – identifying the range of responses and the distribution of responses.
Collapsing categories – combining categories to clarify patterns in the data – for example, on a 5-point agreement scale combine strongly agree + agree to compare with strongly disagree + disagree. Collapsing categories is also useful when there are very few responses in some categories.
Graphing distribution patterns – often it is easier to see patterns in the distribution of data by presenting it visually using graphs.
Demographic variables – e.g. age, sex. We can investigate group differences in response patterns.
Sample comparison – another purpose for demographic variables is to identify survey sample bias.
As its name suggests, bivariate analysis involves looking at how two variables or questions relate to each other. This is done by tabulating them in a two way format known in SPSS as a crosstab.
You will be crosstabulating the attitude questions in sections 1 to 5 of the survey questionnaire with age and sex. But first we need to recode the age data into more manageable groups.
If you look at your univariate/frequency printout you will see that the values for age are in single years, ranging from 18 to 90, and take up a couple of pages.
Recoding variables involves two steps. Firstly, deciding on the values to go into each group, and then carrying out the process in SPSS. To decide on the groupings, look at the cumulative percent column at the right of the frequency printout for ‘age’. Three or four groups would be a manageable number, of roughly even numbers of people in each group. The second criterion for deciding on grouping ranges is some kind of meaningful social sense and/or roughly even number of years. So there is no right or wrong way to do it, but these principles provide guidelines.
For the examples in this workbook, recode age into four groups: 18–34, 35–49, 50–64 and 65+.
What do you want to know from the crosstabulated data?
How does the dependent variable vary in relation to the independent variable? In our analyses age and sex are independent variables – they are not affected by the attitude questions. The attitudes to gender roles are the dependent variables – their values may vary depending on the value of the independent variable, that is, on whether a person is male or female, or older or younger.
In which response categories are the greatest differences between the percentage of males and the percentage of females?
Note: it is helpful to compare the %s for male/female with the percentage in the total column – look for higher or lower values. For example, looking across the row for the ‘agree’ category in example 1, you can see that 36% of the total sample agree that preschool child suffers if mother works outside the home, but for men the figure is higher at 41.9% and for women it is lower at 31.6%.
So we can describe these findings as “men are more likely than women to agree that pre-school children suffer if a mother works outside the home”.
What other gender differences can you see in this example?
What happens if you collapse the two agreement categories together and compare with the two disagreement categories?
Examples

Questions to ask

Here is an example by age group. Apply the same approach as above.

Collapsing categories and graphing data
(See Appendix 3 for instructions on graphing data using Microsoft Excel.)
Did you find it quite hard to analyse this age crosstab in table format? Four groups across five categories is quite difficult isn’t it? There are two ways to make the analysis simpler and clearer.
One is to collapse your response categories from five into three by combining the two agreement categories into one, and do the same for the two disagreement categories.
Next try graphing the data, first for the full table, then for the collapsed category version.

It is more clearly apparent that agreement with this statement increases with age and disagreement generally decreases as people get older. So we can sum up by saying older people are more likely to agree that a preschool child is likely to suffer if his or her mother works. It is even clearer below where the categories have been collapsed.

We have now described the simple relationship between our two variables, attitudes to women’s roles and sex or age, that is, how attitude varies depending on the value of sex or age.
To take our analysis a step further we need to apply some statistical tests to tell us about the strength of the relationship, and whether it can be generalised from our survey sample to the general population.
We have seen in the examples above that agreement with the statements about attitudes to gender roles varies according to sex and age. So we can say there is a relationship or association between both sex and age and attitudes to the effect of a mother’s paid work on her preschool child. But how strong is that relationship? How well can we predict an attitude by knowing someone’s age or sex? What is the level of correlation between sex or age and attitudes?
We can use statistical tests called measures of association to give us an indication of the strength of the relationship between two variables. Associations are virtually never 100%, so we are measuring the degree or extent of the association between the variables.
Which test we use depends on the type of variable we are looking at.
Types of variables
In this survey data set we have two types of variables: nominal and ordinal.
Nominal variables are variables whose values are simply names or categories. For example, sex.
Ordinal variables, as the name suggests, have some kind of order from high to low – for example, age groups – but you cannot measure the actual distance between them as you can with individual years of age. The agreement scales we have been analysing in this data set can also be considered as ordinal variables as they move in a linear kind of direction from strongly agree through neutral to strongly disagree.
The table below sets out the statistical test to use with each type of variable. You can read about this and see a larger example of this table in de Vaus, 2002, Chapter 15, p.293.
Choosing what statistic to use
| Type of variable | Example | Statistic |
| Nominal | Sex, ethnicity | Phi & Cramer's V |
| Ordinal | Age group, attitude scale | Gamma |
| Nominal + Ordinal | Sex + attitude | Use the statistic for the lower level variable, i.e. nominal = Phi/Cramer's V |
Measures of association can have two components
For nominal variables there is only a value for the strength of the relationship, as there is no order to the values of these variables. But for ordinal variables like age group and attitude scale, there is a direction from high to low for age, and from agreement to disagreement or attitudes.
The measure of association test for nominal variables is Phi / Cramer’s V. Phi is recommended for simple two by two cell tables, Cramer’s V for tables with more rows and columns.
The test result has a value between 0 and 1. To interpret this, a value near 0 means there is a very weak relationship between the two variables, while a value close to 1 means there is a very strong association between the two variables. For example,
The measure of association for ordinal variables is gamma. The value of gamma ranges from -1 to 1. Gamma can have a negative value as well as a positive value. If the value is positive (0 to 1) the two variables are moving in the same direction. If the value is negative (0 to -1), as one variable increases, the other decreases. For example a value of -.346 means a moderate negative relationship between age and attitude, so as age increases, the attitude scale moves towards 1, and as age decreases, the attitude scale moves towards 5. This is somewhat confusing as 5 is strongly disagree and 1 is strongly agree.
Some examples
For example, our crosstabulation of the variable working mom: pre-school child suffers by age group has a gamma value of -.194. This indicates a relatively weak association between age and this attitude, but the negative sign tells us that as age increases, the attitude scale tends towards 1, which represents agreement.
| Value | Asymp. Std Error | Approx. T | Approx. Sig. | ||
| Ordinal by Ordinal | Gamma | -.194 | .034 | -5.725 | .000 |
| N of Valid Cases | 1012 |
Inferential statistics or tests of significance
It is the total population we are really interested in.
Inferential statistics are based on the underlying assumption that the sample is representative of the population from which it is drawn – in this case the population of New Zealand. To be representative the sample must be a random sample.
We know from information provided that this survey was based on a randomly selected sample. However, we also know from our sample/population comparison that due to non-sampling error such as voluntary participation of initially selected sample, it has become a little bit biased towards women. So we will need to take account of this in considering our findings – for topics where there is a gender difference, this will distort the findings on the total sample. For example, if women were more likely to agree with a statement, and we have more women in our sample, then our sample findings will be skewed towards agreement compared with the total population.
The logic underlying inferential statistics, also known as tests of significance, is that where there is a discrepancy between the observed and expected distribution of a sample on a variable or question, it is due to either sampling error or a true relationship between the variables.
For example, if there were no difference between men and women, we would expect the same proportion of each group to answer Yes to a question, so:
Expected Yes response: men = 50%; women = 50%
But if we observed
Actual Yes response: men = 80%; women = 35%
There is a difference, but is it a ‘true’ difference? Do men and women really think differently on this issue, or is the difference just due to sampling error.
This is where despite using random sampling techniques, we end up with a sample that has some bias and is not truly representative of the population it is supposed to represent. You can read more about this in de Vaus, 2002: Chapter 14, p.263–266.
Significance levels are expressions of the likelihood of the relationship between two variables being due to sampling error. For example, a significance level of p = .05 means that the likelihood of a relationship between two variables being due to sampling error is 5 out of 100.
If the chance of sampling error is less, for example 1 out of 100 (p = .01) or 1 out of 1000 (p = .001), then the significance level is higher. That means we can have more confidence that the relationship we have found between two variables on our sample data will hold true for the population from which our sample is drawn.
By convention, p < .05 is the minimum significance level usually accepted for claiming a relationship found in a survey sample holds true in the total population: p < .0001 is very strongly significant. If the significance level on your printout is .0000, this is even higher and stronger.
| p < .05 | 5 out of 100 | minimum significance level |
| p < .01 | 1 out of 100 | higher significance level |
| p < .001 | 1 out of 1000 | strongly significant |
Where statistical significance is greater, p > .05, results are not normally reported as they apply only to the sample and cannot be generalised to the population.
The Chi-Square Test
The Chi-Square test is a frequently used test of inferential significance in social science research. It is suitable for both nominal and ordinal variables. You can read about this test in de Vaus, Chapter 14, page 254. Here we are concerned with interpreting and reporting the test result from your computer printout.
When you use the Cramer’s V or gamma measures of association outlined above, they produce their own significance level in the right-hand column of the printout. This is interpreted and reported the same as for Chi-Square.
Interpreting and reporting results of significance tests
Example 1
The measure of association and significance test results for sex differences in attitudes to preschool children suffering if mother works outside the home.
| Value | Approx. Sig. | ||
| Nominal by Nominal | Phi | .118 | .007 |
| Cramer's V | .118 | .007 | |
| N of Valid Cases | 999 |
The Value column = .118 – this indicates a slight relationship between sex and attitude.
The significance level = .007, which is highly significant (p < .01), which means the relationship is generalisable to the population.
We could report this as:
There is a gender difference in attitudes to whether pre-school children suffer if the mother works outside the home (Cramer’s V = -0.118, p < .01). Men (53%) were more likely than women (41%) to agree with this statement.
OR
Men (53%) were more likely than women (41%) to agree that pre-school children suffer if the mother works outside the home (Cramer’s V = -0.118, p < .01).
Example 2
The measure of association and significance test results for age group differences in attitudes to preschool children suffering if mother works outside the home.
| Value | Asymp. Std Error | Approx. T | Approx. Sig. | ||
| Ordinal By Ordinal | Gamma | -.194 | .034 | -5.725 | .000 |
| N of Valid Cases | 1012 |
The Value column = -.194 – this indicates a slight negative relationship between age and attitude. As age increases, attitude scale tends towards 1 = agreement.
The significance level = .000, which is very significant (p < .001), which means the relationship is generalisable to the population.
We could report this as:
Older people were more likely to agree that pre-school children suffer if the mother works outside the home (gamma = -0.194, p < .001). For example, 60% of those aged 65+ agreed compared with 32% of those aged under 35.
Example 3
Men should do larger share of childcare, by sex.
| Value | Approx. Sig. | ||
| Nominal by Nominal | Phi | .077 | .215 |
| Cramer's V | .077 | .215 | |
| N of Valid Cases | 984 |
The result of the significance test for this relationship is .215, which is greater than .05. So the relationship between these two variables is not statistically significant. There is a likelihood that 215 times in 1,000 the result found in the sample is due to sampling error and is not a true relationship for the population from which the sample is drawn.
We could report this as “the relationship between attitudes to men doing a larger share of childcare and gender was not statistically significant at p < .05, so is not generalisable to the population”.
There are various accepted ways of reporting statistical significance of results, depending on the intended audience of the document. The above example is how it would be done in an academic journal article. For a non-academic report, for a government department for example, you might just have a statement in the introduction, methods section, or appendix where you state that “all reported results are statistically significant at the p < .05 level”.
To create crosstabs in SPSS
This time you want to create tables or crosstabs that enable you to look at one variable or question in relation to another, e.g. attitudes towards women with preschool children working (Q3b) in relation to gender (Q40).
Exercise 1
Click on the Analyze menu
Select Descriptive Statistics
Click Crosstabs…
Select row variables – all except ID number, version, age, sex and household work hours
Click variables individually to highlight them, and then click the top arrow to add them to the row box
OR to select a range of variables, click on the first one, hold down the shift key, scroll down to the last one you want (Fathers better at disciplining kids than mothers (Q38d)), click to highlight the whole range of variables and then click the top arrow to move the whole lot into the row box.

Select the column variable sex, and then click the second arrow to move it into the column box. 
There are various options available for crosstabs including to display additional percentage summaries and to output certain statistics about the variables.
First, click Cells…, and click to check the box to output column percentages. Then click Continue. 
Next, click Statistics… and click to check the box to output Phi and Cramer’s V, noting that sex is a nominal variable. Then click Continue.
Don’t worry about the other options at this point. Click OK and await your output.

Exercise 2
Repeat the process of exercise 1 for age group, choosing gamma as the test statistic.
To do this you will first need to recode the age variable into a new one, agegroup.
Recode instructions
Click on the Transform menu and then select Recode into Different Variables…

Scroll down the variable list to find Age, click to highlight it and click the arrow to move it into the input variable box.
Click in the box on the right under Name and Enter agegroup then click Change.

Click Old and New Values… to get started on the actual recoding. Here you can enter individual values from the old variable and tell SPSS how you want them to be grouped in the new one.
Remember that the sample was of people 18 years and over, and the first age group we want is 18–34. Click to highlight.
Click the button for Range, LOWEST through value and enter 34.
Click in the box beside Value on the right under New Value and enter 1.
Then click Add to make the change. This will give all 18–34 year olds a value of 1 for agegroup.

For the next age groups, click on Range and enter the value range of your next recode age group: 35 in the first box, 49 in the second box, then enter 2 in the new value box, and click on Add.
Repeat for 50–64 = new value 3.
For the last age group, click on Range thru Highest value and enter 65. Enter 4 in the new value box, and click on Add.
Then click Continue, then Change and then OK to finish up. You will see an output window with SPSS describing in code what you have done. Don’t worry about that; click the Window menu and choose the dataset window to return to your data file.
Click on Variable View and you will see the newly created variable agegroup in the left hand column.

You can set up labels for the values of your new variable agegroup by clicking in the Values column in that row and clicking the button that appears marked …
Similarly to in the recode interface, type 1 in the Value box and 18–34 in the Label box, click Add, repeat for the other age groups, then click OK.

You can now produce a frequency table for agegroup. Remember:
Click on the Analyze menu, choose Descriptive Statistics then click Frequencies…
Scroll to the bottom of the variable list, click agegroup, click the arrow, and click OK.
Now your output window will show the distribution of age groups in the data set. And you can now use agegroup to run crosstabs of the questions in the survey to see how the findings vary by age.
| Frequency | Percent | Valid Percent | Cumulative Percent | ||
| Valid | 18-34 | 194 | 18.9 | 19.2 | 19.2 |
| 35-49 | 324 | 31.6 | 32.0 | 51.2 | |
| 50-64 | 293 | 28.6 | 29.0 | 80.2 | |
| 65+ | 200 | 19.5 | 19.8 | 100.0 | |
| Total | 1011 | 98.6 | 100.0 | ||
| Missing | System | 14 | 1.4 | ||
| Total | 1025 | 100.0 |
If you are carrying on from Exercise 1, which used sex, simply shift sex from the column box back to the main variable list by clicking it and then clicking the arrow pointing back. Then click agegroup in the left list and click the arrow to move it into column box. Remember to click Statistics…, check the box for gamma and uncheck that for Phi and Cramer’s V.
Recoding variables – before producing crosstabulations for interval variables like age in single years, we need to reduce them to a small number of groups – we do this by recoding in SPSS.
Creating crosstabs in SPSS – Analyze menu, Descriptive Statistics, Crosstabs….
Comparing findings between groups – look across the response category rows to compare the % of men with the % of women and the total sample %. Look for where the greatest differences are found. You may need to collapse categories to identify patterns more clearly.
Choosing and interpreting statistical tests for strength and direction of relationship – we can apply statistical tests to tell us about the strength and in some cases direction of the relationship between two variables. The test used depends on the type of variable. We use Cramer’s V for nominal variables, e.g. sex, and gamma for ordinal variables, e.g. age group. Test results range from 0 to 1 – a result near zero indicates a weak relationship – a result near 1 indicates a strong relationship.
Interpreting and reporting statistical significance tests for generalising from sample findings to survey population – tests of significance or inferential statistics tell us whether our results can be generalised from our survey sample to the population from which our sample is drawn. These test results are expressed in terms of probability that the result is due to sampling error and not a true result: p < .05 is the minimum level required for generalisability.
Further exercises
If you are interested in this topic you could try crosstabulations using the education variable ‘highest formal qualification’ – treat it like age group for analytical procedures. If using the teaching subset you might want to recode the three university level qualifications into one group as there are relatively few in each group compared to the lower levels.
You could also look at the 1994 Family and Gender Roles data set on NZSSDS and compare how attitudes have changed over time, or access the full 2002 data set to look at other variables.
Interval variables are those whose values are actually measurable and have a real measurable distance, or interval, between values – that is, a quantifiable numerical amount. For example, age in individual years, or hours spent in household work.
There are two ways to approach analysis of interval variables.
Univariate analysis of interval variables
Click on the Analyze menu
Select Descriptive Statistics
Click on Frequencies…
Choose to analyse the ‘how many hours do you spend on household work Q9a’ variable
Click to highlight then click the arrow
Click on Statistics…
Click to check the boxes for the mean, median and mode under central tendency
Click to check the box for Std. deviation under dispersion, then click Continue
Click on Charts…
Click the button for Histograms, then click Continue

As you can see, looking at the frequency printout it is very hard to get an idea of the distribution of values on this variable. Looking at the histogram makes it much clearer that most people spend 20 hours or less on household work. However, from the histogram it also looks like there is an outlier value of several people appearing to do nearly 100 hours. Referring back to the frequency printout you can see that in fact the high number close to 100 is those who do none at all, which is 17 people or 1.8% which have been coded with the value 96. This is a problem with using existing data sets – you have to be very careful to find out how the data is coded and what each value means. Use all forms of data presentation to get the best picture.
So we need to recode these values so that 0 represents those doing no hours. Click the Transform menu and then Recode into different variables… and then follow the steps from the previous example. Change old value 96 to new value 0; all other values are okay, so you can just click under old values, ‘All other values’ and then under new values, ‘Copy old value(s)’.
Now create the frequency table and histogram for your recoded variable for household hours with none as 0 – see you have now got rid of the outlier.

Next, look at the summary statistics box. From this you can see that the average or mean time people spend on household work is 11 hours a week. However, the median number of hours – the number at which half the people do more and half do less – is 8. This indicates a degree of skewing of the data towards the lower end of the values range, as can be seen in the histogram.
The mode is the value or number of hours at which you find the largest group of people, which is 10.
So look at the histogram – you can see the highest bar or number of people is at around 10 – go to the frequency printout and look for the highest number and this confirms that it is 10, with 95 people stating this number of hours. Then there are a number of relatively equal secondary modes between 2 and 8 hours, and a couple of others at 14 and 20.
Another useful column on the frequency printout is the cumulative percent. From this you can also confirm the median (50%) of respondents, at 8 hours. You can also see that two-thirds or 67% do 12 hours or less, and three-quarters do less than 15 hours.
To sum up, you could say that most people do less than 12–15 hours of household work per week, with 11 hours being the average.
| Frequency | Percent | Valid Percent | Cumulative Percent | ||
| Valid | .00 | 17 | 1.7 | 2.4 | 2.4 |
| 1.00 | 26 | 2.5 | 3.7 | 6.1 | |
| 2.00 | 59 | 5.8 | 8.4 | 14.5 | |
| 3.00 | 39 | 3.8 | 5.5 | 20.1 | |
| 4.00 | 50 | 4.9 | 7.1 | 27.2 | |
| 5.00 | 50 | 4.9 | 7.1 | 34.3 | |
| 6.00 | 35 | 3.4 | 5.0 | 39.3 | |
| 7.00 | 33 | 3.2 | 4.7 | 44.0 | |
| 8.00 | 43 | 4.2 | 6.1 | 50.1 | |
| 9.00 | 6 | .6 | .9 | 50.9 | |
| 10.00 | 95 | 9.3 | 13.5 | 64.4 | |
| 12.00 | 21 | 2.0 | 3.0 | 67.4 | |
| 14.00 | 44 | 4.3 | 6.3 | 73.7 | |
| 15.00 | 31 | 3.0 | 4.4 | 78.1 | |
| 16.00 | 4 | .4 | .6 | 78.7 | |
| 17.00 | 2 | .2 | .3 | 78.9 | |
| 18.00 | 13 | 1.3 | 1.8 | 80.8 | |
| 20.00 | 46 | 4.5 | 6.5 | 87.3 | |
| 21.00 | 10 | 1.0 | 1.4 | 88.8 | |
| 22.00 | 1 | .1 | .1 | 88.9 | |
| 24.00 | 6 | .6 | .9 | 89.8 | |
| 25.00 | 14 | 1.4 | 2.0 | 91.7 | |
| 28.00 | 11 | 1.1 | 1.6 | 93.3 | |
| 30.00 | 18 | 1.8 | 2.6 | 95.9 | |
| 32.00 | 2 | .2 | .3 | 96.2 | |
| 35.00 | 8 | .8 | 1.1 | 97.3 | |
| 40.00 | 8 | .8 | 1.1 | 98.4 | |
| 42.00 | 4 | .4 | .6 | 99.0 | |
| 50.00 | 2 | .2 | .3 | 99.3 | |
| 60.00 | 2 | .2 | .3 | 99.6 | |
| 65.00 | 1 | .1 | .1 | 99.7 | |
| 86.00 | 1 | .1 | .1 | 99.9 | |
| 95.00 | 1 | .1 | .1 | 100.0 | |
| Total | 703 | 68.6 | 100.0 | ||
| Missing | System | 322 | 31.4 | ||
| Total | 1025 | 100.0 |
Note: the high level of missing data on this variable is because the question was only applicable to people living as a couple, so single respondents were excluded.
Bivariate analysis of interval variables
Now it would be interesting to know whether there is any difference between men and women in the number of hours spent on household work, and/or by age.
The type of analysis and tests depends on whether the other variable is also interval, or whether it is nominal or ordinal, and how many values there are – a few or many.
| Variable types | Examples | Statistics |
| Interval + interval | Age in individual years Hours spent on household work |
Regression Pearson's R |
| Interval + nominal | Hours spent on household work + sex | Eta/ANOVA/F-test |
| Interval + ordinal | Hours spent on household work + age group | Eta/Kendall's Tau/Spearman's Rho |
Analysis of an interval variable by a nominal variable involves comparison of means, and statistical testing for the difference between the means.
Example
This comparison of means of hours spent on household work per week shows that women spend more than twice as much time as men, on average: 14.8 hours compared with 7.1 hours. This result is statistically significant with the p-value from an ANOVA F test (highlighted below) of .000. This means the result can be generalised to the population from which sample was drawn. The Eta test result of 0.368 indicates a moderate level correlation between sex and hours spent on household work. Note that these are the tests suggested in the table above for interval & nominal variables.
ANOVA is an abbreviation for Analysis of Variance, a general statistical technique for looking for statistical significance in the difference between two independent groups of observations – males and females here are a good example of two such groups.
| R: Sex | Mean | N | Std. Deviation |
| Male | 7.1420 | 324 | 6.60272 |
| Female | 14.7876 | 372 | 11.66516 |
| Total | 11.2284 | 696 | 10.36657 |
| Sum of Squares | df | Mean Square | F | Sig. | |||
| hhld hrs recoded | Between Groups | Combined | 10122.984 | 1 | 10122.984 | 108.809 | .000 |
| none as 0 * R: Sex | Within Groups | 64565.692 | 694 | 93.034 | |||
| Total | 74668.677 | 695 |
| Eta | Eta Squared | |
| hhld hrs recoded none as 0 * R: Sex | .368 | .136 |
Steps for the above exercise
Click on the Analyze menu
Select Compare Means
Click on Means…
Scroll to bottom of the variable list, click the new variable RecQ9a0
Click the arrow to move it to the dependent variable box
Find and click the sex variable, and click the arrow to move it to the independent variable box
Click the Options… button, and click to tick the box to produce an Anova table and eta statistic
Click Continue and then OK.
You could also try hours on household work (RecQ9a0) with age group
Section key points
Interval variables – Numerically quantifiable variables with values that have real measureable distances between them, like individual age in years, and number of hours spent on household work.
Recoding – Interval variables can be recoded/grouped to create ordinal variables, e.g. age groups, for use in crosstabulations as explored in previous modules.
Descriptive statistics – With interval variables you can produce averages/means, medians, modes, and measures of spread such as standard deviation, to summarise distributions. These figures can then be compared for different nominal groups, e.g. gender, when looking for differences.
To be able to present your survey results in three forms:
It is easier to describe and analyse your data if you transfer it from the computer printout to a table or graph first. Visual representation of data makes it easier to see the patterns in the data – this is so for both the analyst and the reader – but there should also be text to describe the key points in a table or graph.
Tables
Tables follow a basic format:
Example
| Preschool % | School % | Left home % | |
| Full-time | 2 | 12 | 64 |
| Part-time | 31 | 71 | 19 |
| Stay at home | 57 | 6 | 2 |
| Can't choose | 10 | 10 | 15 |
| Total % | 100 | 99* | 100 |
| Total responses | 988 | 983 | 972 |
* Not 100% due to rounding percentages to whole numbers.
Text
Handy hint
A passage full of % this and % that can get monotonous. There are alternative ways of writing a lot of percentage figures, e.g. 31% = almost a third; 40% = two out of five; 49% = almost half; 52% = just over half.
Graphs
Graphs are a particularly effective form of presenting data. Where there are definite patterns and differences a graph displays them clearly. See Appendix 2 of this workbook for detailed instructions for creating graphs using Microsoft Excel.
Presenting graphs
Include a figure number (to refer to in the text) above or below – graphs and other images in a document are usually referred to as “figures”.
Axis labels should be included to make it clear to the reader what is being summarised.
All information required to understand, read and interpret the graph should be on the graph.
Example 1

Example 2
Sometimes results are clearer if you collapse categories. This is particularly so when there are lots of categories in both variables, e.g. 4 age-groups by attitudes on a 5-point scale (see examples in Module 2). And when the pattern in the smaller extreme categories is not substantially different to that in the main categories, as in this example. Collapsing categories accentuates differences in agreement versus disagreement.

Combining several variables in one graph
Sometimes it is useful to use a graph to compare simple univariate responses to a set of statements on a common topic, for example, what percentage of respondents agreed or strongly agreed with a set of statements related to men. You could go further and break the graph down by gender or some other demographic variable.

Text
This clearly shows that while there is wide support for men being more involved in their children’s lives, and quite strong support for them doing more childcare and household work, there is little agreement that they are better at disciplining children or are treated as second class citizens.
Table or graph?
How do you decide whether to present your data in a table or a graph. It is usually a good idea to have a mixture of table and graphs in your document, not all tables or all graphs.
When is a table better?
When is a graph better?
Example
| 18–34 %   | 35–49 % | 50–64 % | 65+ % | |
| Agree | 54 | 44 | 25 | 16 |
| Neutral | 21 | 15 | 13 | 12 |
| Disagree | 25 | 41 | 62 | 71 |
| Total % | 100 | 100 | 100 | 99* |
* Not 100% due to rounding percentages to whole numbers.
Graph
Now you can clearly see the linear downward gradient for agreement as age increases, and the upward gradient for disagreement. Neutrality also declines as age increases.

Text
Support for both parents getting parental leave declines with age, from 54% of those under 35 to 16% of those 65+ (Gamma = 0.364, p < .0 01), as can be seen in the graph.
OR
You might look at the graph and see that the biggest decline occurs around age 50, and collapse your age group data to present those findings as: Almost half (49%) of those under 50 agreed that both parents should get parental leave, compared with only 21% of those over 50.
Creating tables
Click Insert on the ribbon and then click Table in Microsoft Word 2007 or later.
Select the number of columns you want – you can keep adding rows by using the TAB key.
If you click inside your table you can then access further options under Design and Layout on the ribbon toolbar.
Section key points
To become familiar with the standard format of a research report.
Writing a research report: Common components
These are the key components of a research report. Many empirical social science theses also follow this pattern. Unlike an essay, a report is written in sections with headings and subheadings, which are often numbered for ease of reference. You can look at reports on the websites for various government departments, such as Statistics New Zealand, the Ministry of Health, or the Ministry of Social Development.
Links to specific reports
http://www.bigcities.govt.nz/Quality_of_Life_2008.pdf
http://www.nzfamilies.org.nz/sites/default/files/downloads/Give-and-Take.pdf
The headings given in bold face in the above list form the core body of any report, beginning with the Introduction, which outlines the purpose of the research and the structure of the report. This should include the aims of the study, clearly set out, or the research questions that the study set out to investigate.
This is usually followed by a review of the literature, or what is already known about the topic being studied. This enables you to situate your work within the existing body of knowledge on the topic. You can conclude this section by saying how your work will contribute or add to what is already known.
The next section sets out the methods used for your study. It should say what research method has been used and why, followed by a detailed description of how the data were collected. This is necessary so that others can critique your findings and conclusions, or replicate your study. For example, if describing survey research, you need to say how the sample was selected, what the response rate was, and how questionnaires were administered, e.g. face-to-face, by phone, by mail, or via the Internet. You should present details of the final sample and how it compares to the population from which it was drawn. Any limitations of the method used should be acknowledged. Ethical issues and how they were dealt with are also well included here.
Then the results or findings of the study can be reported. This section can be broken down into subtopics. The reporting of survey research findings should begin with a statement of the findings for the total sample, before moving into subgroup analyses, e.g. looking at differences related to age group or gender.
In order to describe your findings using text, you first need to present the data in either a table or a graph. This makes it easier to see the key patterns or points about which you are writing.

This is a simpler format for presenting percentage distributions in a table:
| 18–34 % | 35–49 % | 50–64 % | 65+ % | |
| Strongly agree | 6 | 9 | 11.5 | 13 |
| Agree | 26 | 33 | 39 | 47 |
| Neutral | 27 | 21 | 14 | 16 |
| Disagree | 31 | 30 | 29.5 | 20 |
| Strongly disagree | 10 | 7.5 | 5.6 | 4 |
And this is a graph of the data with categories collapsed. Which format is easiest to read/interpret?

Straight descriptions of findings in the data can be followed by analysis and interpretation. This involves considering how findings on one topic relate to findings on another, or why the findings may be as they are. Make it clear when you are moving from description into interpretation, e.g.
Findings
Older people are more likely than younger people to agree that preschool children suffer if their mother works.
OR
Believing that preschool children suffer if their mother works increases steadily with age, from 32% of 18–34 year olds to 60% of those aged 65 and over.
Interpretation
This is likely to be due to changing gender roles over time. When those aged 65+ were parents of young children it would have been unusual for mothers to be in paid work. Today’s mothers are more educated and it is more accepted for them to continue working after having children.
Often interpretation is left to the discussion section of the report. This saves repetition as changing gender roles over time can likely explain the gender differences in a number of variables.
The final section of a research report is the conclusions and/or discussion. In this section you use your findings to answer the research questions asked at the beginning in your aims. You might also discuss these findings in relation to the literature reviewed – are your findings consistent or different with previous findings on the topic area, and why might that be? You should also consider how the limitations of your methodology might have affected your findings. For example, what effect would using the electoral roll as the sampling frame have on the representation of those who took part in the survey, or using a postal survey rather than a phone survey?
This final section also usually includes any recommendations coming out of your research, such as for policy change, and suggestions for further research.
After the end of the main body of the report, include appendices and references. Appendices for survey research should include a copy of the questionnaire so readers can refer to the exact wording of questions and their response options. The reference list should give full references for any books or articles or reports referred to in the report.



Excel defines four main types of graphs: column, bar, pie and line graphs
To open Excel
Click START in bottom left of screen
Go to MICROSOFT at top of list and in the list on the right, click on Microsoft Office Excel
A blank EXCEL worksheet will appear.

Filling in the worksheet
Title: Click on cell A1 (top left box).
You know what cell you’re in because it will have a border around it.
Type the title of your worksheet – usually the same as you will use on your graph, although you will re-enter it on the graph.
What you type will appear in the formula bar above the worksheet.
First we’ll look at how to get data in and use it to produce a fully marked up graph as below.

Start with the single column frequency example on sheet 1.
So the TITLE will be “Both parents should get parental leave”
Press ENTER when you have finished.
The title should now appear on your worksheet.
Category labels
Move down to A3 and click.
Following the example worksheet above, TYPE in the name of your first category, i.e. strongly agree – press ENTER.
You should now be in cell A4, so type your next category, – agree – press ENTER.
And so on, down the list.
To widen Column A so that the whole label shows, move cursor to line between A and B in top bar – the cursor should now look like a black cross – CLICK AND DRAG to the right. You can also use the ‘auto-fit to contents’ feature and double-click with the cross cursor.
Now use your mouse to move the arrow up to B2, CLICK, and type in your value label – percent – press ENTER.
Now you should be in B3. Type in the percentage number corresponding to the category in A3 – for strongly agree type 11 – press ENTER, and so on, down the column.
Highlighting worksheet data to make a graph
Click on CELL A2 (blank) and, holding down on the mouse button, drag so that all the cells you have typed in, below the title, are highlighted. Lift your finger off the button to freeze the highlight on the selected cells.

Now you are ready to convert your worksheet into a GRAPH or CHART.
Creating an Excel graph (chart)
Click on Insert in the ribbon toolbar.
Click on Column in the Charts section, and click the very first layout option.

Click on the graph and note the new toolbars that appear – Design, Layout, Format. Click Layout and then you can add titles to your graph and its axes, for example as below. Then just click on each title and type to fill it in – you can also change font, size, etc. by going back to Home on the toolbar.
The graph legend on the right can be useful for more complex graphs, but for this case you can happily just click it and press delete – the graph should resize to fill its space.

If you want to display exact values for the bars on the graph, you can right-click on one of them and then click Data Labels. You can size and drag your graph by clicking on it so that the border comes up, and then positioning your cursor on the dots in the corners or sides, and dragging.
You can similarly resize and move different elements within the graph window: titles, the legend and the graph area itself. Finally for the original screenshot, I had adjusted the y-axis – right-click on for instance ‘20%’ and click Format Axis.... I changed the Maximum to be fixed at 0.4 and the Major Unit to be fixed at 0.1 – resulting in only 10% increments being displayed on the graph.

You can click on the graph to copy and paste to your report document in Word.
To Print the whole page with data and graph, click outside the graph to remove the blue border. If you have clicked on the graph only the graph will print.
Collapsing categories
You can make the three main categories of agree, neutral, disagree, clearer by combining strongly agree and agree, and then strongly disagree and disagree.
Set up a new worksheet doing this with your data, as in example 2, sheet 1 by moving further down the page, or by clicking on Sheet 2 at bottom of screen.
Try it as column graph and then as a pie graph.

Pie graph
If you have already made a column graph, click to highlight it, then click DESIGN under Chart Tools in the top toolbar, and then click Change Chart Type and choose a pie graph from the list.
OR
Go back and highlight the worksheet data again
Click Insert on the toolbar
This time click Pie instead of column
When you have created the graph, the graph toolbars appear again.
Graph by gender

To follow through such an example, decide whether you want to enter the data across all five points on the agreement scale, or collapsed into agree, neutral and disagree.
Enter data in a worksheet following previous instructions.
Note that you now have two columns of data, one for men and one for women, so this time you will want a legend to show which bars are for which sex. You can choose settings for this under Layout on the toolbar.
Graph by age
Note: Because there are now 4 categories on the X axis (4 age groups) it is best to collapse the attitude data into 3 categories (agree, neutral, disagree) or your graph will look too cluttered and be too hard to read.
You can try it with all five attitude points to see what I mean.
Bar graph
This represents several questions in one graph by just taking the data that represent agreement which each statement.
What Excel calls a BAR graph has the data lines going sideways. Most other pieces of software call Excel’s column graph a bar graph.
The benefit of the horizontal style here is when your category labels are long. Try a bar graph by filling in a new worksheet as in the example given.

This time choose to insert a BAR graph instead of a COLUMN.
If you want to get your bars in ascending or descending order, click outside the graph to remove its blue border, then highlight the two filled in columns in your spreadsheet, click on DATA in the toolbar, and then SORT.
Click on the arrow in the sort by box, and choose % agreeing. In the third box, Order, you can choose either smallest to largest, or largest to smallest – try both.

ANALYSING EXISTING SURVEY RESEARCH DATA SETS
Developed for COMPASS
by Alex Marks
2010
The aim of this workbook is to introduce users to survey research analysis using existing survey data. The New Zealand Social Science Data Service (NZSSDS. http://www.nzssds.org.nz) is administered from within the COMPASS Research Centre, in the Faculty of Arts at The University of Auckland. The centre consists of a team of researchers with experience and expertise in a range of disciplines and research methodologies.
NZSSDS is a multi-functional entity, providing in the first instance a space for holding data sets and metadata related to social sciences surveys in New Zealand, for the purposes of encouraging secondary analysis and specifically for aiding in the teaching of research methods in the social sciences. 'Enhanced publications' holdings are also being developed, adding value to journal articles and other publications around the surveys held.
NZSSDS also administers an online archive of past surveys in the social sciences in New Zealand. Its holdings include a number of surveys from the New Zealand Election Study, several in the area of health and primary care, and a number from the International Social Survey Programme (ISSP) series. These are run in 45 countries including New Zealand, and cover a range of social science topics, many of which are repeated every five years or so, enabling analysis of changes over time.
The example used in this workbook is the Social Inequality Survey from ISSP. This was last carried out in 2009, with the 1999 and 1992 surveys also available, allowing us to compare and indentify changes and trends that have occurred during this period. Findings can also be analysed by demographic variables such as age and gender.
The survey was carried out by the department of marketing, Massey University between July 2009 and November 2009. A nationwide mail survey of 2,250 people aged 18 and over, randomly selected from the New Zealand electoral role. The population was first stratified by age: under 35, 35 to 54, 55 and over; then equal samples of 750 were randomly selected from within each age group. Following three reminders, the survey produced 935 valid responses after 136 of the initial sample were rejected as ineligible. This was an effective responsive rate of 935 out of 2,114, or 44.2%.
The full survey data set is available for use in the data archive; here you can also find valuable metadata available for the 2009 ISSP social inequality survey.
Accessing the metadata online:
Visit http://webview.nzssds.org.nz/webview
Downloading the teaching data set:
Why SPSS?
There are a number of programmes for survey data analysis. SPSS was chosen for the purposes of this workbook as it provides a good combination of being relatively user-friendly as well as being used in real world research environments. SPSS enables us:
Other options, such as SAS, are less commonly taught in the social sciences, but are more likely to be used in large research organisations. However, once the basics of survey data processing and analysis have been learned in SPSS, it will be easier to transfer those to another system. Some others, such as Minitab, are used for teaching purposes but are not found in real world research settings.
Once you have completed the exercises in this workbook you should similarly be able to analyse any of the data sets in the NZSSDS data archive that interest you. Survey topics on which data are currently available include: attitudes to the role of government, politics, social inequality, work orientations, religion, national identity, citizenship, the environment, leisure time and sports, and health.
You can access such data sets at http://webview.nzssds.org.nz/webview/. There you can navigate the tree interface to the left by clicking on the icons, and access survey metadata, i.e. background information, how the sample was selected, and the questionnaire(s) that the survey involved. The list of variables available can also be viewed, and frequency tables for each of them. The online interface also allows for further analysis, and for data sets to be downloaded.
Where teaching data sets have been created, these are freely available for download; for other data sets, you generally need to apply for further access.
If you are new to SPSS, a detailed tutorial is available using another ISSP survey; the Family and Changing Gender Roles Survey. This SPSS workbook not only introduces you to the SPSS software package for analysing survey research data, but also includes detailed instructions on univariate and bivariate analysis, analysis of interval variables and instructions on how best to present your findings. This workbook is available from the NZSSDS website by following the link http://www.nzssds.org.nz/node/53.
Univariate analysis is simple form of statistical analysis which involves only a single variable. There are two main ways of analysing univariate data, a numeric method and a graphic method. The numeric method involves using descriptive statistics to summaries the main features of the data in table form, while the graphic method involves using various graphs and charts to visualize the main aspects of the variable.
Univariate analysis is used mainly for descriptive purposes, and most commonly involves frequency tables, graphs and descriptive statistics.
We can use univariate analysis to find out specific information relating to each variable. Information such as range, median, mean, skewness and standard deviation etc.
Descriptive statistics are used to provide a brief summary of the basic features and information on the chosen variable,
including summaries about the sample and the measures. Together with simple graphics analysis, they form the basis of
virtually every quantitative analysis of the data.
Click on the Analyze menu
Select Descriptives under the Descriptive statistics
menu
- this will open the Descriptives window. Select your variable of interest; this can be done in the list of variables to the left, by simply scrolling until you have found the intended variable. In this instance we will use
Q4a Actually earn: How much do you think a doctor in general practice earns. Then click the icon, this will add your selected variable to the Variable(s) panel.
Click on the Options button on the right. – This will display the various descriptive statistics that are available. Check the boxes of the various statistics you would like to view on your variable. Then click continue.
Click OK to finish. - The Output window will automatically open, and display your statistics.
Graphs provide a simple and easy way to visually represent the key information relating to a variable. This workbook will show you how to represent your data using two common graphs, a bar graph and a pie graph.
Click on the Graphs menu
Select Chart builder
Click on the gallery tab- the gallery includes many predefined charts organised by type.
Select the Bar option- icons representing the various types of bar graphs available will appear in the panel.
For a simple bar graph select the
icon and drag onto the “canvas” or large area above the gallery. You should now see a preview of a bar graph on the “canvas”
To add your analysis variable to the graph, select your variable of interest- this can be done in the variables list to the left, by simply scrolling until you have found the intended. In this instance we shall use R: Religious denomination [RELIG]
After selecting your variable, drag it into the blue box labelled X-Axis? In the “canvas”
The Y-Axis? Should default to count- To change your unit of measurement for the Y-Axis, open Element Properties, under Edit the properties of: select Bar1, and choose the drop down menu for statistics- here you are able to define the measure you want on the Y-Axis- for example if you wanted to view the proportion of the sample that fall into each religious denomination you would select the option Percentage do this now. Then click apply, percentage will now be shown in the Y-Axis of the “canvas”.
To finish click OK - The Output window will automatically open, and display your completed chart.
In the output window double click on your bar chart- this will open the chart editor.
Under the options menu click on the Title option- this will insert a text box at the top of your chart, where you can write the title of your graph. It will also open a properties menu; here you can change the aesthetics of your title, including size, font colour and orientation.
In the chart editor right click on your chart and open the Properties window- from here you can change the background and bar colour.
To change the background colour simply, click on the Fill and border tab. Choose your colour then click apply.
To change your bar colour with the properties window open click on the bars in the editor- this will highlight the bars, choose your colour then click apply
Click on the Graphs menu
Select Chart builder
Click on the gallery tab- the gallery includes many predefined charts organised by type.
Select the Pie/Polar option- icons representing the various types of bar graphs available will appear in the panel.
For a simple pie chart select the
icon and drag onto the “canvas” or large area above the gallery. You should now see a preview of a pie chart on the “canvas”.
To add your analysis variable to the graph, select your variable of interest- this can be done in the variables list to the left, by simply scrolling until you have found the intended. In this instance we shall use Q6d The government should spend less on benefits for the poor [V35]
After selecting your variable, drag it into the blue box labelled Slice by? In the “canvas”.
The Angle variable? Should default to count- To change your unit of measurement for the Angle variable, open Element Properties, under Edit the properties of: select Polar-Interval1, and choose the drop down menu for statistics- here you are able to define the measure used in determining pie sections for example if you wanted to view the proportion of the sample that identify with each political party you would select the option Percentage- do this now. Then click apply, percentage will now be shown in the Angle variable? on the “canvas”
To finish click OK - The Output window will automatically open, and display your completed chart.
In the output window double click on your pie chart- this will open the chart editor.
right click on your chart, then select properties window- from this menu you are able to change the look of your graph, including the colour, arrangement of variables, and various effects.
Right click on your chart and select show data labels- this automatically label your pie with the percentages associated with each slice and open a properties menu allowing you to edit the various aspects associated with your chart.
For example to change the position of your labels choose the custom option under Label Position and select the left image
This will move your labels to outside the interior of your pie. Click Apply
Your final chart to look like this.
Bivariate analysis is one of the simplest form of quantitative analysis. It involves the performing of an analysis between two variables for the purpose of determining if an empirical relationship exists between them. This analysis will reveal whether or not there exists an association, the strength of such an association or whether there is a difference between the variables.
There is a variety of different methods we can use to perform bivariate analysis. For the purposes of this workbook we will use fairly common and simple analysis techniques to guide you through using SPSS for bivariate analysis. The workbook will include instructions on how to perform bivariate analysis using graphical methods including 3-D bar graphs and boxplots, as well as numeric methods including one-way ANOVA and crosstabulations with Chi-square analysis.
Using graphs and charts is an easy way to visually represent the relationships and patterns that may be present between two variables. Here we will use three graphical methods for bivariate analysis. A 3-D Bar graph - used to show the relationship between two categorical variables; a boxplot - for the relationship between one categorical and one continuous variable; and a scatterplot - for analysing two continuous variables.
Click on the Graphs menu
Select Chart builder
Click on the gallery tab – the gallery includes many predefined charts organised by type.
Select the Bar option – icons representing the various types of bar graphs available will appear in the panel.
For a simple 3D bar graph select the icon and drag onto the “canvas” or large area above the gallery. You should now see a preview of a bar graph on the “canvas”
For a 3D bar graph you need to select two categorical variables of interest.
To select your variables of interest, simply scroll through the list of variables in the panel on the left until you have found the variables you wish to use.
Once you have selected your first variable, drag it into the blue box labelled X-Axis? In the “canvas” panel.
Do the same for your second variable; instead drag it into the blue box labelled Z-Axis? in the “canvas” panel.
The Y-Axis should default to count.
For this demonstration we will use the two variables:
"Q17 About how many books were there around your family’s house when you were 15 years old?" on the Z-Axis and
"Q20 Which social class would you say you belong to?" on the X-Axis
To change your unit of measurement for the Y-Axis, open Element Properties, under Edit the properties of: select Bar1, and choose the drop down menu for statistics. Here you can define the measure you want on the Y-Axis- for example if you wanted to view the proportion of the sample that fall into each religious denomination you would select the option Percentage - do this now. Then click apply, percentage will now be shown in the Y-Axis of the “canvas”.
To finish click OK – The Output window will automatically open, and display your completed chart.
In the output window double click on your graph - this will open the chart editor.
Right click on your chart, then select 3-D Rotation – this will open the 3-D rotation tool; to rotate simply click and drag your graph and the image will rotate around its central axis.
Click on the Graphs menu
Select Chart builder
Click on the gallery tab – the gallery includes many predefined charts organised by type.
Select the Boxplot option – icons representing the various types of bar graphs available will appear in the panel.
For a simple boxplot select the icon and drag onto the “canvas” or large area above the gallery. You should now see a preview of a bar graph on the “canvas”.
For a boxplot we need to select two variables, a categorical variable for the X-Axis and a continuous variable for the Y-Axis. To select your variables use the variables list on the left, by simply scrolling until you have found the intended.
In this instance we shall use:
R: Marital status [MARITAL] as the categorical variable. After selecting the variable, drag it into the blue box labelled X-Axis? In the “canvas”.
R: Age as the continuous variable. Drag it into the blue box labelled Y-Axis? In the “canvas”.
To finish, click OK. The Output window will automatically open, and display your completed chart.
Click on the Graphs menu
Select Chart builder
Click on the gallery tab- the gallery includes many predefined charts organised by type.
Select the Scatter/Dot option - icons representing the various types of bar graphs available will appear in the panel.
For a simple scatterplot, select the icon and drag onto the “canvas” or large area above the gallery. You should now see a preview of a scatterplot on the “canvas”.
To add your analysis variables to the plot, select your variable of interest - this can be done in the variables list to the left, by simply scrolling until you have found the intended.
In this instance we shall use:
"Q4b Actually earn: How much do you think a chairman of a large national corporation earns?" as our first variable. Drag it into the blue box labelled Y-Axis? in the “canvas”.
"Q5b Should earn: How much do you think a chairman of a large national corporation should earn?" as our second variable. Drag it into the blue box labelled X-Axis? in the “canvas”.
To finish, click OK. The Output window will automatically open, and display your completed chart.
Numeric methods for analysing data involve using statistical tests to evaluate a hypothesis. They provide a more in-depth and flexible way of analysing the data, in a way that is specific and meaningful to you.
A crosstabulation is a joint frequency distribution based on two categorical variables. It allows you to explore the relationship between two variables, and form simple conclusions.
A crosstabulation can be analysed using the Chi-square statistic. In this workbook we will use the Chi-square test for independence to investigate whether distributions of the categorical variables differ from each other, i.e. to see if there is a relationship between the two variables. The null hypothesis for the Chi-square test for independence is H0: The two variables are independent; the alternative hypothesis is H1: The two variables are not independent.
Analysis of Variance (ANOVA) is perhaps the most commonly used technique for drawing comparisons between the means of various groups of data. One-Way ANOVA consists of one measurement (continuous) variable and one nominal (categorical) variable. Multiple observations of the continuous variable are made for each factor level of the categorical variable. The test involves calculating the mean of the observations within each group of the categorical variable, then comparing the variance amongst these means to the average variance within each group. The null hypothesis for One-Way ANOVA is H0: all the underlying means are the same; the alternative hypothesis is H1: at least two underlying means are different.
Click on the Analyze menu
Under the Descriptive statistics option, select Crosstabs
For a crosstabulation, you need to select two variables of interest to compare. To select your variables of interest, simply scroll through the list of variables in the panel on the left until you have found the variables you wish to use.
Once you have selected your first variable, click the top icon to add this variable to the Row(s) panel.
Do the same for your second variable and again click on the second icon to add your second variable to the Column(s) panel.
In this instance we will use:
"Q1i Getting ahead: How important is a person’s race?" as the row variable and
"Q1j Getting ahead: How important is a person’s religion?" as the column variable.
Click the Cells button on the right - this will open the crosstabs: cells display, select column under the heading percentages then click Continue.
Click OK to finish. The Output window will automatically open, and display your completed Crosstabulation.
Click on the Analyze menu.
Under the Descriptive statistics option, select Crosstabs.
To select your variables of interest, simply scroll through the list of variables in the panel on the left until you have found the variables you wish to use.
Once you have selected your first variable, click the top icon to add this variable to the Row(s) panel.
Do the same for your second variable and again click on the second icon to add your second variable to the Column(s) panel.
In this instance we will use R: Sex [SEX] as the row variable and
R: Marital status [MARITAL] as the column variable.
Click on the Statistics icon, on the right. This will open the Crosstabs: Statistics window.
Select the Chi-square option at the top, and then click Continue..
Click the Cells icon, on the right. This will open the Crosstabs: Cell Display. Select the Expected option, and then click Continue.
Click OK to finish. The Output window will automatically open, and display your completed Chi-square.
The Chi-square test for independence tests the null hypothesis H0: The two variables are independent against the alternative hypothesis H1: The two variables are not independent.
Firstly we need to be sure that our two-way table of counts passes the assumptions required for the Chi-square test. These assumptions are:
We can see that under our output table SPSS has produced a summary for us, this summary indicates that only 1 of our cells (8.3%) have an expected count less than five, with the lowest count being 4.45, and so in this instance our table does not violate any of the assumptions.
From the Chi-square output above we see that the p-value is .001, this indicates that at the 5% level of significance that we have very strong evidence against the null hypothesis in favour of the alternative hypothesis, that the two variables are independent.
Click on the Analyze menu
Under the Compare Means option, select One-Way ANOVA.
To select your variables of interest, simply scroll through the list of variables in the panel on the left until you have found the variables you wish to use.
Once you have selected your Quantitative (dependent) variable, click the top icon to add this variable to the Dependent List. In this instance we will use
R: Age [AGE].
Do the same for your quantitative (factor) variable and click on the second icon to add your second variable to the Factor panel. In this instance we will use the variable
R: Marital status [MARITAL].
Selecting the relevant outputs:
Click on the Post Hoc icon, on the right. This will open the One-Way ANOVA: Post Hoc window.
Select the Tukey option under the Equal Variances Assumed heading, and then click Continue.
Click Options then click the Descriptives option under the Statistics Heading, then click Continue.
Click OK to finish. The Output window will automatically open, and display your completed Chi-square.
For the dependent variable (AGE), the descriptive output gives the sample size, mean, standard deviation, minimum, maximum, standard error, and confidence interval for each level of the independent variable. In our example, 584 respondents were married, and their mean age was 54, with a standard deviation of 15.391.
The ANOVA table is used to test the overall significance of the model, and test the null hypothesis H0: µ Married=µ Widowed=....=µ NA, refused against the alternative hypothesis H1: at least two means are different.
In this example the p-value =.000, which is significant at the 5% level of significance. We have strong evidence against the null hypothesis, or in favour of the alternative.
The Multiple comparisons output gives the results of the Post-Hoc tests (for the Tukey multiple comparisons). This output compares each level of the independent variable to all the other levels of the independent variable.
Consider the first row (Married), this tells us that the mean age for a married person is 18.441 years younger than a widowed person, the third column measures the significance of this comparison via a p-value measure, (any value less than 0.05 is considered significant at the 5% level of significance). In this instance the p-value is .000 indicating the difference in age between a married person and a widowed person is statistically significant. The final two columns give the 95% confidence interval, this confidence interval of -24.46 to -12.42 indicates that with 95% confidence we estimate that the mean age of a married person is between 12.5 and 25.5 years less than the mean age of a widowed person.
This workbook is an introductory level guide to analysing New Zealand Election Study (NZES) data sets held by the New Zealand Social Science Data Service (NZSSDS). Aimed at postgraduate students with little or no previous experience in survey data analysis, the workbook progresses through examples of basic analyses of the 2008 NZES, using the SPSS/PASW statistical software package.
The aim is to provide students with an introduction to:
Examples and exercises are provided in each section of the workbook to achieve these aims.
The New Zealand Election Study (NZES) runs a recurring national random sample survey. It has been carried out in each of the seven New Zealand government election years since 1990. The survey enables monitoring of voting behaviour, which over this period includes the significant shift in the electoral system from First Past the Post (FPP) to Mixed Member Proportional (MMP). The surveys also collect information on public attitudes to politics and a range of issues.
There is a core set of questions common to all of the NZES surveys, enabling monitoring of trends over time; and there are also some unique/different questions at each election to capture public perceptions of issues relevant at the time.
The 2008 election was held on November 8. The result was a change from a Labour-led to a National-led government. The sample size was 3,042, with an oversample os 643 from the seven Māori electorates. A sample of this size has a sampling error of around +/–1.75%. Part of the sample was a panel of respondents who also completed the 2002 and 2005 surveys (n=948). The response rate for the remainder of the sample was 40%.
All of the NZES surveys contain weighting variables which need to be applied in order to reflect properly the makeup of the population. Survey sampling strategies often intentionally include oversamples of specific populations of interest, e.g. Māori, to better represent their views. This means that you no longer have a random sample of the population, but a biased one. Weighting variables are then used to adjust the proportions on a number of (typically) demographic variables, to bring them into line with those from the census population data. Instructions on applying weighting variables are provided through out this workbook. For the 2008 NZES data set, the weighting variable ZZWT6 is the one you need to use.
Data sets from the surveys are available for download through NZSSDS. In addition, we have available a range of supporting documentation (metadata) detailing the survey methodology, research team, questionnaires, publications and variables.
The topic areas as covered and included in the 2008 survey were as follows:
A. Personal interest in politics and the campaign
B. The election campaign and media
C. Issues, problems, parties and leaders
D. Opinions on a range of policy issues
E. Party preferences and voting behaviour
F. Attitudes to government and the electoral system
G. Representation and participation (attitudes and behaviour) - this section also has variables ZOLDELG and ZOLDELM, which enable analysis by electorate
H. Demographic characteristics of survey respondents.
After section H, several variables are included for defining administrative areas such as territorial local authorities, regional councils, district health boards, and census meshblocks to allow customised analysis by fine level geographic area.
You can access the teaching data set to accompany this workbook from NZSSDS WebView. There you can navigate the tree interface to the left by clicking the icons, and access survey metadata, i.e. background information, how the sample was selected, and the questionnaire(s) that the survey involved. the list of variables available can also be viewed, and frequency tables for each of them. The online interface also allows for further analysis, and for data sets to be downloaded.
Where teaching data sets such and this one have been created, these are freely available for download; for other data sets, you generally need to apply for further access.
There is a variable to separate out Māori roll from general roll respondents (ZMAOREL). There are separate variables for each individual ethnic group, as multiple responses were allowed, i.e. someone could tick more than one ethnicity and be recorded once in each group. There is also a variable that attempts to summarise 'with which ethnicity do you identify most?' But if you want to include all the responses, you can use the multiple responses procedure in SPSS, which will be outlined later in Module 2.
While the full survey data set is available in the data archive, a smaller more manageable subset of questions has been created for teaching purposes.
Some of the earlier data sets, e.g. 2002 and 1999, collected data by both mail and phone, and some questions were only asked in the mail survey, e.g. QE9a on whether MMP is fairer than FPP. This left a large number of observations coded with a value of '817' in 2002 and '887' in 1999. in order to find out what these mean, you need to browse the appropriate survey metadata in Nesstar in the NZSSDS website. Documentation is not complete for 2002, but there is some information about phone and mail survey responses.
Click to view the metadata for the NZES 2008 teaching data set. You can read the full extent of background information, and if you scroll further down you will find links to questionnaires and study description files for further details.
Click to download the NZES 2008 teaching data set. Clearly for use with this workbook you will want to set the file format as SPSS. Then click Download.
Log in to download the data set. It will download as a zip file. You should be able to open the data set from within that - the file nz.org.nzssds.ddi.00015-t_F1.sav.
SPSS enables us:
SPSS was chosen for this workbook as it is relatively user-friendly and is used in real world research environments. Once the basic processes of survey data processing and analysis have been learned in SPSS, it is easier to transfer those to another system, such as SAS or Stata.
The first step in doing survey research is to decide what it is you want to know. What questions do you want to ask of this data set. To do that you need to become familiar with the scope of the informaition in the data set. There are two ways to do this.
Looking at the actual questionnaire for the 2008 New Zealand Election Study, you will see there are eight groups of questions (A–H) covering topics from the campaign to issues to attitudes to the role of government and the electoral system itself, as outlined in the background section above. It is a very large survey with over 400 questions or variables which produce hundreds of pages of output if a frequency procedure is run for all of them. It is recommended that if working with the full data set you produce frequency tabbles for variables in one section at a time. In particular the demographic section contains very detailed data, for example meshblock area codes and links to panel data from the 2005 and 2002 surveys. The data set also contains a number of open questions, particularly in sections C and G, which produce textual response data for each respondent which can be analysed quantitatively but also has been thematically coded for quantitative analysis.
Ideally every question in the survey would be analysed by those who carried out the survey. However, the purpose of making these data sets available for others to use is so they can come in with specific research questions that might pertain only to one or some topics. For example, someone researching in the area of electoral campaigns might just focus on sections A & B along with some demographics from section H and some voting behaviour from sectioin E. Another researcher might be looking for information on attitudes and hypothetical behavior regarding MMP (section F). Yet another might be interested in what issues have been identified as most important by voters (section C).
For the purposes of this workbook we have created a more manageable teaching data set with a smaller number of variables in order to illustrate specific analytical and data processing procedures. We have selected questions on the topic of attitudes to social welfare and the government's role in that, attitudes to MMP and the Maori seats, voting behaviour, and some key demographics - gender, age, occupation, education, income and ethnicity.
The first group of questions from section D are about attitudes to a range of social issues and the role of the government versus personal responsibility. The second group of questions, from Section E are about people's attitudes to MMP and voting behaviour in 2008 and 2005. There are also some questions from section F relating to MMP.
Just five questions from section G are included: G10 on the future of Maori seats, G4a&b on parents' party preferences, and electorates for general (zoldelg) and Maori rolls (zoldelm). The final group of variables from section H is a selection of demographic variables such as age, gender, educational qualification, income, occupation and ethnicity.
The second stage in developing research questions arises from examining the findings for each question - we call these the univariate or single variable findings. While initially you may be guided by your interest in particular sections of the questionnaire, but once you see the findings on each question your focus may shift. For example, you may find an interesting or unexpected pattern of agreement or disagreement with some questions.
At this point you may also start raising further questions, such as "I wonder how attitudes to this issue vary by party vote or socioeconomic status?" or "I wonder if attitudes to this are changing over time - do younger people think differently about this to older people?" or "How did this finding change between the 2008 survey and earlier surveys?" or "What is the relationship between parents' voting preference and voters' party vote?"
Data analysis is about describing and interpreting what your data are telling you. This involves reducing the large array of data to manageable summaries, and is done by:
Most of the variables in the NZES data sets, and in social sciences surveys in general, are ordinal in nature; that is, questions require categorical rather than numerical responses, but the categories have a sense of order from high to low. A clear example is age group - you can say that one is older than another, but you cannot measure the actual distance between them as you can with individual years of age. Opinion questions also have ordinal responses, usually a range from 'Strongly Agree' to 'Strongly Disagree'. We will be analysing several of these later on. There is also a number of open questions, particularly in section C on identifying issues of importance to voters, which require qualitative analysis.
Once inside SPSS, there are two windows available:
Data, which has two different views
Output, which appears when you instruct SPSS to carry out a process on the data, shows the results in a separate window, which you can save or export to another format.
As an example, in variable view, scroll down the label column to find D10a to D10k. These are from QD10 in the questionnaire, which asks respondents to rate their agreement or disagreement on a range of socioeconomic issues. For example, click on D10h: Welfare benefits make peole lazy and dependent - if you click the '...' button in the values column of this row, you can see that the possible responses range from '1' for strongly agree to '5' for strongly disagree. Note in the first column that the shortform name for this variable is zwlflaz.
Now click on data view and scroll along the columns until you find zwlflaz - each row has the numerically coded response for each respondent to that question, e.g. the first person answered '3' = neither agree nor disagree, the second person answered '2' = agree, the third person '5' = strongly disagree, and so on. You can confirm this by clicking the view menu and selecting Value labels.
To find out how many people agreed or disagreed with this statement, you need to produce a frequency table.
Click on the Analyze menu and select Descriptive Statistics
Click on Frequencies...
Choose the variable(s) you want to summarise
Click variables individually to highlight then click on the arrow to add them to the Variable box working list.
OR to select a range of variabels, click on the first one, hold down the shift key, scroll down to the last one you want, click to highlight the whole range of variables, then click the arrow to move lot into the working list in the variable box.
For example, clicking on D10a Remove treaty references, hold shift key and scroll down to D10h Welfare makes people lazy. Then click OK.
Note: You can resize this frequencies dialog and most other windows in SPSS by clicking and dragging from one of the corners of the window. This can be very useful for seeing full variable labels.
You will now have an Output displaying the frequency tables you have created. You can look through the output using the right-hand scrollbar or arrows. Examples of these tables are shown below. The output window may also issue a warning about issues weighting variable applied - essentially any records with negative or missing weights will be excluded from analyses, automatically; you do not need to worry about it.
Click on the File menu, and then click Print...
It is a good idea to first scroll through the output listings, and look at the Print Preview, to check that you are going to print what you want to print, and in a sensible layout.
If you click on a table heading or Title in the left pane of the Output window and then click on the Insert menu, you can put a page break before a given item of output to ensure that the heading and table print on the same page.
If you want to print all the output, click again in the white space below the title before you go to print, or SPSS will interpret that you only want to print that one heading. To make sure of what you are printing, and how many pages it will be, click the File Menu and then Print Preview. If there is output that is too wide, you may want to change the page orientation to landscape under File menu, Page Setup.
Data analysis is about describing and interpreting what the data is telling you. This often involves reducing a lot of detail into manageable summaries, by collapsing data into categories or values, adding them together, and converting to proportions.
Following is a frequency table of single value printout for the responses to the question of whether people think that welfare benefits make people lazy and dependent.
| Frequency | Percent | Valid Percent | Cumulative Percent | ||
| Valid | Strongly agree | 710 | 26.3 | 26.9 | 26.9 |
| Agree | 928 | 34.4 | 35.2 | 62.1 | |
| Neutral | 463 | 17.1 | 17.5 | 79.6 | |
| Disagree | 355 | 13.2 | 13.5 | 93.1 | |
| Strongly disagree | 127 | 4.7 | 4.8 | 97.9 | |
| DK | 55 | 2.0 | 2.1 | 100.0 | |
| Total | 2638 | 97.7 | 100.0 | ||
| Missing | System | 62 | 2.3 | ||
| Total | 2700 | 100.0 |
Univariate table layout
1st column: the range of possible responses or values or categories for that question
2nd column: the frequency or number of people in each category
3rd column: the number of people in each category as a percentage of the total sample
4th column: the valid percent is the number of people in each category as a percentage of those who answered that question
5th column: the cumulative percent, adding up the valid percentages for each category so far.
A similar question asked people why they thought there are people in New Zealand who live in need. Looking at the table below you can see there is now a bigger proportion of missing data, and consequently more of a difference between the percentages for the total sample, and the valid percent based on those who answered the question, but it is still not making a substantial difference. The overall pattern of distribution of responses remains similar.
| Frequency | Percent | Valid Percent | Cumulative Percent | ||
| Valid | Lazy, etc. | 962 | 35.6 | 38.2 | 38.2 |
| Unfair Society | 546 | 20.2 | 21.7 | 59.9 | |
| Neither / DK | 1010 | 37.4 | 40.1 | 100 | |
| Total | 2518 | 93.3 | 100.0 | ||
| Missing | System | 182 | 6.7 | ||
| Total | 2700 | 100.0 |
It is always important to look at the missing data figures and if it substantially affects the percentage outcomes it needs to be reported, and a decision needs to be made about which percentage to report. Are you interested in knowing just the percentage who answered a question, or the breakdown for the whole sample, including the percentage who did not respond.
Sometimes there is a valid reason for a high missing figure - check the questionnaire, as it may be that only those who answered the previous question a certain way were eligible to answer the next question.
Generally the missing data is small and does not substantially affect the percentage findings, as in the two examples below, so it would not matter which column of percentages you use. But you must be clear in describing your findings whether you are reporting them for the total sample, or just those who answered each question. As a rule of thumb, work with the valid percent column (see De Vaus, 2002, p.212, for more information) noting where there is a high non-response, and any apparent explanation for it, such as being a contingency question - only those answering previous questions a specified way are eligible. Otherwise it may be an indication that the survey population does not have clear views on the question.
The range of response options for many of the questions in the survey are 4- or 5-point scales, for example, from strongly agree to strongly disagree, plus a 'don't know' option, so the first example is based on this format.
| Frequency | Percent | Valid Percent | Cumulative Percent | ||
| Valid | Strongly agree | 710 | 26.3 | 26.9 | 26.9 |
| Agree | 928 | 34.4 | 35.2 | 62.1 | |
| Neutral | 463 | 17.1 | 17.5 | 79.6 | |
| Disagree | 355 | 13.2 | 13.5 | 93.1 | |
| Strongly disagree | 127 | 4.7 | 4.8 | 97.9 | |
| DK | 55 | 2.0 | 2.1 | 100.0 | |
| Total | 2638 | 97.7 | 100.0 | ||
| Missing | System | 62 | 2.3 | ||
| Total | 2700 | 100.0 |
Now you can start analysing this table.
Questions to ask of the Data
First look down the valid percent column to see how the responses are distributed across the range of response categories. Look for the highest figure, and the lowest figure. Look for patterns in the way responses are spread. Are they skewed towards one end of the 5-point agreement scale, or evenly spread across all categories, or do they perhaps form a normal curve with most in the middle and a few at either end.
The largest group here are in the 'agree' category (35.2%), and the next largest group are those who 'strongly agree' (26.9%). There are then smaller proportions in the neutral and disagree categories, and a very few who strongly disagree or don't know.
Sometimes to simplify the analysis it is helpful to collapse categories into "for" and "against". For example, adding the two agreement categories (62.1%) and the two disagreement categories (18.3%) shows that there are more than three times as many in the agreement with the statement that welfare benefits make people lazy and dependent.
Note that you need to refer to the questionnaire for the actual question wording when reporting findings as an abbreviated form is used in SPSS printouts. Collapsing categories is also useful when there are a few responses in some categories.
Finally, 17.5% are neutral on this issue (and a further 2.1% don't know). To sum up, over half of New Zealanders think that people on welfare are lazy and dependent, but the rest don't necessarily disagree: they are just as likely not to have an opinion on the issue.
The second example explores a similar issue, that is: why some people in New Zealand are living in need, but has a different range of responses. This time it is not a scale format, but two opposing options: that people are 'poor because of laziness and lack of willpower' which is contrasted with people are 'poor because of an unfair society', and a third 'neither of these/don't know' position.
| Frequency | Percent | Valid Percent | Cumulative Percent | ||
| Valid | Lazy, etc. | 962 | 35.6 | 38.2 | 38.2 |
| Unfair Society | 546 | 20.2 | 21.7 | 59.9 | |
| Neither / DK | 1010 | 37.4 | 40.1 | 100 | |
| Total | 2518 | 93.3 | 100.0 | ||
| Missing | System | 182 | 6.7 | ||
| Total | 2700 | 100.0 |
Looking at the distribution of responses in the table above shows the largest group of respondents (40.1%) did not hold either of the two opposing positions on laziness versus unfairness. But almost as many believed need was due to laziness. There was least support for the ideas that need results from an unfair society - just over half as much for each of the other positions.
It can be easier to see patterns of response distribution if you graph the data.
Here is an example for you to try.
| Frequency | Percent | Valid Percent | Cumulative Percent | ||
| Valid | Strongly agree | 867 | 32.1 | 32.7 | 32.7 |
| Agree | 1106 | 41.0 | 41.7 | 74.5 | |
| Neutral | 386 | 14.3 | 14.6 | 89.0 | |
| Disagree | 188 | 7.0 | 7.1 | 96.1 | |
| Strongly disagree | 63 | 2.3 | 2.4 | 98.5 | |
| DK | 40 | 1.5 | 1.5 | 100.0 | |
| Total | 2650 | 98.1 | 100.0 | ||
| Missing | System | 50 | 1.9 | ||
| Total | 2700 | 100.0 |
Questions to ask
SPSS – a computer package for analysing survey research data.
Research questions – define what it is you want to find from the survey data.
Missing values – what is the proportion of non-response to each question - is it large enough to affect the response outcomes - are there any explanations for a high non-response? Report a high non-response and the different outcomes for the total sample compared with just those who did respond to the question.
Univariate analysis – one question or variable - looking for patterns in the data - identifying the range of responses and the distribution of responses.
Collapsing categories – combining categories to clarify patterns in the data - for example, on a 5-point agreement scale combine strongly agree + agree to compare with strongly disagree + disagree. Collapsing categories is also useful when there are very few responses in some categories.
Graphing distribution patterns – often it is easier to see patterns in the distribution of data by presenting it visually using graphs.
As its name suggests, bivariate analysis involves looking at how two varibles or questions relate to each other. This can be done by putting them in a two way table format known is SPSS as a crosstab.
There are a number of demographic variables in Section H of this survey which would provide interesting breakdowns of who holds particular views. We will consider four here which illustrate different processes needed to prepare the data for this type of analysis: gender, age, income/ education/ occupation to represent socioeconomic status and party vote of respondents and their parents.
Gender (H1) is usually relatively easy to use as it has only two main categories: 1=Male and 2=Female. However, 2% (n=56) in this survey are described as '9'. There is no label for what the code '9' represents; such codes are often used for 'don't know', but this is unlikely for gender. It is possible that it represents those individuals who don't identify as male or female, but we dont know for sure.
For better analysis of differences between men and women, you can restrict analysis to these two groups quite easily using 'select cases' in the Data menu. Fortunately, this has already been done in the teaching data set, with a 'best gender code' variable produced after matching with gender on the electoral roll.
The other demographic variables require some preperation known as 'recoding' to get them into fewer, more manageable categories for analysis in relation to other variables. Or, again, you could use 'Select Cases' - the steps for this are described later on in analysing the variables for parents' party preferences.
Age is known as an interval variable. Interval variables are those whose values are actually measureable and have a real measureable distance, or interval, between values- that is, a quantifiable numerical amount. For example, age in individual years, or hours spent in household work. They can be used in this form for regression analysis, but for crosstabulation we will probably want to group them into fewer large categories. Instructions on how to do this are given in the next section on describing relationships between variables.
Ethnicity is collected as a 'multiple response variable'. This means that people can choose more than one category, for example, Māori and European. Each category is processed as a seperate variable, so when you run your frequencies you will have five variables, h21a1 to H21a5, NZ European, NZ Māori, Pacifica, Asian and other
If you want to do bivariate analysis by ethnicity to see how attitudes and behaviour vary by ethnic affiliation, you need to combine these into one variable. To do this in SPSS you use the Multiple Response Procedure in the Analyse menu.2 An outline and examples are provided at the end of the section on describing the relationship between two variables, after age group analysis.
Party vote (QE3), income and education are similar types of ordinal variables with several categories, many of which have few responses. I will use party vote to illustrate how to prepare these for crosstabulation.
What do you want to know from the crosstabulated data?
|
How does the dependant variable vary in relation to the independant variable? In our analyses age and sex, are independant variabels - they are not affected by the attitude questions. The attitudes to issues are the dependant variables - their values may vary depending on the value of the independant variable, that is, on whether a person is male or female, or older or younger.
Differences between the view of men annd women on the questions in the teaching database for NZES 2008 are not great, but will provide the following example as it is easier to get the basic idea of analysins bivariate tables with only two categories in the independant variable.
F10: How easy or hard do you think it is for people like you to understand MMP?
| Male % | Female % | Total % | |
| Easy 2 3 4 Hard Don't know |
20.1 20.3 23.8 17.3 11.6 6.8 |
12.5 16.2 24.3 18.9 18.2 9.9 |
16.2 18.2 24.0 18.1 15.1 8.4 |
| Total | 1273 | 1376 | 2649 |
Look for differences between males and females across categories or rows. For example, looking at the first row in the table above, you can see that 20.1% of males think it is easy to understand MMP compared with only 12.5% of females. So men are more likely than women to think that MMP is easy to understand.
Conversely, looking across the 'hard' row, you can see that women (18.2%) are more likely than men (11.6%) to think that MMP is hard to understand. About equal proportions of men and women are neutral on this issue (value=3).
Another helpful way of comparing the two groups is to look at the 'total' colums, which is the average across the whole sample, and see which group is above and which is below. So 16.2% of the whole sample think MMP is easy to understand - at 12.5% women are below this and at 20.1% men are above. Wheras 24% of the total sample are neutral on this issue (value=3), and at 24.3 and 23.8 both men and women are equal to this.
Another way of summarising key findings is to collapse categories on the dependant variable - for example, combine 'easy' & '2' as easy, leave '3' as neutral, and combine '4' & 'hard' as hard.
| Male % | Female % | Total % | |
| Easy Neutral
Hard |
40.4 23.8 28.9 6.8 |
28.7 24.3
37.1 |
34.4 24.0 33.2 8.4 |
| Total | 1273 | 1376 | 2649 |
Then graph the data to see key patterns more clearly
Looking at the two presentations of data in graph form, two things become apparent. First, while it is clear that it is easier to get a sense of the differences between men and women when the data is collapsed, looking at the full data presentation in the first graph the key feature of difference is that the greatest differences between men and women occur in the two extreme categories, 'easy' and 'hard' at either end of the scale - there is less difference between the male bar and the female bar at values '2', '3' and '4'. So sometimes we lose important information when the data is collapsed.
The party vote variable is a bit more complex - you could argue that our attitudes influence the way we vote. However, my research question in this case is whether attitudes to welfare vary according to or depending on the party vote. So in this case we are making attitude the dependant variable, and party vote the independant variable.
there are also fifteen categories or parties in Question E3 'party vote in 2008'; presentation in the table below has been restricted to the six main parties.
What are the greatest differences in the percentages between the parties for each value or row of males and the percentage of females.
Note: it is helpful to compare the percentages for each party with those in the total column - look for higher or lower values. For example, looking across the row for the 'strongly agree' category in example 1, you can see that 28.3% of the total sample agree that welfare makes people lazy and dependant but for National and Act the figure is higher at 37.2% and 33.3% respectively, and for the Greens it is lowest at 11.2%.
So we can describe these findings as "people who voted National or Act are most likely to believe that welfare makes people lazy and dependant and those who vote Greens are least likely to believe this".
What other party differences can you use see in this example?
With so many categories in the independant variable, parties, as well as the 5-point agreement scale it is hard to get a good feel for the overall findings. So it is useful if you collapse the two agreement categories together and compare them with the two disagreement categories.
But in this case, with most people in the two agreement categories, do not ignore the strength of that agreement in your analysis; that is, the comparative proportions in each party saying 'strongly agree'.
Attitudes to statement: welfare makes people lazy and dependant, by party vote 2008
| Labour % | National % | Green % | NZ First % | Act % | Māori party % | Total % | |
| Agreement Neutral / DK Disagreement |
52.3 21.7 26.1 |
75.8 16.2 8.0 |
33.6 27.3 39.2 |
63.6 16.7 19.8 |
85.2 11.1 3.7 |
24.5 20.9 29.7 |
63.3 19.0 17.8 |
| Total n | 831 | 1023 | 143 | 96 | 81 | 91 | 2268 |
Both the table and the top graph make it clear that Act and National are more likely than the national average to agree that welfare benefits make people lazy and dependant; Labour, the Māori party and the Greens are less likely; and NZ First is about the same. But the bottom graph shows that, perhaps suprisingly, National voters are more likely than Act voters to 'strongly agree'. And that Māori Party voters, although relatively low on overall agreement, are quite likely to 'strongly agree' if they do agree. Wheras Greens voters are least likely both to agree and to strongly agree.
Question G10 asks people their opinion on the future of the Māori seats, but how do opinions vary by party vote?
You might want to go further and consider these findings in relation to the makeup of the current government, and parliament, and what this might mean for policy making on this issue.
| Labour% | National% | Green% | NZ First% | Act% | Māori% | Total% | |
| Get rid of Māori seats Keep current 7 seats Have more Māori seats Don't know |
26.6 44.6 17.1 11.7 |
52.3 33.8 3.3 10.7 |
22.9 52.8 9.7 14.6 |
38.4 35.4 13.1 13.1 |
71.4 22.1 0 6.5 |
1.1 36.4 61.4 1.1 |
39.1 38.7 11.3 10.9 |
| Total n | 802 | 1004 | 144 | 99 | 6.5 | 88 | 2214 |
We have seen in the examples above the agreement with the statements about attitudes to gender roles vary according to sex and age. So we can say there is a relationship or association between both sex and age and attitudes to the effect of a mother's paid work on her preschool child. But how strong is that relationship? How well can we predict an attitude by knowing someone's age or sex? What is the level of correlation between sex or age and attitudes?
We can use statistical tests called measures of association to give us an indication of the strength of the relationship between between two variables. Associations are virtually never 100%, so we are measuring the degree or extent of the association between the variabels.
The test we use depends on the type of variable we are looking at.
In this survey data set we have two types of categorical variables: nominal and ordinal.
Nominal variables are those whose values are simpply names or categories. For example, sex. Ordinal variables, as the name suggests, have some kind of order from high to low - for example, age groups - but you cannot measure the actual distance between them as you can with individual years of age. The agreement scales we have been analysing in this data set can also be considered as ordinal variables as they move in a linear kind of direction from strongly agree through neutral to strongly disagree. The table below sets out the statistical test to use with each type of variable. You can read about this and see a larger example of this table in de Vaus, 2002, Chapter 15, p.293.
| Type of variable | Example | Statistics |
| Nominal | Sex Ethnicity | Phi Cramer's V |
| Ordinal | age group Attitude scales | Gamma |
| Nominal + Ordinal | Sex + Attitudes | Use the statistic for the lower level variable, i.e. nominal=Phi/Cramer's V |
Most attitude scales qualify as being ordinal as they run from strongly agree at one end to strongly disagree at the other end. When they have a 'don't know' category at the end, they are no longer strictly ordinal. Before running a crosstab with a gamma statistic, such a variable should be recoded to omit the 'don't know' category or combine it with the central 'neutral' category to keep the varable ordinal in nature. Alternatively, treat the attitude scale as nominal and use the Cramer's V statistic.
Measures of Association can have two components: strength and direction.
For nominal variables there is only a value for the strength of the relationship, as there is no order to the value of these variables.
But for ordinal variables like age group and attitude scale, there is a direction from high to low for age, and from agreement to disagreement or attitudes.
The measure of association test for nominal variables is Phi/Cramer's V. Phi is recommended for simple two by two cell tables, Cramers V for tables with more rows and columns. This test is also used where one variable is ordinal and the other is nominal. The test result has a value between 1 and 0. To interpret this, a value near 0 means there is a very weak relationship between the two variables, while a value close to 1 means there is a very strong association between the two variables. For example:
0.1 is weak
0.5 is moderate
0.8 is strong
For example, our cross tabulatiion of variable 'future of Māori seats' by age group has a Cramer's V value of .103. This indicates a relatively weak association between age and this attitude. But when we look at that variable by party vote we get a stronger relationship Cramer's V =.264. Thus we can say that party vote is a stronger predictor of attitude to the future of Māori seats than age, but neither is very strong.
Both de Vaus (2002:258) and Acton and Miller (2009:149) point out that social survey data rarely results in the kind of strong association between two variables as occurs in the physical sciences as there are usually many variables affecting outcomes. However, it is still useful to compare relative strength of different variables within a data set.
G10: Future of Māori Seats * agegroup
Symmetric measures
| Value | Approx. Sig. | |
| Nominal by Nominal Phi Cramer's V N of Valid Cases |
.179 .103 2576 |
.000 .000 |
G10: Future of Māori Seats * main parties voted for
Symmetric measures
| Value | Approx. Sig. | |
| Nominal by Nominal Phi Cramer's V N of Valid Cases |
.457 .264 2214 |
.000 .000 |
The measure of association for ordinal variables is gamma. The value of gamma ranges from -1 to 1. Gamma can have a negative value as well as a positive value. If the value is negative (0 to -1), as one variable increases the other decreases. For example, a value of -.346 means a moderately negative relationship between age and attitude, so as age increases, the attitude scale moves towards one, and as age decreases, the attitude scale moves towards 5. This is somewhat confusing as 5 is strongly disagree and 1 is strongly agree.
Most of the attitude scales in the NZES 2008 data set have a 'don't know' category at the end, so they are no longer strictly ordinal. So to apply the Gamma statistic to the crosstab they have been recoded to omit the 'don't know' category or combie it with the central 'neutral' category to keep the variable ordinal in nature. (Instructions on recoding are provided later in Exercise 2).
D7 taxes versus spending on health and education, by age group, with 'don't know' omitted.
The result is a low and positive gamma = 0.120. This means there is a weak relationship between attitudes to taxes and spending on health and education that as age increases, there is more support for increasing taxes
The result is similar where 'don't know' category is combined with 'neutral=4', gamma=.111. When the 'don't know' category is retained and the Cramer's V statistic is applies, the result is 0.133. This also shows that the 'don't know' option is more common for those aged 65+ than for younger ones.
In fact when you collapse categories into more tax, less tax and the neutral or mid-point, and graph it, it is clear why the relationship between age and attitudes to taxes versus health and education spending is not strong at gamma=.120: within most age groups views are fairly evenly spread, except for the 65+ group. Those aged 65 and over are much more likely to favour increasing taxes and spending more on health and education than those under 65.
Gamma= -0.149 without 'don't know' - a weak, negative relationship.
In this example, as age increases, support for government responsibility for the unemployed moves to the lower end of the scale of values, i.e. 1= definitely should or 2= should and away from values at higher end of scale, 3= shouldn't, 4= definitely shouldn't. [This relationship remains if 'don't know' is recorded as the central neutral value - even at a weaker gamma= -.135]
We could report this as: older people were more likely to think that the government should be responsible for a decent standard of living for the unemployed (gamma= -.149). Over half of those aged 50-64 (56.5%) and nearly two-thirds of those aged 65+ (63.9%) thought they should compared with 48% of those under 50.
Graphing the data shows a similar pattern to example 1, with the main difference in views occuring in the 65+ age group.
Moving into interpretation and explanation of findings, you might now reconsider your views on the findings in example 1:
Possible explanations might be that they have a greater sense of social justice across a range of areas- you could check this out by working through the findings by age for all the variables concerning attitudes to area of government responsibility.
Or it could be that the older generation lived through the great depression of the 1930s and have a keener sense of the impact of widespread unemployment caused by factors outside the control of the individual. You could explore this by looking at questions such as D12 on whether people live in need because of individual or societal factors.
[In fact this shows that those aged 65+ are more likely to believe living in need is due in individual laziness, so this refutes that theory!].
You can see how findings on one question lead to further research questions, some of which you can explore further by going to other questions in the data set. Others may suggest the need for further research. This is the research process which moves in a cyclical way from theory or research questions to empirical research evidence which in turn leads on to further theory or research questions. You can read more about the links between theory and research in de Vaues, 2002 Chapter 2, especially figure 2.2 p.16.
Inferential statistics are based on the underlying assumption provided that the survey sample is representative of the population from which it is drawn - in this case, the population of New Zealand. To be representative the sample nust be a random sample.
We know from information provided that this survey was based on a randomly selected sample. However, we also know from our sample/population comparison that due to non-sampling error such as voluntary participation of initially selected sample, it has become a little bit biased towards women. So we will need to take account of this in considering our findings - for topics where there is a gender difference, this will distort the findings on the total sample. For example, if women were more likely to agree with a statement, and we have more women in our sample, then our sample findings will be skewed towards agreement compared with the total population.
The logic underlying inferential statistics, also known as tests of significance, is that where there is a discrepancy between the observed and expected distribution of a sample on a variable or question, it is due to either sampling error or an actual true relationship between the variables. For example, if there were no difference between men and women, we would expect the same proportion of each group to answer Yes to a question"
Expected Yes response: men= 50% women= 50%
But if we observed
Actual Yes response: men= 80% women= 35%.
we would see this as a difference from the expected, but is that difference a true difference? Do men and women really think differently on this issue, or is the difference just due to sampling error. This is where, despite using random sampling techniques, we end up with a sample that has some bias and is not truly representative of the population. You can read more about this in de Vaus, 2002, Chapter 14, p263-266.
Significance levels are expressions of the likelihood of the relationship between two variables being due to sampling error. For example, a significance level of p=0.05 means that the likelihood of a relationship between two variables being due to sampling error is 5 in 100.
If the chance of sampling error is less, for example 1 in 100 (p=0.01) or 1 in 1000 (p= 0.001), then the significance level is higher. That means we can have more confidence that the relationship we have found between two variables on our sample data will hold true for the population from which our sample is drawn.
By convention, p<0.05 is the minimum significance level accepted for claiming that a relationship observed in a survey sample holds true in the population: p<.0001 is very strongly significant. If the significance printout is .0000, this is even higher and stronger.
p<.05 5 out of 100 minimum significance level
p<.01 1 out of 100 higher significance level
p<.001 1 out of 1000 strongly significant
Where statistical significance is greater, p>.05, results are not normally reported as they apply only to the sample and cannot be generalised to the population.
Note: statistical significance or chance of results being due to sampling error are affected by the size of the sample. Sampling error is more likely in small samples than large samples - this means you are more likely to find examples of non-significant (p>.05) results if you are analysing survey data with a small sample size. If you have a large sample size, as you do with the NZES surveys (n=3000+), nearly all results are statistically significant (p<.05); that is, they are unlikely to be due to sampling error.
A frequently used test of inferential significance in social science research. It is suitable for both nominal and ordinal variables. You can read about this test in de Vaus, chapter 14, page 254. Here we are concerned with interpreting and reporting the test result from your computer printout
When you use Cramer's V or gamma measures of asssociation outlined above, they produce their own significance level in the right hand column of the printout. This is interpreted and reported the same as for the Chi-square.
Interpreting and reporting significance tests: As noted above, because the NZES 2008 survey has such a large sample (n=3042) it was difficult to find an example of a non-signifcant result (p>.05), but there is considerable variability in the strength of significance.
D10h: Welfare Makes People Lazy *Best Gender Code
Symmetric measures
| Value | Approx. Sig. | |
Cramer's V N of Valid Cases |
.066 .066 2638 |
.043 .043 |
The Value column =.066 - this indicates very little relationship between sex and attitude - that gender is not a good predictor of whether people think welfare makes people lazy.
The significance level =.043 - which is only just at the level (p<.05), means the relationship is generalisable to the population, but only just.
F10: Easy or Hard to Understand MMP *Best Gender Code
Symmetric measures
| Value | Approx. Sig. | |
| Nominal by Nominal Phi Cramer's V N of Valid Cases |
.147 .147 2649 |
.000 .000 |
The value=.147 - this indicates a stronger relationship between sex and attitude - that gender is a better predictor of whether people think MMP is easy or hard to understand than it is of whether they think welfare makes people lazy.
The significance level =.000 - this is highly significant at the p<0.001 level, which means the relationship is strongly generalisable to the population.
We could report this as:
There is a gender difference in attittudes to whether people think MMP is easy or hard to understand (Cramer's V =0.147, p<.001). Men (40.4%) were more likely than women (28.7%) to think it was easy.
OR
Men (40.4%) were more likely than women (28.7%) to agree that MMP is easy to understand (Cramer's V = 0.147, p<.001).
As I could not find an example of significance level greater than .05 in the NZES08 survey, here is an example from an ISSP survey with a smaller sample (n=984) so you can see what it would look like on a printout.
Men should do larger share of childcare, by sex.
| Value | Approx. Sig. | |
| Nominal by Nominal Phi Cramer's V N of Valid Cases |
.077 .077 984 |
.215 .215 |
The result of the significance test for this relationship is .215, which is greater than .05. So the relationship between these two variables is not statistically significant. There is a likelihood that 215 times in 1000 the result found in the sample is due to sampling error and is not a true relationship for the population from which the sample is drawn.
We could report this as "the relationship between attitudes to men doing a larger share of childcare and gender was not statistically significant at p<.05, so is not generalisable to the population".
There are various accepted ways of reporting statistical significance of results, depending on the intended audience of the document. The above example is how it would be done in an academic journal article. For a non-academic report, for a government department for example, you might just have a statement in the introduction, methods section, or appendix where you state that "all reported results are statistically significant at the p<.05 level".
This time you want to create tables or crosstabs that enable you to look at one variable or question in relation to another, e.g. attitudes towards welfare beneficiaries (QD10h) in relation to gender (Best gender code).
Click on the Analyse menu, select Descriptive Statistics and then click on Crosstabs...
Select row variables - these are your dependant variables, such as attitudes or behaviours, e.g. D10h welfare makes people lazy
Click variables individually to highlight them, then click the arrow to add them to the row box.
OR to select a range of variables, click on the first one (e.g. D14a), hold down the shift key, scroll down to the last one you want (D14f), click to highlight the whole range of variables and then click the top arrow to move the whole lot into the row box
Select the column variable to be sex - 'Best Gender code', and click the second row to move it into the column box.
There are various options available for crosstabs including to display additional percentage summaries and to output certain statistics about the variables. First, click cells..., and click to check the box to output column percentages.
Then click continue.
Next, click statistics... and click to check the box to output Phi and Cramer's V, noting that sex is a nominal variable.
Don't worry about the other options at this point. Click OK and await your output.
You should now have an Output window, displaying the crosstab and statistics you requested. You can scroll through the output using the arrows at the right of the window.
Repeat the process of exercise 1 for age group, choosing gamma as a statistic. First you will need to recode single year of age data into age groups, as described below.
You may also need to recode the dependant vairble, attitudes to welfare making people lazy, in order to apply the gamma statistic for ordinal variables. Instructions are provided following the age recoding instructions.
Age data was collected by asking for year of birth - you can select this (nzborn - best year born) and run a frequency which will give values from 1902 to 1990. This has been converted to a variable for age in single years (zage - age of respondent), which if you can run a frequency on this will produce a list of ages ranging from 18 to 106, and take up a couple of pages. To produce crosstabs of age by our variables of interest, such as attitudes to welfare of MMP, we need to recode the age data into more manageable groups.
Recoding variables involves two steps: deciding on values to go into each group, and carrying out the process in SPSS. To decide on the groupings, look at the cumulative percent column at the right of the frequency printout for 'age'. Three or four groups of roughly even numbers would be a manageable number, The second criterion for deciding on grouping ranges is some kind of meaningful social sensibility and/or roughly even numbers of years. There is no right or wrong way to do it, but these principles provide guidelines. Here we will recode age into four groups:18-34, 35-49. 50-64 and 64+.
Click on the Transform menu and then select Recode in to Different Variables....
Scroll down the variable list to find Age of respondent (ZAGE), click to highlight it and click the arrow to move it into the input variable box.
Click in the box on the right under name and enter agegroup then click Change.
Click Old and New Values... to get started on the actual recoding
Here you can enter individual values from the old variable and tell SPSS how you want them to be grouped in the new one. Remember that the sample was of people 18 years and over, and the first age group we want is 18-34. Click to highlight
Click on Old and New Values
Click the button for Range, LOWEST through value and enter 34
Click in the box beside Value on the right under New Value and enter 1
Then click Add to make the change
This will give all 18-34 year olds a value of 1 for agegroup.
For the next age groups, click on Range and enter the value range of your next recode age group: 35 in the first box, 49 in the second box, then enter 2 in the new value box, and click on Add; Repeat for 50-64 = new value 3
For the last age group, click on Range through Highest value box, and enter 65, enter 4 in the new value box, and click on Add
Then click Continue, then Change and then OK to finish up
You will see an output window in SPSS describing in code what you have done. Don't worry about that; click the Window menu and choose the dataset window to return to your data file.
Click on Variable View. You will see the newly created variable, agegroup, in the first column.
You can set up labels for the values of your new variable agegroup by clicking in the Values column in that column in that row and clicking the ... button.
Similarly to in the recode interface, type 1 in the Value box and 18-34 in the Label box, click Add, repeat for the other agre groups, then click OK.
You can now produce a frequency table for agegroup. Remember: Click on the Analyze menu, choose Descriptive Statistics then Click frequencies...
Scroll to the bottom of the variable list, click agegroup, click the arrow, and click OK.
Your output window will show the distribution of age groups in the data set. You can now use agegroup to run the crosstabs of the questions in the survey to see how the findings vary by age.
Agegroup
| Frequency | Percent | Valid Percent | Cumulative Percent | |
| Valid 18-34 35-49 50-64 65+ Total |
678 809 701 513 2700 |
25.1 29.9 26.0 19.0 100.0 |
25.1 29.9 26.0 19.0 100.0 |
25.1 55.0 91.0 100.0 |
If you are carrying on from Exercise 1, which used sex, simplify shift sex from the column box back to the main variable list by clicking the arrow pointing back. Then click agegroup in the left list and click the arrow to move it into the column box. Remember to click Statistics..., Check the box for gamma but also leave that for Phi and Cramer's V as some of the dependant variables will be nominal and not suited to gamma. You need both variables to be ordinal to use gamma - like age group and an agreement scale, but not age group and party vote - that needs Cramer's V.
You may also need to recode the dependant variable, for example, attitudes to welfare making people lazy, in order to apply the gamma statistic for ordinal variables - either remove the 'don't know=9' category or add it to the neutral category.
To remove '9 = don't know' use 'Select cases' from the Data menu, and select only where variable D10 is not '9' - remember to turn this procedure off once you have done your crosstab before proceeding to your next analysis though. Or you can use the Transform - recode as different variable procedure and make '9' into 'system missing'.
You can scroll through the output looking for interesting example of differences by age. For example, there is little obvious differences by age for attitudes to welfare making people lazy, unless you combine 'strongly agree and agree' when those aged 65+ are a bit higher at 67.4% but this is more apparent in the next table on why people live in need, with 50.4% of those aged 65+ attributing it to laziness compared with less than 40% for the other age groups
But when you get to the table on MMP, some interesting findings emerge. Young people (18-34) are less likely to prefer FPP (14.7% compared to 19.7% of the total sample) and more likely not to have an opinion on keeping MMP or returning to FPP (45.8% compared with 25% of the total sample). Why do you think this might be? FPP was last used in the 1993 election. So most of those aged 18-34 in 2008 would never have voted under FFP.
Question G10 asks people their opinion on the future of the Māori seats, but how do opinions vary by age?
You will need to use the crosstabs procedure in SPSS as described above, and scoll down the variable list to find G10 Future of Māori seats and copy it to the new row box, Age group recode variable should still be in the column box if uou are carrying on from above otherwise copy it across. Remember to tick column percent in cells procedure and Phi/Cramer's V in the statistics procedure.
You might want to go further and consider these findings in relation to the future, and what this might means for policy making on the issue.
You can recode occupation in a similar way to age. Occupation is coded for each respondent using a 4-digit code. The first digit signifies a simpler level of classification (see the Statistics NZ website at http://search.stats.govt.nz/search?w=occupation%20classification for information on their occupational classification), for example, those beginning with '1' are managerial positions; those beginning with '2' are professional; through to '9' are elementary, unskilled. So occupations can simply be recoded so that all codes beginning with '1' become '1', all codes beginning with '2' become '2', etc through to '9'. These can be further combined and collapsed if necessary.
E3: party vote in 2008 asked which party the respondent voted for in the 20008 election. Results are for 14 parties, and 119 people (4%) did not make a party vote.
A simple frequency table shows that eight of the parties had less than 1% of the vote, and six of them had fewer than 10 people in the survey vote for them. United Future and Progressive also had fewer than 30 votes among the survey respondents; this is the cutoff that has been used for the results presented here, so those two are excluded.
The party vote can be recorded to comprise just the six parties with sufficient numbers for crosstabulation with other variables/ questions in the survey to see how attitudes and behaviour vary acording to party voted for. The parties above the cutoff as mentioned are: National, Labour, NZ First, Act, Green and Māori party.
To do the recode you need the code of value number for each of these parties. Go to the variable view in SPSS, scroll down the Label column to E3: Prty vote in 2008, click in the Values column there and then click on ... - here you will see the values that have been coded for the variable. This shows Labour = 1, National = 2 and so on to Māori Party = 7.
Click on the Transform menu and then select Recode into Different Variables....
Scroll down the variable list to E3: Party Vote in 2008 (zvt08p), click to highlight it, and click the arrow to move it into the input variable box, where it will show up as zvt08p, its variable name (see first column in SPSS variable view screen - E3: Party vote in 2008 is in the variable label column).
Click in the box on the right under Name and enter partyrecode and then click Change. You could also fill in the label box as "party vote 2008 recoded'.
Click Old and New Values... to get started on the actual recoding. Here you can enter individual values from the old variable and tell SPSS how you want them to be grouped in the new one.
Remember that we want to retain the first five parties, Labour to Act, Plus Māori Party (value/code7).
Click the button for range, and enter 1 in the first box, through 5 in the second box.
Click in the box beside Copy old value(s) on the right under New Value.
Then click Add to make the change.
This will give parties Labour, National, Green, NZ First and Act the values 1-5.
For the Māori party, click on Value on the left under Old Value, and enter the value 7, then Click in the box beside Copy old value(s) on the right under New Value, and click on Add.
For the remaining parties that you no longer want to include in your crosstabs, scroll down to the bottom left column and click on All other values, then click on System- missing on the right in the new value box, and click on Add.
Then click Continue, then change and then OK to finish up.
You will see an output window with SPSS describing in code what you have done. Don't worry about that; click the Window menu and choose the dataset window to return to your datafile. Click on Variable View and you will see the newly created variable, partyrecode in the left hand column.
You can set up labels for the values of your new variable partyrecode by clicking in the Values column in that row and clicking the ... button.
Type 1 in the Value box and Labour in the Label box, click Add, repeat for the other party names, then click OK.
You can now produce a frequency table for partyrecode. Remember: Click on the Analyze menu, choose Desriptive Statistics then click Frequencies...
Scroll to the bottom of the variable list, click partyrecode, click the arrow, and click OK. Now your output window will show the distribution of party votes in the data set. And you can now use partyrecode to run crosstabs of the questions in the survey to see how the findings vary by partyvote.
Party vote 2008 recoded
| Frequency | Percent | Valid Percent | Cumulative Percent | |
| Valid Labour National Green NZ First Act Māori Party Total Missing System Total |
847 1047 146 99 82 94 2314 386 2700 |
31.4 38.8 5.4 3.7 3.0 3.5 85.7 14.3 100.0 |
36.6 45.2 6.3 4.3 3.5 4.1 100.0 |
36.6 81.8 88.1 92.4 95.9 100.0 |
To recode income or education use a similar process starting with transform and recode into a different variable, but this time you will not be excluding any categories, except Don't Know=9. Instead you might combine values with small numbers of respondents so you end up with fewer categories all with reasonable numbers.
For example, for H7: Highest Educational Qualification, looking at the way responses are distributed over the categories into "primary or less", then the next two into "secondary qualification", the next two into "tertiary nondegree", and Degree and Postgraduate into "Degree qualificaiton", then exclude 9= don't know.
In variable view, scroll down the label column to H7: Highest Educational Qualification, click in the Values column and then on'...'. The coded values start at 1= minimal primary and continue through to 8= postgraduate.
Unfortunately it doesn't tell you what '9' means. In such a situation, you might try going to the questionnaire to see if it is listed there as a response option with a label, but it is not there in either case. We have to assume it means 'don't know' as this is a stylistic convention, but it is a bad practice not to label all the values. For some existing data sets there is a coding book you can go to for this information, but not for this one.
Fill in the recode for different variables as follows:
H7: Highest Educational Qualification
| Frequency | Percent | Valid Percent | Cumulative Percent | |
|
Valid Minimal Primary
Complete Primary SC UE Non degree qualification Some University Degree Postgraduate 9.00 Total |
66 496 745 201 375 148 248 95 325 2700 |
2.4 18.4 27.6 7.4 13.9 5.5 9.2 3.5 12.0 100.0 |
2.4 18.4 27.6 7.4 13.9 5.5 9.2 3.5 12.0 100.0 |
2.4 20.8 48.4 55.9 69.8 75.3 84.4 88.0 100.0 |
You can follow a similar process to recode personal income or household income.
In SPSS, click the Analyze menu, select Descriptive Statistics and click Crosstabs... Put E3: Party vote in 2008 in the row box and E14: Party vote in 2005 into the column box.
Click on Statistics and select Phi/Cramer's V. Click on Cell and select column.
In the resulting table, looking down the columns we can see how people who voted for a particular party in 2005 then voted in 2008.
Results from the SPSS output are presented in the table form below. As an exercise, transfer the data for the remaining columns, Act and Māori party vote 2005 by party vote in 2008.
| Labour 2005% |
National 2005% |
Green 2005% |
NZ First 2005% |
Act 2005% |
Māori party 2005% |
No Vote 2005% |
|
|
Labour 2008%
National 2008% Green 2008% NZ First 2008% Act 2008% Māori party 2008% Other 2008% No Vote 2008% |
63.9 17.7 4.9 2.7 0.7 3.5 2.1 4.5 |
1.4 89.4 0.3 1.2 4.5 0.2 1.0 2.0 |
17.5 16.5 55.7 2.1 0 2.1 5.1 1.0 |
5.2 33 3.5 32.2 5.2 10.4 2.7 7.8 |
16.4 27.7 6.2 4.0 1.1 4.2 1.8 39.0 |
||
| Total N | 1072 | 662 | 97 | 115 | 354 |
We know that Labour lost party votes in 2008 compared to 2005 when they were able to form the government. According to the NZES, as shown in the table above, only 63.9% voted Labour in both elections. But where did those 2005 Labour voters go in 2008 - who did they vote for instead of Labour?
Looking at the table above it is clear that the largest group of dissaffected Labour voters from 2005 went to National in 2008 (17.7%). Others went to the Greens (4.9%), the Māori party (3.5%), NZ First (2.7%) or didn't vote (4.5%). Less than 3% were spread across the other parties. NZ First had a big loss, retaining only a third of its 2005 voters and losing just as many to National. National, on the other hand retained 89.4% of its voters from 2005. Of those who changed their party vote, the largest group went to Act (4.5%).
Exercise: Describe how 2005 Green, Māori and Act voters voted in 2008.
To find out where 2008 National voters came from, we need to turn the table around in SPSS - put the 2008 vote variable in the column box and the 2005 vote in the row.
Other than National, what party did most of National's 2008 votes come from? The largest group of the new National voters in 2008 had voted Labour in 2005 (18.3%). Others came from NZ First (3.7%), Act (2.7%) and non-voters in 2005 (9.5%).
Using Select Cases to limit categories
Parent's party prefernce variables (G4A: father's preference), (G4b: mother's preference), and (H16: family political learning).
There are a large number of categories here, mostly with very few responses. There are also a large number of 'don't knows' and 'no preference' categories. One could recode these out similarly to how we did earlier on.
An alternative would be to use Data menu - Select Cases... to restrict analysis to a comparison between parents with a known Labour preference and those with a National preference.
Click the Data menu, and then Select Cases... Click the button for 'If condition is satisified', and then click on If..., as shown in the screenshot below.
Scroll down the variable list, click the G4a: Father party preference, and then click the arrow to put the variable name into the calculation box. It will appear under its variable name zfspan. If you know the short name of the variable you want to use here, you can just type it in rather than having to find it on the list.
For this example, you only want to choose Labour and National, which are coded as values 2 and 3. So click on '=' on the keypad shown, and then easiest is to complete it by typing, as shown below, to finish with: zfspan = 2 or zfspan = 3.
Click on Continue and then OK to finish up.
If you then produce a frequency table of G4a: father's party preference, you will only get results for National and Labour.
G4a: Father Party Preference
|
|
Frequency | Percent | Valid Percent | Cumulative Percent |
| Valid National Labour Total |
687 774 1461 |
47.0 53.0 100.0 |
47.0 53.0 100.0 |
47.0 100.0 |
If you wanted to include No preference (value=1), you could use the expression zfspan<4
G4a: Father Party Preference
|
|
Frequency | Percent | Valid Percent | Cumulative Percent |
| Valid No preference National Labour Total |
110 687 774 1461 |
7.0 43.7 49.3 100.0 |
7.0 43.7 49.3 100.0 |
7.0 50.7 100.0 |
The 'select cases' filter will stay on until you turn it off; with it on, you can now run crosstabs for father's party preference compared with say respondent's party vote (party recode).
Produce a crosstab with E3: party vote 2008 in the row box and G4a: father's party preference in the column box. In statistics options, select Phi & Cramer's V; in cell options, select to display column percentages.
In the resulting table, looking down the columns we can see how respondents voted in 2008 was related to their fathers' party preferences - here displayed for National, Labour, or no preference, as filtered similarly to above. For example, 48.1% of those whose father prefered Labour also voted Labour in 2008.
Remember to go back into the select Cases interface and turn off the filter before running any other procedures or they will all run on the subset rather than the full sample.
Ethnicity: As outlined earlier, ethnicity data for NZES were entered in five binary variables, one for each of the main ethnic groups - allowing people to identify with more than one ethnicity. For bivariate analysis, as described in this module, you could simply tabulate selected dependant variables against each ethnicity variable seperately, or you could use the Multiple Response procedure in SPSS to create a single ethnicity variable.
As an individual can be in more than one category it is not possible to run statistical tests on the strength of an association with or ability to predict outcomes on the dependant varible from ethnicity, or on the reliability of generalising from the sample data to the population (Acton and Millar 2009, p176).
Before we begin, consider the specifics of the ethicity variables. frequency tables show that the variables for NZ European, NZ Māori, Pacifica and Asian are all presented in the same format, with a dichotomous response - either 'Yes or 'No'. The 'Other ethnicity' variable has a large number of response options corresponding to different specific ethnicities. As all variables need to be of the same response format for creating a single multiple response variable, we will be excluding the 'Other' ethnicity variable here.
The first step is to define your 'variable set'. click the Analyze menu, select Multiple Response and click Define Variable Sets...
Select H21a1 to H21a5 from the variable list and clickt the arrow to move these into the 'Variables in Set' box.
Enter 1 in the Counted value box - this is the value that appears for each individual ethnicity variable if that ethnicity was ticked on the questionnaire.
Type the name you want to give your new multiple response variable, and a label explaining it.
Click Add, on the right, and then Close.
Unlike with your recoded variabels, the variable created here will not appear in the variable list of your original data set. It is a temporary variable and can only be used to create frequencies and crosstabs through the Multiple Response procedure.
Go back to the Analyze manu, Multiple Response. You will see that you are now offered the options of producing Frequencies or Crosstabs here.
First, choose Frequencies... Click on the arrow to move your Ethnicity variable into the Tables box, and then click OK.
Go to the Output window to see how your five seperate ethnicity variables now appear as one.
Now go to produce crosstabs under Multiple Response
You will see your ethnicity variable sitting in the Multiple Response Sets box below the main varible list box. Click on it, and click the arrow to move it into the column box.
Select G10: Future of Māori seats from main variable list and move into row box.
You now have to define the value range for this variable. You can do this in SPSS by producing a frequency table for it, or by checking the coded values in the Values column in Variable View.
You will find that the variable has values 1, 2, 3 and 9 (get rid of the Māori seats, keep the current 7, Have more Māori seats, and Don't know).
With his information in mind, click Define Ranges - enter 1 as Minimum and 9 as Maximum, and then click Continue. Click on Options and click to tick the box to display column percentages.
Note that the percentages will be based on the total number of cases or respondents. with a multiple response variable you can base percentages on the total number of cases or on the total number of responses, e.g. I am one case or respondent, but if I select both NZ European and NZ Māori as responsed to the ethnicity question, then i contribute two responses.
Look across the rows in the resulting table as above to compare responses by ethnic group. The first row shows the percentage from each ethnic group who thought we should get rid of the Māori seats. Not suprisingly very few Māori (9.5%) believed this, and those who identified as Pacifica were also not generally in agreement (7.5%). NZ Europeans were most in favour of getting rid of the Māori seats (43%). Pacifica were most supportive of keeping the Status quo, that is, seven Māori seats. Māori were most supportive of having more than the current seven, with nearly half (46.2%).
Further exercises:
You can look at how findings on a question in the 2008 NZES compare with those from earlier years by consulting other data sets vis NZSSDS http://webview.nzssds.org.nz/webview/ . You can either produce frequency tables right there on the website or sign up to download the data set as you did the teaching data set for this workbook
There are a number of steps you need to work through first.
Step 1 Check whether the question you want to compare was included in earlier surveys by going to the questionnaire or variable list in NZSSDS. Note that the questions are not necessarily in the same sections each year, e.g. party vote was in section E in 2008 but section D in 2002. Questionnaires are available under metadata for each survey. An example of how to navigate the interface to find a questionnaire is shown below.
Step 2 While there, you can also read through the rest of the supporting documentation, which may provide pertinent information on sample and questionnaire design, variable coding etc. To check variable coding and distributions you can go into Variable Description and expand the list - an example is shown on the following page.
Two questions that have been included in most surveys since the 1990s and are still topical are Support for MMP, and Whether to keep or abolish the Māori seats. To make it easier for you to work through this example, the variables from the earlier surveys discussed have been included at the end of the teaching data set for 2008. Note that these variables relate to a very different sample and as such cannot be analyses in any more detail without their original data sets. You wil also be unable to weight observations to the population without those data sets.
Questions on whether voters think MMP should be retained or should return to the previous First Past the Post system, or try a third option, have been included in the NZES since MMP was introduced in the 1996 election following the referendum on whether a change from FPP was desired, preceding the actual referendum in 1993 on whether to change to MMP.
Create frequency tables for the variables for each year - remember, you will find the variables for the previous years at the end of the teaching data set for 2008. You should get output of a list of tables like those below, for the original question in 1993 and that in 2005 respectively.
Vote in 1993 Referendum
| Frequency | Percent | Valid Precent | Cumulative Percent | |
| Valid NONVOTE FPP MMP Total Missing System Total |
203 879 11137 2219 32 2251 |
9.0 39.0 50.5 98.6 1.4 100.0 |
9.1 39.6 51.2 100.0 |
9.1 48.8 100.0 |
E2 Hypothetical Electoral Referendum - 2005
| Frequency | Percent | Valid Precent | Cumulative Percent | |
| Valid Keep MMP FPP Alternative PV STV List PR PR General Open List DK Total Missing System Total |
977 953 195 4 93 0 3 1 549 2775 33 2808 |
35.8 33.9 6.9 .2 3.3 .0 .1 .0 19.5 98.8 1.2 100.0 |
35.2 34.4 7.0 .2 3.4 .0 .1 .0 19.8 100.0 |
35.2 69.5 76.6 76.7 80.1 80.1 80.2 80.2 100.0 |
To get a better view of how opinions have changed over time, you could take figures from the table and put them into a graph, like the one below was created using Excel.
You can see from the graph that support for MMP declined slightly from the 1993 referendum to 1996. Then there was a drop in 1999, before support bounced back in 2002. It then dropped significantly in 2005, and reduced again in 2008 at the time FPP support increased markedly in 2005, following a steady decline from 1993.
Note that the question as asked in more recent years also covered other options for an election model, such as STV, and included a 'don't know' option.
Exercise
Why might support for MMP have been lower prior to the 1999 election than after it? What happened in the 1996 post-election period when coalitions were being negotiated? How did that process compare to that after the 1999 election?
What was happening in the political landscape in NZ between 2002 and 2005?
Looking at election results, the change in 2005 coincided with a surge in support for National. http://en.wikipedia.org/wiki/Elections_in_New_Zealand.
What was happening with the third parties during this time that might have created disaffection with MMP?
As politics students, use your study from other papers in your degree to help interpret and explain the shift in support for MMP and theorise about what might have caused it, You can then suggest further research that might test your theory. This is an example of the ongoing cyclical link between research and theory.
This example is a little more complicated. There was no quesation about Māori seats in 1996. the response options were slightly different in the 1975 post-election survey, with only 'abolish' or 'no change' - there was not the option of having more Māori seats as was asked in later years. And in 1999 (using the pre-election survey) there was the conplication of coding value '887' for n=809 respondents and the need to go to the supporting documentation in the metadata - Other study description materials - response rates, weights and error, to find out the reasoning for this (see http://webview.nzssds.org.nz/NZSSDSData/notes/NZSSDS00012%20Note.pdf ) was that telephone respondents were not asked this question. So to get the correct percentage from those who were asked this question, you need to subtract 809 fromn the total (5909-809 = 5100) to get a valid total on which to base the percentage in each category. So the percentage saying 'get rid of Māori seats' was 39.7% not 34.3%.
This example is interesting in that it allows us to go back over 30 years to when a question on the Māori seats was first asked in a national election survey. It was not included again until almost twenty years later in 1993. By this time thre had been a huge decine in those wanting the Māori seats abolished, from two thirds (67.8%) in 1975 to less than half (42.3%) in 1993. There had also been a fourth response option included for those who wanted 'more Māori seats', with 14.3% choosing this option.
What might explain this huge shift in public opinion, most of whom were Pākeha? What was happening for Māori in New Zealand during these years [e.g. Māori renaissance]
Changes since 1993 have been more moderate, with the biggest decline in support for abolishment occuring in the most recent inter-electoral period.
Section key points: