| ALSWH
sampling scheme
Selection
of the sample
The study sample was selected by Medicare Australia
(previously known as the Health Insurance Commission)
from three zones - urban, rural and remote. The age
groups sampled from the Medicare database in April 1996
were 18-22 years, 45-49 years and 70-74 years. By the
time the invitations to participate were mailed later
in 1996, some women at the upper limit of the age groups
had had their birthday and were a year older. Hence
some women recruited were 23, 50 and 75 years old and
so the cohort age ranges in the study are: 18-23; 45-50
and 70-75 years (although you will note that there are
relatively fewer women in the oldest year of each cohort).
Sampling
from the population was random within each age group,
except that women from rural and remote areas were selected
in twice the proportions of the Australian population
living in these areas. Women from capital cities and
other metropolitan areas made up the balance of the
samples.
There were
also a small number of women who were sent an invitation
to participate whose age lies outside the cohort ages
(by a year or two), probably due to errors in date of
birth in the Medicare database. However the survey data
for these women have been retained. We recommend that
when using the data, these women are either excluded
or their age set to the nearest valid age.
Calculation
of the sample weights
The women were selected based on their postcode recorded
by Medicare. The first three digits of their Study ID
number reflects the selection (age group code, state
code, area code). The variable in the datasets called
‘inarea’ reflects the area from which the
women were sampled (urban, rural, remote). However by
the time the survey was mailed, some women, particularly
in the younger age group, had moved. The variable ‘y1area’
reflects their actual area of residence when completing
the survey.
The
number of respondents who lived in urban, rural and
remote areas at the time of completing the first survey
(wave 1 area) was used to create the sample weights
for each age group for each area (urban, rural, remote),
by comparing these numbers of respondents to the most
recent census figures (1991). The sample weights appear
in the datasets and are labeled y1wtarea, m1wtarea,
o1wtarea.
Representativeness
and attrition:
The International Journal of Epidemiology paper is the
best reference for current retention rates and representativeness
(Lee C, Dobson AJ, Brown WJ, Bryson L, Byles J, Warner-Smith
P, Young AF. (2005) Cohort Profile: The Australian Longitudinal
Study on Women’s Health. International Journal
of Epidemiology; 34: 987-991.)
Annual
updated information can also be found on the ALSWH website
under Project / Sample.
Longitudinal
analysis:
When doing longitudinal analyses, remember to weight
for area of residence at Survey 1 (y1wtarea, m1wtarea,
o1wtarea) in all crosstabs, frequencies and analyses
to adjust for the initial deliberate oversampling in
rural and remote areas. Not required when running models
that include area of residence.
Missing
data:
Some participants completed a short survey instead of
the full survey, accounting for some missing data. The
type of survey completed is identified with variables
such as y2survey for Survey 2 of the Younger cohort.
Mid 2 Q70 on income is missing the first category ($1-$119).
There are large amounts of missing data in some income
questions. Mid 2, Mid 3 and Mid 4 are missing the question
about being admitted to hospital. Young 2 is missing
the question about ability to manage on income. Mid
2 Q67 is unreliable as the instruction was incorrectly
stated as “mark one only” rather than “mark
all that apply”. Many participants realised that
this was an error and answered the question as it should
have been. Others may not have done so.
Extra
resources to support data analysis:
Check the data map, the data dictionary and Data Dictionary
Supplement for further information about survey items
and derived variables. They are available by
following this link.
Check
the survey databooks if unsure about response frequencies.
Electronic copies of the surveys and databooks are available
at the following link.
Several
reports are available via the web that may be useful.
For example Changes Report 1:”Transitions in Selected
Variables, Surveys 1, 2 and 3” (December 2004)
and Changes Report 2 “Changes Report 2: “Examples
from the Australian Longitudinal Study on Women’s
Health for Analysing Longitudinal Data.” (June
2005) See the reports page.
See
the Data Dictionary Supplement for information on cleaning
and coding of anthropometric variables (heights, weights,
body mass index). These variables are provided in a
separate dataset to the survey data. In 2008 the anthropometric
data was included in all the survey data sets.
Notes
about specific variables
Menopause - The menopause status variable is
recalculated as each new dataset becomes available for
Mid-age women. Make sure you get the most recent menopause
status dataset.
Child
data set – The fourth survey for the Younger
cohort included a set a questions relating to child
birth. These questions have been put on a Child data
set.
Items
that form part of a scale – Be careful that
you do not inappropriately analyse single items from
a scale. For example, the 36 items in the SF-36 should
not be considered as separate items, other than the
first self-rated health item. The Data Dictionary Supplement
has details about which scales have been included in
the surveys.
Measure
of depressive symptomatology - the 10-item CES-D
scale has an extra item at the end (“I felt terrific”)
which is not included in the calculation of the CES-D
score. The CES-D score is available in the datasets.
Counting
symptoms - when looking at symptoms, the general
rule is to count the number of women who had the symptom
“sometimes” or “often”.
Measures
of exercise - the exercise questions were changed
after Survey 1. The new exercise measures from Survey
2 are not comparable to Survey 1 in longitudinal analysis.
Refer to the Data Dictionary Supplement for more information.
Summary
variables - there are a few “standard”
ways to collapse some of the main categorical variables
we collect. For example, education (highest qualification)
can be dichotomised as “school only”, ”post
school” or in three categories: “no formal
qualifications”, “school qualifications”,
“trade/tertiary qualifications” and so on.
There have been several variables created to summarise
sets of items in the surveys (eg. the illicit drug use
items) and it is important that data analysts become
familiar with these new variables (See Data Dictionary
Supplement)
Area
of residence - the main areas are urban, rural
and remote but there are few women in the study living
in remote areas and many living in rural areas. A 4-level
variable that is used in the databooks is: Urban (RRMA
1,2), Large rural centre (RRMA 3), Small rural centre
(RRMA 4) and Other rural and remote (RRMA 5,6,7). Other
classifications can also be justified.
Use
of general practitioners - in Young 2 (and Young
3) there are two items about frequency of use of GPs
(for “Pap tests, contraception, routine pregnancy
tests” and for “all other reasons”).
Responses to these two items have been combined into
a single measure of GP use. Refer to the Data Dictionary
Supplement for further details.
ATSI
status - asked at Survey 1 in all age groups. This
variable can be used in statistical models but results
should not be reported separately by ATSI status in
any papers (as we do not have a representative sample
in the study).
Coding
issues - for some variables, the category coded
as 1 (reference category) is not the first of the ordered
categories. For example, the reference category for
alcohol risk is ‘low risk’. Similarly, the
reference BMI category is “acceptable weight”.
In the question about how much would you like to weigh
now, the response option “Happy as I am”
generally appears as the first option except in Young
1 where it is the third option.
|