2003 Master Sample (MS) Design
Beginning July 2003, the former National Statistics Office (NSO) employs the 2003 master sample (MS) design in the conduct of its household surveys. The 2003 MS extensively employed the results of the 2000 Census of Population and Housing as well as results of past national surveys, such as the 2000 Family Income and Expenditure Survey (FIES), the 2001 Labor Force Survey (LFS), and the 1997 Family Planning Survey (FPS).
This note provides an overview and general description of the different aspects of the 2003 MS. More thorough discussions are given in the main technical documentation (The 2003 Master Sample Documentation).
With the availability of updated information for the general household population from the 2000 Census of Population and Housing, a redesign of the master sample was done.
The 2003 master sample design covers all households in the Philippines excluding institutional households as well as households in the Least Accessible Barangays (LABS).
For the 2003 MS, a barangay is classified as LAB if: (a) there is no regular means of transportation (frequency of transportation is less than three times a week); (b) the cost of a oneway fare is more than 500 pesos; or, (c) it takes more than 8 hours of walking to reach the barangay. The LABS were identified by the PSA (former NSO) field offices. The final list was determined after further consultation by the PSA (former NSO) Central Office MS project team with the NSO field offices. A total of 350 barangays were classified as LABs and were excluded in the MS frame.
Primary Sampling Units (PSU)
- There are 41,942 barangays in the country, 350 of which were considered least accessible barangays (LABs) and were excluded from the frame
- The total number of PSUs formed from 41,592 barangays is 16,586
- The average number of households in a 2003 MS PSU (or PSU size) is 923
A master sample is a sample of PSUs. A PSU, on the other hand, is a cluster of households with clear and stable boundaries, that is, the boundaries do not change rapidly over time. A PSU should also contain sufficient number of households to support all the household surveys for which it will be used as sample. The 2003 MS for instance, needs PSUs with at least 500 households.
The barangays were found to be the most suitable administrative unit (in terms of number) to form the PSUs for the 2003 MS. However, more than half of the barangays do not satisfy the minimum size requirement (number of household) of an ideal PSU, thus, “small” barangays were grouped with contiguous barangays within the municipality to form the desired PSUs.
A list of all the PSUs formed and their characteristics in terms of the stratification variables used is contained in the Master Sample Frame (MSF).
Survey estimates are generally needed for the nation as a whole as well as for various subgroups. These subgroups may refer to socio-demographic subdivisions that are usually spread throughout the population such as female-headed households by age of head or educational levels by age and sex, or geographic subdivision such as regions or provinces. Thus, the survey may be designed taking into consideration the provision of estimates with adequate level of precision for such subdivisions. At the design stage, geographic subdivisions are usually treated as domains. A domain refers to such subdivisions in which estimates of adequate precision are desired.
Based on past surveys and other available resources, most national surveys are able to produce estimates of adequate precision at the regional level only. The precision of estimates may be measured in several ways. One way is to construct a 95% confidence interval estimate (note that a wider confidence interval estimate is deemed imprecise and less useful).
Example: The estimated proportion of poor families for a given domain is 30%
|Coefficient of Variation (CV)||Standard Error (SE)||95% Confidence Interval Estimate|
|10%||3%||30% ± (2*3%) ⇒ 24% to 36%|
|20%||6%||30% ± (2*6%) ⇒ 18% to 42%|
The example above means that with a CV of 10%, the true proportion of poor families lies between 24% to 36% ninety-five percent of the time. A CV of 20%, on the other hand, assures that the true proportion of poor families lies between 18% to 42% ninety-five percent of the time. Notice that the width of the interval widens as the CV or SE values increases. A summary of the provincial and regional level CV values of the estimated proportion of poor families is shown in Table 1.
Table 1. Distribution of Regional and Provincial Estimates of the Proportion of Poor Families Based on the Results of the 2000 Family Income and Expenditures Survey (FIES)
|Range of CV Values||Number of Regional Estimates||%|
|5% - 10%||11||64.7|
|Range of CV Values||Number of Provincial Estimates||%|
|5% - 10%||36||43.9|
|10% - 15%||33||40.2|
|20% - 25%||1||1.2|
Source of Primary Data: PSA (former NSO), 2000 FIES
For domain specification, an estimate is considered precise if the CV value of the estimated proportion of poor families does not exceed 10%. This criterion was used in specifying regions as domains of the MS. Note that in Table 1, only 39 out of 82 provincial estimates of the proportion of poor households yielded CV values less than 10%.
The importance of generating provincial level estimates was seriously considered in defining major sampling domains for the MS. However, generating provincial level estimates with adequate precision requires larger sample size that is usually not feasible and sustainable given the resources available for the survey.
With regions as domains, the computed total sample size that would give the desired reliability in the estimates for each domain is manageable. In particular, the required sample size per region was computed so that the expected CV of the estimated proportion of poor households would not exceed 5% except in the NCR where the CV value was set to 10%. The exception was made through the observation that the estimated proportion of poor households in NCR is small (around 8%). The total sample size computed that satisfies this reliability condition is about 43,000 households. If provinces were to be specified as domains, the total sample size requirement would be much larger than this.
The procedure in allocating the total sample size in each domain directly affects the precision of the estimates based on two important purposes. These are:
- The need to generate precise estimates at the national level or subclasses of the population that cuts across domains. Examples of subclass estimates are the proportion of poor households among female-headed households or the employment rate by major industry classification (e.g. agriculture, manufacturing, etc.). For this purpose, allocating the sample proportional to the total number of households in the domain is considered the best solution.
- The need to generate precise estimates at the domain level for purposes of comparison. In this case, allocating the total sample size equally across domains is the best solution
Clearly, the best solutions for each of the two concerns are not consistent with one another. Because of this, a compromise allocation scheme was used. In particular, the Kish Allocation Scheme was used to allocate the total sample size to each domain.
The final sample size per region was further adjusted (increased) to consider projected non-response and population growth. These adjustments resulted to a total sample size of about 47,000 households.
n - total sample size (about 43,000);
H - number of specified domains/regions (=17); and
Wd = Nd / N - proportion of the total household population (N) found in region d.
Number of PSUs per Domain/Region
The number of sampled PSUs per domain was computed by simply dividing the total sample size by the desired sample size per PSU. The desired sample size per PSU was determined using: (1) the information on the cost of data collection efforts in the region; and, (2) the indication of similarity or homogeneity of the households within the PSU. The basic idea is to take smaller samples with PSUs consisting of homogeneous households and if the cost of data collection is more expensive. With these information gathered from past survey results, the number of sample households from each PSU was set at 16 for areas outside the National Capital Region (NCR) and 12 for the NCR. This means that for NCR, the total number of PSUs is equal to the allocated sample size divided by 12. For the other regions, it is equal to the allocated sample size divided by 16.
SR PSU or Self-Representing Primary Sampling Unit – a very large PSU in the region/domain with a selection probability of approximately 1 or higher and is outright included in the MS; it is properly treated as a stratum; also known as certainty PSU
NSR PSU or Non-SelfRepresenting Primary Sampling Unit – a regular to small sized PSU in a region/domain; also known as non-certainty PSU
The final number of sample PSUs for each domain was determined by first classifying PSUs as either selfrepresenting (SR) or non-selfrepresenting (NSR). In addition, to facilitate the selection of subsamples, the total number of NSR PSUs in each region was adjusted to make it a multiple of 4.
The 2003 MS consist of a sample of 2,826 PSUs. The sample size distribution across regions and provinces are shown in the attached Table A.
Stratification of PSUs
Stratification involves the division of the entire population into non-overlapping subgroups called strata, from which samples are being selected independently. This procedure is done to:
Improve the efficiency of the estimates as a result of combining units that are similar in characteristic. This means improving on the precision of the estimates for a given sample size.
Provide samples for specific subgroups of the population in which separate estimates are desired.
The stratification procedure used in the 2003 MS is described in Diagram A.
A total of 955 explicit strata were formed, 330 of which were the SR PSUs.
2 Proportion of strongly built houses
3 An indication of the proportion of households engaged in agriculture
4 Per capita municipal income
In each explicit stratum, a sample of PSUs, and then sample EAs within PSUs, was selected with probability proportional to size (PPS) where size is the number of households enumerated in the 2000 Census of Population and Housing (CPH). Within each sampled EA, a sample of housing units was selected with equal probability. All households in the housing units sampled are completely enumerated, except for few cases when the housing units have more than three households. For operational considerations, the maximum number of household that could be enumerated in each sampled housing units is three. In the case of SR PSUs, the EAs were the PSUs and a minimum of two EAs were selected with PPS to ensure valid estimation of the variances.
Formation of Replicates
Another important feature of the 2003 MS design is its flexibility to meet the needs of different surveys. Some surveys require only a fewer set of sample and thus the need to sub-sample from the master sample. To facilitate the selection of sub-samples, the MS was divided into four replicates. A replicate is defined as a subsample that possesses the properties of the full master sample such that each replicate is able to generate national level estimates of adequate precision.
For the NSR PSUs, each of the four PSUs in every stratum is assigned to one replicate. In the case of SR PSUs, on the other hand, the EAs were distributed to the replicates in such a way that a balance between two half samples (each of two replicates) can be achieved. A balanced distribution of EAs of the SR PSUs to the four replicates can not be achieved because most of the SR PSUs have only two EAs.
Selection of Subsamples
Several options are available in the selection of subsamples from the new master sample. These options depend on whether the survey is done together with the regular Labor Force Survey (LFS) or as a stand-alone survey.
- If a survey that requires only a subsample is conducted together with the LFS, then it is more efficient to select a subsample of housing units within a PSU. For instance, suppose the total number of sampled housing units within a PSU is 16, a quarter sample is drawn by selecting 4 housing units from among the 16 with equal probability.
- If the survey is to be conducted independently of the LFS, then it is more efficient to select a subsample of PSUs rather than a subsample of housing units in all PSUs. The subsampling of PSUs can be done by selecting one or more replicates. For instance, if a 50% sample is desired, then this can be achieved by selecting two replicates. This applies on both SR and NSR PSUs.
The generation of the survey weights for each responding element is one of the key activities in generating estimates using the MS. The weight may be interpreted as the relative importance given to the responding unit in the generation of estimates. This can also be interpreted as the number of non-sampled units that each responding unit represents in the sample. Basically, the final survey weight is defined as the product of: (1) Base weights; (2) The nonresponse adjustment weight; and, (3) Weight adjustment based on known population totals or simply post-stratification weight. The base weight is determined by taking the inverse of the selection probabilities of each unit of analysis. The nonresponse adjustment weight is determined by taking the inverse.
Rotation of Samples
The MS will be used for a period of 10 years. As such, sample elements need to be replaced by a new set at certain points in time. Retaining the original sample elements would create problems such as response burden that would eventually affect the overall quality of the survey results. In addition, units repeatedly interviewed increase the likelihood of non-response. A solution to this problem is to devise a sample rotation plan so that a unit may stay in the sample for some period and then replaced permanently by a new set of sample. To facilitate a sample replacement scheme, each replicate will form a panel. In each PSU, all units were divided into rotation groups of equal size. The sample replacement scheme is such that every quarter of the year, a new rotation group in each panel will be selected. However to maximize the effect of the correlation of the estimates between years, 50% of the panels will have common samples for a quarter in consecutive years. For illustration, refer to the proposed sample rotation design in Table 2.
The completion of the research for 2003 master sample design directed the PSA (former NSO), through the Statistical Methodology Unit (SMU), to conduct other related research studies. For 2004, the research study line up is as follows:
- Validation of Raking Procedure used for LFS Estimates;
- Provincial Estimation of Unemployment Using Aggregated Four Quarter Samples;
- Comparison of Estimates (levels/rates and precision) Using Old and New Nonresponse Adjustment Procedure; and
- Comparison of the number of households obtained in C2K and CA/CF listing by EA.
Table 2. Sample rotation design from 2004 to 2008
* Numbers represent rotation groups formed for the housing units withn the sampled EAs and letters represent rotation clusters. Rotation cluster A includes replicates one and two while rotation cluster B includes replicates 3 and 4
Table A: Sample size distribution by region and province. 2003 PSA (former NSO) Master Sample.
|Region / Province||Total Pop'n||No. of Hhlds||No. of PSU||Allocated Sample Size||No. of Sample PSU||Final PSU Allocation|
|Original||Adj. For Non Response||SR PSU||NSR PSU||Ttal PSU|
|Zamboanga del Norte||809,672||159,463||222||601||627||38||0||40||40|
|Zamboanga del Sur||831,504||161,751||222||610||636||38||0||40||40|
|Lanao del Norte||476,106||90,675||136||290||305||18||0||16||16|
|Cagayan de Oro City||482,310||98,131||32||314||330||20||10||8||18|
|Davao de Norte||743,592||150,627||139||459||504||29||3||24||27|
|Davao del Sur||745,401||154,484||168||471||517||29||1||28||29|
|General Santos City||411,110||86,524||25||290||319||18||9||8||17|
|Las Pinas City||440,315||92,203||20||197||218||16||14||4||18|
|Lanao del Sur||749,325||108,711||177||441||463||28||0||28||28|
|Agusan del Norte||285,755||53,506||75||259||276||16||0||16||16|
|Agusan del Sur||551,212||103,621||131||502||535||31||3||28||31|
|Surigao del Norte||476,597||93,517||122||453||482||28||2||24||26|
|Surigao del Sur||502,700||95,706||118||464||494||29||2||28||30|
Column Description in Table A
Column 1 - Regions and provinces in the Philippines
Column 2 - Total household population based on Census 2000 counts
Column 3 - Number of households based on Census 2000 counts
Column 4 - Number of PSUs formed per region/province/city
Column 5 - Number of sample households allocated per region/province/city
Column 6 - Number of sample households allocated per region/province/city adjusted to cover for the non-response
Column 7 - Number of sample PSUs per region/province/city from which sample households will be drawn
Column 8 - Number of sample self-representing PSUs per region/province/city
Column 9 - Number of sample non-self-representing PSUs rounded off to the nearest multiple of four
Column 10 - Total of Columns 8 and 9