1.1 Original Sample Design
The design of the original ELSOC sample aimed to harmonise the multiple research interests of the researchers associated with the Center. Among the most relevant considerations were the following:
A sample design that allowed combining the variables measured in the questionnaire with spatial variables, registered at block and commune level, contained in the databases developed by the Territorial Intelligence Center (CIT) of the Aldolfo Ibáñez University. The CIT data is not available for all the blocks in the country, particularly those located in rural localities, so we decided to include only urban areas in the sample. This deliberation also coincided with the preferences of many researchers at the Center, who were mainly interested in an urban sample.
Some researchers requested a design that allows the estimation of multilevel (or hierarchical) models grouped by city and commune. Therefore, we resolved that the sample should contain a sufficient number of cities and communes, as well as a sufficient number of cases within each city and commune, to allow such analysis (Snijders & Bosker, Chapter 10).
Other researchers were interested in comparing inhabitants of the three largest cities in the country, which resulted in a non-proportional design that increased the number of respondents in the areas of Greater Valparaíso (cities of Viña del Mar and Valparaíso) and Greater Concepción (Concepción, Talcahuano and others).
Finally, some researchers requested a design that allows the comparison of respondents living in large and small cities, favouring increasing the sample size of households in small cities (Kish, 1965, Section 3.5), particularly in those with between 30,000 and 100,000 inhabitants.
COES researchers worked with a sampling design manager, Stephanie Eckman, to develop a design that reasonably met these substantive needs and interests. The final ELSOC-COES Wave 1 sampling design provides adequate coverage of the country’s largest cities (Greater Santiago, Greater Valparaíso and Greater Concepción) and smaller cities. Also, it ensures the representation of people in the north and south of the country. In general, the sampling design reaches approximate representativeness of 77% of the country’s total population and 93% of the urban population. The following subsections detail the different steps of the sampling design.
Sampling Frame Setup
The original sample sampling process was conducted based on the 2011 pre-census data, which the CIT formatted. Although the 2012 census population counts are not accurate, the pre-census work collecting housing information on all blocks is of quality. The dataset contained a total of 155,757 blocks, but we eliminated four different types before selection began.
Following the analytical interests of the Center’s researchers, only urban blocks were used. We used the coding of the type of locality (urban or rural) contained in the 2011 pre-census database to determine which blocks were urban. Consequently, we excluded 22,188 (14.2%) blocks in this step.
Similarly, based on the analytical interests of the Center’s researchers, only the blocks that had been previously geo-referenced by CIT were retained for sampling. This implies that we removed 1,971 (1.5% of the urban blocks) that were not geo-referenced in this step.
We retained only blocks containing five or more households (according to 2011 pre-census). 503 blocks (less than 1% of the remaining blocks after steps 1 and 2) did not meet this threshold and were removed.
Only blocks in cities with more than 10,000 individuals were eligible for selection. 10,238 blocks (7.8% of the remaining blocks) were excluded from the sampling frame.
Thus, the final sampling frame contains 120,857 blocks. The COES sample will represent only these blocks and not those excluded. Estimates derived from the sample data will apply only to this target population and should not be applied to the entire Chilean population. The respondent selection process was developed in four stages, although a fifth stage was added during fieldwork.
Stage 1: City Selection
The universe of blocks (the 120,857 blocks mentioned above) was aggregated at the city level, resulting in 122 cities. The three largest cities (Greater Santiago, Viña del Mar - Valparaiso and Concepción - Talcahuano) were selected with certainty. The remaining cities were stratified by population. Table 1.1 shows the definitions of the strata and the population and sample sizes in each one.
North Stratum
|
South Stratum
|
||||||
---|---|---|---|---|---|---|---|
Stratum | Definition (N° inhabitants) | Cities population size | Cities sample size | Stratum population size | Stratum sample size | Stratum population size | Stratum sample size |
Greater Santiago | 1 | 1 | |||||
Greater Valparaíso | 1 | 1 | |||||
Greater Concepción | 1 | 1 | |||||
Big cities | > 100 thousand | 18 | 8 | 8 | 4 | 10 | 3 |
Medium cities | > 30 thousand | 28 | 10 | 15 | 6 | 13 | 3 |
Small cities | > 10 thousand | 73 | 19 | 24 | 6 | 49 | 13 |
The strata of large cities, medium-sized cities and small cities were stratified geographically by North or South zone to ensure that the sample contained cities in the north and south of Chile. This resulted in a total of nine strata. The sample was distributed between the two areas in proportion to their population size in the universe. See Table 1.1 for details about the population and sample sizes in each of the northern and southern strata.
The selection of cities within each stratum was in proportion to the population size of each city. This method gives a higher probability of selection to large cities.
The probability of selection of a city \(i\) within stratum \(h\) was:
\[\pi_i=\frac{(nc_h)(pop_i)}{\sum_h pop}\]
where \(nc_h\) is the number of cities selected in stratum \(h\) and \(pop_i\) is the population of city \(i\).
Stage 2: Block Selection
The 40 selected cities contained 87,839 blocks. In the second stage, we selected blocks in each city with a population proportional to size, where size was determined from the pre-census household unit count. The selection was systematic: the list of blocks in the selected cities was ordered by census sub-district and block number to ensure that the selected blocks spread throughout the city2.
Table 1.2 shows the number of blocks selected in each city, according to stratum. The sample of blocks was disproportionately distributed so that areas outside of Santiago would be over-represented relative to their size in the target population. Several COES researchers requested this distribution to ensure that the sample was diverse concerning city size.
The probability of selection of a block \(j\) in city \(i\), conditional on city selection, was:
\[\pi_{j|i}=\frac{(nb_i)(hu_j)}{\sum_i hu}\]
where \(nb_i\) is the number of blocks selected in city \(i\) and \(hu_j\) is the population of city \(i\).
Stratum | Definition (N° inhabitants) | Cities sample size | Blocks number per city | Blocks number |
---|---|---|---|---|
Greater Santiago | 1 | 200 | 200 | |
Greater Valparaíso | 1 | 100 | 100 | |
Greater Concepción | 1 | 100 | 100 | |
Big cities | > 100 thousand | 8 | 26 | 208 |
Medium cities | > 30 thousand | 10 | 25 | 250 |
Small cities | > 10 thousand | 19 | 11 | 209 |
Total | 40 | 27 | 1080 |
In 4 cities, some blocks were extremely large so they were sure selections. The household unit counts were larger than the selection interval and would be selected in any sample and could even be selected twice. To avoid duplicate selections, we first selected these blocks with certainty. Additional blocks were selected from the remaining blocks for those cities to reach the city’s desired total sample size (see Table 1.2. \(\pi_{j|i}\) for these cities is 1.
In the field, we registered the 1,067 blocks selected in the 40 selected cities to select the households with the most updated information possible. The CIT provided maps of each selected block. CMD field staff visited each block in person and created a registry of all household units in those blocks. We carefully reviewed the listings for any errors or duplicates.
During the registration process, the CMD found that some blocks had more than 100 households, making the procedure excessively difficult. Consequently, we divided these blocks into sub-blocks of approximately equal size (40 to 50 households) and selected one for registration. Because we created sub-blocks to be of similar sizes, we selected them based on equal probability. In total, we subsampled 301 blocks. This step did not affect the remaining blocks.
Stage 3: Households Selection
As shown in Table 1.3, the number of households selected in each block varied by stratum. This design resulted in 4,001 household units, which aimed to obtain approximately 3,000 completed interviews under the assumption of a 75% response rate for all strata.
Stratum | Definition (N° inhabitants) | Number of households per block |
---|---|---|
Greater Santiago | 5 | |
Greater Valparaíso | 5 | |
Greater Concepción | 5 | |
Big cities | > 100 thousand | 3 |
Medium cities | > 30 thousand | 3 |
Small cities | > 10 thousand | 3 |
Total | 4001 |
We conducted a simple random sample of households in each block. The combination of population proportional to sampling size in the first two stages and simple random sampling in the third and fourth stages resulted in a sample of households with approximately equal probability within each of the nine strata.
The probability of selection of a household \(k\) in block \(j\) in city \(i\) and stratum \(h\) was:
\[\pi_{k|j,i}=\frac{nh_j}{NH_j}\]
where \(nh_j\) is the number of households selected in block \(j\), and \(NH_j\) represents the number of enlisted households in block \(j\).
Stage 4: Individuals Selection
Interviewers visited each selected household and attempted to perform the interview. The first step in the interview process was to identify the target respondent. When more than one adult was in the household, we selected one using a simple random sample through a Kish table.
The probability of selection of a person in household \(k\) was:
\[\pi_{l|k,j,i}=\frac{1}{NP_j}\]
where \(NP_j\) is the number of adults (over 18 and under 75) living in household \(j\).
Stage 5: Increasing sample size
During the fieldwork of the first wave (2016), we observed that the assumption of a 75% response rate for all strata was incorrect. First, the overall response rate was lower than 75%, and second, there was significant heterogeneity in response rates between regions. Because of this, we decided to increase the number of households per block to achieve the 3,000 interviews effectively.
The increase in households per block has a limited effect on the probability of selection of each household. It only affects the probability calculated in Stage 3, since the number of households available is lower, but there is no change in the probabilities calculated in Stages 1 and 2. This occurs because we used the blocks selected (in Stage 2), and no new blocks were introduced.
We added 1,082 new households to the study sample during this process, located within the selected blocks. The allocation of these new households was not uniform across all blocks in the country. Instead, they concentrated on four regions: Coquimbo, O’Higgins, Metropolitana, and Biobío, where interviewers had the most problems contacting respondents. Table 1.4 details the communes in which the number of households increased concerning the initial design and the total number of households incorporated per block.
Region | Commune | Total added households | Households added per block |
---|---|---|---|
Coquimbo | |||
Coquimbo | 24 | 2 | |
La Serena | 28 | 2 | |
Salamanca | 22 | 2 | |
O’Higgins | |||
Doñihue | 10 | 1 | |
Rancagua | 42 | 2 | |
Santa Cruz | 11 | 1 | |
Biobío | |||
Chiguayante | 24 | 3 | |
Concepción | 75 | 3 | |
Coronel | 11 | 1 | |
Penco | 4 | 1 | |
Quillón | 6 | 1 | |
San Pedro de la Paz | 28 | 2 | |
Metropolitana | |||
Cerrillos | 9 | 3 | |
Colina | 12 | 3 | |
Curacaví | 14 | 2 | |
El Bosque | 8 | 2 | |
Estación Central | 12 | 3 | |
Huechuraba | 6 | 2 | |
Independencia | 6 | 2 | |
Isla de Maipo | 39 | 3 | |
La Cisterna | 9 | 3 | |
La Florida | 24 | 2 | |
La Granja | 6 | 2 | |
La Pintana | 12 | 2 | |
La Reina | 9 | 3 | |
Las Condes | 33 | 3 | |
Lo Barnechea | 9 | 3 | |
Lo Espejo | 6 | 2 | |
Lo Prado | 6 | 2 | |
Macul | 8 | 2 | |
Maipú | 32 | 2 | |
Ñuñoa | 16 | 2 | |
Padre Hurtado | 6 | 3 | |
Pedro Aguirre Cerda | 6 | 2 | |
Peñaflor | 30 | 2 | |
Peñalolén | 14 | 2 | |
Providencia | 7 | 3 | |
Pudahuel | 14 | 2 | |
Puente Alto | 32 | 2 | |
Quilicura | 12 | 2 | |
San Bernardo | 16 | 2 | |
San Joaquín | 6 | 2 | |
San Miguel | 9 | 3 | |
San Ramón | 6 | 2 | |
Santiago | 120 | 3 | |
Vitacura | 9 | 3 |
Successive wave sampling design in Original Sample
The sample design for the successive waves is equivalent to the original sample design. The households added and respondents selected during 2016 and those added during Stage 5 were re-interviewed in the follow-up waves.
Matías Garretón, CIT researcher, provided census block and census district numbers. Census sub-districts are smaller geographic units than communes but larger than blocks.↩︎