The expected factor structure stands out when there is sufficient difference between the within correlation and the between correlations i. We find that the survey data have an average within correlation of 0. This compares to a within and between correlation of 0. The fact that the difference between the between and within correlation is greater with the internet data is consistent with its more discernible factor structure.
This can further be inferred from table S8, which shows the item-by-item correlation coefficients for the survey database combining STEP and other surveys versus the correlation coefficients of the internet data and the United States.
Advances in Personality Assessment - Google Books
For internet data and the United States, correlations between items meant to capture the same PT are consistently much higher than correlations with any other items with the only exception of the first Openness item. By contrast, for the survey database, there are several items that show higher correlations with some items meant to measure other PTs than with items meant to measure the same PT including two items of Conscientiousness.
For the internet and U. But for the survey data, despite averaging over a large number of datasets, of the 10 highest correlations, 4 are between correlations and 6 are within. The fact that many questions correlate more with items intended to measure a different PT than with items intended to measure the same PT makes it arguably hard to interpret items as capturing the intended PT.
For a given number of items, it increases when the correlation between items of the same PT increases. Hence, it is higher when the noise of each item is low and when they measure the same underlying factor. A minimum threshold of 0. While all PTs measured in surveys show relatively low values, results indicate that internal consistency in the survey data is the lowest for Agreeableness and somewhat better for Emotional Stability.
This core set of results highlights that many Big Five survey data collected in low- and middle-income countries do not follow the FFM. Conceptually, this implies that some items correlate less with items that aim at measuring the same PT than with items that belong to different PTs. The evidence that led psychometricians to conclude that the Big Five are universal tends to hold for the internet data, but it is far from apparent in the survey data from developing countries.
Moreover, this appears not to be driven by differences in average age of the respondents or sample size in the internet data, as limiting the internet sample to ages and sample sizes similar to those of the survey data yields broadly similar results as for the full sample of internet data bottom panel of Table 2. We investigate possible explanations for the low validity of the PT measures, including the number of items, the cognitive ability and education levels of the respondents, the administration method, and systematic response patterns.
One might be concerned that it is difficult to recover the factor structure with only 15 items or that the 15 items selected for STEP were poorly chosen. Note, however, that validity with the same 15 items is much higher in the internet data. Moreover, the congruence coefficient, if anything tends to decrease with the number of items because the overfitting that occurs when the number of components is high with respect to the number of items, is reduced.
Browse more videos
In addition, the between and within correlation are on average not affected by the number of items. Second, Fig.
- Natale in Canada (Italian Edition).
- Shopping Cart.
- Advances in Experimental Social Psychology, Volume 4 - 1st Edition.
- Situational Influences on Personality?
- The Resurrection (The Attributes Book 2)!
- American Aunt;
Last, the middle panel of Table 2 presents statistics using the full 44 set of items available in other surveys and still shows overall low validity, substantially lower than the levels observed for the internet data. The estimates for the survey data are based on surveys with at least six items per PT, while the estimates for the internet data are based on data from the 14 STEP countries, using all 44 items of the BFI.
The U. The educational level of the respondents is another potential driver of the differences in the reliability and internal validity of survey data compared to datasets used by psychologists in the United States or the internet data. The bottom panel of Table 2 therefore presents a set of psychometric indicators when restricting the STEP data to respondents who have had some college education. This increases comparability between the STEP and internet data and brings the sample of STEP respondents closer to the convenience samples of university students often used in psychometric studies.
Unexpectedly, we find no improvement in any of the indicators, suggesting that the level of education of respondents may not be a primary driver of the low validity. Similar results are obtained when restricting the STEP samples to individuals with white-collar jobs. We analyze the role of cognitive ability with the STEP database, using as a proxy for cognition the measure of functional literacy that is comparable between individuals and countries of the STEP surveys.
Figure 3 presents the relationship between psychometric indicators and the cognitive ability of the respondents. In this figure, the unit of observation is the region, corresponding to the largest geographical division within each country as indicated in STEP, resulting in between 2 and 15 regions per country. Each figure depicts one of the indicators, separately calculated for each of the region, and their correlation with the regional-level average cognitive ability of the respondents.
The analysis is limited to the nine countries for which good cognitive measures are available. Of course, since regions with low-average cognitive ability are likely to differ along many other dimensions, these correlations may not have any causal interpretation.
The variation is substantial with the congruence coefficient varying from about 0. Hence, survey data for regions with lower-average cognitive ability show factor structures that are less consistent with the FFM and less internally valid.
- Edited by James N. Butcher;
- Draylon - God Killer.
- The Parisi: Britains and Romans in Eastern Yorkshire.
Yet, this is not the entire story because even the regions with the highest average cognitive ability remain below acceptable psychometric standards. Moreover, once we account for average differences between countries by including country-fixed effects in the estimations, these relationships are no longer significant, making it unclear whether they capture differences in cognition or rather other cross-country contextual differences that could affect responses in face-to-face surveys.
In each figure, the level of observation is the largest possible geographical division in the country regions, provinces, or district. We apply a weight that is the inverse of the number of geographical divisions to give the same weight to each country. Enumerator bias measures the share of the variation in responses by PT that can be explained by systematic biases due to which enumerator administrated each survey.
Cognitive ability is measured by the full literacy test, also described in the Supplementary Materials. The nine countries in the regression are the nine countries with full literacy test included in the STEP surveys. Low validity of the PT measures could also be related to systematic response biases and answering patterns that are potentially more prevalent in survey data.
Social desirability bias, for instance, could help explain why Conscientiousness and Agreeableness are the most problematic PTs and why Conscientiousness has little predictive power in the survey data. The bottom left panel of Fig. This result highlights that, within a country, respondents with lower cognitive skills are more likely to agree with statements that are mutually inconsistent.
It is in line with Soto et al. Overall, Fig. To better understand this contrast, we focus on the possible role of enumerators and of the administration method. Enumerator-fixed effects explain on average 5. The right bottom panel of Fig. Unexpectedly, enumerator effects are, if anything, more prevalent in regions with higher-average levels of cognitive abilities.ipdwew0030atl2.public.registeredsite.com/sitemap124.xml
Advances in Personality Assessment: Volume 8
Because enumerators are rarely randomly allocated to subjects, the enumerator results may reflect other factors if assignment of respondents to enumerators captures other within-region variation, for instance, differences in gender, language, access, etc. We therefore use two of the datasets non-STEP analyzed from Kenya and Colombia , where we randomly allocated enumerators to respondents, to isolate the enumerator effect from the selection effects previously mentioned. We find significant response biases of similar orders of magnitude as those found for most STEP regions.
The fact that the explanatory power of enumerator effects remains relatively strong even when enumerators are randomly assigned further points to the administration mode as a plausible explanation for the difference between face-to-face interviews and internet surveys. This is in line with other studies of Big Five data and other psychological data, using face-to-face interviews and internet data from Germany and Austria, which also conclude that response patterns such as acquiescence bias can depend on the characteristics of the particular measurement occasion 51 , To directly test the importance of the mode of administration in developing country settings, we set up a survey experiment to test the effect of administrating a face-to-face survey compared to a paper-and-pencil survey.
Our survey in Colombia included farmers who completed primary education. That said, even with self-administration, these indicators remain below standards and below the ones obtained with internet data.
By contrast, in a similar experiment, comparing self-administration and face-to-face interviews using the Big Five data in the German socioeconomic panel 53 found that the anticipated Big Five factor structure was present in the data, irrespective of the mode of data collection, but Rammstedt et al. This then helps to reconcile the different findings: The survey method possibly affects response styles more in non-WEIRD populations with lower levels of education.
Overall, we cannot point to one single factor that explains the lower validity of PT questions in developing country surveys. The evidence presented and reviewed highlights several factors jointly at play. Previous research has emphasized the wording of the questions, the quality of the translations, and how those questions are interpreted in the culture More generally, our findings suggest that many of the known potential response pattern biases are accentuated in field survey data among less educated populations, making it hard to identify the intended latent traits.
Advances in Personality Assessment : Volume 1
This association might be further explained by the lower interest in the survey topic of typical survey respondents in these contexts, whose incentives and expectations are likely quite different from the ones of those filling in personality tests on the internet or in other WEIRD populations. We show that although the BFI has been validated in specific countries and languages, one cannot assume that validity holds when administered in large-scale surveys amongst non-WEIRD populations. The lack of support for the FFM across a large set of surveys in diverse contexts indicates that the issues identified are not unique to a specific data collection exercise but point to a general problem in the measure of PTs through survey data in developing countries.
Although the FFM may well be universal, it appears hard to uncover this factor structure with survey data in contexts other than those for which it was developed. The various psychometric indicators analyzed suggest that measurement error is correlated with cognition, which, in turn, is associated with income and other factors.
Therefore, a set of items used as a proxy for a specific PT can also capture other factors and lead to incorrect inferences. This psychometric evidence on the lack of a clear Big Five factor structure in survey data points to an interpretation of Fig. Previous literature has found Conscientiousness and Emotional Stability to be the strongest predictors of income, and yet, we find Emotional Stability, but not Conscientiousness, to be a strong predictor. As illustrated in Table 1 , Emotional Stability is the PT that best differentiates itself from the others and hence is likely to be the least affected by systematic response biases.
Conscientiousness, on the other hand, is the PT that cumulates the highest number of misplaced items with their highest factor loads on other PTs. Together, this can explain why Emotional Stability, but not Conscientiousness, stands out as a strong predictor. Besides this, and in contrast to most findings in WEIRD populations, Openness appears to be a strong predictor of income.