Exploring work with NAPP microdata

July 26, 2014

Census microdata, such as that produced by the NAPP project, can help illuminate interesting issues surrounding work and labor.

Investigating the history of work using microdata must begin with the questions asked by the census enumerators about employment. These varied depending on the country and year, but were generally quite simple. For example, the U.S. 1880 census form had two questions pertaining to work, and a third that hinted toward labor as well. Question 13 recorded the “Profession, Occupation, or Trade of each person, male or female,” and Question 14 asked the “Number of months this person has been unemployed during the Census year.” These two questions were not supposed to be asked of any individual “under 10 years of age,” according to the instructions.¹ Question 15 also implied work, asking if the person had been sick or disabled, “so as to be unable to attend to ordinary business or duties.” The recorded occupation, Question 13 in the case of the US 1880 census, was written free-form in the blank.

Transcribing the occupation

In the NAPP dataset, the occupation as it was transcribed by volunteers is found in the field OCCSTRNG. The small space on the form and the frequently creative spelling and abbreviating style of the enumerators can lead to some unreliable results if taken alone.

Let’s look at mining engineers in the 1880 US microdata as produced by NAPP. We have 417 entries where OCCSTRNG is MINING ENGINEER. Good! But we also have other variations, such as:


MINING ENGINEAR	MINING ENGERNEER	MINING ENG.
MINGING ENGINEER	MG. ENGINEER	ENGINEER MINING
MINING ENGR	MINING EXPERT	MININING ENGINEER

Some of these might be small variations on the overall category, worth noting and investigating. (What’s the difference, in 1880, between a Mining Expert and a Mining Engineer?) But many are clearly mining engineers, just spelled differently.

While this variable by itself is valuable, it can be difficult even for simple operations – say, counting the number of mining engineers in a particular state – because of the spelling differences. To address this limitation, the NAPP team created many additional variables to allow researchers to compare occupational information more broadly.

Constructing variables about work

From these tiny bits of inconsistent information about each person’s occupation, NAPP adds tremendous value by creating new variables derived from this information. These are “constructed variables.”

As you can see in the full list, not all variables are available for each sample, in part because the census questions asked about labor could vary depending on country and year. Additionally, some of the variables have essentially similar information, but NAPP offers a variety in order to make it easier to connect NAPP data with other data sets.

Some of these constructed variables are simple and intended to help with other comparisons. For example, the LABFORCE variable simply records if a person participated in the work force or not (or if it was impossible to tell).² By itself, this might not seem very useful, but it could be helpful in conjunction with other variables. For example, you might want to compare people in one occupational sub-group – farmers, for example – with all people who were a part of the labor force rather than the public as a whole.

Other constructed variables group together workers by their occupation. This helps solve the sort of problem we faced above with misspellings of “mining engineer.” OCCHISCO and OCC50US are two of these variables. Each uses a mid-twentieth century list of occupations as a starting point, with adaptations to better represent historical occupations. It is important to remember that not all occupations would have fit neatly into one of these later occupational categories, and conversely, sometimes very different occupations get lumped together inadvertently.³ Even so, this can be an important way to identify relatively fire-grained occupational information.

Some occupations have a built-in hierarchy of status that might be difficult to capture using the OCCHISCO codes alone. A “Mine Laborer” and a “Miner” are different things, but both might reasonably belong in OCCHISCO category 71120. The OCSTATUS variable records any known hierarchical information from the occupation field. So while a miner and a mine laborer would both be in the same OCCHISCO category, a “Laborer” would have OCSTATUS 33, where the miner might have an OCSTATUS of 32 (“Worker or works in”), 00 (“None”), or 99 (“Unknown”).

NAPP also includes constructed variables that suggest, in a relative way, the wealth or status associated with an occupation. SEIUS uses the “Duncan Socioeconomic Index,” which was developed in the 1950s, to create an occupational rank that considers income, education, and prestige. These scores are tied to the way those facotrs were perceived in the 1950s – this means that they can be compared across decades, as the scores will always be the same for a given occupation; but it also anachronistically frames prestige in 1950 terms. For example, a “miner” receives a SEIUS score of 10, which is fairly low.⁴ But perhaps mining carried more prestige in 1880? For obvious reasons, there is considerable debate about the usefulness of this measure, but like all of these constructed variables, it may be helpful if used carefully.⁵ The NAPP variable OCSCORUS provides a related measurement, classifying each occupation according to its economic status in 1950. Unlike the SEIUS score, which factored prestige or status of an occupation into its calculations, the OCSCORUS is based only on the earning power of that job classification in 1950. As with SEIUS, there are obvious problems with anachronistically comparing 1880 job types based on what those job types earned 70 years later. However, if used carefully, OCSCORUS, like SEIUS, can put all workers somewhere on a universal scale in order to compare them.

A simple example

Let’s take a look at a simple example and the SQL code needed to calculate it. (Here’s how I set up a SQLite database with NAPP data.)

Where did most mining engineers live?

Let’s begin with a straightforward question: Where did mining engineers live in 1880? We will use the OCCHISCO variable to look for them, noticing that the value “02700” is “Mining engineers.” Let’s group them by state, but notice that NAPP provides a variety of geographic levels that could be used here, from simple measures of urbanity (URBAN), to small divisions such as enumeration district (ENUMDIST) and county (COUNTYUS), up to regional groupings of states (REGIONNA).

This SQL code will produce the table below, using a JOIN to grab each state’s name from the auxiliary table. (Note: I have manually folded the table to take up less vertical space.)

SELECT stateus.desc AS State
, count(*) AS Engineers
FROM data
JOIN stateus ON data.stateus = stateus.id
WHERE occhisco = 2700
GROUP BY State
ORDER BY Engineers DESC
;

State	Engineers	State	Engineers
California	146	Colorado	83
Pennsylvania	66	New York	57
Michigan	24	West Virginia	23
Arizona Territory	22	Illinois	20
Nevada	20	Massachusetts	17
Utah Territory	17	Missouri	14
New Jersey	14	Ohio	10
New Mexico	9	Virginia	8
Idaho Territory	7	Montana Territory	7
North Carolina	7	Iowa	6
Georgia	5	District of Columbia	4
New Hampshire	4	Tennessee	4
Arkansas	3	Connecticut	3
Kentucky	3	Maine	3
South Dakota	3	Alabama	2
Indiana	2	Maryland	2
Rhode Island	2	South Carolina	2
Delaware	1	Nebraska	1
North Dakota	1	Oregon	1
Wyoming Territory	1

Unsurprisingly, many mining engineers were found in the American West, where mining was booming. The strong numbers in Pennsylvania reflect the importance of the anthracite and bituminous coal industry there. But these numbers can also help remind us of the close association of engineering expertise with capital, as in the case of those located in New York, New Jersey, and Massachusetts. Similarly, they might help remind a researcher who focuses on Western mining of the growing importance of coal production in the midwest, in states such as Ohio, Illinois, and Iowa. Unexpectedly high or low numbers can help prompt deeper investigation. For instance, are two mere engineers in Alabama a sign that the state’s major coal industry had yet to reach substantial levels of development, or that the mines were worked without significant engineering oversight?

Caveats about microdata and labor history

As with any data derived from the historical manuscript census, there are sometimes problems with NAPP’s occupational data. Some of these problems arise from the recording, transcription, and coding phases. If the enumerator heard the person incorrectly (or could not spell well), or if a volunteer could not make out the handwriting or assumed the word was a different one, or if the occupation did not clearly fit any one category and a “best guess” had to be made in classification, errors might be introduced. It might be particularly troublesome to imagine that job categories, and their relative status, remained consistent over the decades.

Other issues stem from the nature of the census itself. Most census forms only permitted one occupation to be listed.⁶ Occasionally enumerators tried to squeeze two occupations in the space, such as “MINING & CIVIL ENGINEER.” But most frequently the other work was simply not counted. What if a person was a farmer during the summer months and worked at mining during the winter? Only one occupation could be recorded.

Census workers were supposed to record a person where they were found on a particular day or month, such as June 1880. This specificity could contribute to errors in recording people whose work was seasonal or took them far from home, such as a mining engineer on a summer-long consulting trip in the mountains of the American West.

Similarly, the census takers did a poor job understanding and accounting for the work of women and children. A woman who took in boarders or washing made a tangible contribution to her household’s economic prosperity, but this was often overlooked by enumerators who would frequently simply record “Keeping House.”

Job insecurity is difficult to determine in the NAPP data. It was not well recorded by the census, especially in older censuses, which provide only crude measures of unemployment and no data at all about underemployment. In the 1880 US census, for example, there is a column to mark how many months a person has been unemployed, but this could at best unevenly reflect the cycle of on-again, off-again work that typified many labor categories, such as anthracite miners in Pennsylvania. Compounding the issue, most of the NAPP data sets do not include this information, even if it had originally been recorded on the census. (Perhaps a potentially-sensitive issue such as unemployment had been deemed unnecessary to record by genealogist volunteers who, in some cases, originally compiled the data sets that were further extended by NAPP.)

Conclusion

NAPP microdata derived from the census can offer important information about historic patterns of work and labor. The data is by no means a perfect representation of work activity, and it can contain noteworthy errors. Even so, when used judiciously, this microdata can shed light on important questions about work that were central to life in the past.

Some quick work with the database shows that this rule was hardly observed universally. While sometimes enumerators filled this field with age-appropriate information such as “ATTENDING SCHOOL,” or crossed it out with an “X,” (and typos in the age field may also have occurred), it is clear that children under 10 years of age worked in small numbers in a wide variety of occupations in 1880. ↩︎
As usual, caveats apply and the documentation for each variable must be read carefully. For example, the LABFORCE variable is designed so as to report that people who are listed as having an occupation, but are 15 years old or younger, are automatically reported as not having an occupation. This would make it impossible to use LABFORCE to pursue certain kinds of occupational questions about child labor, for example. ↩︎
One example of inadvertent lumping of dissimilar work in the same category can be found in OCCHISCO value 03030, “Mine surveyors.” In the 1880a US data set, only 15 individuals are placed in this category. With occupations such as “MINE SURVEYOR” (3), “MINING SURVEYOR” (2), and “SURVEYOR AND MINER” (1), some of these appear to be people who work for mining companies conducting underground surveying, which was often done by beginning-level mining engineers. Others in this category, such as “U.S. MIN’L SURVEYOR” (1) and “U.S. DEPUTY MINER SURV.” (1) are quite different. These were experienced land surveyors appointed by the federal government to carefully survey the surface boundaries of any mining claim staked on public land. They would create reports and plat maps, swearing under oath as to their accuracy. US Mineral Surveyors, then, would seem to be a very different type of occupation than a mine surveyor, but because of the need to classify occupations they ended up in the same OCCHISCO category. ↩︎
The average SEIUS value for all members of the labor force in 1880 is 19.70. By way of comparison, mining engineers (OCCHISCO = 02700) have an SEIUS score of 85. ↩︎
See this cautionary note from the IPUMS documentation. Note: NAPP’s SEIUS is called SEI in the IPUMS-USA data and documentation. ↩︎
Among NAPP data sets, Norway is an exception, and allowed census takers to record two occupations. ↩︎