Exploring work with NAPP microdata
Census microdata, such as that produced by the NAPP project, can help illuminate interesting issues surrounding work and labor.
Investigating the history of work using microdata must begin with the questions asked by the census enumerators about employment. These varied depending on the country and year, but were generally quite simple. For example, the U.S. 1880 census form had two questions pertaining to work, and a third that hinted toward labor as well. Question 13 recorded the “Profession, Occupation, or Trade of each person, male or female,” and Question 14 asked the “Number of months this person has been unemployed during the Census year.” These two questions were not supposed to be asked of any individual “under 10 years of age,” according to the instructions.1 Question 15 also implied work, asking if the person had been sick or disabled, “so as to be unable to attend to ordinary business or duties.” The recorded occupation, Question 13 in the case of the US 1880 census, was written free-form in the blank.
Transcribing the occupation
In the NAPP dataset, the occupation as it was transcribed by
volunteers is found in the field OCCSTRNG
. The small space on the
form and the frequently creative spelling and abbreviating style of
the enumerators can lead to some unreliable results if taken alone.
Let’s look at mining engineers in the 1880 US microdata as produced by
NAPP. We have 417 entries where OCCSTRNG
is MINING ENGINEER
.
Good! But we also have other variations, such as:
MINING ENGINEAR | MINING ENGERNEER | MINING ENG. |
MINGING ENGINEER | MG. ENGINEER | ENGINEER MINING |
MINING ENGR | MINING EXPERT | MININING ENGINEER |
Some of these might be small variations on the overall category, worth
noting and investigating. (What’s the difference, in 1880, between a
Mining Expert
and a Mining Engineer
?) But many are clearly mining
engineers, just spelled differently.
While this variable by itself is valuable, it can be difficult even for simple operations – say, counting the number of mining engineers in a particular state – because of the spelling differences. To address this limitation, the NAPP team created many additional variables to allow researchers to compare occupational information more broadly.
Constructing variables about work
From these tiny bits of inconsistent information about each person’s occupation, NAPP adds tremendous value by creating new variables derived from this information. These are “constructed variables.”
As you can see in the full list, not all variables are available for each sample, in part because the census questions asked about labor could vary depending on country and year. Additionally, some of the variables have essentially similar information, but NAPP offers a variety in order to make it easier to connect NAPP data with other data sets.
Some of these constructed variables are simple and intended to help
with other comparisons. For example, the LABFORCE
variable
simply records if a person participated in the work force or not (or
if it was impossible to tell).2 By itself, this might not
seem very useful, but it could be helpful in conjunction with other
variables. For example, you might want to compare people in one
occupational sub-group – farmers, for example – with all people who
were a part of the labor force rather than the public as a whole.
Other constructed variables group together workers by their
occupation. This helps solve the sort of problem we faced above with
misspellings of “mining engineer.” OCCHISCO
and
OCC50US
are two of these variables. Each uses a
mid-twentieth century list of occupations as a starting point, with
adaptations to better represent historical occupations. It is
important to remember that not all occupations would have fit neatly
into one of these later occupational categories, and conversely,
sometimes very different occupations get lumped together
inadvertently.3 Even so, this can be an important way to
identify relatively fire-grained occupational information.
Some occupations have a built-in hierarchy of status that might be
difficult to capture using the OCCHISCO
codes alone. A “Mine
Laborer” and a “Miner” are different things, but both might reasonably
belong in OCCHISCO
category 71120. The OCSTATUS
variable records
any known hierarchical information from the occupation field. So
while a miner and a mine laborer would both be in the same OCCHISCO
category, a “Laborer” would have OCSTATUS
33, where the
miner might have an OCSTATUS
of 32 (“Worker or works in”), 00
(“None”), or 99 (“Unknown”).
NAPP also includes constructed variables that suggest, in a relative
way, the wealth or status associated with an occupation.
SEIUS
uses the “Duncan Socioeconomic Index,” which was
developed in the 1950s, to create an occupational rank that considers
income, education, and prestige. These scores are tied to the way
those facotrs were perceived in the 1950s – this means that they can
be compared across decades, as the scores will always be the same for
a given occupation; but it also anachronistically frames prestige in
1950 terms. For example, a “miner” receives a SEIUS
score of 10,
which is fairly low.4 But perhaps mining carried more prestige
in 1880? For obvious reasons, there is considerable debate about the
usefulness of this measure, but like all of these constructed
variables, it may be helpful if used carefully.5 The NAPP
variable OCSCORUS
provides a related measurement, classifying each
occupation according to its economic status in 1950. Unlike the
SEIUS
score, which factored prestige or status of an occupation into
its calculations, the OCSCORUS
is based only on the earning power of
that job classification in 1950. As with SEIUS
, there are obvious
problems with anachronistically comparing 1880 job types based on what
those job types earned 70 years later. However, if used carefully,
OCSCORUS
, like SEIUS
, can put all workers somewhere on a universal
scale in order to compare them.
A simple example
Let’s take a look at a simple example and the SQL code needed to calculate it. (Here’s how I set up a SQLite database with NAPP data.)
Where did most mining engineers live?
Let’s begin with a straightforward question: Where did mining engineers live in
1880? We will use the OCCHISCO
variable to look for them, noticing
that the value “02700” is “Mining engineers.” Let’s group them by
state, but notice that NAPP provides a variety of geographic levels that could be used here, from simple measures of urbanity (URBAN
), to small divisions such as enumeration district (ENUMDIST
) and county (COUNTYUS
), up to regional groupings of states (REGIONNA
).
This SQL code will produce the table below, using a JOIN to grab each state’s name from the auxiliary table. (Note: I have manually folded the table to take up less vertical space.)
SELECT stateus.desc AS State
, count(*) AS Engineers
FROM data
JOIN stateus ON data.stateus = stateus.id
WHERE occhisco = 2700
GROUP BY State
ORDER BY Engineers DESC
;
State | Engineers | State | Engineers |
---|---|---|---|
California | 146 | Colorado | 83 |
Pennsylvania | 66 | New York | 57 |
Michigan | 24 | West Virginia | 23 |
Arizona Territory | 22 | Illinois | 20 |
Nevada | 20 | Massachusetts | 17 |
Utah Territory | 17 | Missouri | 14 |
New Jersey | 14 | Ohio | 10 |
New Mexico | 9 | Virginia | 8 |
Idaho Territory | 7 | Montana Territory | 7 |
North Carolina | 7 | Iowa | 6 |
Georgia | 5 | District of Columbia | 4 |
New Hampshire | 4 | Tennessee | 4 |
Arkansas | 3 | Connecticut | 3 |
Kentucky | 3 | Maine | 3 |
South Dakota | 3 | Alabama | 2 |
Indiana | 2 | Maryland | 2 |
Rhode Island | 2 | South Carolina | 2 |
Delaware | 1 | Nebraska | 1 |
North Dakota | 1 | Oregon | 1 |
Wyoming Territory | 1 |
Unsurprisingly, many mining engineers were found in the American West, where mining was booming. The strong numbers in Pennsylvania reflect the importance of the anthracite and bituminous coal industry there. But these numbers can also help remind us of the close association of engineering expertise with capital, as in the case of those located in New York, New Jersey, and Massachusetts. Similarly, they might help remind a researcher who focuses on Western mining of the growing importance of coal production in the midwest, in states such as Ohio, Illinois, and Iowa. Unexpectedly high or low numbers can help prompt deeper investigation. For instance, are two mere engineers in Alabama a sign that the state’s major coal industry had yet to reach substantial levels of development, or that the mines were worked without significant engineering oversight?
Caveats about microdata and labor history
As with any data derived from the historical manuscript census, there are sometimes problems with NAPP’s occupational data. Some of these problems arise from the recording, transcription, and coding phases. If the enumerator heard the person incorrectly (or could not spell well), or if a volunteer could not make out the handwriting or assumed the word was a different one, or if the occupation did not clearly fit any one category and a “best guess” had to be made in classification, errors might be introduced. It might be particularly troublesome to imagine that job categories, and their relative status, remained consistent over the decades.
Other issues stem from the nature of the census itself. Most census forms only permitted one occupation to be listed.6 Occasionally enumerators tried to squeeze two occupations in the space, such as “MINING & CIVIL ENGINEER.” But most frequently the other work was simply not counted. What if a person was a farmer during the summer months and worked at mining during the winter? Only one occupation could be recorded.
Census workers were supposed to record a person where they were found on a particular day or month, such as June 1880. This specificity could contribute to errors in recording people whose work was seasonal or took them far from home, such as a mining engineer on a summer-long consulting trip in the mountains of the American West.
Similarly, the census takers did a poor job understanding and accounting for the work of women and children. A woman who took in boarders or washing made a tangible contribution to her household’s economic prosperity, but this was often overlooked by enumerators who would frequently simply record “Keeping House.”
Job insecurity is difficult to determine in the NAPP data. It was not well recorded by the census, especially in older censuses, which provide only crude measures of unemployment and no data at all about underemployment. In the 1880 US census, for example, there is a column to mark how many months a person has been unemployed, but this could at best unevenly reflect the cycle of on-again, off-again work that typified many labor categories, such as anthracite miners in Pennsylvania. Compounding the issue, most of the NAPP data sets do not include this information, even if it had originally been recorded on the census. (Perhaps a potentially-sensitive issue such as unemployment had been deemed unnecessary to record by genealogist volunteers who, in some cases, originally compiled the data sets that were further extended by NAPP.)
Conclusion
NAPP microdata derived from the census can offer important information about historic patterns of work and labor. The data is by no means a perfect representation of work activity, and it can contain noteworthy errors. Even so, when used judiciously, this microdata can shed light on important questions about work that were central to life in the past.
-
Some quick work with the database shows that this rule was hardly observed universally. While sometimes enumerators filled this field with age-appropriate information such as “ATTENDING SCHOOL,” or crossed it out with an “X,” (and typos in the age field may also have occurred), it is clear that children under 10 years of age worked in small numbers in a wide variety of occupations in 1880. ↩︎
-
As usual, caveats apply and the documentation for each variable must be read carefully. For example, the
LABFORCE
variable is designed so as to report that people who are listed as having an occupation, but are 15 years old or younger, are automatically reported as not having an occupation. This would make it impossible to useLABFORCE
to pursue certain kinds of occupational questions about child labor, for example. ↩︎ -
One example of inadvertent lumping of dissimilar work in the same category can be found in
OCCHISCO
value 03030, “Mine surveyors.” In the 1880a US data set, only 15 individuals are placed in this category. With occupations such as “MINE SURVEYOR” (3), “MINING SURVEYOR” (2), and “SURVEYOR AND MINER” (1), some of these appear to be people who work for mining companies conducting underground surveying, which was often done by beginning-level mining engineers. Others in this category, such as “U.S. MIN’L SURVEYOR” (1) and “U.S. DEPUTY MINER SURV.” (1) are quite different. These were experienced land surveyors appointed by the federal government to carefully survey the surface boundaries of any mining claim staked on public land. They would create reports and plat maps, swearing under oath as to their accuracy. US Mineral Surveyors, then, would seem to be a very different type of occupation than a mine surveyor, but because of the need to classify occupations they ended up in the sameOCCHISCO
category. ↩︎ -
The average
SEIUS
value for all members of the labor force in 1880 is 19.70. By way of comparison, mining engineers (OCCHISCO
= 02700) have anSEIUS
score of 85. ↩︎ -
See this cautionary note from the IPUMS documentation. Note: NAPP’s
SEIUS
is calledSEI
in the IPUMS-USA data and documentation. ↩︎ -
Among NAPP data sets, Norway is an exception, and allowed census takers to record two occupations. ↩︎