Omeka, Collecting, & Crowdsourcing – Center for Public History + Digital Humanities at Cleveland State University

At CPHDH, we use Omeka, and here are two projects that make use of Omeka, both with nice (though largely out of the box) designs that demonstrate how Omeka can be useful in collaboratively identifying and mapping a city (Sao Tome) and collecting (Bracero): Sao Tome Map Project; Bracero History Archive. Both are engaging projects that depend on crowdsourcing methodologies, which are powerful in theory and even practice (Wikipedia remains the most robust example, if you ask me.)

Bracero, which has won several awards, including a National Council on Public History Award, is designed as a collecting tool for stories of the Bracero Program, which brought millions of Mexican guest workers to the United States and ended in the 1960s. Importantly, the site is meant to collect “oral histories and artifacts” from this era. Clearly, the project is succeeding in collecting materials, over 3000 to date, from the community, which is remarkable, given both the date of the program from the 1940s through the 1960s and the difficulty of collecting materials from underserved populations.

Even so, the Bracero Project reveals much to us about the challenges facing digital historians. First, among them is the challenge of collecting materials from the community. Artifacts can be added by anyone and the project is clear to distinguish between contributions made by users and those items that were “curated by a project historian.” In practice, this makes great sense, as it allows users to know something about the quality of the data (especially metadata) being collected. Yet, it also undercuts the very notion of collaboration, raising questions, albeit subtly, about the veracity and/or verifiability of the materials being collection. If I am not fond of this distinction, it seems to me that it is almost unavoidable. This problem–of how to include contributions from the community broadly into a serious scholarly question is not an easy one to address. Indeed, our professionalism as historians, librarians, and digital humanists, depends in some degree on the quality of the materials–and here it is not just the stuff being collected but the metadata being collected, archived, and presented. Confronting this aspects of collaborative collecting and community collaboration will remain a challenge going forward.

Perusing the site also reveals just how little metadata is being added with the items, which is troubling both from a digital humanities and a research perspective. The contributors are not adding much (underscoring the need for the distinction mentioned above.) This, however, is a different problem than that of “quality” of metadata. It is about the presence of metadata altogether, which makes collections searchable and sensible. Once a collection grows, the implications of collecting items that do not posses mestadata is the same as having bad metadata: the archival collections become a black hole from which nothing, or little, escapes. In this sense, digital collecting mimics the problems of traditional collecting in archives. We collect volumes of materials that disappears under its own weight.

The Bracero Project speaks to an emerging critical challenge in digital humanities, specifically how to make sense of oral history in digital collections. Putting aside the fragments and reminiscences added by contributors, the oral histories being collected appear to be standard hour-long oral history interviews. The collection consists of a .wav file of the entire interview, with a brief description of the interview in the brief view and the longer transcript in the full view. This is standard archival presentation of oral history in paper archives or some digital archives, whether it is in a ContentDM system or some other more standard library format. Additionally, of the dozen or so interviews I perused, I did not find any tags (see above on metadata) or segmenting of the interviews. If this does not make full use of the capabilities of Omeka, it also does not live up to standard library cataloging that at least gives interviews some search terms (even if they are Library of Congress terms) or what is an increasingly common practice of offering excerpts of the interview within the body of the record or as a separate record (the Veteran’s Oral History Project has done this with subsets of its interviews.)

If the emergence of digital tools has reinvigorated the aurality of oral history, the power of the voice and its import to the field, as well as its availability through digital resources, we digital historians have not yet fully caught up with the tools for dealing with this revolution. We remain subject to the text of the interview (both for searching and for exhibiting purposes.) We continue to struggle, as the field itself does, with how to categorize interviews and makes sense of oral history across interviews and across collections. These are not inconsequential intellectual problems, and we should be confronting the intellectual problems of a field as we break into the digital age. Merely replicating the the challenges faced by archivists of oral history in a digital domain may represent a digital advance, but it does not necessarily confront the core problem.

Finally, I would suggest that crowdsourcing might not be the panacea that it is sometimes made out to be within our digital humanities community. Crowdsourcing is democratizing. That is great as an ideal and as a practice, but it demands on a crowd, which suggests a large audience. The question is how large? From my perspective, crowdsourcing–in terms of collecting and collaborating–remains in question with smaller and largely non-digital audiences. At a macro level, for major national institutions, such as the Smithsonian, crowdsourcing may offer a different scale of benefits than it does for smaller institutions. Either way, both the Sao Tome Project (which I have not spoken abou t at all but will in a subsequent post) and the Bracero Project are pushing the boundaries of crowdsourcing as a methodology, as well as how to build robust and insightful archival collections.

I am eagerly watching them as we continue to review and experiment with many of these same issues in our work in Cleveland. With Teaching & Learning Cleveland we have confronted these same problems of metadata and collaboration as well. And, as we collect oral histories (over 500 to date) we continue to stuggle with their representation, opting in the short term for developing interpretive segments and struggling with how to represent those (along with the main interview) in digitally engaging formats.

It is exciting to see these and other projects begin to reveal best practices and confront major issues in digital history head on, in practice not merely in theory.