BW Guide2 Library Science Doku Wiki

Assignment 3: Curation (30 points). Working in groups, students will complete a curation project designed to provide a concrete application of best practices covered in the course. Further instructions will be provided.

Curation Project Report The Unbroken Line-An Exploration of Female Cartoonists Since 1896 Kim Cook Jennifer Faus Andre Robinson Brian Whitmer

Emporia State University

Introduction

In 1896, teenager Rose O’Neill published the first known work by a woman cartoonist. Since that time women have been continuously present in the world of comics: from art nouveau beauties of the 1920s to the 1940s Miss Fury—who predated Wonder Woman as the first female superhero to the underground comics movement and the unrestrained world of graphic novels. Today, the number of women cartoonists and comics by women is on the rise. However, to understand what the future holds for women in comics, we must conduct a review of the contributions women have made to cartoons and comics in the past. The Archive-It collection developed by LI-809 Group E presents a chronology that represents the 120 year history of women in this industry. The collection includes biographies, images, websites, and blogs intended to provide patrons with a representative group of women cartoonists and comic artists. Each woman, from Rose O'Neill to Alison Bechdel, is meant to represent her respective time period with the hope of showing how cartoons and comics have been influenced by women over the last century. Theme

The overarching theme for the collection was selected when a common interest in comics was identified among the group members. Using an iterative process, the theme was explored, tested, and refined, evolving from comic books to comic book superheroes to influential female cartoonists like Alison Bechdel and Marie Severin who pioneered the cartoon genre for women cartoonists and fans alike. Crawl results, reviews of previously archived collections, issues and ideas the group believed would be interesting to patrons, and collection cohesion contributed to the final collection theme. The final collection offers a chronology of women’s entry into

previously male-dominated fields such as cartooning, editorial cartooning, and comic book writing and illustration. Women cartoonists in the collection include Rose O’Neil, Marie Severin, Marjane Satrapi, Marguerite Abouet, Ramona Fradon, and Alison Bechdel. The collection represents a substantive contribution to the preservation of the history of the comic/cartoon genre as a whole in that there is no digital repository, museum, or archives dedicated to preserving the history of women in this industry. With the exception of a few print resources and some scattered websites, this a singular effort to develop a comprehensive collection of the history of women and their contributions to the fields of illustration, comics, graphic novels, and the cartooning profession as a whole. Criteria for Inclusion

According to Milligan (2016), “Any preservation effort must begin with an assessment of what content to preserve: archivists refer to this as appraisal, which is related to what librarians call collection development. This process remains inescapable, even in the digital context.” Appraisal for this collection began with the group’s collective knowledge of cartooning and comics. From there, the group discussed and debated the reasoning behind their choices and came to consensus regarding the value and evidence provided by the collection. In selection for inclusion, the group sought to provide a variety of seeds including standard websites, blogs, social media accounts, reference, and image sites. Rather than a simplistic approach of a series of Wiki websites, we sought to include sites that demonstrated the history, skills, and personality of the individual. For example, an in-depth article about Rose O’Neill that included illustrations drawn by the artist and lively storytelling was chosen over a site that presented a linear biography.

It was also important to capture material that would be unique to a digital collection and not overly duplicated by existing print resources. For the museum and organization sites, this included capturing one-time only events, for individual cartoonists this meant finding blogs and other sources of information that featured real-time communication streams, including commentary, as a unique record to help provide background, relevance, and evidence to show the importance of that cartoonist/comic to the collection. These choices were made with the assumption that the projected patron or end-user might be a person who has a learned interest in comics and cartooning. Seeds

Seeds for the collection base were chosen using three criteria:

1. Is the seed lively, interesting, informative?

2. Does the seed powerfully represent the person and the era?

3. Does the seed contribute cohesively to the whole?

Crawls

In the course of conducting test crawls, group members learned that crawls required more time than expected. Initial crawls of one hour to one day returned disappointing results where 3-day crawls were more successful. Crawls blocked by robots were rare and rather than over-ride robots, alternative seeds were selected. Alternative seeds were also selected when the desired results (fully-rendered sites) were not returned. The results returned by the crawls were not so much surprising as they were new. The majority of group members were unfamiliar with Archive-It, so there was a time consuming, and often frustrating, learning curve to the software that made collecting and deciphering data somewhat difficult at first. Archive-It has a tendency

to provide novice users with more DATA than INFORMATION, which can be problematic when new users do not have a comprehensive understanding of the technology. The crawl reports break down numbers of items collected, but they are still contingent on the collector to perform quality assurance and scope/rescope cycle testing. Luckily, some institutions have created their own QA tools to assist new users with the process by performing a “visual review of embedded documents and determine whether they should be in scope for future crawls” (Bragg et al., 2013). Capturing images was another challenge. For example, when the program crawled a Twitter account, text was displayed but many of the images did not materialize. Given that the images were often illustrations by the subject, the group felt that the omission of the images did not present a complete record. However, there were instances where illustrated images did appear to have been archived, such as the ones in the Alison Bechdel site, dykestowatchoutfor.com/. Perhaps the image issue is dependent up on the website settings and type of website, as well as the length of the crawl. Repeated testing over an extended period of time would be required to fully explore and resolve this issue. Metadata proved to be another major challenge during the course of this project. In part this was because the recommended Dublin Core Standard is documented on the dublincore.org website in the most general way possible. This allows for flexibility in implementation but can also be difficult for novice practitioners to interpret and navigate without a full understanding of the metadata set. Novice users who have little knowledge or experience using metadata element sets in general, risk encountering a high degree of uncertainty and hesitation when faced with something like Dublin Core. Dublin Core Metadata Initiative (DCMI) attempts to address this by

use of a Wiki page, which is helpful, but not sufficient to answer many of the practical questions that arise for users new to Archive-It. Physical entry of metadata into Archive-It was relatively easy given the fields provided. It was the content needed to complete those fields that group members found challenging. The group experienced similar issues as those raised by Dooley, Farrell, Kim, & Venlet, (2017) in “Developing Web Archiving Metadata Best Practices to Meet User Needs”. The questions the group raised were, in fact, almost identical to those raised in the article. For example: “Which types of content are most important to include in a metadata record that describes an archived website or a group of sites?” What is the meaning of:

“Website creator/owner: Is this the publisher? Creator? Subject? All three? Title: Should it be transcribed verbatim from the head of the site? Or edited to clarify the nature/scope of the site? Should acronyms be spelled out? Dates: Which dates are both important and feasible to record? Beginning/end of the site's existence? Date(s) of capture by a repository? Content? Format: Is it important that the description clearly states that the resource is a website? If so, how best to do this? In the title? Extent? Description?” (p. 11). Group members also agreed with Millar (2017), in her discussion regarding controlling language, in which she states, “Regardless of which approach the archivist takes to arrangement and description, the language used to present information in the description needs to be as

consistent as possible” (p. 225). As a whole, we assert that metadata language should be consistent. In Archive-It, it is particularly important that seed metadata language be consistent if one is concerned about the look and feel of the user-facing content. There was another issue with Archive-It that the group did not discover until a little later in the crawling process. As members started to pull seeds together into a complete collection, we found collaboration was rather difficult. In theory, Archive-It has the ability to allow different collaborators to add seeds to existing collections. However, when one of the members of the group tested this, the system did not allow them to add their data to the collection. Despite numerous visits to the “Help” section of the site it remained unclear what caused this error, but it made it very difficult for this group member to add their content to the collaboration. This was problematic as Archive-It has no explanation for such an error or how to troubleshoot and correct the issue when it arises. A solution to this problem might be for Archive-It to employ an on-demand technician who can actively troubleshoot these kinds of issues in real time so that collaborators do not lose their own time trying to solve a problem that is beyond their skill set. Archive-It The prospective uses of Archive-It for site comparison, website redesign, and big data analysis are encouraging. It appears that implementing Archive-It to mine data for website redesign will be a powerful use of the tool. However, Archive-It is probably most useful for institutions that have a specific interest in documenting, storing, and curating information found on the internet. For instance, it is hard to imagine the Special Collections and Archives Department at Wichita State University having much interest in using Archive-It when its main interest lies in the collection, preservation, and curation of physical collections in their care.

Archive-It offers the most value for capturing digital born and digitized information which now includes many government, corporate, and institutional sites as well as the individual and blog sites that have replaced the paper based diary as documentary evidence of creative processes, responses to current events, and moment-to-moment ruminations. Archive-It: Problems/Solutions

Overall, the available training for Archive-It was lacking in theory and content. It skipped substantive portions of the project, including how to engage in crawl analysis and how to apply meaning to the data returned. Also missing were the functions required to move beyond the crawl to create the final product including collection and seed level metadata and user-facing, or “public” content. The multiple tab-views was confusing and several times resulted in data-entry replication.

For example, when entering metadata it can take time for the text to populate on the

user-facing page. As a result, initial entries were duplicated until the group discovered this issue. In other words, seed metadata was entered from the crawl tab and then also from the seed tab (see visual representation above) because the seed metadata entered under the crawl tab did not immediately populate causing the user to think entry had been made under the incorrect tab. Ten minutes later there was duplicate information. After going to the “Help” button, members learned that metadata can take up to 15 minutes to populate. This was another issue related to collaboration as well. We did not realize at first that it was necessary to send out an announcement when editing metadata to prevent that work from being overwritten by another

user. Even though the work was delegated among the group members, as we learned and tested the software these issues were identified. Further, it was frustrating that the “Help” screenshots were images of earlier releases and did not always reflect the screens we were looking at. Given the importance of metadata in effective searching, this can be a significant shortcoming for users who are not aware of these issues. One solution that might help to correct these issues would a more clear and cohesive training section that included definitions of current terminology such as “crawl,” “seeds,” and “collection scope.” The lack of clear terminology made it very difficult to fully understand and utilize the concepts we were required to work with. Another solution would be to have a training program that walks users through a basic crawl/test crawl and explains each step as it is happening. This real-time training would have been particularly useful in explaining why each step was necessary in the process. Finally, Archive-It could make an attempt to make the system less “Computer Science” heavy by developing terminology and systems that are more generalized and interdisciplinary. Many people who use Archive-It are not overly familiar with the basics of Computer Science, but there are certain elements of the system (crawls, wayback, and CS heavy metadata) that require at least a basic knowledge of Computer Science terminology and theory. This is problematic for a system that advertises itself as a “user friendly web application,” but it is really anything but user friendly. Revamping the site to legitimately be user friendly, even for those with little to no background in computing, would be very useful. While the Archive-It videos and training materials covered each step in the crawl cycle, there was very little material that covered the rules of thumb that are the narrative of real-world usage, such as helpful hints on how long a process should take, how best to approach the creation

of metadata, etc. Their documentation would be much improved by offering a scenario based example which explains a best practice and then walks through each step and the results so novice users could compare/contrast what they did against the model of the test scenario. Providing downloadable forms for best practices to help manage crawls would also be helpful. Creating a teaching curriculum based on the Web Archiving Life Cycle Model (WALCM), which was itself developed from feedback and real world usage reports from various Archive-It user institutions (Bragg et al., 2013), would help spread best practices that can be scaled to meet the requirements of different types and sizes of institutions. The graphical depiction of the WALCM could help guide novice users to develop policy practices that are most congruent with their institution and archiving goals. Future Management

Managing this collection going forward, we would suggest including more pioneers of cartooning, adding new publications that cover this topic, and monitoring on a quarterly basis the types of information being captured, since the technologies of the web are constantly changing and evolving in response to new conditions. For example, will a new cyber threat eliminate access to a key website because of technology change? We would also implement an ongoing transfer of the most promising digital material to a separate digital repository in case the use of Archive-It is curtailed by future management or other constraints. Another precaution that future archivists for this collection may want to keep in mind is the idea of backing up information. Archive-It appears to have the ability to both import and export certain pieces of information such as pdf files for metadata and html files that can be stored in a backup system such as an external hard drive or a cloud. This type of backup system

means that if anything were to happen to the Archive-It collection (i.e. the site experiences a crash that deletes information or the original sites are lost and can no longer be used as backups should the archived copies fail), there would at least be enough information available to help restore the lost data in some fashion. Issues of Preserving and Providing Access to Records

Each age has seen the primary medium for inscribing its profound thoughts and mundane activities change—from stone, to wood, to animal skins, to papyrus, to paper, and now electronic storage. Many of the records for historians, journalists, and others will only be available in this format. Unless material is printed off, stored, and preserved, the bulk of this digital material, and the opportunities they afford will be lost. As the volume of data created increases exponentially, digital processing methods and tools are the only reasonable hope of archivists, historians, and data scientists to be able to sort, sift, and use this material. That will also mean much of the existing paper-based record will be digitized and available in this stream as well. The application of artificial intelligence and machine learning technologies will help keep professionals from drowning in the data but only if they have the experience, skills, and insights to build new tools and approaches for the digital future. Now is the time to help shape the future of Archive-It and other project software and apps by increasing their ease of use and ease of analysis. It is also necessary that digital collections have proper legal, institutional, and cultural support. Serendipity is a horrible default strategy for preservation of the past.

Conclusion

Archive-It provides an important tool for modern times. Many sites from the early days of the web have been lost for various reasons, from intentional deletions to unintentional server

problems. Most content creators on the web treat their work as ephemeral or else assume that everything on the Internet is eternal, so there is a real need to ensure that important content is saved in some form. Services like Archive-It help archivists and researchers preserve webpages that can add to our understanding of contemporary culture and to our knowledge base. As more records and content become exclusive to the web, this need will only grow, and it is important that technologies such as Archive-It continue to strive to create an experience that is user-friendly and streamlined for archivists to meet this growing need.

References

Bragg, M., Hanna, K., Donovan, L., Hukill, G., & Peterson, A. (2013). The web archiving life cycle model. [white paper] Retrieved from Archive-It http://ait.blog.archive.org/files/2014/04/archiveit_life_cycle_model.pdf Dooley, J. M., Farrell, K. S., Kim, T., & Venlet, J. (2017). Developing web archiving metadata best practices to meet user needs. Journal of Western Archives. 8(2), 1-14. Retrieved from: http://digitalcommons.usu.edu/westernarchives/vol8/iss2/5 Milligan, Ian, Ruest, Nick, and Lin, Jimmy. (2016). Content selection and curation for web archiving: The gatekeepers vs. the masses. JCDL '16 Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries. Retrieved from http://dx.doi.org/10.1145/2910896.2910913 Millar, L. (2017). Archives: Principles and practices (2nd ed.). New York, NY: ALA Neal Schuman.