Table of Contents
Catalog
Sample Table | b | c | d | e |
---|---|---|---|---|
a | b | c | d | e |
a | b | c | d | e |
a | b | c | d | e |
#| xy | catalog_access |
A| xy | catalog_AACR | catalog_arrangement | catalog_authorship |
B| xy | catalog_book |
C| xy | catalog_card | catalog_collection |
D| xy | catalog_description | catalog_descriptive |
E| xy |
F| xy | catalog_FRBR |
G| xy |
H| xy |
I| xy | catalog_intellectual access | catalog_interoperability |
J| xy |
K| xy |
L| xy | catalog_library |
M| xy | catalog_main entry | catalog_MARC |
N| xy |
O| xy | catalog_online |
P| xy | catalog_precision |
Q| xy |
R| xy | catalog_RDA | catalog_recall | catalog_rules |
S| xy | catalog_scope | catalog_standards | catalog_subject |catalog_syntax |
T| xy |
U| xy |
V| xy |
W| xy |
X| xy |
Y| xy |
Z| xy |
Page 1 of 22
ZOOM
Hi everyone. In this lecture, we will begin to discuss oneof the most fundamental organizational tools in existence, the library catalog. We will try to understand what a library catalog is and what it is designed to do. Think of this exercise as trying to get behind a catalog, to see it from the inside out, to see its inner workings. I’d like you to come away from this lecture thinking about catalogs differently than perhaps you already do.1 We’ll begin by thinking through the basic concept ofa library catalog. A catalog is an index, or a set of indices. I’ll explain what that means. In explaining what a catalog is, we’ll also define the basic objectives of a catalog. The penultimate objective of any information organization tool is intellectual access to some collection. In talking about what a catalog is, we’ll talk about what intellectual access is. We’ll also talk about more specific and defined objectives of the library catalog, especially those put forth by Cutter in 1876, since Cutter’s objects of a catalog are an important keystone in the field. After we discuss the basic concepts and objectives of the catalog, we’ll think with Wilson, following his article “catalog as access mechanism,” by reflecting on the fundamental problems facing any catalog. I call these the problem of scope, the problem of description, and the problem of arrangement. This brings us to where we can begin to talk meaningfully about how catalogs today actually work.2 Sowhat is a library catalog?3 A catalog is an index or a set of indices that enable intellectual access to a collection. It’s somewhat redundant to say it this way because there is no index that doesn’t provide intellectual access to a collection in some way. But this definition reveals the different elements that must be discussed further: indices, intellectual access, and collection.4 An indexis something that points, something that provides direction, like a map. Think of your index finger: youuse it to point “over there” or wherever,depending on where some interlocutor tells you they want to go. Someonetells you where they want to go—their desired destination—you, like an index, point them there.5 Just like you pointing your finger, anindex provides directions for a searcher, directing the searcher to some final address. At this address is where they may find what it is they are searching for. The final address is somewhere new, some place in the collection that may be UNknownto the searcher. Think of an index as a linkingmechanism: it connectsa searcher to a destination in a collection, some final address there. This feature of an index implies that an index is not of much help to someone who doesn’t know anything about what they are looking for. To use an index, the searcher has to know something‐‐there has to be some desired destination, some place they’d like to go.6 How does an index connect a searcher from to the destination? First, there is some list of possible destinations. These destinations are described somehow in words. These words are called the headings. They are the index’s access points, the places that lead to somewhere in the collection. The other feature of the index is the pointers, which are the addresses of the destinations. To use the giving directions metaphor again, someone may know they want to go to the post office, or the bank, or the supermarket, but at the same time they may not know where these places are. Like an index, you tell them where they are located. 7 We should also mention that, besides connecting or linking “travelers” to their “destinations,” there is a second use of an index besides just directing from there to there. That use is this: An index can also serve as a list of all the significant topics or authors or titles housed in a collection, in other words, an index is the list of all the potential places a searcher can go. A searcher can browse an index to determine whether that collection has anywhere they want to travel. If a desired destination is absent from an index, the searcher knows to try a different collection.8 Why are indices useful? For the same reason maps are. If someone has the time and patience, of course they can wander around without knowing where they are going, and eventually they may find what they are looking for. But if a searcher uses a map, they can just go to their destination directly. Likewise, an index is designed to save time and substitute for the inherently limited knowledge and memory of searchers. For many collections, it is impossible to browse them fully or understand or remember everything that’s in them.9 We’ve saidthat a library catalog is an index or a set of indices that provide intellectual access to a collection. We also said that a catalog is a linking mechanism: it is a navigational tool that directs a searcher from something known to something unknown. The overarching purpose of an index is intellectual access to a collection through its list of headings. It succeeds in this purpose by supplying a sufficient set of starting places, and by accurately directing a searcher from the starting places to relevantdestinations. But there are some more specific objectives for a library catalog. These were set forth by Cutter in his “rules for a dictionary catalog.” These three objectives, or what he calls objects, are important to know because they form the basis of library catalogs today.10 The first objectis “to enable a person to find a book of which either the author, title, or subject is known.” This rule is significant because it establishes a set of three indices for a catalog: an author index, a title index, and a subject index. Any library catalog, therefore, will contain a list of author headings, a list of title headings, and a list of subject headings. 11 The next object is “to show what the library has by a given author, on a given subject, and in a given kind of literature.” This object is significant for two reasons. First, itestablishes a structure or order to the author and subject lists. To show what a library has about a given author or subject, all entries for any particular author or subject have to be located together, or collocated. Second, the rule establishes distinctions between different kinds of literature, such as different languages or forms. By forms, Cutter may mean format. Format is the size of a book as measured by its pages. When books were published, multiple pages were printed on large sheets of paper, front and back. Then the large sheets were folded and cut to form the pages. So by form, Cutter could mean folio size, which is one fold, quarto size, or two folds, or octavo size, which is three folds.12 Finally, the third object is “to assist in the choice of a book as to its edition (bibliographically) or as to its character(literary or topical).” This is important because a catalog must somehow, within its author, title, and subject indices, also distinguish between different editions of a book and between fiction and non‐fiction. A catalog must offer a searcher choices, if there are some available. How a catalog did this then and does it now is important to keep in mind.13 Now that we’ve basically establishedwhat a catalog is and what it should do, it’s necessary to bring attention to the problems that any catalog must confront. Wilson touches on these problems in his article “catalog as access mechanism.” 14 The first problem is the problem of scope. The problem of scope is actually a set of severalproblems. The first is the problem of what final destinations an index should include. In other words, what should an index cover? Another sub‐problem is, what is the definition or nature of the collection an index describes? What counts as the collection? What is included there? These problems also presuppose another, more metaphysical question about the nature of a collection. What is a collection a collection of? This will be a recurring issue in this course, for instance when we examine FRBR and RDA and the semantic web. Is it a collection of books? If so, are we just talking about physical or virtual items? Or is a collection a collection of works? Are we talking about bibliographic units or literary units, as Verona asks? Or is a collection a collection of data? These kinds of questions matter because we have to imagine how searchers conceptualize a collection, what they are really searching for and what kinds of access points are most appropriate. So all these questions fall under the question of scope.15 As soon as we answer any of the questions associatedwith the problem of scope, we encounter problems of description. Problems of description are problems like, well, supposing we want to provide access to authors in the collection, how do we go about creating author headings? What rules do we follow? Do we use this language or that one? Do we use real names or pseudonyms? What about authors that are organizations or agencies? Authorship is an especially difficult means of entry because names are so complicated. In Panizzi’s91 rules, for example, how many of them pertain to author entries? I count at least 46. That means rules for describing author names constitute more than 50% of his rules. Rules for authorship can become very complex, and subject cataloging is no less difficult. How is it possible to say what something is about? How does someone determine that?The main idea here is that for an index to work successfully, its access points have to be created in a logical, consistent, and non‐arbitrary way. Otherwise, searchers will become lost or will not find what they are looking for. But is this an impossible task, to describe collections well? So this is the problem of description.16 The third problem is theproblem of arrangement. The problem of arrangement raises questions about the order of the headings in the headings list or lists, on the one hand, and the physical or virtual addresses of things in the collection, on the other hand. Designers of catalogs can, at least to some extent, control both the access points and the physical or virtual arrangement of the collection. In terms of the headings list or lists, if there are multiple lists, should they be combined into one big list? Or should there be separate lists maintained? For instance, should the headings lists for author, title, and subject be combined into one big index, as in the case of a dictionary catalog, or should there be separate lists for each set of headings? A single, combined catalog is usually preferred because questions of authorship may lead to questions of title, and so on, so having the headings all together allows for this connectedness. We take order for granted, because most lists we use are alphabetized, and after alphabetization, they are ordered chronologically. Indices like library catalogs therefore rely on features of language, something built in to Anglo‐American culture. But on the other end of the index, the destination addresses, we might still ask, how should things be ordered in the collection? What are the addresses of things? How should the collection be arranged? So these are all considerations that fall under the problem of arrangement.17 I hope you wereable to understand the lecture and think differently about what catalogs are. How we are to solve the problems of scope, description, and arrangement, are central questions that we will discuss over the course of this class. Just to summarize what we discussed in this lecture:18 A catalog is anindex or a set of indices that provide intellectual access to a collection.19 The main objectiveof a catalog is a linking function: it uses its headings to direct searchers to a desired location in a collection20 Cutter’s 3 objects are the foundation of catalogs today.21 The three problemsfacing any catalog are the problem of scope, the problem of description, and the problem of arrangement. The problem of scope is, what should an index cover? The problem of description is, how should the headings of an index be constructed? And the problem of arrangement is, how should the headings of an index and the collection it links to be ordered?This concludes the lecture, thank you.22
Welcome back everyone. In this lecture,we’ll briefly examine how the technologies, the infrastructure of library catalogs developed from the mid‐19thcentury to the present. This history is useful to know because in knowing the form that catalogs once took, how they were built, it’s easier to understand why catalogs are the way that they are today and how the technologies have affected the means of access.1
There are basically three main forms that library catalogs have taken in the last 150 or so years. These are the bookcatalog, the card catalog, and the online catalog. We’ll look at these one by one. You might say, well, what about microform or what about CD‐ROMs? I don’t believe there are significant differences between these and, say, book catalogs or card catalogs—they are just displayed differently.2 The firstform of a catalog was the book catalog.3 A book catalog is basically just that—it’s a list of all the items in a collection, arranged alphabetically by author and title, where the entries are hand‐written or typed on pages.4 Those leaves are then bound together in monograph form or they may be loosely bound. A book catalog is an inventory list of what a library has. 5 You’ve already seen an example of a book catalog, and that is the Catalogue of printed books in the British Museum. Thatparticular catalog is an example of what’s called a dictionary catalog because the author index and title indexed are combined to form one large alphabetically‐ordered master index, complete with cross‐references.6 Book catalogs are swell because they are small, they don’t take up much space, and books are pretty durable things.7 What’s of course not so great about book catalogs is that they don’t work well with large, dynamic collections. Let’s say your collection is expanding or shrinking, it’s difficult to alter the entries in the book catalog without having to re‐create the entire book. You could leave extra, blank leaves interspersed to anticipate changes, but that’s going to get messy. So book catalogs are only good for relatively small, static collections.8 The next innovation in librarycatalog technology was the card catalog. Card catalogs became prevalent in libraries in the latter half of the 19thcentury. In a card catalog, each entry of the index is written on a small slip, what are known as index cards. Once again, the entries could be hand‐written or typed.9 The cards are filed alphabetically in drawers, usually in the dictionary style where author index, title index, and subject index are included together. Any itemmight be listed on 3 or more cards, one for each index.10 But only one card might be what’s called the mainentry, the card with all the bibliographic information on it. The main entry card might have not only the author, title, and subject, but also place of publication, maybe a short abstract or summary of the work, maybe some kind of notes about the item. Because remember, for Cutter, the library catalog should help searchers decide on the most appropriate item. The more information on the main entry card, the more complete it is, the better the catalog can do that.11 Just so we’re clear onwhat a card catalog looks like, here’s a creative commons picture of the card catalog at Columbia University’s Butler library. I travelled there for a tour maybe 5 or 6 years ago, and these drawers were still there. I’m old enough to remember the card catalogs in my middle school and high school libraries, and the public library, but most card catalogs have been removed by now to make room in library buildings for other things.12 Getting back to the discussion, the advantage of a card catalog is, it’s dynamic. If the collection changes, newentries can be added easily, old entries can be removed. Another advantage we have to keep in mind was, at the time card catalogs became popular, at the turn of the 20thcentury, bibliographic agencies like the Library of Congress began creating card sets for books. This meant libraries could order these unit card sets—they didn’t have to create them themselves. This saved time, labor, the library knew the entries were done correctly, and they knew their entries were the same as the library’s in the town over. So the card catalog enabled a national, centralized, and standardized cataloging system to develop in the US.13 The disadvantageof card catalogs was, they could become nasty in size. For a small, rural public library with a small collection, the card catalog could take up the space of a large refrigerator. For a large, academic, research institution, imagine just banks and banks of file drawers, like in Columbia’s library. Card catalogs were the 20thcentury version of today’s data centers, or data farms.14 Card catalogs were gradually phased out in the 1990s and 2000s as library catalogs became automated. Today, the standardform of library catalog is the online catalog, which you all know. Online catalogs are also called OPACS, or Online Public Access Catalogs.15 An online catalog is basically a database. Imagine a spreadsheet with rows and columns. The rows are the different items. The columns are the access points, or heading types. They might be labeled author, title, subject. When you choose to search by author, for instance, you’re limiting your search to that author column in the spreadsheet.16 On the back end of an online catalog is a MARC record. MARC stands for Machine Readable Cataloging. The MARC format is a standard way, a standard structureused to create a bibliographic record. You can think of it as the grammarof the record. 17 Many library systems allow you to view the back end of the catalog from the Web. In the University of Pittsburgh system, for example, if you find an item’s record and you select “staff info,” the MARC record appears. You can see the field codes in blue. The fields of particular note are the 100‐field, where the author is entered, the 245‐field, where the title and author are entered as they appear on the title page. 260 is the publication and date field. 300 is the physical description. The 650 fields are subject fields. This is the kind of record that a cataloger works with all the time.18 Getting back to our conversation, MARC formatting alone doesn’t say what the record is, it only specifies what the record should look like, in what order things are described by the cataloger, how it is displayed on the screen by the user, and how the records can be searched. MARC is a transmission standard, a way to ensure that records can be shared in a uniform way. MARC formatting was originally created by the Library of Congress for use with card catalogs. The LC used to create the entries that could be sent out and transmitted by computer to libraries and printed out on the index cards for use in their card catalogs. The cards could also be printed out enmasse and distributed to libraries around the country and world. Today, MARC is important because most cataloging is copy‐cataloging. This means only a few agencies, like the OCLC, the LC, vendors, or academic libraries with specialized collections, actually create original records. Most records are actually pulled from these centralized agencies.So this MARC formatting is, in one sense, a continuation of the trend begun with the card catalog in terms of the national, centralized, standard way of doing cataloging. In another sense, though, because MARC was created for use with card catalogs and at and early time in the history of computers, it is a legacy format, a hangover from a previous time. It wasn’t designed specifically for the online environment today, it was 19 only adapted to serve a purpose here.19 There are some other differences between online catalogs today and card catalogs. In online catalogs now, a searcher is not limited to a specified set of indexing terms—any word or phrase is searchable.20 Also, the distinction between main entry and added entry is no longer necessary. A search for title retrieves an item’s entire record just as well as a search for author does.21 What are the advantages of an online catalog? There are many. Catalogs are viewable remotely. They can be updated remotely. They are easily modified. In most cases, records have already been created and can be pulled. There is no longer just a predefined headings list; users can also search by full text and keyword. The catalog is not housed in the library, so in a sense the size of the catalog is not an issue. I say sometimes not an issue because if the servers are housed offsite, there’s more room in the library building. But sometimes libraries maintain a server room in the same building. Also, the catalog can be integrated with other databases the library maintains, like circulation, account information. These all can be linked in the library’s integrated library system, or ILS. And finally, items in the collection can be linked directly to the catalog entry if they are online.22 But there are several disadvantages. The first relates to energy, storage, and access. I said that physical size is not an issue. It’s more accurate to say that the physicality of an online catalog is not an issue in the same way as it is for a card catalog or a book catalog. The servers that host a library catalog are still physical things. They are located somewhere, they consume electricity, they require fast Internet access. They can go offline, and they can become expensive to maintain. Let’s say the servers are in West Virginia, and West Virginia is experiencing bad weather, even if it’s sunny where you are, you might not have access to a catalog. So then there are the economic issues of continued maintenance. Unlike a card catalog system, which might be a large investment up front, online catalogs require indefinite maintenance costs. We also need to consider the environmental impact of computing. A lot of CO2 is emitted while keeping all the computers running. 23 So I hope you’ve gotten a sense of how catalogshave developed. I know it’s a lot to think about and a lot to take in, but understanding how catalogs have been built will help you to understand the readings better. It will help you understand how it is we’ve come to where we are today. We might end with some general conclusions. The first is that each of the catalog forms was created in a specific moment in time to handle a certain type of collection. The second is every catalog form has certain advantages and disadvantages. And thirdly, I think we can see how previous catalog technologies inform succeeding ones. Thanks for listening.24
Hi everyone. In last week’slectures, we looked at a quintessential information organization tool, the library catalog. We discussed the basic idea of a catalog, we examined how the technologies for catalogs developed over time, from the mid‐19thcentury to the present, and we defined a catalog as an index or a set of indices that provide intellectual access to a collection. We looked at some of the fundamental problems that library catalogs face. These problems are the problem of scope, the problem of description, and the problem of arrangement. The problem of scope is the problem of what is included in an index, what is included in a collection, and what the nature of that collection is. The problem of description is, now that we’ve decided what’s in the index and what’s in the collection, how do we represent that collection in an index, how are access points created. And the problem of arrangement is the problem of how the index is ordered and how the entries are connected, and also how the collection is ordered. Moving forward this week, we will begin to look at how these basic problems have been addressed.1 In this lecture, we will examine how library codes have developed over time in order to address the problemsof scope, description, and arrangement. We will focus on descriptivecataloging codes. Descriptive cataloging is typically treated as a different animal than subjectcataloging and classification, which we will examine later in the course.2 In tracingthe history of the development of library cataloging codes, we will touch on two important concepts that are central to understanding cataloging practice. These concepts are the principle of authorship….3 …and standards,standardization, and interoperability…4 So whatis descriptive cataloging? Descriptive cataloging is the process of creating some record or entry for an object in the collection. 5 It includes the choice of author and title headings. 6 It also includes other kinds of description of the object, such as size, publication information, edition, translation, language, and if it is part of a series. 7 Know also that it includes authority work, though we’re not going to go deeply into authority work at this time. 8 Descriptive cataloging does notinclude subject access, and it does not include classification.9 As library collections grew,as the number of libraries grew, and as librarianship became professionalized, there was a need to understand how best to provide access to collections. Descriptive cataloging rules, or codes, were developed to instruct librarians on how to provide access. A descriptive cataloging code is an attempt to solve the 3 problems of cataloging: scope, description, and arrangement. It does this by defining, even if implicitly, what a collection is, how best to represent it in an index, and how to order the index and the collection. Remember that the order or arrangement of the index does not always match the order or arrangement of the collection itself.10 One of the first sets of rules was Panizzi’s91 rules,in his preface for the Catalogue of Printed Books in the British Museum, published in 1841. In the rules, Panizziimplicitly defines the collection as composed of bibliographic units, or individual books or publications. The two types of indexes Panizziestablishes are an author index and a title index. The author index directs searchers to authorized headings using “see” references. There is no subject index in Panizzi’scatalog. 11 The next significant descriptive catalogingcode was Cutter’s Rules for a Dictionary Catalog, first published in 1876, the same year the American Library Association was founded. Cutter adds a subject index on top of title and name entry. He also identifies what is now known as a literary unit, which is, as Verona explains, a set of closely interrelated works. Say, for example, there is a book that is a first edition in English, there is a second edition of the same book in English, there is an Italian translation whose title is Italian, there is an audio book, an e‐book—these are all separate and distinct bibliographical units, but they are a single literary unit. According to Cutter, all these works should be assembled or collocated under a uniform heading to identify the literary unit. The best example of a literary unit assembled under a uniform heading is the Bible.12 FollowingCutter, there was an increased emphasis on national, committee‐led initiatives to develop descriptive cataloging rules. As we mentioned last week, at the turn of the 20thcentury, many libraries had begun to use the innovation of the card catalog. As more and more libraries began to approach cataloging in the same way, it became possible to further standardize cataloging practices. The Library of Congress began its printed card service, where it manufactured index cards for particular books that it distributed for libraries to use in their catalogs. So to do this task well and to do it transparently, it developed its own set of rules in 1899.13 At the same time, libraries wantedto know how to create their own cards that they could use with the LC’s cards. So in 1901, a committee was formed by the ALA and the Library Association, which is the British equivalent of ALA. This committee work resulted in 1908 with the Catalog Rules.14 These ruleswere not revised until 1941 when the ALA revised them to form the ALA Cataloging Rules. A revised edition was published in 1949.15 The LC also published its own set of rules in 1949.16 In 1951, ALA invited Seymour Lubetzkyto comment on and revise and revise its rules. Lubetzkywas a cataloging specialist at the LC. In the readings for this week, we’re reading Lubetzky, and so the readings are part of his critique. So his study started in 1951. Then, in 1961, there was the International Conference on Cataloging Principles held in Paris. A new draft of the ALA code was prepared there, and it came to be known as the “Paris Principles.” They were based on Lubetzky’swork. 17 TheParis Principles led to the first edition of the Anglo‐American Cataloging Rules in 1967. This was called AACR. There was then a second edition in 1978. This was called AACR2. Following that, there was a second edition revised in 1988. This was called AACR2r. Then there was a second revised second edition. You can called this AACR2r2. Most people just call it AACR2r. The laststrevision was in 2002. What’s important to note about AACR, AACR2, and so on is that it is an effort toward international standardization of cataloging. It was jointly developed by representatives of the US, Canada, Britain, and Australia.18 So let’s pause for a second and consider why there is thisdevelopment of standardization. We noted last week that the transmission standards for catalogs were the index card, followed by MARC. But the rules for descriptive cataloging became standardized as well. It goes without saying that standards are extremely important in everyday life. We use standards on roads and when driving. Standards for building houses. Standards are used in all sorts of ways. Why are standards needed in an information organization system like a catalog?19 They’re needed because catalog records are a shared pool of resources. 20 Cataloging is done collaboratively.21 If different libraries do cataloging according to the same standard, the same set of rules of specifications, then the records created using that standard are interoperable. 22 So there is this mutual gain that libraries have by creating records in a standardized way and by pulling records from a standardized pool. Imagine the extra time and labor required for a library to catalog according to its own, idiosyncratic set of rules. There are also processes based on cataloging that might not function correctly without standardized practices, such as inter‐library loan.23 We also should step back and recognize animportant feature of the AACR rules and the rules that led up to and preceded them, and that is the principle of authorship.24 The principle of authorship assumes there to be a main entry. As we mentioned last week, when card catalogs were used, there was a need for a main entry card that contained all the bibliographic information of a work—it’s author, title, publication info, maybe an abstract, and finally there would be a pointer to where the work was in the collection.25 Following the principle of authorship, the heading of the main entry card should be the author, if it is known.26 This does two things in the catalog: it assembles all works by a particular author under that author heading.27 It also locates in the catalog other editions of the work by the author.28 Sowe’ve charted descriptive cataloging rules, from Panizzi’s91 rules to AACR2. But recently, AACR2 was succeeded by a new descriptive standard, RDA.29 RDA stands for Resource Description and Access. RDA is the standard, the code or the set of rules catalogers should now follow. 30 This standard was based on what’s called FRBR, or Functional Requirements for Bibliographic Records. 31 RDA superseded AACR2r in 2010 when it was published. It is now the standard for descriptive cataloging. The LC called for “full implementation” of RDA in 2013. 32 RDA began as AACR3, but it was seen during the revision process in 2005 that a radical departure from AACR was needed.33 FRBR was published in 1998 by IFLA, which is the International Federation of Library Associations and Institutions.34 RDA was begun in 1997 at the International Conference on the Principles & Future Development of AACR. The conference was heldby the Joint Steering Committee for Revision of AACR (JSC). The JSC was composed of people from around the world,including those from the US, Canada, Britain, Germany, and Australia. The JSC was in charge of revising AACR since 1974.35 RDA was prompted by FRBR, which offered a new way to conceptualize bibliographic entities, and it was also prompted by technological change. The World Wide Web offered the potential to re‐think and build catalogs using a database structure.36 So that concludes this lecture. I know I left it at a bitof a cliffhanger because we did not get to discuss FRBR and RDA in depth. That will be the topic of the next lecture. But in this lecture, we traced the development of descriptive cataloging rules. These rules have been attempts to grapple with the problems of scope, description, and arrangement. Two important concepts that characterized this development were standardization and the principle of authorship.37
Page 1 of 61
ZOOM
Hi everyone. Last time we discussed the history of descriptive cataloging codes. We saw how the rules developed by Panizziand Cutter grew into national‐and international‐level standards developed by committees, whose members included representatives from the LC, ALA, the Library Association, and other national‐level organizations. But we didn’t discuss what FRBR and RDA are, how they were implemented, or why the shift to RDA may or may not represent a fundamental shift in information organization.1 So in this lecture, we have one main objective‐‐we’re going to discussFRBR and RDA in depth. 2 So we mentionedlast time that RDA is the product of what’s called a Joint Steering Committee whose members include representatives from key bibliographic agencies in several countries. This committee’s duty has been, since the 1970’s, to revise and improve AACR, the Anglo‐American Cataloging Rules. An iteration of revision of AACR began around 1997, and at that time, it was thought, well, whatever it is, whatever it becomes, it will just be called AACR3, it won’t be such a radical departure from AACR2. But in 2005, it was decided by the committee that building on and improving AACR was no longer possible, so the new product, the new set of rules came to be known as RDA, or Resource Description and Access. RDA was published in 2010. At the time that it was published, it became the de‐facto descriptive standard.3 ButRDA, the set of instructions, is based on FRBR, which is not a set of instructions, but a conceptual framework. FRBR stands for Functional Requirements for Bibliographic Records. It was published in 1998 by representatives from IFLA, which is the international federation of library associations and institutions. Keep in mind that IFLA is responsible for initiating what came to be called the Paris Principles, what would later become the Anglo‐American Cataloging Rules. They are also responsible for creating ISBD, or International Standard Bibliographic Description, the backbone of AACR. But to return to the discussion, FRBR informs RDA, so in order to understand RDA, we first have to understand FRBR.4 Now consider one of the fundamental questions we’ve been asking ourselves since the beginning of the course and since reading Wilson. Itis the question, what is the bibliographic universe? In particular, it’s the question of what’s in it? What makes it up? What does the bibliographic universe consist of? Is it books, as Cutter said? Is it works or texts, as Wilson said? Are there bibliographic or literary units, as Verona asked? What exactly is it we mean when we talk about a collection? FRBR tackles this question directly by defining what the bibliographic universe is—what lives there and how the things that live there are related. FRBR is, then, a model of the bibliographic universe. According to FRBR, there are three main types of things in the bibliographic universe: They are…5 ….Entities…6 …attributes…7 …and relationships8 Withinentities, there are3 groups, conveniently labeled…9 …Group1 entities…10 …Group 2 entities…11 And, you guessed it, group 3 entities.12 Let’s begin looking at the group 1 entities. There are 4 of them. They are…13 …Work…14 …Expression…15 …Manifestation…16 …and Item. 17 These entities,collectively, are called “WEMI” for short. Justremember‐‐WEMI18 So what are these things? Well, first, it’s helpful to think aboutthem on a scale of abstraction. 19 On the scale of abstraction, a work is the most abstract, the most general,group 1 entity in the FRBR universe. Going down the scale of abstraction from work, an expression is still general, but it’s more concrete than a work, more particular. A manifestation is even more particular than an expression, and finally there is an item, which is alone, by itself, unique, a one‐of‐a‐kind. 20 Another way to think of the entities is as groupings.21 So a work includesexpressions, manifestations,and items…22 …An expression includesmanifestationsand items…23 …A manifestationincludesitems…24 …And an item is singular. It includes only itself. There’s only one ofit.25 Thereare other ways of describing these entities. This is done by identifying the relationshipsbetween group 1 entities. 26 For instance, an item is an exampleof a manifestation. This is the first type of relationship.27 …A manifestation is an embodiment of anexpression, another relationship.28 …and an expression is a realizationof work. A third type of relationship.29 So this is the start of the ontology of the bibliographic universe, composed of entities, attributes, and relationships. 30 Still looking at group 1 entities, a work is defined as “a distinct intellectual or artistic creation.”31 An expression is defined as “the intellectual or artistic realization of a work.”32 A manifestation is defined as “the physical embodiment of an expression of a work.”33 And an item is defined as “a single exemplar of a manifestation.”So, again, these are the group 1 entities of FRBR. Just remember: WEMI.34 It’s also useful to use these entities as a way to approach another major distinction in descriptive cataloging,the distinction between contentand carrier.35 Content and carrier were not clearly distinguished in AACR. But FRBR, and consequently RDA, emphasizes this distinction. Within the FRBR model, the carrier is represented by two entities: manifestation and item. The content is represented by the other two entities: work and expression. Carrier refers to the data related to some physical object, for the purposes of inventory control. Attributes of a carrier include things like the publisher, page count, ISBN. Carrier data helps find the holdings in a collection. Content, by contrast, refers to the intellectual content of a carrier, more ephemeral things like author, title. Hamlet is the classic example use to describe the differences between work, expression, manifestation, and item, and consequently the differences between content and carrier. Let’s say someone is searching for Hamlet. Hamlet is a workunder FRBR. There are different expressionsof Hamlet. Maybe there’s a Spanish translation. That would be one expression of the singular work, Hamlet. There may be a certain Spanish version of Hamlet published by a certain publisher, by a certain translator, in a certain year. That’s a manifestationof the Spanish‐language expressionof the work Hamlet. Now, the library may have a copy of that particular Spanish‐language version. The copy in the library’s holdings is the item. In the actual FRBR document, there are plenty of other examples like these that show how these group 1 entities relate to one another in practice. But the point of making these distinctions is to clarify what when we talk about books or texts or whatever, we can mean different things, from something very concrete and particular, like the actual, physical book held by a library, or we could mean something more ephemeral, 36 more abstract, like a work. So that is the promise of FRBR, the hope at least, that these distinctions that have always been there in some way, are now brought to the fore and clarified.36 As I said, thosefour entities are what’s called the “group 1 entities” in FRBR. There are also group 2 entities and group 3 entities37 Group 2 entities are “those responsible for the intellectual or artistic content, thephysical production and dissemination, or the custodianship of the entities in the first group.” This is what most people consider authors. Group 2 entities include person and corporate body. 38 At thispoint, I’m just going to defer to the images in FRBR. As you can see in this entity‐relationship diagram, an item is owned by a person or corporate body, a manifestation is produced by a person or corporate body, an expression is realized by a person or corporate body, and a work is created by a person or corporate body. So this image shows the relationships between group 1 entities, at the top, and group 2 entities at the bottom.39 That takes us to group 3 entities. Group3 entities are defined as “entities that serve as the subjects of works.” They include concept, object, event, and place.40 Again, this entity‐relationship model is from FRBR. You can see a work can have as it subjectany of the group 3 entities at the bottom: concept, object, event, place. A work can also have as its subject any of the group 1 entities or group 2 entities. So a work can have as its subject another work, expression, manifestation, item, person, or corporate body. So, technically, if things aren’t confusing enough already, group 1 entities are also, or can be, group 3 entities because works, expressions, manifestations, and items can be the subject of a work.41 So we’ve talked about entitiesand relationships, but we haven’t mentioned attributes, which is the third type of metaphysical thing that inhabits the bibliographic universe. So what’s an attribute? An attribute is basically a characteristic of an entity. The attributes of an entity help to distinguish one entity from another. Each type of entity can have different types of attributes. 42 For instance,attributes of a work include title, form, date…43 Attributes of an expression include title, date, language…44 Attributesof a manifestation include title, statement of responsibility, edition, and place of publication…45 Attributes of an item includeidentifier, which is a unique number or code, provenance of the item, or who owned it, where did it come from, and condition of the item. These are not exhaustive lists, by the way. The full set of attributes for each entity is detailed in FRBR.46 Andsimilarly, there are attributes for group 2 and 3 entities as well.47 Before we conclude our discussion ofFRBR and move on to RDA, lastly,you should know the four main objectives found in FRBR. Cutter called these the objectsof a catalog, but in FRBR, these are actually called user tasks—the tasks that bibliographic description should support. The four tasks are: find, identify, select, and obtain.48 To find means “to find entities that correspond to the user’s stated search criteria.”49 To identify means “to confirm that the entity described corresponds to the entity sought, or to distinguish between two or more entities with similar characteristics.”50 To select means to “to choose an entity that meets the user’s requirements with respect to content, physical format, etc., or to reject an entity as being inappropriate to the user’s needs.”51 And to obtain means “to acquire an entity through purchase, loan, etc., or to accessan entity electronically through an online connection to a remote computer.” Now, in stating these user tasks, FRBR is making the implicit claim that these and these tasks alone are those that should concern the cataloger. This list is claimed to be exhaustive, and the entities, attributes, and relationships are claimed to describe the bibliographic universe completely and unproblematically. It is the attributes of entities and the relationships among entities that are supposed to help users accomplish these tasks. Again, this is the promise of FRBR.52 Thisconcludes our discussion of FRBR. So let’s move on to RDA. What does FRBR mean for RDA?53 First, RDA is a set of instructions for establishing access points to the entities in the bibliographic universe. Where in AACR2‐speak, and when we used to talk about headings in the context of, say, card catalogs, the access points were called headings. RDA doesn’t use this language. It tells catalogers how to construct authorized access points. FRBR provides the language and structure for creating the access points. FRBR is the basis for the entities, attributes, and relationships that are used in the process outlined in RDA.54 Second, recall how FRBR makes the distinction between carrier and content. Works and expressions are entities in the content category; manifestations and items are in the carrier category. When using RDA to describe a resource, the cataloger starts by considering the carrier entities: the item and manifestation attributes. Attributes of manifestations and items are the first part of RDA, chapters 1 to 4.55 Then, the cataloger moves on to considering the expression and work attributes. This is part 2 of RDA, chapters 5, 6, and 7. So basically the first two sections of RDA are about the attributes of FRBR’s Group 1 entities. 56 The next section of RDA, chapters 8 to 11, is about attributes of group 2 entities, which include persons, families, and corporate bodies.57 Section 4 is about the attributes of FRBR group 3 entities, which are concept, object, event, and place.58 The next sections of RDA, sections 5 through 10, are about describing relationships between all the entities. So while going through RDA, a cataloger must keep in mind these distinctions between entities, attributes, and relationships.59 FRBR and RDA is very confusing for beginners. It’s confusing because you’re learning a new language, a new way to divide up the world. I’ve tried to explain the central ideas of FRBR and to explain how FRBR informs RDA. But I realize the actual work flow, the actual process of bibliographic description is going to vary by person and by institution. So if you are interested in learning more about FRBR or RDA, I would encourage you to read copies for yourself. FRBR is available for download on the IFLA site. Likewise, you can try out RDA online. There are also physical copies here and there, I know ESU has a physical book copy. But this concludes the lecture. I’m happy to answer any questions you may have. Take care.60
Page 1 of 15
ZOOM
Welcome back, everyone. Lasttime, we discussed FRBR and RDA in depth. As I said, RDA is the current descriptive standard used in libraries for describing the bibliographic universe. If you choose to go on and take cataloging, or if you receive training in cataloging, you will likely learn how to catalog in RDA, though there are still institutions that use AACR2. So knowing the basics of FRBR and RDA is important, even though we won’t go on to actual cataloging in this class. But this is a foundation for you should you choose to go on. Last time, we discussed, in depth, what FRBR is and how it informs RDA. This is part of the story, but there are some other things to consider, in terms of library administration, personnel training, and catalog maintenance that I believe everyone should know, regardless of whether you are a cataloger. As I said at the beginning of the course, organization of information is a human practice. We have to consider not only the standard in the abstract but also its implications for actual practices that go on in the library.1 So in this lecture, we’ll reflect onwhere we are in the transition from AACR2 to RDA. Then, we’ll look at the significant similarities and differences between the two standards, and what implications these similarities and differences have.2 The first thing you should know, as a practical matter, if you are new to cataloging or newto libraries, is that the library community is still very much in the transition phase from AACR2 to RDA. While most people have heard of FRBR and RDA by now, when I surveyed school librarians in Pennsylvania in 2011 and 2012, many of them had not heard of FRBR or RDA, and many were not aware that there was to be a new descriptive standard. So my guessis that has changed, but still, I also wonder how many practicing librarians have actually read FRBR and RDA and can explain the entity-relationship-attribute model coherently. Honestly, I don’t know.3 Some of the key dates you may wantto keep in mind are, RDA was published in 2010.4 There was a 6-monthtesting period from 2010 to 2011. During this time, it was tested by the LC, the National Library of Medicine, National Agricultural Library, and another 23 libraries in the US. The testing was done, in part, because there was significant push-back from the library and the cataloging community about adopting RDA without also revising MARC format. I’m not going to belabor this point by detailing all the reasons some people were reluctant to adopt RDA, because it’s a moot point now. If you are really interested in the controversy, one starting point is a 2016 article by Michael Gorman called RDA: The emperor’s new code. It’s in volume 7, issue 2 of JLIS.it. In any case, some people celebrated the arrival of RDA, some people thought RDA stood for “retirement day at last.” But you should know that there was and still is some tumult, some upheaval in the library community as a result of the change.5 After the testing,the Library of Congress decided to “fully implement” RDA beginning in 2013. Basically, the nationaland international library community is a huge ship to steer, so this implementation is something that is happening gradually.6 So what does this mean for you, if you are stepping into a new library environment? 7 First, it means that likely, most if not allof the existing records you will see in your catalog will have been created using AACR2, not RDA. 8 Now, RDA recordsare compatible with AACR2 records, by this I mean they’re friendly, they play together well in the same system, you don’t have to use different systems to display them or some such thing. Both of them are constructed and transmitted and displayed using MARC structure. You still use Dewey Decimal System or LC Classification for call numbers. But you may notice slight differences in the bibliographic descriptions of AACR2 records versus RDA records. There are lots of differences when you look closely and compare, but one noticeable difference between an RDA record and an AACR2 record that will be immediately visible to you is that an RDA record has 3 new MARC fields: the 336, 337, and 338 fields. These are called the CMC fields, or “content media carrier.” These are the content type, media type, and carrier type fields. Remember how RDA carefully distinguishes between carrier and content? This is one way it does this—by creating new fields. These three fields replace what in AACR2 was called the GMD, or general material designation, that was an optional addition in the 245 field, and whose lists of descriptors were seen to be outdated and arbitrary, which is why the 3 new RDA fields superseded the GMD.9 So there are certainly similarities betweenAACR2 and RDA. Because RDA is still based on AACR2, there is no radical change in the way records look or they way cataloging is done. There are also certainlydifferences between the records that use RDA and thosethat use AACR2. And certainly there are differences in vocabulary and terminology. The real difference, I believe, is not so much the appearance of the records, but the thought process that goes into creating original records, the actual practice of cataloging. One of the reasons RDA superseded AACR2 was because AACR2 was not seen to be suitable for describing online and electronic resources. AARC2, the actual set of rules, is organized by format. There are general rules for description, but then following those there are more specific rules for books, cartographic materials, music, sound recordings, motion pictures and video recordings, microforms, and so on, by format. So what this means is that a cataloger using AACR2 is preoccupied with these types of formats. RDA abandons this preoccupation with format. Instead, catalogers using RDA are interested in, as we said, entities, their attributes, and their relationships, the elements regardless of format. So the process of cataloging is changed because a new grid is used to cover the bibliographic universe.10 But the bigger question for you will likely not be, how doI catalog in RDA? This is because, as I may have said, much cataloging today is copy cataloging. Probably, someone has already created the RDA record for whatever resource it is you’re dealing with, and you or the catalogers you’re working with will just pull that record from the consortialpool. The bigger question for you may be, what do we do with all these records that are AACR2? Do we go back and convert them? Do we enrich them to create “hybridized” records? Do we just let them be? If we want to do something about them, what do we do? How do we do it? Will we do it or will we hire someone to do it? Is it worth the time and money? How will any changes benefit users? How will changes affect the consortium? An RDA project is a large-scale project that is time-, labor-, and money-intensive. There must be a careful cost-benefit analysis, as well as careful technical planning. There is a growing literature about RDA implementation. Kent State, Chicago, Houston, and other academic libraries, for example, have published case reports on their experiences in either converting or enriching existing records. From what I’ve seen in the literature, my sense is that on the whole, most libraries are not doing much of anything with RDA. This is for reasons of money and personnel. Enrichments projects are typically done by larger academic libraries. But I don’t have data to support this at this time. But these are all important decisions you may face or have already faced at your institution.11 So beforewe wrap up this lecture, let me introduce the first Easter egg of the course. For those of you who don’t know what an Easter egg is, it’s a secret, a hidden message, a bonus feature, usually found in a game. In this course, an Easter Egg is the opportunity for you to earn extra credit points. The Easter Egg is presented in the form of a question.12 The question isthis: Imagine computerized catalogs today. They’re part of an ILS—an integrated library system. This integrated library system was introduced during the latter part of the 20thcentury and the early 21stcentury as part of the library automation movement. An ILS connects records from the catalog to patron information and circulation information. On the patron side, an ILS shows if something is checked out or available. Today, most libraries use bar codes to check out books and associate them with a particular patron, who also uses a card with some unique code. So now, imagine the time before all this was done. Imagine a time before there was an ILS. What did circulation records look like? How were circulation records organized before there was an ILS? What was the relationship between circulation records and catalog records?13 Takesome time to think about it. You can use whatever means available to you to produce an answer. If you know the answer, please emailit to me before the next weekend meeting. You can earn up to 5 extra-credit points depending on the completeness and detail of your answer. Winners will be announced at the next weekend meeting.14 So I know this lecture was relatively brief, butI hope you were able to gain a better understanding of where we are in the process of the transition from AACR2 to RDA. I also hope you were able to see some of the areas of significance for cataloging practice, whether cataloging is your specialty or not. Whatever your role in the library is, you may have some difficult decisions to make regarding RDA.This concludes the lecture. Thank you for listening.15
Page 1 of 76
ZOOM
Welcome back everyone. So far in the course, we’ve talked about what a catalog is. We’ve focused on the library catalog because this is a library and information management program, and as working or aspiring professionals, it benefits you to know what a catalog is and how it works. You may manage a catalog directly, or you may work with those who do. Besides professional-level training, we’ve also concentrated on catalogs because they represent information organization tools more broadly. The problems and issues within the subfield of cataloging and classification resound in other related areas, such as artificial intelligence, programming, map-making, architecture, web design, and web searching. I hope you’ve begun to focus deeply and carefully on cataloging issues, but at the same time dial out to see cataloging issues as issues that are fundamental to any organization endeavor. We’ve defined a catalog as “an index or a set of indices that provide intellectual access to a collection.” So far, we’ve talked the descriptive aspects of a catalog—how to represent, for example, authors, titles, publication information. These aspects are used to create access in some ways. There are some seemingly intractable problems here, as we’ve seen, but since the mid-to late-nineteenth century, there have been increasingly large-scale and centralized efforts to manage catalog description in a standardized way. This week, we will move on to another type of access, one emphasized by Cutter, and that is subjectaccess. Cataloging is 1 traditionally approached as a trifecta—descriptive cataloging, subject cataloging, and classification. We will follow this structure in this course.1 Inthis lecture, we will do some stage-setting. We will ask, what is a subject? How is it defined? We will also make the distinction between subject access and retrieval versus subject cataloging. Subject cataloging is done using an indexing language. So we will have to step back and examine indexing languages from afar, then zoom in to focus on indexing languages used for subject cataloging. We will define what indexing languages are, we will look at a few types of indexing languages as a way to better understand what all the options are. Finally, it’s time to introduce ways to evaluate subject searches. The two most important measures of a search are precision and recall. So we will examine the history of these measurements and examine how they work.2 So, what is a subject?3 Asubject is defined as a department of knowledge, a theme, a topic.4 It is what something is about.5 So we immediately confrontthis notion of “aboutness,” which is the problem of determining what something is about. As we will see, determining what something is about is not easy. In fact, it’s pretty complicated.6 But let’s not let ourselvesget too bogged down yet with philosophical problems. There’s another distinction that needs to be made. It is between subject search and retrieval—subject access—on the one hand, and subject cataloging, on the other. 7 Subject access is any means of locating material on some topic. This can be done in either a structuredway, using some set of terms, numbers, or codes, or it can be done in an unstructured way, like using words in the title, abstract, or body. 8 Subject cataloging is theprocess of determining what a work is about and how best to represent its “aboutness” in a catalog. In other words, subject cataloging is concerned with providing subject access, with making subject access better. The challenge is figuring out how to do this.9 To understandhow subject cataloging is done, we have to step back for a second and understand what an indexing language is. 10 An indexing language is an organizational tool used to provide effective subject access. 11 It is a systematic guide to the topics and contents of a collection. The term “language” is used because the systems share characteristics with language, including vocabulary, syntax, and semantic relationships.12 Library of Congress Subject Headings, LCSH, and Dewey Decimal Classification, or DDC, are two examples of indexing languages.13 First, I think it’s helpful to get a broad overview of the different types of indexing languages.14 I am going to use this hierarchy to describethe types of indexing languages. The types of indexing languages are shown here. We’re going to go through these and make sense of these distinctions one-by-one, in order to have a common understanding of what we mean when we say, for instance, colon classification, or Sears.15 First, let’s look at this broad distinction betweenalphabetical and systematic indexing languages.16 We’ll look at systematic first.17 Systematic indexing means it is basedon classification.18 To classify means to organize into some order. It means to group things into classes based on some shared characteristics. Taxonomy is the general study of how classification is done. In a library, in order to group like with like, materials of a certain type are stored together, whether in separate rooms on together on the shelves. To do this, libraries use notation. Notation is a system of symbols, such as letters and numbers, which can be combined to represent the divisions of the classification system. Think of DDC or LCC. They use notation to ensure that all the materials of a certain subject area are stored together. We are going to set classification aside for the moment, because we will return to it and explore it in more depth later in the course. But before we do that, just know that classification was what the first libraries used to organize their collections. Some collections were organized using only classification. So in terms of evaluating the effectiveness of a classified organization, an advantage is that things are organized logically, and like things are brought together with other like things. The disadvantage is, if you don’t know the classification system, you can’t find anything. In a classification system, objects in the collection are not organized by a name, but according to an order, which is unhelpful if all you know is the name of what something is about, not where that subject is located within the order.19 Within systematic indexing languages, there are enumerativeand synthetic types. An enumerative type of systematic organization means that the system lists all the possible classes something could fit into. In Library of Congress Classification, for example, it lists all the possible classes. It’s exhaustive. A synthetic type of classification means that classes can be created on the go, as something is cataloged. This is done using smaller elements—building blocks—and by following rules.20 We are going to set classification aside for the moment, because we will return to it and explore it in more depth later in the course. But before we do that, just know that classification was what the first type of subject organization the libraries used for their collections. Some collections were organized using only classification. So in terms of evaluating the effectiveness of a classified organization, an advantage is that things are organized logically, and like things are brought together with other like things. The disadvantage is, if you don’t know the classification system, you can’t find anything. In a classification system, objects in the collection are not organized by a name, but according to an order, which is unhelpful if all you know is the name of what something is about, not where that subject is located within the order.21 That brings us to alphabetical indexing languages. Alphabetical indexing languages were developed after classified-only collections failed to serve users, especially in the early 20thcentury when closed-stacks libraries became open-stacks libraries and the public library let the public search on their own. 22 Alphabetical indexing differs from systematic indexing because alphabetical indexing doesn’t use notation, it uses natural language—words—to express subjects. These words are called terms, descriptors, subject headings, but they all have the same meaning. In an alphabetical index, the searcher can search by the name of the subject.23 Within alphabetical indexing, there are twotypes: uncontrolled and controlled. Let’s look at uncontrolled first.24 Uncontrolled means that no effort is made to use words consistently to represent particular subjects.25 There are several examples of uncontrolled indexing. They are KWIC, KWAC, and KWOC. All of these are examples of catch-word indexing.26 KWIC stands for Keyword in Context. KWIC indexing was developed in 1958 by IBM engineer Hans Peter Luhn. This was before keyword searching, or full text searching, was possible. Luhn’sobjective was to use computers to perform automatic indexing of scientific and technical literature. A KWIC index is a type of catch-word index. In a catch-word index, significant words or phrases are used from titles. Because the resulting index depends on what the titles are, the index is uncontrolled because there is no attempt to systematize or consistently use the terms that appear in the index. But again, KWIC means Keyword In Context.27 This is an example of aKWIC index from Wikipedia. You can see the significant words in the center, ordered alphabetically. On the far right is the place where that title can be found. But KWIC is called “in context” because the indexing terms and displayed together with and within their titles.28 There are two other very similar catch-word indexing techniques. They are KWAC and KWOC. KWAC means Keyword and context, alternatively Keyword alongside Context. KWOC means Keyword out of context. These two are basically just variations of KWOC, the only difference being the position of the keywords in the index.29 This is an example of aKWAC index, again pulled from Wikipedia. What you see is a bit different than the KWIC index because the significant words in the titles are not aligned centrally, but left-justified, and the keywords are extracted form the titles and placed before them.30 This is an example of aKWOC index. As you can see, the keywords are extracted from the titles and placed on the left. The titles then appear in full, unlike the KWAC index, where titles were displayed with their keywords removed.31 Uncontrolled indexing can be done quicklyand automatically, as the KWIC acronym was perhaps meant to imply. But there are three significant problems with uncontrolled indexing.32 The first is the problem of referential semantics. This is theproblem of homonyms. Homonyms are words that are spelled and pronounced alike but have different meanings. An example is band. There are marching bands, but also rubber bands. Homographs are words that are spelled alike but differ in derivation, meaning, or pronunciation. An example is China. There is China the country and then there is porcelain china. Homophones are words that are pronounced alike but have different meanings, derivations, or spellings. An example is red, as in the color red, and read as in “I read a book.” Homonyms and homographs are a problem in written catalogs because it is unclear to the user which meaning of the word is meant. Say I’m looking for a work on alternator belts for cars. I have to sort through all other types of belts and disambiguate them from what I want—pants belts, asteroid belts, rock belts, trophy belts.33 The second problem as to do with relationalsemantics. Relational semantics includes several things, including equivalences, or synonyms, hierarchical relationships, and related term relationships. In other words, there’s no connected tissue in an uncontrolled index, what Cutter called a syndeticstructure. We’ll come back to this when we talk about controlled vocabularies in more depth. But the problem is, say, in the case of synonyms, if I’m searching for something about cars, I need to think of all other indexing terms that could be used to refer to the same thing—automobile, vehicles, transportation, Oldsmobile. Are these more general or more specific terms? How are they related in the index? There’s no way to tell.34 The third problem is what’scalled the problem of fanciful titles. Cutter, in his chapter on library catalogs from 1876, identified this problem. He says, “we cannot alwaystake the ‘author’s own definition of his book.’ He knows what the subject is, but he may not know how to express it for cataloguing purposes; he may even choose a title that misleads or is unintelligible, especially if his publisher insists on a striking title, as is the manner of publishers; and different writers, or even the same writers at different times, may choose different word to express the same thing” (Cutter, 1876, pp. 536-537).35 So,given the shortcomings, first, of systematic cataloging, then of uncontrolled alphabetical cataloging, indexing languages were developed that are alphabetical but controlled. Again, alphabetical means that natural language words are used as the indexing terms, not notation. And controlled means that words are carefully thought out and used consistently in order to avoid the problems of referential semantics, relational semantics, and fanciful titles that occur when indexing language is uncontrolled. A controlled indexing language uses an authorized list of subject vocabulary to represent subject content.36 The first distinction within controlledalphabetical indexing languages is between precoordinateand postcoordinatesystems. 37 In a precoordinateindexing, a single heading is used for a complex subject.38 For example, if I’m looking for something about public works projects in the Tokyo region of Japan, the heading for that subject in a precoordinatesystem would be Public works—Japan—Tokyo.39 Contrast this with a postcoordinateindex where a single work with a complex subject might be cataloged under several different headings, and each heading represents a different aspect of the work. For example, public works, Japan, Tokyo, might all be separate headings, not a single one.40 For example, public works, Japan, Tokyo, might all be separate headings, not a single one.41 Let’s look at precoordinatesystems. 42 Some of the earliest precoordinateindicestried to combine the advantages of both systematic and alphabetical indexing into a single system, called an alphabetico-classedsystem. 43 An alphabetico-classed system uses what’s called indirect entry because subject headings are listed within their entire hierarchy. Let’s say I wanted something about frogs. I’d have to look up frogs by looking up it’s hierarchy, which starts with Zoology.Cutter commented on the Harvard College Library in 1876, which at the time was alphabetico-classed. Cutter strongly objected to 44 The advantages of the alphbetico-classed system are, recall and precision are high. The system avoids the referential and relational problems, as well as the fanciful title problem, of uncontrolled systems.45 The disadvantage is, finding is difficult. This is because there is no clear way to go about using the catalog to find things. This is because, due to indirect entry, and due the nesting of specific subjects within general classes, it’s hard to know where to look for things. Cutter criticized the Harvard College Library in 1876, which at the time was alphabetico-classed.46 The disadvantages of the alphabetico-classed system led Cutter to develop his own subject catalog.47 It uses a direct-entrysubject index so that users can find topics using words they know.48 In order to avoid the referential and relational semantic problems, he also adds a syndetic structure. Syndetic means connected. He connected headings in the catalog using cross-references. “See” references directed users from unused synonyms to authorized headings. “See also” references were used to express hierarchy or related terms. Basically, what Cutter did was incorporate a thesaurus within the index 49 Homographs were controlled for using qualifying words, phrases, or notes that appeared next to the headings. The two meanings of Mercury could be distinguished by stating that one was an element, the other a planet.50 Cutter’s subjectcatalog is an example of an enumerative index, where all the authorized headings are established, or enumerated, from the beginning by the cataloger. Cutter’s ideas were incorporated into the Library of Congress Subject Headings list, and the Sears subject headings list, which are still used today.51 Under precoordinate, the other option is asyntheticlist. Recall that in a precoordinatesystem, there is a single heading used for complex subjects. If an indexing language is precoordinateand enumerative, as it was in Cutter’s catalog, or LCSH, or Sears, this means that all the headings are established by the cataloger from the start. In a precoordinatesystem that is synthetic, there is still a single heading used for works of a complex nature, but smaller building blocks, parts, or elements are used to construct headings. 52 PRECIS is the best example of this type of indexing language. It stands for Preserved Context Indexing System. PRECIS is alphabetical, controlled, precoordinateand synthetic because a single heading is constructed, or synthesized, out of an established set of authorized words and an authorized syntax. A set of already-established building blocks are used to construct a string, which serves as the heading or headings.53 Importantly, this construction not only on words, but also syntax, which means order or arrangement.54 So for example, supposing there is a work about the management of libraries in Canada. All three of the elements in this subject is significant: management, libraries, and Canada. Management is the “what” of the action, libraries is the “to whom or to what” of the action, and Canada is the “where” of the action. So all three elements form their own headings. But the nature of PRECIS is that context is retained through syntax and through the rules for applying it.55 So the first string would be Canada. Libraries. Management.56 The second string would be libraries. Canada. Management. 57 And the third string would be Management. Libraries. Canada. These strings are constructed using the rules established by PRECIS.58 We said earlierthat LCSH and Sears are enumerative, precoordinatesystems. But actually, they also have synthetic aspects as well. They have become more synthetic because it offers more flexibility. These systems are synthetic because they allow for free-floating subdivisions. We’ll leave it at that for now.59 That takes us to postcoordinatesystems are so called because there is not a singleheading established by the cataloger. Instead, the onus shifts to the user who must use Boolean operators to construct a string. 60 Forexample, where in a precoordinatesystem the heading for the history of philosophy might be Philosophy—History, and the heading for the philosophy of history might be History—Philosophy, in a post coordinate system….61 …the user is expected to create a Booleansearch such as philosophy <and> history. The thesaurus of ERIC descriptors is an example of a postcoordinatealphabetical controlled vocabulary.62 This concludesour discussion of all the different types of indexing languages.63 Let’s move on to how to evaluate the effectiveness of different indexing languages.64 Two criteria that are often used are precision and recall.65 Precision is defined as the proportion of relevant documents retrieved to the total number of documents retrieved. Stated as a formula, precision is calculated as relevant documents retrieved divided by total documents retrieved. Let’s say you retrieved a total of 20 documents, and only 10 of those are relevant, your precision is .5, or one-half. Basically, precision means getting what you want, and nothing else. It means you’re not getting any garbage results, stuff you don’t want. It means your search doesn’t miss the mark.66 Recall is defined as the proportion of relevant documents retrieved to the total number of relevant documents in the database. Stated as a formula, recall is calculated as relevant documents retrieved divided by total relevant documents in the database. Supposing you retrieved 8 relevant documents, and there are 10 relevant documents in the database, your recall is .8, or 80 percent. It means you didn’t retrieve everything relevant that you could have. So precision and recall, if they were perfect, would mean that you could get everything you want and nothing that you don’t.67 Another way to think about precision and recall is to think about a target with multiple bull’s eyes. In the image, the red dots represent bull’s eyes. The bull’s eyes, in this analogy, represent the documents you want, the relevant ones. The black circle represents the database. There are 20 total bull’s eyes in the database.68 The black dots represent garbage, stuff you don’t want. These are the documents that aren’t relevant to you. Now let’s say you perform a search.69 The blue contents of the blue circle represent your search results. You retrieved 9 relevant results and one irrelevant one. That’s 10 total hits, and a precision of .9. But if you look at the bigger picture, you see that this search excluded a lot of relevant results—all the red dots outside the blue circle. There are 20 total relevant documents in the database. So that’s a recall of .45. So then let’s say you try to expand your search to include more relevant results. You cast a wider net. 70 This time, you retrieved 19 out of the 20 total relevant results. That’s a recall of .95, which is a marked improvement from last time. But you also retrieved a lot of junk. Of the 31 total hits, 12 are irrelevant. So precision went down from .9 to .61.71 The criteria of recall and precision were defined by Cyril Cleverdonin 1962 in the CranfieldExperiements72 This and subsequent analyses show an inverse relationship between recall and precision. There is a tradeoff—as one goes up, the other goes down. The challenge, then, and the purpose of these experiments, was to determine what features of indexing languages and what types of searches enable high precision and high recall. In general, languages that incorporate a hierarchy exhibit high recall because like documents are collocated in classes. They also have low precision because any general class also includes more specific terms that are not relevant. Any uncontrolled language that does not control for homographs will have low precision, and if it doesn’t control for synonyms, it will have low recall. Any keyword search will typically have low precision and low recall because it is just a shot in the dark. Keep in mind that there is also the criteria of usability of a catalog that must be considered. Even if an indexing language can enable high precision and high recall, it also has to be easy to use.73 For reasons that includebut are not limited to precision, recall, and usability,libraries have gradually settled into this area of indexing languages.74 This concludes the lecture. Thank you for listening.75
Page 1 of 53
ZOOM
Hi everyone. Last time we took a broad look at the historyand types of indexing languages used for representing what works are about. This is all part of the bigger goal to understand how to organize information, and how to provide access to the bibliographic universe. In the end, we found that due to considerations that include, but are not limited to, recall, precision, and usability, current cataloging practices have settled into alphabetical, controlled, and precoordinatesystems. The two systems that are most prevalent today are LCSH and Sears. Both are highly enumerative systems, but they also exhibit characteristics of synthetic languages. 1 In this lecture, we will look more closely at LCSH and Sears, but especially LCSH. Sears was developed after LCSH and is largely based on it, so much ofwhat will be said about LCSH also applies to Sears. So first, we will examine the history of LCSH and Sears. We will then look more closely at the principles of LCSH and how LC subject headings are constructed.2 LCSH isa little over a century old.3 LC had a subject catalog from 1869, but following the innovations developed by Cutter, the LC decided to revise its subject catalog to better provide access to its collection. LCSH, then, was designed with the LC’s holdings in mind.4 There was an ALA list of subject headings published in 1895, so the LC catalogers, led by J. C. M. Hanson, started with that ALA list. Cutter was a member of the committee that developed this ALA list in 1895. Of course, the list followed his principles.5 Keep in mind that around this same time, at the turn of the 20thcentury, the LC was also preparing its index card printing service, the unit card. So it was looking for a standard way to create subject access, not just for its own collections, but for any libraries using its cards. Standardization had a number of implications, including how the sameindexing terms would come to be used at differentlibraries. Before standardization, different libraries might use different terms for the same subject. Standardization also allowed for cooperative cataloging—the sharing and reuse of catalog records. Standardization also enabled other joint processes, such as interlibrary loan.6 It was 1909 when publication of the new LCSH list began, and publication continued until 1914. The ALA list that already had three editions, from 1895, 1898, and 1911, was discontinued. At first, the LC list was called Subject Headings Used in Dictionary Catalogues of the Library of Congress. It wasn’t until 1975 that the name changed to Library of Congress Subject Headings. 7 The LCSH emphasized much of what Cutter stressed in 1876: an alphabetical list, using common words everyone knew, with direct entry of the words, not inverted, and with cross-references to control for synonyms. In other words, a syndetic, thesaurus structure. The list also contained qualifiers to control for homographs, which is something Cutter did not explicitly mention. 8 Of course, there have been significant revisions of LCSH over the last century. It’s now on its 39thedition. It contains over 280,000 headings. It comes in a 4-volume set, distinctively known as the big red books.9 Though LCSH is the preferred type of subject index worldwide, the development of LCSH was haphazard. There are many inconsistencies that resulted from mixing various practices. One prominent problem has been inconsistent syntax, meaning that the order and arrangement of words is not consistent. Some are direct entry, other indirect. In terms of revisions, for instance, some of the inconsistencies in syntax have been corrected. There has been a shift of emphasis away from adjectives and toward nouns. There has also been significant criticism of the racism, sexism, and other biases inherent in LCSH. These criticisms, many of which came from Sanford Berman, have resulted in modifications. For more on the history of LCSH, I suggest reading about it in either Lois Mai Chan’s or Arlene Taylor’s Cataloging and Classification texts, or reading the 2000 article by Alva Stone in Cataloging and Classification Quarterly. Lois Mai Chan was responsible for revising parts of LCSH. The revision and critique of LCSH is ongoing.10 Nowlet’s shift to Sears. 11 Sears was first published in 1923.12 It was intended for small public and school libraries. 13 It is basedon LCSH, so it has some of the same headings, the same general structure, it’s just not as specific. Some terms differ from LCSH. We will look more closely at LCSH, but in doing so, since Sears and LCSH are so similar, the points about LCSH will translate over to Sears.14 Sears was successful in part because H.W. Wilson, its publisher, used the list on the cards it produced forits card distribution service.15 Unlikethe LCSH list, which is produced in a multi-volume set, Sears is sold in a single volume.16 Headings lists also include DDC schedules, which makesit easier to know what the call number will be in Dewey once you determine what the subject heading is.17 But back to LCSH. Recall that LCSH—and Sears—is an alphabetical, controlled, precoordinatesystem. It is a standard list of terms, called subject headings, that are used for indexing. Precoordinatemeans that only catalogers construct a subject heading, not users. 18 It follows the principleof uniform and unique headings. The same principle applied when we discussed descriptive cataloging—there is an authorized heading used for an author, and alternate headings point to this authorized one. Likewise, in a subject index, a single, authorized heading is used to collocate all works about a subject. This improves recall and precision. For example, if the heading Agricultural chemistry is used, then Chemistry, Agricultural is not used, and there may be a “see” reference from it to the authorized heading. A corollary of the uniform headings is unique headings. This means a heading should be used to refer to one subject only, that it is not used more than once.19 LCSH alsofollows the principle of common usage. In other words, the terms are not scientific terms, not Latin ones. It uses words that people use widely everyday. This principle follows from the emphasis on the user of a catalog, not librarians. And not just any user, but someone who may not use catalogs often, what Cutter calls the “desultory” user. The principle of common usage, however, presents its own problems, as word usage changes over time. Once acceptable terms become unacceptable culturally. Also, there are regional language variations.20 Then there is the principle of specificity. This means that thesubject headings should be as specific as the topics they are intended to cover. In other words, the heading should be coextensive—no broader or narrower—than the subject content of the work. Now, what happens when a work is about more than one thing, or when something is viewed from multiple perspectives, or other situations? In these cases, there may be a need for more than one subject heading, or there may be a need to subdivide the heading.21 And as we mentioned, LCSH follows direct entry. Contrast this withhow classified entry works: some subject is nested within a hierarchy, and in order to find the subject, you have to know the broadest term of the hierarchy.22 And as we mentioned, LCSH follows direct entry. Contrast this withhow classified entry works: some subject is nested within a hierarchy, and in order to find the subject, you have to know the broadest term of the hierarchy.23 There area variety of ways that LCSH headings are constructed. It can be challenging to find order within the chaos.24 Headings takethe form of a single noun…Art…25 An adjective and a noun…American drama…26 A noun and an adjective, separated bya comma. Chemistry, comma, organic…27 A noun and a noun… ocean currents…28 A noun and a noun, separated by a comma… insurance, comma, life…29 A nounwith another noun qualifier in parentheses… children (Roman law)…30 A noun with anadjective qualifier in parentheses…Germany (West)…31 A noun preposition noun construction… Social work with youth32 And a noun conjunction noun construction… Good and evil.33 One distinctive characteristic of LCSH is its subdivisions. We know that LCSH is enumerative, which means that the designers of the list enumerate all the possible headings for catalogers to use. In a fully enumerated list, catalogers have no flexibility in creating their own headings. But LCSH is not totally enumerative, it is also synthetic, which means there is some flexibility in the types of headings catalogers can create. LCSH enables this kind of flexibility by allowing for subdivisions. Historically, LCSH was first a fully enumerated system, but subdivisions were added first in the 1920s, then in the 1970s, making the system more synthetic.34 Subdivisions are created using syntax. And this syntax takes the form of an emdash. The basic form of synthesizingsyntax in LCSH to create subdivisions is Main heading emdash subdivision emdash subdivision. Subdivisions allow for better specificity of heading, because they allow a single heading to express multiple aspects of a work’s content. This specificity improves precision. Main headings are taken from LCSH, and the subdivisions are taken from various lists.35 There are four types of subdivisions. There are form subdivisions—what a work is. Is it a bibliography, is it fiction,is it an index?36 Geographicsubdivisions, when a work is about a subject in a particular place. Geographic subdivisions are entered hierarchically from largest to smallest. For example, Kansas, Emporia.37 Periodsubdivisions, for works about specific times, specific coverage.38 Topicalsubdivisions express a subtopic of the main subject. The application of subdivisions may be limited to specific main headings. In other words, some headings specify which subdivisions can be used with them. Other subdivisons, called “free-floating” subdivisions, can be used by the cataloger even when the heading doesn’t explicitly specify them. Other subdivisions are used with certain categories of headings.39 As we said,a controlled vocabulary has to solve the problem of relational semantics. This includes the problem of synonyms and the problem of how a term relates to other, similar terms, including those that are broader and narrower. LCSH uses symbols to express these relationships.40 Thefirst is UF. UF stands for Used For. It expresses equivalence. It shows what the authorized subject heading is and what its synonyms are. In the index, any non-authorized synonyms will have “see” references that lead to the authorized term.41 The next is BT. BT stands for broaderterm. It shows a hierarchical relationship by showing a term that is broader than the term shown. The broader term will include “see also” references to terms narrower in meaning.42 NT stands for narrower term. Like broader term, it expresses hierarchy and specifies “see also” references frombroader terms to narrower terms. Notice that as a rule, there are no “see also” references from narrower terms to broader terms.43 SA means “see also.” This also specifies a hierarchical relationship below theterm to terms too numerous to list.44 Finally there is RT, which means relatedterm. RT does not specify equivalence or hierarchy but terms that are similar in meaning.45 Here is theentry for magicians. You know that a geographical subdivision can be used because it says “may subdivide geographically.” There are two suggested LC classification call numbers there. Then there are the semantic relationships. It says to use the magicians heading for conjurers, enchanters, and sorcerers. So imagine you catalog something with this heading. You should also add conjurers, enchanters, and sorcerers in the index as headings with “see” references that direct the user to magicians, which is the authorized heading. Conjuring and entertainers are listed as broader terms. This means any entries for conjuring and entertainers should include “see also” references that lead to magicians, since magicians is the narrower term. A related term is wizards. This means both magicians and wizards should have reciprocal “see also” references. And a narrower term is miracle workers. So supposing miracle workers and magicians appear as headings in your index. You should have a “see also” reference leading from magicians to miracle workers. Remember, hierarchical references only go down, not up.46 Here we have another example. It is the heading for motionpictures. You can see that it can be subdivided geographically. You see the LCC call number. What’s distinctive about this entry is its scope notes. There is an actual direction written out for catalogers. It says “Here are entered general works on motion pictures themselves, including motion pictures as an art form, copyrighting, distribution, edition, plots, production, etc. Works on the technical aspects of making motion pictures and their projection onto a screen are entered under Cinematography.” Then you cansee what alternate headings direct to this authorized heading. Motion pictures is used for cinema, films, movies. You can see that Moving-pictures was the former heading, which is interesting. There are some broader terms, and then there are many, many narrower terms, all listed out. Some of these narrower terms are “dinosaurs in motion pictures,” “Star Wars films,” “ghosts in motion pictures.” There are also many specific subdivisions listed under this heading.47 LCSH is complex andconfusing. It takes practice to learn. There are four main guidelines for applying LCSH.48 First,identify the main subject of the work. Consider how the user might view it. Use a heading only if it covers at a minimum 20 percent of the work. It’s fine if a work is about more than one thing, but don’t assign more than 6, to say nothing of 10, headings.49 Next, identify any other aspects or facets of the work that should be expressed. Is it a specific form, like a bibliographyor index? Is it about a particular time or place? These aspects could be expressed using subdivisions.50 Next, assign the most specific heading. Assign more than one heading if the work is about more thanone thing.51 Finally, use the subdivisionsto express the aspects of the work.52 That brings us to the end of the lecture. Just to recap, we examined the history of LCSH and Sears. We also looked at the principles thatunderlie LCSH. And we examined how LC subject headings are constructed, in general. Thanks for listening.53