]>
Commit | Line | Data |
---|---|---|
f9995f31 MG |
1 | \r |
2 | \r | |
3 | The Project Gutenberg Etext of LOC WORKSHOP ON ELECTRONIC TEXTS\r | |
4 | \r | |
5 | \r | |
6 | \r | |
7 | \r | |
8 | WORKSHOP ON ELECTRONIC TEXTS\r | |
9 | \r | |
10 | PROCEEDINGS\r | |
11 | \r | |
12 | \r | |
13 | \r | |
14 | Edited by James Daly\r | |
15 | \r | |
16 | \r | |
17 | \r | |
18 | \r | |
19 | \r | |
20 | \r | |
21 | \r | |
22 | 9-10 June 1992\r | |
23 | \r | |
24 | \r | |
25 | Library of Congress\r | |
26 | Washington, D.C.\r | |
27 | \r | |
28 | \r | |
29 | \r | |
30 | Supported by a Grant from the David and Lucile Packard Foundation\r | |
31 | \r | |
32 | \r | |
33 | *** *** *** ****** *** *** ***\r | |
34 | \r | |
35 | \r | |
36 | TABLE OF CONTENTS\r | |
37 | \r | |
38 | \r | |
39 | Acknowledgements\r | |
40 | \r | |
41 | Introduction\r | |
42 | \r | |
43 | Proceedings\r | |
44 | Welcome\r | |
45 | Prosser Gifford and Carl Fleischhauer\r | |
46 | \r | |
47 | Session I. Content in a New Form: Who Will Use It and What Will They Do?\r | |
48 | James Daly (Moderator)\r | |
49 | Avra Michelson, Overview\r | |
50 | Susan H. Veccia, User Evaluation\r | |
51 | Joanne Freeman, Beyond the Scholar\r | |
52 | Discussion\r | |
53 | \r | |
54 | Session II. Show and Tell\r | |
55 | Jacqueline Hess (Moderator)\r | |
56 | Elli Mylonas, Perseus Project\r | |
57 | Discussion\r | |
58 | Eric M. Calaluca, Patrologia Latina Database\r | |
59 | Carl Fleischhauer and Ricky Erway, American Memory\r | |
60 | Discussion\r | |
61 | Dorothy Twohig, The Papers of George Washington\r | |
62 | Discussion\r | |
63 | Maria L. Lebron, The Online Journal of Current Clinical Trials\r | |
64 | Discussion\r | |
65 | Lynne K. Personius, Cornell mathematics books\r | |
66 | Discussion\r | |
67 | \r | |
68 | Session III. Distribution, Networks, and Networking: \r | |
69 | Options for Dissemination\r | |
70 | Robert G. Zich (Moderator)\r | |
71 | Clifford A. Lynch\r | |
72 | Discussion\r | |
73 | Howard Besser\r | |
74 | Discussion\r | |
75 | Ronald L. Larsen\r | |
76 | Edwin B. Brownrigg\r | |
77 | Discussion\r | |
78 | \r | |
79 | Session IV. Image Capture, Text Capture, Overview of Text and\r | |
80 | Image Storage Formats\r | |
81 | William L. Hooton (Moderator)\r | |
82 | A) Principal Methods for Image Capture of Text: \r | |
83 | direct scanning, use of microform\r | |
84 | Anne R. Kenney\r | |
85 | Pamela Q.J. Andre\r | |
86 | Judith A. Zidar\r | |
87 | Donald J. Waters\r | |
88 | Discussion\r | |
89 | B) Special Problems: bound volumes, conservation,\r | |
90 | reproducing printed halftones\r | |
91 | George Thoma\r | |
92 | Carl Fleischhauer\r | |
93 | Discussion\r | |
94 | C) Image Standards and Implications for Preservation\r | |
95 | Jean Baronas\r | |
96 | Patricia Battin\r | |
97 | Discussion\r | |
98 | D) Text Conversion: OCR vs. rekeying, standards of accuracy\r | |
99 | and use of imperfect texts, service bureaus\r | |
100 | Michael Lesk\r | |
101 | Ricky Erway\r | |
102 | Judith A. Zidar\r | |
103 | Discussion\r | |
104 | \r | |
105 | Session V. Approaches to Preparing Electronic Texts\r | |
106 | Susan Hockey (Moderator)\r | |
107 | Stuart Weibel\r | |
108 | Discussion\r | |
109 | C.M. Sperberg-McQueen\r | |
110 | Discussion\r | |
111 | Eric M. Calaluca\r | |
112 | Discussion\r | |
113 | \r | |
114 | Session VI. Copyright Issues\r | |
115 | Marybeth Peters\r | |
116 | \r | |
117 | Session VII. Conclusion\r | |
118 | Prosser Gifford (Moderator)\r | |
119 | General discussion\r | |
120 | \r | |
121 | Appendix I: Program\r | |
122 | \r | |
123 | Appendix II: Abstracts\r | |
124 | \r | |
125 | Appendix III: Directory of Participants\r | |
126 | \r | |
127 | \r | |
128 | *** *** *** ****** *** *** ***\r | |
129 | \r | |
130 | \r | |
131 | Acknowledgements\r | |
132 | \r | |
133 | I would like to thank Carl Fleischhauer and Prosser Gifford for the\r | |
134 | opportunity to learn about areas of human activity unknown to me a scant\r | |
135 | ten months ago, and the David and Lucile Packard Foundation for\r | |
136 | supporting that opportunity. The help given by others is acknowledged on\r | |
137 | a separate page.\r | |
138 | \r | |
139 | 19 October 1992\r | |
140 | \r | |
141 | \r | |
142 | *** *** *** ****** *** *** ***\r | |
143 | \r | |
144 | \r | |
145 | INTRODUCTION\r | |
146 | \r | |
147 | The Workshop on Electronic Texts (1) drew together representatives of\r | |
148 | various projects and interest groups to compare ideas, beliefs,\r | |
149 | experiences, and, in particular, methods of placing and presenting\r | |
150 | historical textual materials in computerized form. Most attendees gained\r | |
151 | much in insight and outlook from the event. But the assembly did not\r | |
152 | form a new nation, or, to put it another way, the diversity of projects\r | |
153 | and interests was too great to draw the representatives into a cohesive,\r | |
154 | action-oriented body.(2)\r | |
155 | \r | |
156 | Everyone attending the Workshop shared an interest in preserving and\r | |
157 | providing access to historical texts. But within this broad field the\r | |
158 | attendees represented a variety of formal, informal, figurative, and\r | |
159 | literal groups, with many individuals belonging to more than one. These\r | |
160 | groups may be defined roughly according to the following topics or\r | |
161 | activities:\r | |
162 | \r | |
163 | * Imaging\r | |
164 | * Searchable coded texts\r | |
165 | * National and international computer networks\r | |
166 | * CD-ROM production and dissemination\r | |
167 | * Methods and technology for converting older paper materials into\r | |
168 | electronic form\r | |
169 | * Study of the use of digital materials by scholars and others\r | |
170 | \r | |
171 | This summary is arranged thematically and does not follow the actual\r | |
172 | sequence of presentations.\r | |
173 | \r | |
174 | NOTES:\r | |
175 | (1) In this document, the phrase electronic text is used to mean\r | |
176 | any computerized reproduction or version of a document, book,\r | |
177 | article, or manuscript (including images), and not merely a machine-\r | |
178 | readable or machine-searchable text.\r | |
179 | \r | |
180 | (2) The Workshop was held at the Library of Congress on 9-10 June\r | |
181 | 1992, with funding from the David and Lucile Packard Foundation. \r | |
182 | The document that follows represents a summary of the presentations\r | |
183 | made at the Workshop and was compiled by James DALY. This\r | |
184 | introduction was written by DALY and Carl FLEISCHHAUER.\r | |
185 | \r | |
186 | \r | |
187 | PRESERVATION AND IMAGING\r | |
188 | \r | |
189 | Preservation, as that term is used by archivists,(3) was most explicitly\r | |
190 | discussed in the context of imaging. Anne KENNEY and Lynne PERSONIUS\r | |
191 | explained how the concept of a faithful copy and the user-friendliness of\r | |
192 | the traditional book have guided their project at Cornell University.(4) \r | |
193 | Although interested in computerized dissemination, participants in the\r | |
194 | Cornell project are creating digital image sets of older books in the\r | |
195 | public domain as a source for a fresh paper facsimile or, in a future\r | |
196 | phase, microfilm. The books returned to the library shelves are\r | |
197 | high-quality and useful replacements on acid-free paper that should last\r | |
198 | a long time. To date, the Cornell project has placed little or no\r | |
199 | emphasis on creating searchable texts; one would not be surprised to find\r | |
200 | that the project participants view such texts as new editions, and thus\r | |
201 | not as faithful reproductions. \r | |
202 | \r | |
203 | In her talk on preservation, Patricia BATTIN struck an ecumenical and\r | |
204 | flexible note as she endorsed the creation and dissemination of a variety\r | |
205 | of types of digital copies. Do not be too narrow in defining what counts\r | |
206 | as a preservation element, BATTIN counseled; for the present, at least,\r | |
207 | digital copies made with preservation in mind cannot be as narrowly\r | |
208 | standardized as, say, microfilm copies with the same objective. Setting\r | |
209 | standards precipitously can inhibit creativity, but delay can result in\r | |
210 | chaos, she advised.\r | |
211 | \r | |
212 | In part, BATTIN's position reflected the unsettled nature of image-format\r | |
213 | standards, and attendees could hear echoes of this unsettledness in the\r | |
214 | comments of various speakers. For example, Jean BARONAS reviewed the\r | |
215 | status of several formal standards moving through committees of experts;\r | |
216 | and Clifford LYNCH encouraged the use of a new guideline for transmitting\r | |
217 | document images on Internet. Testimony from participants in the National\r | |
218 | Agricultural Library's (NAL) Text Digitization Program and LC's American\r | |
219 | Memory project highlighted some of the challenges to the actual creation\r | |
220 | or interchange of images, including difficulties in converting\r | |
221 | preservation microfilm to digital form. Donald WATERS reported on the\r | |
222 | progress of a master plan for a project at Yale University to convert\r | |
223 | books on microfilm to digital image sets, Project Open Book (POB).\r | |
224 | \r | |
225 | The Workshop offered rather less of an imaging practicum than planned,\r | |
226 | but "how-to" hints emerge at various points, for example, throughout\r | |
227 | KENNEY's presentation and in the discussion of arcana such as\r | |
228 | thresholding and dithering offered by George THOMA and FLEISCHHAUER.\r | |
229 | \r | |
230 | NOTES:\r | |
231 | (3) Although there is a sense in which any reproductions of\r | |
232 | historical materials preserve the human record, specialists in the\r | |
233 | field have developed particular guidelines for the creation of\r | |
234 | acceptable preservation copies.\r | |
235 | \r | |
236 | (4) Titles and affiliations of presenters are given at the\r | |
237 | beginning of their respective talks and in the Directory of\r | |
238 | Participants (Appendix III).\r | |
239 | \r | |
240 | \r | |
241 | THE MACHINE-READABLE TEXT: MARKUP AND USE\r | |
242 | \r | |
243 | The sections of the Workshop that dealt with machine-readable text tended\r | |
244 | to be more concerned with access and use than with preservation, at least\r | |
245 | in the narrow technical sense. Michael SPERBERG-McQUEEN made a forceful\r | |
246 | presentation on the Text Encoding Initiative's (TEI) implementation of\r | |
247 | the Standard Generalized Markup Language (SGML). His ideas were echoed\r | |
248 | by Susan HOCKEY, Elli MYLONAS, and Stuart WEIBEL. While the\r | |
249 | presentations made by the TEI advocates contained no practicum, their\r | |
250 | discussion focused on the value of the finished product, what the\r | |
251 | European Community calls reusability, but what may also be termed\r | |
252 | durability. They argued that marking up--that is, coding--a text in a\r | |
253 | well-conceived way will permit it to be moved from one computer\r | |
254 | environment to another, as well as to be used by various users. Two\r | |
255 | kinds of markup were distinguished: 1) procedural markup, which\r | |
256 | describes the features of a text (e.g., dots on a page), and 2)\r | |
257 | descriptive markup, which describes the structure or elements of a\r | |
258 | document (e.g., chapters, paragraphs, and front matter).\r | |
259 | \r | |
260 | The TEI proponents emphasized the importance of texts to scholarship. \r | |
261 | They explained how heavily coded (and thus analyzed and annotated) texts\r | |
262 | can underlie research, play a role in scholarly communication, and\r | |
263 | facilitate classroom teaching. SPERBERG-McQUEEN reminded listeners that\r | |
264 | a written or printed item (e.g., a particular edition of a book) is\r | |
265 | merely a representation of the abstraction we call a text. To concern\r | |
266 | ourselves with faithfully reproducing a printed instance of the text,\r | |
267 | SPERBERG-McQUEEN argued, is to concern ourselves with the representation\r | |
268 | of a representation ("images as simulacra for the text"). The TEI proponents'\r | |
269 | interest in images tends to focus on corollary materials for use in teaching,\r | |
270 | for example, photographs of the Acropolis to accompany a Greek text.\r | |
271 | \r | |
272 | By the end of the Workshop, SPERBERG-McQUEEN confessed to having been\r | |
273 | converted to a limited extent to the view that electronic images\r | |
274 | constitute a promising alternative to microfilming; indeed, an\r | |
275 | alternative probably superior to microfilming. But he was not convinced\r | |
276 | that electronic images constitute a serious attempt to represent text in\r | |
277 | electronic form. HOCKEY and MYLONAS also conceded that their experience\r | |
278 | at the Pierce Symposium the previous week at Georgetown University and\r | |
279 | the present conference at the Library of Congress had compelled them to\r | |
280 | reevaluate their perspective on the usefulness of text as images. \r | |
281 | Attendees could see that the text and image advocates were in\r | |
282 | constructive tension, so to say.\r | |
283 | \r | |
284 | Three nonTEI presentations described approaches to preparing\r | |
285 | machine-readable text that are less rigorous and thus less expensive. In\r | |
286 | the case of the Papers of George Washington, Dorothy TWOHIG explained\r | |
287 | that the digital version will provide a not-quite-perfect rendering of\r | |
288 | the transcribed text--some 135,000 documents, available for research\r | |
289 | during the decades while the perfect or print version is completed. \r | |
290 | Members of the American Memory team and the staff of NAL's Text\r | |
291 | Digitization Program (see below) also outlined a middle ground concerning\r | |
292 | searchable texts. In the case of American Memory, contractors produce\r | |
293 | texts with about 99-percent accuracy that serve as "browse" or\r | |
294 | "reference" versions of written or printed originals. End users who need\r | |
295 | faithful copies or perfect renditions must refer to accompanying sets of\r | |
296 | digital facsimile images or consult copies of the originals in a nearby\r | |
297 | library or archive. American Memory staff argued that the high cost of\r | |
298 | producing 100-percent accurate copies would prevent LC from offering\r | |
299 | access to large parts of its collections.\r | |
300 | \r | |
301 | \r | |
302 | THE MACHINE-READABLE TEXT: METHODS OF CONVERSION\r | |
303 | \r | |
304 | Although the Workshop did not include a systematic examination of the\r | |
305 | methods for converting texts from paper (or from facsimile images) into\r | |
306 | machine-readable form, nevertheless, various speakers touched upon this\r | |
307 | matter. For example, WEIBEL reported that OCLC has experimented with a\r | |
308 | merging of multiple optical character recognition systems that will\r | |
309 | reduce errors from an unacceptable rate of 5 characters out of every\r | |
310 | l,000 to an unacceptable rate of 2 characters out of every l,000.\r | |
311 | \r | |
312 | Pamela ANDRE presented an overview of NAL's Text Digitization Program and\r | |
313 | Judith ZIDAR discussed the technical details. ZIDAR explained how NAL\r | |
314 | purchased hardware and software capable of performing optical character\r | |
315 | recognition (OCR) and text conversion and used its own staff to convert\r | |
316 | texts. The process, ZIDAR said, required extensive editing and project\r | |
317 | staff found themselves considering alternatives, including rekeying\r | |
318 | and/or creating abstracts or summaries of texts. NAL reckoned costs at\r | |
319 | $7 per page. By way of contrast, Ricky ERWAY explained that American\r | |
320 | Memory had decided from the start to contract out conversion to external\r | |
321 | service bureaus. The criteria used to select these contractors were cost\r | |
322 | and quality of results, as opposed to methods of conversion. ERWAY noted\r | |
323 | that historical documents or books often do not lend themselves to OCR. \r | |
324 | Bound materials represent a special problem. In her experience, quality\r | |
325 | control--inspecting incoming materials, counting errors in samples--posed\r | |
326 | the most time-consuming aspect of contracting out conversion. ERWAY\r | |
327 | reckoned American Memory's costs at $4 per page, but cautioned that fewer\r | |
328 | cost-elements had been included than in NAL's figure.\r | |
329 | \r | |
330 | \r | |
331 | OPTIONS FOR DISSEMINATION\r | |
332 | \r | |
333 | The topic of dissemination proper emerged at various points during the\r | |
334 | Workshop. At the session devoted to national and international computer\r | |
335 | networks, LYNCH, Howard BESSER, Ronald LARSEN, and Edwin BROWNRIGG\r | |
336 | highlighted the virtues of Internet today and of the network that will\r | |
337 | evolve from Internet. Listeners could discern in these narratives a\r | |
338 | vision of an information democracy in which millions of citizens freely\r | |
339 | find and use what they need. LYNCH noted that a lack of standards\r | |
340 | inhibits disseminating multimedia on the network, a topic also discussed\r | |
341 | by BESSER. LARSEN addressed the issues of network scalability and\r | |
342 | modularity and commented upon the difficulty of anticipating the effects\r | |
343 | of growth in orders of magnitude. BROWNRIGG talked about the ability of\r | |
344 | packet radio to provide certain links in a network without the need for\r | |
345 | wiring. However, the presenters also called attention to the\r | |
346 | shortcomings and incongruities of present-day computer networks. For\r | |
347 | example: 1) Network use is growing dramatically, but much network\r | |
348 | traffic consists of personal communication (E-mail). 2) Large bodies of\r | |
349 | information are available, but a user's ability to search across their\r | |
350 | entirety is limited. 3) There are significant resources for science and\r | |
351 | technology, but few network sources provide content in the humanities. \r | |
352 | 4) Machine-readable texts are commonplace, but the capability of the\r | |
353 | system to deal with images (let alone other media formats) lags behind. \r | |
354 | A glimpse of a multimedia future for networks, however, was provided by\r | |
355 | Maria LEBRON in her overview of the Online Journal of Current Clinical\r | |
356 | Trials (OJCCT), and the process of scholarly publishing on-line. \r | |
357 | \r | |
358 | The contrasting form of the CD-ROM disk was never systematically\r | |
359 | analyzed, but attendees could glean an impression from several of the\r | |
360 | show-and-tell presentations. The Perseus and American Memory examples\r | |
361 | demonstrated recently published disks, while the descriptions of the\r | |
362 | IBYCUS version of the Papers of George Washington and Chadwyck-Healey's\r | |
363 | Patrologia Latina Database (PLD) told of disks to come. According to\r | |
364 | Eric CALALUCA, PLD's principal focus has been on converting Jacques-Paul\r | |
365 | Migne's definitive collection of Latin texts to machine-readable form. \r | |
366 | Although everyone could share the network advocates' enthusiasm for an\r | |
367 | on-line future, the possibility of rolling up one's sleeves for a session\r | |
368 | with a CD-ROM containing both textual materials and a powerful retrieval\r | |
369 | engine made the disk seem an appealing vessel indeed. The overall\r | |
370 | discussion suggested that the transition from CD-ROM to on-line networked\r | |
371 | access may prove far slower and more difficult than has been anticipated.\r | |
372 | \r | |
373 | \r | |
374 | WHO ARE THE USERS AND WHAT DO THEY DO?\r | |
375 | \r | |
376 | Although concerned with the technicalities of production, the Workshop\r | |
377 | never lost sight of the purposes and uses of electronic versions of\r | |
378 | textual materials. As noted above, those interested in imaging discussed\r | |
379 | the problematical matter of digital preservation, while the TEI proponents\r | |
380 | described how machine-readable texts can be used in research. This latter\r | |
381 | topic received thorough treatment in the paper read by Avra MICHELSON.\r | |
382 | She placed the phenomenon of electronic texts within the context of\r | |
383 | broader trends in information technology and scholarly communication.\r | |
384 | \r | |
385 | Among other things, MICHELSON described on-line conferences that\r | |
386 | represent a vigorous and important intellectual forum for certain\r | |
387 | disciplines. Internet now carries more than 700 conferences, with about\r | |
388 | 80 percent of these devoted to topics in the social sciences and the\r | |
389 | humanities. Other scholars use on-line networks for "distance learning." \r | |
390 | Meanwhile, there has been a tremendous growth in end-user computing;\r | |
391 | professors today are less likely than their predecessors to ask the\r | |
392 | campus computer center to process their data. Electronic texts are one\r | |
393 | key to these sophisticated applications, MICHELSON reported, and more and\r | |
394 | more scholars in the humanities now work in an on-line environment. \r | |
395 | Toward the end of the Workshop, Michael LESK presented a corollary to\r | |
396 | MICHELSON's talk, reporting the results of an experiment that compared\r | |
397 | the work of one group of chemistry students using traditional printed\r | |
398 | texts and two groups using electronic sources. The experiment\r | |
399 | demonstrated that in the event one does not know what to read, one needs\r | |
400 | the electronic systems; the electronic systems hold no advantage at the\r | |
401 | moment if one knows what to read, but neither do they impose a penalty.\r | |
402 | \r | |
403 | DALY provided an anecdotal account of the revolutionizing impact of the\r | |
404 | new technology on his previous methods of research in the field of classics.\r | |
405 | His account, by extrapolation, served to illustrate in part the arguments\r | |
406 | made by MICHELSON concerning the positive effects of the sudden and radical\r | |
407 | transformation being wrought in the ways scholars work.\r | |
408 | \r | |
409 | Susan VECCIA and Joanne FREEMAN delineated the use of electronic\r | |
410 | materials outside the university. The most interesting aspect of their\r | |
411 | use, FREEMAN said, could be seen as a paradox: teachers in elementary\r | |
412 | and secondary schools requested access to primary source materials but,\r | |
413 | at the same time, found that "primariness" itself made these materials\r | |
414 | difficult for their students to use.\r | |
415 | \r | |
416 | \r | |
417 | OTHER TOPICS\r | |
418 | \r | |
419 | Marybeth PETERS reviewed copyright law in the United States and offered\r | |
420 | advice during a lively discussion of this subject. But uncertainty\r | |
421 | remains concerning the price of copyright in a digital medium, because a\r | |
422 | solution remains to be worked out concerning management and synthesis of\r | |
423 | copyrighted and out-of-copyright pieces of a database.\r | |
424 | \r | |
425 | As moderator of the final session of the Workshop, Prosser GIFFORD directed\r | |
426 | discussion to future courses of action and the potential role of LC in\r | |
427 | advancing them. Among the recommendations that emerged were the following:\r | |
428 | \r | |
429 | * Workshop participants should 1) begin to think about working\r | |
430 | with image material, but structure and digitize it in such a\r | |
431 | way that at a later stage it can be interpreted into text, and\r | |
432 | 2) find a common way to build text and images together so that\r | |
433 | they can be used jointly at some stage in the future, with\r | |
434 | appropriate network support, because that is how users will want\r | |
435 | to access these materials. The Library might encourage attempts\r | |
436 | to bring together people who are working on texts and images.\r | |
437 | \r | |
438 | * A network version of American Memory should be developed or\r | |
439 | consideration should be given to making the data in it\r | |
440 | available to people interested in doing network multimedia. \r | |
441 | Given the current dearth of digital data that is appealing and\r | |
442 | unencumbered by extremely complex rights problems, developing a\r | |
443 | network version of American Memory could do much to help make\r | |
444 | network multimedia a reality.\r | |
445 | \r | |
446 | * Concerning the thorny issue of electronic deposit, LC should\r | |
447 | initiate a catalytic process in terms of distributed\r | |
448 | responsibility, that is, bring together the distributed\r | |
449 | organizations and set up a study group to look at all the\r | |
450 | issues related to electronic deposit and see where we as a\r | |
451 | nation should move. For example, LC might attempt to persuade\r | |
452 | one major library in each state to deal with its state\r | |
453 | equivalent publisher, which might produce a cooperative project\r | |
454 | that would be equitably distributed around the country, and one\r | |
455 | in which LC would be dealing with a minimal number of publishers\r | |
456 | and minimal copyright problems. LC must also deal with the\r | |
457 | concept of on-line publishing, determining, among other things,\r | |
458 | how serials such as OJCCT might be deposited for copyright.\r | |
459 | \r | |
460 | * Since a number of projects are planning to carry out\r | |
461 | preservation by creating digital images that will end up in\r | |
462 | on-line or near-line storage at some institution, LC might play\r | |
463 | a helpful role, at least in the near term, by accelerating how\r | |
464 | to catalog that information into the Research Library Information\r | |
465 | Network (RLIN) and then into OCLC, so that it would be accessible.\r | |
466 | This would reduce the possibility of multiple institutions digitizing\r | |
467 | the same work. \r | |
468 | \r | |
469 | \r | |
470 | CONCLUSION\r | |
471 | \r | |
472 | The Workshop was valuable because it brought together partisans from\r | |
473 | various groups and provided an occasion to compare goals and methods. \r | |
474 | The more committed partisans frequently communicate with others in their\r | |
475 | groups, but less often across group boundaries. The Workshop was also\r | |
476 | valuable to attendees--including those involved with American Memory--who\r | |
477 | came less committed to particular approaches or concepts. These\r | |
478 | attendees learned a great deal, and plan to select and employ elements of\r | |
479 | imaging, text-coding, and networked distribution that suit their\r | |
480 | respective projects and purposes.\r | |
481 | \r | |
482 | Still, reality rears its ugly head: no breakthrough has been achieved. \r | |
483 | On the imaging side, one confronts a proliferation of competing\r | |
484 | data-interchange standards and a lack of consensus on the role of digital\r | |
485 | facsimiles in preservation. In the realm of machine-readable texts, one\r | |
486 | encounters a reasonably mature standard but methodological difficulties\r | |
487 | and high costs. These latter problems, of course, represent a special\r | |
488 | impediment to the desire, as it is sometimes expressed in the popular\r | |
489 | press, "to put the [contents of the] Library of Congress on line." In\r | |
490 | the words of one participant, there was "no solution to the economic\r | |
491 | problems--the projects that are out there are surviving, but it is going\r | |
492 | to be a lot of work to transform the information industry, and so far the\r | |
493 | investment to do that is not forthcoming" (LESK, per litteras).\r | |
494 | \r | |
495 | \r | |
496 | *** *** *** ****** *** *** ***\r | |
497 | \r | |
498 | \r | |
499 | PROCEEDINGS\r | |
500 | \r | |
501 | \r | |
502 | WELCOME\r | |
503 | \r | |
504 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
505 | GIFFORD * Origin of Workshop in current Librarian's desire to make LC's\r | |
506 | collections more widely available * Desiderata arising from the prospect\r | |
507 | of greater interconnectedness *\r | |
508 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
509 | \r | |
510 | After welcoming participants on behalf of the Library of Congress,\r | |
511 | American Memory (AM), and the National Demonstration Lab, Prosser\r | |
512 | GIFFORD, director for scholarly programs, Library of Congress, located\r | |
513 | the origin of the Workshop on Electronic Texts in a conversation he had\r | |
514 | had considerably more than a year ago with Carl FLEISCHHAUER concerning\r | |
515 | some of the issues faced by AM. On the assumption that numerous other\r | |
516 | people were asking the same questions, the decision was made to bring\r | |
517 | together as many of these people as possible to ask the same questions\r | |
518 | together. In a deeper sense, GIFFORD said, the origin of the Workshop\r | |
519 | lay in the desire of the current Librarian of Congress, James H. \r | |
520 | Billington, to make the collections of the Library, especially those\r | |
521 | offering unique or unusual testimony on aspects of the American\r | |
522 | experience, available to a much wider circle of users than those few\r | |
523 | people who can come to Washington to use them. This meant that the\r | |
524 | emphasis of AM, from the outset, has been on archival collections of the\r | |
525 | basic material, and on making these collections themselves available,\r | |
526 | rather than selected or heavily edited products.\r | |
527 | \r | |
528 | From AM's emphasis followed the questions with which the Workshop began: \r | |
529 | who will use these materials, and in what form will they wish to use\r | |
530 | them. But an even larger issue deserving mention, in GIFFORD's view, was\r | |
531 | the phenomenal growth in Internet connectivity. He expressed the hope\r | |
532 | that the prospect of greater interconnectedness than ever before would\r | |
533 | lead to: 1) much more cooperative and mutually supportive endeavors; 2)\r | |
534 | development of systems of shared and distributed responsibilities to\r | |
535 | avoid duplication and to ensure accuracy and preservation of unique\r | |
536 | materials; and 3) agreement on the necessary standards and development of\r | |
537 | the appropriate directories and indices to make navigation\r | |
538 | straightforward among the varied resources that are, and increasingly\r | |
539 | will be, available. In this connection, GIFFORD requested that\r | |
540 | participants reflect from the outset upon the sorts of outcomes they\r | |
541 | thought the Workshop might have. Did those present constitute a group\r | |
542 | with sufficient common interests to propose a next step or next steps,\r | |
543 | and if so, what might those be? They would return to these questions the\r | |
544 | following afternoon.\r | |
545 | \r | |
546 | ******\r | |
547 | \r | |
548 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
549 | FLEISCHHAUER * Core of Workshop concerns preparation and production of\r | |
550 | materials * Special challenge in conversion of textual materials *\r | |
551 | Quality versus quantity * Do the several groups represented share common\r | |
552 | interests? *\r | |
553 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
554 | \r | |
555 | Carl FLEISCHHAUER, coordinator, American Memory, Library of Congress,\r | |
556 | emphasized that he would attempt to represent the people who perform some\r | |
557 | of the work of converting or preparing materials and that the core of\r | |
558 | the Workshop had to do with preparation and production. FLEISCHHAUER\r | |
559 | then drew a distinction between the long term, when many things would be\r | |
560 | available and connected in the ways that GIFFORD described, and the short\r | |
561 | term, in which AM not only has wrestled with the issue of what is the\r | |
562 | best course to pursue but also has faced a variety of technical\r | |
563 | challenges.\r | |
564 | \r | |
565 | FLEISCHHAUER remarked AM's endeavors to deal with a wide range of library\r | |
566 | formats, such as motion picture collections, sound-recording collections,\r | |
567 | and pictorial collections of various sorts, especially collections of\r | |
568 | photographs. In the course of these efforts, AM kept coming back to\r | |
569 | textual materials--manuscripts or rare printed matter, bound materials,\r | |
570 | etc. Text posed the greatest conversion challenge of all. Thus, the\r | |
571 | genesis of the Workshop, which reflects the problems faced by AM. These\r | |
572 | problems include physical problems. For example, those in the library\r | |
573 | and archive business deal with collections made up of fragile and rare\r | |
574 | manuscript items, bound materials, especially the notoriously brittle\r | |
575 | bound materials of the late nineteenth century. These are precious\r | |
576 | cultural artifacts, however, as well as interesting sources of\r | |
577 | information, and LC desires to retain and conserve them. AM needs to\r | |
578 | handle things without damaging them. Guillotining a book to run it\r | |
579 | through a sheet feeder must be avoided at all costs.\r | |
580 | \r | |
581 | Beyond physical problems, issues pertaining to quality arose. For\r | |
582 | example, the desire to provide users with a searchable text is affected\r | |
583 | by the question of acceptable level of accuracy. One hundred percent\r | |
584 | accuracy is tremendously expensive. On the other hand, the output of\r | |
585 | optical character recognition (OCR) can be tremendously inaccurate. \r | |
586 | Although AM has attempted to find a middle ground, uncertainty persists\r | |
587 | as to whether or not it has discovered the right solution.\r | |
588 | \r | |
589 | Questions of quality arose concerning images as well. FLEISCHHAUER\r | |
590 | contrasted the extremely high level of quality of the digital images in\r | |
591 | the Cornell Xerox Project with AM's efforts to provide a browse-quality\r | |
592 | or access-quality image, as opposed to an archival or preservation image. \r | |
593 | FLEISCHHAUER therefore welcomed the opportunity to compare notes.\r | |
594 | \r | |
595 | FLEISCHHAUER observed in passing that conversations he had had about\r | |
596 | networks have begun to signal that for various forms of media a\r | |
597 | determination may be made that there is a browse-quality item, or a\r | |
598 | distribution-and-access-quality item that may coexist in some systems\r | |
599 | with a higher quality archival item that would be inconvenient to send\r | |
600 | through the network because of its size. FLEISCHHAUER referred, of\r | |
601 | course, to images more than to searchable text.\r | |
602 | \r | |
603 | As AM considered those questions, several conceptual issues arose: ought\r | |
604 | AM occasionally to reproduce materials entirely through an image set, at\r | |
605 | other times, entirely through a text set, and in some cases, a mix? \r | |
606 | There probably would be times when the historical authenticity of an\r | |
607 | artifact would require that its image be used. An image might be\r | |
608 | desirable as a recourse for users if one could not provide 100-percent\r | |
609 | accurate text. Again, AM wondered, as a practical matter, if a\r | |
610 | distinction could be drawn between rare printed matter that might exist\r | |
611 | in multiple collections--that is, in ten or fifteen libraries. In such\r | |
612 | cases, the need for perfect reproduction would be less than for unique\r | |
613 | items. Implicit in his remarks, FLEISCHHAUER conceded, was the admission\r | |
614 | that AM has been tilting strongly towards quantity and drawing back a\r | |
615 | little from perfect quality. That is, it seemed to AM that society would\r | |
616 | be better served if more things were distributed by LC--even if they were\r | |
617 | not quite perfect--than if fewer things, perfectly represented, were\r | |
618 | distributed. This was stated as a proposition to be tested, with\r | |
619 | responses to be gathered from users.\r | |
620 | \r | |
621 | In thinking about issues related to reproduction of materials and seeing\r | |
622 | other people engaged in parallel activities, AM deemed it useful to\r | |
623 | convene a conference. Hence, the Workshop. FLEISCHHAUER thereupon\r | |
624 | surveyed the several groups represented: 1) the world of images (image\r | |
625 | users and image makers); 2) the world of text and scholarship and, within\r | |
626 | this group, those concerned with language--FLEISCHHAUER confessed to finding\r | |
627 | delightful irony in the fact that some of the most advanced thinkers on\r | |
628 | computerized texts are those dealing with ancient Greek and Roman materials;\r | |
629 | 3) the network world; and 4) the general world of library science, which\r | |
630 | includes people interested in preservation and cataloging.\r | |
631 | \r | |
632 | FLEISCHHAUER concluded his remarks with special thanks to the David and\r | |
633 | Lucile Packard Foundation for its support of the meeting, the American\r | |
634 | Memory group, the Office for Scholarly Programs, the National\r | |
635 | Demonstration Lab, and the Office of Special Events. He expressed the\r | |
636 | hope that David Woodley Packard might be able to attend, noting that\r | |
637 | Packard's work and the work of the foundation had sponsored a number of\r | |
638 | projects in the text area.\r | |
639 | \r | |
640 | ******\r | |
641 | \r | |
642 | SESSION I. CONTENT IN A NEW FORM: WHO WILL USE IT AND WHAT WILL THEY DO?\r | |
643 | \r | |
644 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
645 | DALY * Acknowledgements * A new Latin authors disk * Effects of the new\r | |
646 | technology on previous methods of research * \r | |
647 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
648 | \r | |
649 | Serving as moderator, James DALY acknowledged the generosity of all the\r | |
650 | presenters for giving of their time, counsel, and patience in planning\r | |
651 | the Workshop, as well as of members of the American Memory project and\r | |
652 | other Library of Congress staff, and the David and Lucile Packard\r | |
653 | Foundation and its executive director, Colburn S. Wilbur.\r | |
654 | \r | |
655 | DALY then recounted his visit in March to the Center for Electronic Texts\r | |
656 | in the Humanities (CETH) and the Department of Classics at Rutgers\r | |
657 | University, where an old friend, Lowell Edmunds, introduced him to the\r | |
658 | department's IBYCUS scholarly personal computer, and, in particular, the\r | |
659 | new Latin CD-ROM, containing, among other things, almost all classical\r | |
660 | Latin literary texts through A.D. 200. Packard Humanities Institute\r | |
661 | (PHI), Los Altos, California, released this disk late in 1991, with a\r | |
662 | nominal triennial licensing fee.\r | |
663 | \r | |
664 | Playing with the disk for an hour or so at Rutgers brought home to DALY\r | |
665 | at once the revolutionizing impact of the new technology on his previous\r | |
666 | methods of research. Had this disk been available two or three years\r | |
667 | earlier, DALY contended, when he was engaged in preparing a commentary on\r | |
668 | Book 10 of Virgil's Aeneid for Cambridge University Press, he would not\r | |
669 | have required a forty-eight-square-foot table on which to spread the\r | |
670 | numerous, most frequently consulted items, including some ten or twelve\r | |
671 | concordances to key Latin authors, an almost equal number of lexica to\r | |
672 | authors who lacked concordances, and where either lexica or concordances\r | |
673 | were lacking, numerous editions of authors antedating and postdating Virgil.\r | |
674 | \r | |
675 | Nor, when checking each of the average six to seven words contained in\r | |
676 | the Virgilian hexameter for its usage elsewhere in Virgil's works or\r | |
677 | other Latin authors, would DALY have had to maintain the laborious\r | |
678 | mechanical process of flipping through these concordances, lexica, and\r | |
679 | editions each time. Nor would he have had to frequent as often the\r | |
680 | Milton S. Eisenhower Library at the Johns Hopkins University to consult\r | |
681 | the Thesaurus Linguae Latinae. Instead of devoting countless hours, or\r | |
682 | the bulk of his research time, to gathering data concerning Virgil's use\r | |
683 | of words, DALY--now freed by PHI's Latin authors disk from the\r | |
684 | tyrannical, yet in some ways paradoxically happy scholarly drudgery--\r | |
685 | would have been able to devote that same bulk of time to analyzing and\r | |
686 | interpreting Virgilian verbal usage.\r | |
687 | \r | |
688 | Citing Theodore Brunner, Gregory Crane, Elli MYLONAS, and Avra MICHELSON,\r | |
689 | DALY argued that this reversal in his style of work, made possible by the\r | |
690 | new technology, would perhaps have resulted in better, more productive\r | |
691 | research. Indeed, even in the course of his browsing the Latin authors\r | |
692 | disk at Rutgers, its powerful search, retrieval, and highlighting\r | |
693 | capabilities suggested to him several new avenues of research into\r | |
694 | Virgil's use of sound effects. This anecdotal account, DALY maintained,\r | |
695 | may serve to illustrate in part the sudden and radical transformation\r | |
696 | being wrought in the ways scholars work.\r | |
697 | \r | |
698 | ******\r | |
699 | \r | |
700 | ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
701 | MICHELSON * Elements related to scholarship and technology * Electronic\r | |
702 | texts within the context of broader trends within information technology\r | |
703 | and scholarly communication * Evaluation of the prospects for the use of\r | |
704 | electronic texts * Relationship of electronic texts to processes of\r | |
705 | scholarly communication in humanities research * New exchange formats\r | |
706 | created by scholars * Projects initiated to increase scholarly access to\r | |
707 | converted text * Trend toward making electronic resources available\r | |
708 | through research and education networks * Changes taking place in\r | |
709 | scholarly communication among humanities scholars * Network-mediated\r | |
710 | scholarship transforming traditional scholarly practices * Key\r | |
711 | information technology trends affecting the conduct of scholarly\r | |
712 | communication over the next decade * The trend toward end-user computing\r | |
713 | * The trend toward greater connectivity * Effects of these trends * Key\r | |
714 | transformations taking place * Summary of principal arguments *\r | |
715 | ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
716 | \r | |
717 | Avra MICHELSON, Archival Research and Evaluation Staff, National Archives\r | |
718 | and Records Administration (NARA), argued that establishing who will use\r | |
719 | electronic texts and what they will use them for involves a consideration\r | |
720 | of both information technology and scholarship trends. This\r | |
721 | consideration includes several elements related to scholarship and\r | |
722 | technology: 1) the key trends in information technology that are most\r | |
723 | relevant to scholarship; 2) the key trends in the use of currently\r | |
724 | available technology by scholars in the nonscientific community; and 3)\r | |
725 | the relationship between these two very distinct but interrelated trends. \r | |
726 | The investment in understanding this relationship being made by\r | |
727 | information providers, technologists, and public policy developers, as\r | |
728 | well as by scholars themselves, seems to be pervasive and growing,\r | |
729 | MICHELSON contended. She drew on collaborative work with Jeff Rothenberg\r | |
730 | on the scholarly use of technology.\r | |
731 | \r | |
732 | MICHELSON sought to place the phenomenon of electronic texts within the\r | |
733 | context of broader trends within information technology and scholarly\r | |
734 | communication. She argued that electronic texts are of most use to\r | |
735 | researchers to the extent that the researchers' working context (i.e.,\r | |
736 | their relevant bibliographic sources, collegial feedback, analytic tools,\r | |
737 | notes, drafts, etc.), along with their field's primary and secondary\r | |
738 | sources, also is accessible in electronic form and can be integrated in\r | |
739 | ways that are unique to the on-line environment.\r | |
740 | \r | |
741 | Evaluation of the prospects for the use of electronic texts includes two\r | |
742 | elements: 1) an examination of the ways in which researchers currently\r | |
743 | are using electronic texts along with other electronic resources, and 2)\r | |
744 | an analysis of key information technology trends that are affecting the\r | |
745 | long-term conduct of scholarly communication. MICHELSON limited her\r | |
746 | discussion of the use of electronic texts to the practices of humanists\r | |
747 | and noted that the scientific community was outside the panel's overview.\r | |
748 | \r | |
749 | MICHELSON examined the nature of the current relationship of electronic\r | |
750 | texts in particular, and electronic resources in general, to what she\r | |
751 | maintained were, essentially, five processes of scholarly communication\r | |
752 | in humanities research. Researchers 1) identify sources, 2) communicate\r | |
753 | with their colleagues, 3) interpret and analyze data, 4) disseminate\r | |
754 | their research findings, and 5) prepare curricula to instruct the next\r | |
755 | generation of scholars and students. This examination would produce a\r | |
756 | clearer understanding of the synergy among these five processes that\r | |
757 | fuels the tendency of the use of electronic resources for one process to\r | |
758 | stimulate its use for other processes of scholarly communication.\r | |
759 | \r | |
760 | For the first process of scholarly communication, the identification of\r | |
761 | sources, MICHELSON remarked the opportunity scholars now enjoy to\r | |
762 | supplement traditional word-of-mouth searches for sources among their\r | |
763 | colleagues with new forms of electronic searching. So, for example,\r | |
764 | instead of having to visit the library, researchers are able to explore\r | |
765 | descriptions of holdings in their offices. Furthermore, if their own\r | |
766 | institutions' holdings prove insufficient, scholars can access more than\r | |
767 | 200 major American library catalogues over Internet, including the\r | |
768 | universities of California, Michigan, Pennsylvania, and Wisconsin. \r | |
769 | Direct access to the bibliographic databases offers intellectual\r | |
770 | empowerment to scholars by presenting a comprehensive means of browsing\r | |
771 | through libraries from their homes and offices at their convenience.\r | |
772 | \r | |
773 | The second process of communication involves communication among\r | |
774 | scholars. Beyond the most common methods of communication, scholars are\r | |
775 | using E-mail and a variety of new electronic communications formats\r | |
776 | derived from it for further academic interchange. E-mail exchanges are\r | |
777 | growing at an astonishing rate, reportedly 15 percent a month. They\r | |
778 | currently constitute approximately half the traffic on research and\r | |
779 | education networks. Moreover, the global spread of E-mail has been so\r | |
780 | rapid that it is now possible for American scholars to use it to\r | |
781 | communicate with colleagues in close to 140 other countries.\r | |
782 | \r | |
783 | Other new exchange formats created by scholars and operating on Internet\r | |
784 | include more than 700 conferences, with about 80 percent of these devoted\r | |
785 | to topics in the social sciences and humanities. The rate of growth of\r | |
786 | these scholarly electronic conferences also is astonishing. From l990 to\r | |
787 | l991, 200 new conferences were identified on Internet. From October 1991\r | |
788 | to June 1992, an additional 150 conferences in the social sciences and\r | |
789 | humanities were added to this directory of listings. Scholars have\r | |
790 | established conferences in virtually every field, within every different\r | |
791 | discipline. For example, there are currently close to 600 active social\r | |
792 | science and humanities conferences on topics such as art and\r | |
793 | architecture, ethnomusicology, folklore, Japanese culture, medical\r | |
794 | education, and gifted and talented education. The appeal to scholars of\r | |
795 | communicating through these conferences is that, unlike any other medium,\r | |
796 | electronic conferences today provide a forum for global communication\r | |
797 | with peers at the front end of the research process.\r | |
798 | \r | |
799 | Interpretation and analysis of sources constitutes the third process of\r | |
800 | scholarly communication that MICHELSON discussed in terms of texts and\r | |
801 | textual resources. The methods used to analyze sources fall somewhere on\r | |
802 | a continuum from quantitative analysis to qualitative analysis. \r | |
803 | Typically, evidence is culled and evaluated using methods drawn from both\r | |
804 | ends of this continuum. At one end, quantitative analysis involves the\r | |
805 | use of mathematical processes such as a count of frequencies and\r | |
806 | distributions of occurrences or, on a higher level, regression analysis. \r | |
807 | At the other end of the continuum, qualitative analysis typically\r | |
808 | involves nonmathematical processes oriented toward language\r | |
809 | interpretation or the building of theory. Aspects of this work involve\r | |
810 | the processing--either manual or computational--of large and sometimes\r | |
811 | massive amounts of textual sources, although the use of nontextual\r | |
812 | sources as evidence, such as photographs, sound recordings, film footage,\r | |
813 | and artifacts, is significant as well.\r | |
814 | \r | |
815 | Scholars have discovered that many of the methods of interpretation and\r | |
816 | analysis that are related to both quantitative and qualitative methods\r | |
817 | are processes that can be performed by computers. For example, computers\r | |
818 | can count. They can count brush strokes used in a Rembrandt painting or\r | |
819 | perform regression analysis for understanding cause and effect. By means\r | |
820 | of advanced technologies, computers can recognize patterns, analyze text,\r | |
821 | and model concepts. Furthermore, computers can complete these processes\r | |
822 | faster with more sources and with greater precision than scholars who\r | |
823 | must rely on manual interpretation of data. But if scholars are to use\r | |
824 | computers for these processes, source materials must be in a form\r | |
825 | amenable to computer-assisted analysis. For this reason many scholars,\r | |
826 | once they have identified the sources that are key to their research, are\r | |
827 | converting them to machine-readable form. Thus, a representative example\r | |
828 | of the numerous textual conversion projects organized by scholars around\r | |
829 | the world in recent years to support computational text analysis is the\r | |
830 | TLG, the Thesaurus Linguae Graecae. This project is devoted to\r | |
831 | converting the extant ancient texts of classical Greece. (Editor's note: \r | |
832 | according to the TLG Newsletter of May l992, TLG was in use in thirty-two\r | |
833 | different countries. This figure updates MICHELSON's previous count by one.)\r | |
834 | \r | |
835 | The scholars performing these conversions have been asked to recognize\r | |
836 | that the electronic sources they are converting for one use possess value\r | |
837 | for other research purposes as well. As a result, during the past few\r | |
838 | years, humanities scholars have initiated a number of projects to\r | |
839 | increase scholarly access to converted text. So, for example, the Text\r | |
840 | Encoding Initiative (TEI), about which more is said later in the program,\r | |
841 | was established as an effort by scholars to determine standard elements\r | |
842 | and methods for encoding machine-readable text for electronic exchange. \r | |
843 | In a second effort to facilitate the sharing of converted text, scholars\r | |
844 | have created a new institution, the Center for Electronic Texts in the\r | |
845 | Humanities (CETH). The center estimates that there are 8,000 series of\r | |
846 | source texts in the humanities that have been converted to\r | |
847 | machine-readable form worldwide. CETH is undertaking an international\r | |
848 | search for converted text in the humanities, compiling it into an\r | |
849 | electronic library, and preparing bibliographic descriptions of the\r | |
850 | sources for the Research Libraries Information Network's (RLIN)\r | |
851 | machine-readable data file. The library profession has begun to initiate\r | |
852 | large conversion projects as well, such as American Memory.\r | |
853 | \r | |
854 | While scholars have been making converted text available to one another,\r | |
855 | typically on disk or on CD-ROM, the clear trend is toward making these\r | |
856 | resources available through research and education networks. Thus, the\r | |
857 | American and French Research on the Treasury of the French Language\r | |
858 | (ARTFL) and the Dante Project are already available on Internet. \r | |
859 | MICHELSON summarized this section on interpretation and analysis by\r | |
860 | noting that: 1) increasing numbers of humanities scholars in the library\r | |
861 | community are recognizing the importance to the advancement of\r | |
862 | scholarship of retrospective conversion of source materials in the arts\r | |
863 | and humanities; and 2) there is a growing realization that making the\r | |
864 | sources available on research and education networks maximizes their\r | |
865 | usefulness for the analysis performed by humanities scholars.\r | |
866 | \r | |
867 | The fourth process of scholarly communication is dissemination of\r | |
868 | research findings, that is, publication. Scholars are using existing\r | |
869 | research and education networks to engineer a new type of publication: \r | |
870 | scholarly-controlled journals that are electronically produced and\r | |
871 | disseminated. Although such journals are still emerging as a\r | |
872 | communication format, their number has grown, from approximately twelve\r | |
873 | to thirty-six during the past year (July 1991 to June 1992). Most of\r | |
874 | these electronic scholarly journals are devoted to topics in the\r | |
875 | humanities. As with network conferences, scholarly enthusiasm for these\r | |
876 | electronic journals stems from the medium's unique ability to advance\r | |
877 | scholarship in a way that no other medium can do by supporting global\r | |
878 | feedback and interchange, practically in real time, early in the research\r | |
879 | process. Beyond scholarly journals, MICHELSON remarked the delivery of\r | |
880 | commercial full-text products, such as articles in professional journals,\r | |
881 | newsletters, magazines, wire services, and reference sources. These are\r | |
882 | being delivered via on-line local library catalogues, especially through\r | |
883 | CD-ROMs. Furthermore, according to MICHELSON, there is general optimism\r | |
884 | that the copyright and fees issues impeding the delivery of full text on\r | |
885 | existing research and education networks soon will be resolved.\r | |
886 | \r | |
887 | The final process of scholarly communication is curriculum development\r | |
888 | and instruction, and this involves the use of computer information\r | |
889 | technologies in two areas. The first is the development of\r | |
890 | computer-oriented instructional tools, which includes simulations,\r | |
891 | multimedia applications, and computer tools that are used to assist in\r | |
892 | the analysis of sources in the classroom, etc. The Perseus Project, a\r | |
893 | database that provides a multimedia curriculum on classical Greek\r | |
894 | civilization, is a good example of the way in which entire curricula are\r | |
895 | being recast using information technologies. It is anticipated that the\r | |
896 | current difficulty in exchanging electronically computer-based\r | |
897 | instructional software, which in turn makes it difficult for one scholar\r | |
898 | to build upon the work of others, will be resolved before too long. \r | |
899 | Stand-alone curricular applications that involve electronic text will be\r | |
900 | sharable through networks, reinforcing their significance as intellectual\r | |
901 | products as well as instructional tools.\r | |
902 | \r | |
903 | The second aspect of electronic learning involves the use of research and\r | |
904 | education networks for distance education programs. Such programs\r | |
905 | interactively link teachers with students in geographically scattered\r | |
906 | locations and rely on the availability of electronic instructional\r | |
907 | resources. Distance education programs are gaining wide appeal among\r | |
908 | state departments of education because of their demonstrated capacity to\r | |
909 | bring advanced specialized course work and an array of experts to many\r | |
910 | classrooms. A recent report found that at least 32 states operated at\r | |
911 | least one statewide network for education in 1991, with networks under\r | |
912 | development in many of the remaining states.\r | |
913 | \r | |
914 | MICHELSON summarized this section by noting two striking changes taking\r | |
915 | place in scholarly communication among humanities scholars. First is the\r | |
916 | extent to which electronic text in particular, and electronic resources\r | |
917 | in general, are being infused into each of the five processes described\r | |
918 | above. As mentioned earlier, there is a certain synergy at work here. \r | |
919 | The use of electronic resources for one process tends to stimulate its\r | |
920 | use for other processes, because the chief course of movement is toward a\r | |
921 | comprehensive on-line working context for humanities scholars that\r | |
922 | includes on-line availability of key bibliographies, scholarly feedback,\r | |
923 | sources, analytical tools, and publications. MICHELSON noted further\r | |
924 | that the movement toward a comprehensive on-line working context for\r | |
925 | humanities scholars is not new. In fact, it has been underway for more\r | |
926 | than forty years in the humanities, since Father Roberto Busa began\r | |
927 | developing an electronic concordance of the works of Saint Thomas Aquinas\r | |
928 | in 1949. What we are witnessing today, MICHELSON contended, is not the\r | |
929 | beginning of this on-line transition but, for at least some humanities\r | |
930 | scholars, the turning point in the transition from a print to an\r | |
931 | electronic working context. Coinciding with the on-line transition, the\r | |
932 | second striking change is the extent to which research and education\r | |
933 | networks are becoming the new medium of scholarly communication. The\r | |
934 | existing Internet and the pending National Education and Research Network\r | |
935 | (NREN) represent the new meeting ground where scholars are going for\r | |
936 | bibliographic information, scholarly dialogue and feedback, the most\r | |
937 | current publications in their field, and high-level educational\r | |
938 | offerings. Traditional scholarly practices are undergoing tremendous\r | |
939 | transformations as a result of the emergence and growing prominence of\r | |
940 | what is called network-mediated scholarship.\r | |
941 | \r | |
942 | MICHELSON next turned to the second element of the framework she proposed\r | |
943 | at the outset of her talk for evaluating the prospects for electronic\r | |
944 | text, namely the key information technology trends affecting the conduct\r | |
945 | of scholarly communication over the next decade: 1) end-user computing\r | |
946 | and 2) connectivity.\r | |
947 | \r | |
948 | End-user computing means that the person touching the keyboard, or\r | |
949 | performing computations, is the same as the person who initiates or\r | |
950 | consumes the computation. The emergence of personal computers, along\r | |
951 | with a host of other forces, such as ubiquitous computing, advances in\r | |
952 | interface design, and the on-line transition, is prompting the consumers\r | |
953 | of computation to do their own computing, and is thus rendering obsolete\r | |
954 | the traditional distinction between end users and ultimate users.\r | |
955 | \r | |
956 | The trend toward end-user computing is significant to consideration of\r | |
957 | the prospects for electronic texts because it means that researchers are\r | |
958 | becoming more adept at doing their own computations and, thus, more\r | |
959 | competent in the use of electronic media. By avoiding programmer\r | |
960 | intermediaries, computation is becoming central to the researcher's\r | |
961 | thought process. This direct involvement in computing is changing the\r | |
962 | researcher's perspective on the nature of research itself, that is, the\r | |
963 | kinds of questions that can be posed, the analytical methodologies that\r | |
964 | can be used, the types and amount of sources that are appropriate for\r | |
965 | analyses, and the form in which findings are presented. The trend toward\r | |
966 | end-user computing means that, increasingly, electronic media and\r | |
967 | computation are being infused into all processes of humanities\r | |
968 | scholarship, inspiring remarkable transformations in scholarly\r | |
969 | communication.\r | |
970 | \r | |
971 | The trend toward greater connectivity suggests that researchers are using\r | |
972 | computation increasingly in network environments. Connectivity is\r | |
973 | important to scholarship because it erases the distance that separates\r | |
974 | students from teachers and scholars from their colleagues, while allowing\r | |
975 | users to access remote databases, share information in many different\r | |
976 | media, connect to their working context wherever they are, and\r | |
977 | collaborate in all phases of research.\r | |
978 | \r | |
979 | The combination of the trend toward end-user computing and the trend\r | |
980 | toward connectivity suggests that the scholarly use of electronic\r | |
981 | resources, already evident among some researchers, will soon become an\r | |
982 | established feature of scholarship. The effects of these trends, along\r | |
983 | with ongoing changes in scholarly practices, point to a future in which\r | |
984 | humanities researchers will use computation and electronic communication\r | |
985 | to help them formulate ideas, access sources, perform research,\r | |
986 | collaborate with colleagues, seek peer review, publish and disseminate\r | |
987 | results, and engage in many other professional and educational activities.\r | |
988 | \r | |
989 | In summary, MICHELSON emphasized four points: 1) A portion of humanities\r | |
990 | scholars already consider electronic texts the preferred format for\r | |
991 | analysis and dissemination. 2) Scholars are using these electronic\r | |
992 | texts, in conjunction with other electronic resources, in all the\r | |
993 | processes of scholarly communication. 3) The humanities scholars'\r | |
994 | working context is in the process of changing from print technology to\r | |
995 | electronic technology, in many ways mirroring transformations that have\r | |
996 | occurred or are occurring within the scientific community. 4) These\r | |
997 | changes are occurring in conjunction with the development of a new\r | |
998 | communication medium: research and education networks that are\r | |
999 | characterized by their capacity to advance scholarship in a wholly unique\r | |
1000 | way.\r | |
1001 | \r | |
1002 | MICHELSON also reiterated her three principal arguments: l) Electronic\r | |
1003 | texts are best understood in terms of the relationship to other\r | |
1004 | electronic resources and the growing prominence of network-mediated\r | |
1005 | scholarship. 2) The prospects for electronic texts lie in their capacity\r | |
1006 | to be integrated into the on-line network of electronic resources that\r | |
1007 | comprise the new working context for scholars. 3) Retrospective conversion\r | |
1008 | of portions of the scholarly record should be a key strategy as information\r | |
1009 | providers respond to changes in scholarly communication practices.\r | |
1010 | \r | |
1011 | ******\r | |
1012 | \r | |
1013 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
1014 | VECCIA * AM's evaluation project and public users of electronic resources\r | |
1015 | * AM and its design * Site selection and evaluating the Macintosh\r | |
1016 | implementation of AM * Characteristics of the six public libraries\r | |
1017 | selected * Characteristics of AM's users in these libraries * Principal\r | |
1018 | ways AM is being used *\r | |
1019 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
1020 | \r | |
1021 | Susan VECCIA, team leader, and Joanne FREEMAN, associate coordinator,\r | |
1022 | American Memory, Library of Congress, gave a joint presentation. First,\r | |
1023 | by way of introduction, VECCIA explained her and FREEMAN's roles in\r | |
1024 | American Memory (AM). Serving principally as an observer, VECCIA has\r | |
1025 | assisted with the evaluation project of AM, placing AM collections in a\r | |
1026 | variety of different sites around the country and helping to organize and\r | |
1027 | implement that project. FREEMAN has been an associate coordinator of AM\r | |
1028 | and has been involved principally with the interpretative materials,\r | |
1029 | preparing some of the electronic exhibits and printed historical\r | |
1030 | information that accompanies AM and that is requested by users. VECCIA\r | |
1031 | and FREEMAN shared anecdotal observations concerning AM with public users\r | |
1032 | of electronic resources. Notwithstanding a fairly structured evaluation\r | |
1033 | in progress, both VECCIA and FREEMAN chose not to report on specifics in\r | |
1034 | terms of numbers, etc., because they felt it was too early in the\r | |
1035 | evaluation project to do so.\r | |
1036 | \r | |
1037 | AM is an electronic archive of primary source materials from the Library\r | |
1038 | of Congress, selected collections representing a variety of formats--\r | |
1039 | photographs, graphic arts, recorded sound, motion pictures, broadsides,\r | |
1040 | and soon, pamphlets and books. In terms of the design of this system,\r | |
1041 | the interpretative exhibits have been kept separate from the primary\r | |
1042 | resources, with good reason. Accompanying this collection are printed\r | |
1043 | documentation and user guides, as well as guides that FREEMAN prepared for\r | |
1044 | teachers so that they may begin using the content of the system at once.\r | |
1045 | \r | |
1046 | VECCIA described the evaluation project before talking about the public\r | |
1047 | users of AM, limiting her remarks to public libraries, because FREEMAN\r | |
1048 | would talk more specifically about schools from kindergarten to twelfth\r | |
1049 | grade (K-12). Having started in spring 1991, the evaluation currently\r | |
1050 | involves testing of the Macintosh implementation of AM. Since the\r | |
1051 | primary goal of this evaluation is to determine the most appropriate\r | |
1052 | audience or audiences for AM, very different sites were selected. This\r | |
1053 | makes evaluation difficult because of the varying degrees of technology\r | |
1054 | literacy among the sites. AM is situated in forty-four locations, of\r | |
1055 | which six are public libraries and sixteen are schools. Represented\r | |
1056 | among the schools are elementary, junior high, and high schools.\r | |
1057 | District offices also are involved in the evaluation, which will\r | |
1058 | conclude in summer 1993.\r | |
1059 | \r | |
1060 | VECCIA focused the remainder of her talk on the six public libraries, one\r | |
1061 | of which doubles as a state library. They represent a range of\r | |
1062 | geographic areas and a range of demographic characteristics. For\r | |
1063 | example, three are located in urban settings, two in rural settings, and\r | |
1064 | one in a suburban setting. A range of technical expertise is to be found\r | |
1065 | among these facilities as well. For example, one is an "Apple library of\r | |
1066 | the future," while two others are rural one-room libraries--in one, AM\r | |
1067 | sits at the front desk next to a tractor manual.\r | |
1068 | \r | |
1069 | All public libraries have been extremely enthusiastic, supportive, and\r | |
1070 | appreciative of the work that AM has been doing. VECCIA characterized\r | |
1071 | various users: Most users in public libraries describe themselves as\r | |
1072 | general readers; of the students who use AM in the public libraries,\r | |
1073 | those in fourth grade and above seem most interested. Public libraries\r | |
1074 | in rural sites tend to attract retired people, who have been highly\r | |
1075 | receptive to AM. Users tend to fall into two additional categories: \r | |
1076 | people interested in the content and historical connotations of these\r | |
1077 | primary resources, and those fascinated by the technology. The format\r | |
1078 | receiving the most comments has been motion pictures. The adult users in\r | |
1079 | public libraries are more comfortable with IBM computers, whereas young\r | |
1080 | people seem comfortable with either IBM or Macintosh, although most of\r | |
1081 | them seem to come from a Macintosh background. This same tendency is\r | |
1082 | found in the schools.\r | |
1083 | \r | |
1084 | What kinds of things do users do with AM? In a public library there are\r | |
1085 | two main goals or ways that AM is being used: as an individual learning\r | |
1086 | tool, and as a leisure activity. Adult learning was one area that VECCIA\r | |
1087 | would highlight as a possible application for a tool such as AM. She\r | |
1088 | described a patron of a rural public library who comes in every day on\r | |
1089 | his lunch hour and literally reads AM, methodically going through the\r | |
1090 | collection image by image. At the end of his hour he makes an electronic\r | |
1091 | bookmark, puts it in his pocket, and returns to work. The next day he\r | |
1092 | comes in and resumes where he left off. Interestingly, this man had\r | |
1093 | never been in the library before he used AM. In another small, rural\r | |
1094 | library, the coordinator reports that AM is a popular activity for some\r | |
1095 | of the older, retired people in the community, who ordinarily would not\r | |
1096 | use "those things,"--computers. Another example of adult learning in\r | |
1097 | public libraries is book groups, one of which, in particular, is using AM\r | |
1098 | as part of its reading on industrialization, integration, and urbanization\r | |
1099 | in the early 1900s.\r | |
1100 | \r | |
1101 | One library reports that a family is using AM to help educate their\r | |
1102 | children. In another instance, individuals from a local museum came in\r | |
1103 | to use AM to prepare an exhibit on toys of the past. These two examples\r | |
1104 | emphasize the mission of the public library as a cultural institution,\r | |
1105 | reaching out to people who do not have the same resources available to\r | |
1106 | those who live in a metropolitan area or have access to a major library. \r | |
1107 | One rural library reports that junior high school students in large\r | |
1108 | numbers came in one afternoon to use AM for entertainment. A number of\r | |
1109 | public libraries reported great interest among postcard collectors in the\r | |
1110 | Detroit collection, which was essentially a collection of images used on\r | |
1111 | postcards around the turn of the century. Train buffs are similarly\r | |
1112 | interested because that was a time of great interest in railroading. \r | |
1113 | People, it was found, relate to things that they know of firsthand. For\r | |
1114 | example, in both rural public libraries where AM was made available,\r | |
1115 | observers reported that the older people with personal remembrances of\r | |
1116 | the turn of the century were gravitating to the Detroit collection. \r | |
1117 | These examples served to underscore MICHELSON's observation re the\r | |
1118 | integration of electronic tools and ideas--that people learn best when\r | |
1119 | the material relates to something they know.\r | |
1120 | \r | |
1121 | VECCIA made the final point that in many cases AM serves as a\r | |
1122 | public-relations tool for the public libraries that are testing it. In\r | |
1123 | one case, AM is being used as a vehicle to secure additional funding for\r | |
1124 | the library. In another case, AM has served as an inspiration to the\r | |
1125 | staff of a major local public library in the South to think about ways to\r | |
1126 | make its own collection of photographs more accessible to the public.\r | |
1127 | \r | |
1128 | ******\r | |
1129 | \r | |
1130 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
1131 | FREEMAN * AM and archival electronic resources in a school environment *\r | |
1132 | Questions concerning context * Questions concerning the electronic format\r | |
1133 | itself * Computer anxiety * Access and availability of the system *\r | |
1134 | Hardware * Strengths gained through the use of archival resources in\r | |
1135 | schools *\r | |
1136 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
1137 | \r | |
1138 | Reiterating an observation made by VECCIA, that AM is an archival\r | |
1139 | resource made up of primary materials with very little interpretation,\r | |
1140 | FREEMAN stated that the project has attempted to bridge the gap between\r | |
1141 | these bare primary materials and a school environment, and in that cause\r | |
1142 | has created guided introductions to AM collections. Loud demand from the\r | |
1143 | educational community, chiefly from teachers working with the upper\r | |
1144 | grades of elementary school through high school, greeted the announcement\r | |
1145 | that AM would be tested around the country.\r | |
1146 | \r | |
1147 | FREEMAN reported not only on what was learned about AM in a school\r | |
1148 | environment, but also on several universal questions that were raised\r | |
1149 | concerning archival electronic resources in schools. She discussed\r | |
1150 | several strengths of this type of material in a school environment as\r | |
1151 | opposed to a highly structured resource that offers a limited number of\r | |
1152 | paths to follow.\r | |
1153 | \r | |
1154 | FREEMAN first raised several questions about using AM in a school\r | |
1155 | environment. There is often some difficulty in developing a sense of\r | |
1156 | what the system contains. Many students sit down at a computer resource\r | |
1157 | and assume that, because AM comes from the Library of Congress, all of\r | |
1158 | American history is now at their fingertips. As a result of that sort of\r | |
1159 | mistaken judgment, some students are known to conclude that AM contains\r | |
1160 | nothing of use to them when they look for one or two things and do not\r | |
1161 | find them. It is difficult to discover that middle ground where one has\r | |
1162 | a sense of what the system contains. Some students grope toward the idea\r | |
1163 | of an archive, a new idea to them, since they have not previously\r | |
1164 | experienced what it means to have access to a vast body of somewhat\r | |
1165 | random information.\r | |
1166 | \r | |
1167 | Other questions raised by FREEMAN concerned the electronic format itself. \r | |
1168 | For instance, in a school environment it is often difficult both for\r | |
1169 | teachers and students to gain a sense of what it is they are viewing. \r | |
1170 | They understand that it is a visual image, but they do not necessarily\r | |
1171 | know that it is a postcard from the turn of the century, a panoramic\r | |
1172 | photograph, or even machine-readable text of an eighteenth-century\r | |
1173 | broadside, a twentieth-century printed book, or a nineteenth-century\r | |
1174 | diary. That distinction is often difficult for people in a school\r | |
1175 | environment to grasp. Because of that, it occasionally becomes difficult\r | |
1176 | to draw conclusions from what one is viewing.\r | |
1177 | \r | |
1178 | FREEMAN also noted the obvious fear of the computer, which constitutes a\r | |
1179 | difficulty in using an electronic resource. Though students in general\r | |
1180 | did not suffer from this anxiety, several older students feared that they\r | |
1181 | were computer-illiterate, an assumption that became self-fulfilling when\r | |
1182 | they searched for something but failed to find it. FREEMAN said she\r | |
1183 | believed that some teachers also fear computer resources, because they\r | |
1184 | believe they lack complete control. FREEMAN related the example of\r | |
1185 | teachers shooing away students because it was not their time to use the\r | |
1186 | system. This was a case in which the situation had to be extremely\r | |
1187 | structured so that the teachers would not feel that they had lost their\r | |
1188 | grasp on what the system contained.\r | |
1189 | \r | |
1190 | A final question raised by FREEMAN concerned access and availability of\r | |
1191 | the system. She noted the occasional existence of a gap in communication\r | |
1192 | between school librarians and teachers. Often AM sits in a school\r | |
1193 | library and the librarian is the person responsible for monitoring the\r | |
1194 | system. Teachers do not always take into their world new library\r | |
1195 | resources about which the librarian is excited. Indeed, at the sites\r | |
1196 | where AM had been used most effectively within a library, the librarian\r | |
1197 | was required to go to specific teachers and instruct them in its use. As\r | |
1198 | a result, several AM sites will have in-service sessions over a summer,\r | |
1199 | in the hope that perhaps, with a more individualized link, teachers will\r | |
1200 | be more likely to use the resource.\r | |
1201 | \r | |
1202 | A related issue in the school context concerned the number of\r | |
1203 | workstations available at any one location. Centralization of equipment\r | |
1204 | at the district level, with teachers invited to download things and walk\r | |
1205 | away with them, proved unsuccessful because the hours these offices were\r | |
1206 | open were also school hours.\r | |
1207 | \r | |
1208 | Another issue was hardware. As VECCIA observed, a range of sites exists,\r | |
1209 | some technologically advanced and others essentially acquiring their\r | |
1210 | first computer for the primary purpose of using it in conjunction with\r | |
1211 | AM's testing. Users at technologically sophisticated sites want even\r | |
1212 | more sophisticated hardware, so that they can perform even more\r | |
1213 | sophisticated tasks with the materials in AM. But once they acquire a\r | |
1214 | newer piece of hardware, they must learn how to use that also; at an\r | |
1215 | unsophisticated site it takes an extremely long time simply to become\r | |
1216 | accustomed to the computer, not to mention the program offered with the\r | |
1217 | computer. All of these small issues raise one large question, namely,\r | |
1218 | are systems like AM truly rewarding in a school environment, or do they\r | |
1219 | simply act as innovative toys that do little more than spark interest?\r | |
1220 | \r | |
1221 | FREEMAN contended that the evaluation project has revealed several strengths\r | |
1222 | that were gained through the use of archival resources in schools, including:\r | |
1223 | \r | |
1224 | * Psychic rewards from using AM as a vast, rich database, with\r | |
1225 | teachers assigning various projects to students--oral presentations,\r | |
1226 | written reports, a documentary, a turn-of-the-century newspaper--\r | |
1227 | projects that start with the materials in AM but are completed using\r | |
1228 | other resources; AM thus is used as a research tool in conjunction\r | |
1229 | with other electronic resources, as well as with books and items in\r | |
1230 | the library where the system is set up.\r | |
1231 | \r | |
1232 | * Students are acquiring computer literacy in a humanities context.\r | |
1233 | \r | |
1234 | * This sort of system is overcoming the isolation between disciplines\r | |
1235 | that often exists in schools. For example, many English teachers are\r | |
1236 | requiring their students to write papers on historical topics\r | |
1237 | represented in AM. Numerous teachers have reported that their\r | |
1238 | students are learning critical thinking skills using the system.\r | |
1239 | \r | |
1240 | * On a broader level, AM is introducing primary materials, not only\r | |
1241 | to students but also to teachers, in an environment where often\r | |
1242 | simply none exist--an exciting thing for the students because it\r | |
1243 | helps them learn to conduct research, to interpret, and to draw\r | |
1244 | their own conclusions. In learning to conduct research and what it\r | |
1245 | means, students are motivated to seek knowledge. That relates to\r | |
1246 | another positive outcome--a high level of personal involvement of\r | |
1247 | students with the materials in this system and greater motivation to\r | |
1248 | conduct their own research and draw their own conclusions.\r | |
1249 | \r | |
1250 | * Perhaps the most ironic strength of these kinds of archival\r | |
1251 | electronic resources is that many of the teachers AM interviewed\r | |
1252 | were desperate, it is no exaggeration to say, not only for primary\r | |
1253 | materials but for unstructured primary materials. These would, they\r | |
1254 | thought, foster personally motivated research, exploration, and\r | |
1255 | excitement in their students. Indeed, these materials have done\r | |
1256 | just that. Ironically, however, this lack of structure produces\r | |
1257 | some of the confusion to which the newness of these kinds of\r | |
1258 | resources may also contribute. The key to effective use of archival\r | |
1259 | products in a school environment is a clear, effective introduction\r | |
1260 | to the system and to what it contains. \r | |
1261 | \r | |
1262 | ******\r | |
1263 | \r | |
1264 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
1265 | DISCUSSION * Nothing known, quantitatively, about the number of\r | |
1266 | humanities scholars who must see the original versus those who would\r | |
1267 | settle for an edited transcript, or about the ways in which humanities\r | |
1268 | scholars are using information technology * Firm conclusions concerning\r | |
1269 | the manner and extent of the use of supporting materials in print\r | |
1270 | provided by AM to await completion of evaluative study * A listener's\r | |
1271 | reflections on additional applications of electronic texts * Role of\r | |
1272 | electronic resources in teaching elementary research skills to students *\r | |
1273 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
1274 | \r | |
1275 | During the discussion that followed the presentations by MICHELSON,\r | |
1276 | VECCIA, and FREEMAN, additional points emerged.\r | |
1277 | \r | |
1278 | LESK asked if MICHELSON could give any quantitative estimate of the\r | |
1279 | number of humanities scholars who must see or want to see the original,\r | |
1280 | or the best possible version of the material, versus those who typically\r | |
1281 | would settle for an edited transcript. While unable to provide a figure,\r | |
1282 | she offered her impressions as an archivist who has done some reference\r | |
1283 | work and has discussed this issue with other archivists who perform\r | |
1284 | reference, that those who use archives and those who use primary sources\r | |
1285 | for what would be considered very high-level scholarly research, as\r | |
1286 | opposed to, say, undergraduate papers, were few in number, especially\r | |
1287 | given the public interest in using primary sources to conduct\r | |
1288 | genealogical or avocational research and the kind of professional\r | |
1289 | research done by people in private industry or the federal government. \r | |
1290 | More important in MICHELSON's view was that, quantitatively, nothing is\r | |
1291 | known about the ways in which, for example, humanities scholars are using\r | |
1292 | information technology. No studies exist to offer guidance in creating\r | |
1293 | strategies. The most recent study was conducted in 1985 by the American\r | |
1294 | Council of Learned Societies (ACLS), and what it showed was that 50\r | |
1295 | percent of humanities scholars at that time were using computers. That\r | |
1296 | constitutes the extent of our knowledge.\r | |
1297 | \r | |
1298 | Concerning AM's strategy for orienting people toward the scope of\r | |
1299 | electronic resources, FREEMAN could offer no hard conclusions at this\r | |
1300 | point, because she and her colleagues were still waiting to see,\r | |
1301 | particularly in the schools, what has been made of their efforts. Within\r | |
1302 | the system, however, AM has provided what are called electronic exhibits-\r | |
1303 | -such as introductions to time periods and materials--and these are\r | |
1304 | intended to offer a student user a sense of what a broadside is and what\r | |
1305 | it might tell her or him. But FREEMAN conceded that the project staff\r | |
1306 | would have to talk with students next year, after teachers have had a\r | |
1307 | summer to use the materials, and attempt to discover what the students\r | |
1308 | were learning from the materials. In addition, FREEMAN described\r | |
1309 | supporting materials in print provided by AM at the request of local\r | |
1310 | teachers during a meeting held at LC. These included time lines,\r | |
1311 | bibliographies, and other materials that could be reproduced on a\r | |
1312 | photocopier in a classroom. Teachers could walk away with and use these,\r | |
1313 | and in this way gain a better understanding of the contents. But again,\r | |
1314 | reaching firm conclusions concerning the manner and extent of their use\r | |
1315 | would have to wait until next year.\r | |
1316 | \r | |
1317 | As to the changes she saw occurring at the National Archives and Records\r | |
1318 | Administration (NARA) as a result of the increasing emphasis on\r | |
1319 | technology in scholarly research, MICHELSON stated that NARA at this\r | |
1320 | point was absorbing the report by her and Jeff Rothenberg addressing\r | |
1321 | strategies for the archival profession in general, although not for the\r | |
1322 | National Archives specifically. NARA is just beginning to establish its\r | |
1323 | role and what it can do. In terms of changes and initiatives that NARA\r | |
1324 | can take, no clear response could be given at this time.\r | |
1325 | \r | |
1326 | GREENFIELD remarked two trends mentioned in the session. Reflecting on\r | |
1327 | DALY's opening comments on how he could have used a Latin collection of\r | |
1328 | text in an electronic form, he said that at first he thought most scholars\r | |
1329 | would be unwilling to do that. But as he thought of that in terms of the\r | |
1330 | original meaning of research--that is, having already mastered these texts,\r | |
1331 | researching them for critical and comparative purposes--for the first time,\r | |
1332 | the electronic format made a lot of sense. GREENFIELD could envision\r | |
1333 | growing numbers of scholars learning the new technologies for that very\r | |
1334 | aspect of their scholarship and for convenience's sake.\r | |
1335 | \r | |
1336 | Listening to VECCIA and FREEMAN, GREENFIELD thought of an additional\r | |
1337 | application of electronic texts. He realized that AM could be used as a\r | |
1338 | guide to lead someone to original sources. Students cannot be expected\r | |
1339 | to have mastered these sources, things they have never known about\r | |
1340 | before. Thus, AM is leading them, in theory, to a vast body of\r | |
1341 | information and giving them a superficial overview of it, enabling them\r | |
1342 | to select parts of it. GREENFIELD asked if any evidence exists that this\r | |
1343 | resource will indeed teach the new user, the K-12 students, how to do\r | |
1344 | research. Scholars already know how to do research and are applying\r | |
1345 | these new tools. But he wondered why students would go beyond picking\r | |
1346 | out things that were most exciting to them.\r | |
1347 | \r | |
1348 | FREEMAN conceded the correctness of GREENFIELD's observation as applied\r | |
1349 | to a school environment. The risk is that a student would sit down at a\r | |
1350 | system, play with it, find some things of interest, and then walk away. \r | |
1351 | But in the relatively controlled situation of a school library, much will\r | |
1352 | depend on the instructions a teacher or a librarian gives a student. She\r | |
1353 | viewed the situation not as one of fine-tuning research skills but of\r | |
1354 | involving students at a personal level in understanding and researching\r | |
1355 | things. Given the guidance one can receive at school, it then becomes\r | |
1356 | possible to teach elementary research skills to students, which in fact\r | |
1357 | one particular librarian said she was teaching her fifth graders. \r | |
1358 | FREEMAN concluded that introducing the idea of following one's own path\r | |
1359 | of inquiry, which is essentially what research entails, involves more\r | |
1360 | than teaching specific skills. To these comments VECCIA added the\r | |
1361 | observation that the individual teacher and the use of a creative\r | |
1362 | resource, rather than AM itself, seemed to make the key difference.\r | |
1363 | Some schools and some teachers are making excellent use of the nature\r | |
1364 | of critical thinking and teaching skills, she said.\r | |
1365 | \r | |
1366 | Concurring with these remarks, DALY closed the session with the thought that\r | |
1367 | the more that producers produced for teachers and for scholars to use with\r | |
1368 | their students, the more successful their electronic products would prove.\r | |
1369 | \r | |
1370 | ******\r | |
1371 | \r | |
1372 | SESSION II. SHOW AND TELL\r | |
1373 | \r | |
1374 | Jacqueline HESS, director, National Demonstration Laboratory, served as\r | |
1375 | moderator of the "show-and-tell" session. She noted that a\r | |
1376 | question-and-answer period would follow each presentation.\r | |
1377 | \r | |
1378 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
1379 | MYLONAS * Overview and content of Perseus * Perseus' primary materials\r | |
1380 | exist in a system-independent, archival form * A concession * Textual\r | |
1381 | aspects of Perseus * Tools to use with the Greek text * Prepared indices\r | |
1382 | and full-text searches in Perseus * English-Greek word search leads to\r | |
1383 | close study of words and concepts * Navigating Perseus by tracing down\r | |
1384 | indices * Using the iconography to perform research *\r | |
1385 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
1386 | \r | |
1387 | Elli MYLONAS, managing editor, Perseus Project, Harvard University, first\r | |
1388 | gave an overview of Perseus, a large, collaborative effort based at\r | |
1389 | Harvard University but with contributors and collaborators located at\r | |
1390 | numerous universities and colleges in the United States (e.g., Bowdoin,\r | |
1391 | Maryland, Pomona, Chicago, Virginia). Funded primarily by the\r | |
1392 | Annenberg/CPB Project, with additional funding from Apple, Harvard, and\r | |
1393 | the Packard Humanities Institute, among others, Perseus is a multimedia,\r | |
1394 | hypertextual database for teaching and research on classical Greek\r | |
1395 | civilization, which was released in February 1992 in version 1.0 and\r | |
1396 | distributed by Yale University Press.\r | |
1397 | \r | |
1398 | Consisting entirely of primary materials, Perseus includes ancient Greek\r | |
1399 | texts and translations of those texts; catalog entries--that is, museum\r | |
1400 | catalog entries, not library catalog entries--on vases, sites, coins,\r | |
1401 | sculpture, and archaeological objects; maps; and a dictionary, among\r | |
1402 | other sources. The number of objects and the objects for which catalog\r | |
1403 | entries exist are accompanied by thousands of color images, which\r | |
1404 | constitute a major feature of the database. Perseus contains\r | |
1405 | approximately 30 megabytes of text, an amount that will double in\r | |
1406 | subsequent versions. In addition to these primary materials, the Perseus\r | |
1407 | Project has been building tools for using them, making access and\r | |
1408 | navigation easier, the goal being to build part of the electronic\r | |
1409 | environment discussed earlier in the morning in which students or\r | |
1410 | scholars can work with their sources.\r | |
1411 | \r | |
1412 | The demonstration of Perseus will show only a fraction of the real work\r | |
1413 | that has gone into it, because the project had to face the dilemma of\r | |
1414 | what to enter when putting something into machine-readable form: should\r | |
1415 | one aim for very high quality or make concessions in order to get the\r | |
1416 | material in? Since Perseus decided to opt for very high quality, all of\r | |
1417 | its primary materials exist in a system-independent--insofar as it is\r | |
1418 | possible to be system-independent--archival form. Deciding what that\r | |
1419 | archival form would be and attaining it required much work and thought. \r | |
1420 | For example, all the texts are marked up in SGML, which will be made\r | |
1421 | compatible with the guidelines of the Text Encoding Initiative (TEI) when\r | |
1422 | they are issued.\r | |
1423 | \r | |
1424 | Drawings are postscript files, not meeting international standards, but\r | |
1425 | at least designed to go across platforms. Images, or rather the real\r | |
1426 | archival forms, consist of the best available slides, which are being\r | |
1427 | digitized. Much of the catalog material exists in database form--a form\r | |
1428 | that the average user could use, manipulate, and display on a personal\r | |
1429 | computer, but only at great cost. Thus, this is where the concession\r | |
1430 | comes in: All of this rich, well-marked-up information is stripped of\r | |
1431 | much of its content; the images are converted into bit-maps and the text\r | |
1432 | into small formatted chunks. All this information can then be imported\r | |
1433 | into HyperCard and run on a mid-range Macintosh, which is what Perseus\r | |
1434 | users have. This fact has made it possible for Perseus to attain wide\r | |
1435 | use fairly rapidly. Without those archival forms the HyperCard version\r | |
1436 | being demonstrated could not be made easily, and the project could not\r | |
1437 | have the potential to move to other forms and machines and software as\r | |
1438 | they appear, none of which information is in Perseus on the CD.\r | |
1439 | \r | |
1440 | Of the numerous multimedia aspects of Perseus, MYLONAS focused on the\r | |
1441 | textual. Part of what makes Perseus such a pleasure to use, MYLONAS\r | |
1442 | said, is this effort at seamless integration and the ability to move\r | |
1443 | around both visual and textual material. Perseus also made the decision\r | |
1444 | not to attempt to interpret its material any more than one interprets by\r | |
1445 | selecting. But, MYLONAS emphasized, Perseus is not courseware: No\r | |
1446 | syllabus exists. There is no effort to define how one teaches a topic\r | |
1447 | using Perseus, although the project may eventually collect papers by\r | |
1448 | people who have used it to teach. Rather, Perseus aims to provide\r | |
1449 | primary material in a kind of electronic library, an electronic sandbox,\r | |
1450 | so to say, in which students and scholars who are working on this\r | |
1451 | material can explore by themselves. With that, MYLONAS demonstrated\r | |
1452 | Perseus, beginning with the Perseus gateway, the first thing one sees\r | |
1453 | upon opening Perseus--an effort in part to solve the contextualizing\r | |
1454 | problem--which tells the user what the system contains.\r | |
1455 | \r | |
1456 | MYLONAS demonstrated only a very small portion, beginning with primary\r | |
1457 | texts and running off the CD-ROM. Having selected Aeschylus' Prometheus\r | |
1458 | Bound, which was viewable in Greek and English pretty much in the same\r | |
1459 | segments together, MYLONAS demonstrated tools to use with the Greek text,\r | |
1460 | something not possible with a book: looking up the dictionary entry form\r | |
1461 | of an unfamiliar word in Greek after subjecting it to Perseus'\r | |
1462 | morphological analysis for all the texts. After finding out about a\r | |
1463 | word, a user may then decide to see if it is used anywhere else in Greek. \r | |
1464 | Because vast amounts of indexing support all of the primary material, one\r | |
1465 | can find out where else all forms of a particular Greek word appear--\r | |
1466 | often not a trivial matter because Greek is highly inflected. Further,\r | |
1467 | since the story of Prometheus has to do with the origins of sacrifice, a\r | |
1468 | user may wish to study and explore sacrifice in Greek literature; by\r | |
1469 | typing sacrifice into a small window, a user goes to the English-Greek\r | |
1470 | word list--something one cannot do without the computer (Perseus has\r | |
1471 | indexed the definitions of its dictionary)--the string sacrifice appears\r | |
1472 | in the definitions of these sixty-five words. One may then find out\r | |
1473 | where any of those words is used in the work(s) of a particular author. \r | |
1474 | The English definitions are not lemmatized.\r | |
1475 | \r | |
1476 | All of the indices driving this kind of usage were originally devised for\r | |
1477 | speed, MYLONAS observed; in other words, all that kind of information--\r | |
1478 | all forms of all words, where they exist, the dictionary form they belong\r | |
1479 | to--were collected into databases, which will expedite searching. Then\r | |
1480 | it was discovered that one can do things searching in these databases\r | |
1481 | that could not be done searching in the full texts. Thus, although there\r | |
1482 | are full-text searches in Perseus, much of the work is done behind the\r | |
1483 | scenes, using prepared indices. Re the indexing that is done behind the\r | |
1484 | scenes, MYLONAS pointed out that without the SGML forms of the text, it\r | |
1485 | could not be done effectively. Much of this indexing is based on the\r | |
1486 | structures that are made explicit by the SGML tagging.\r | |
1487 | \r | |
1488 | It was found that one of the things many of Perseus' non-Greek-reading\r | |
1489 | users do is start from the dictionary and then move into the close study\r | |
1490 | of words and concepts via this kind of English-Greek word search, by which\r | |
1491 | means they might select a concept. This exercise has been assigned to\r | |
1492 | students in core courses at Harvard--to study a concept by looking for the\r | |
1493 | English word in the dictionary, finding the Greek words, and then finding\r | |
1494 | the words in the Greek but, of course, reading across in the English.\r | |
1495 | That tells them a great deal about what a translation means as well.\r | |
1496 | \r | |
1497 | Should one also wish to see images that have to do with sacrifice, that\r | |
1498 | person would go to the object key word search, which allows one to\r | |
1499 | perform a similar kind of index retrieval on the database of\r | |
1500 | archaeological objects. Without words, pictures are useless; Perseus has\r | |
1501 | not reached the point where it can do much with images that are not\r | |
1502 | cataloged. Thus, although it is possible in Perseus with text and images\r | |
1503 | to navigate by knowing where one wants to end up--for example, a\r | |
1504 | red-figure vase from the Boston Museum of Fine Arts--one can perform this\r | |
1505 | kind of navigation very easily by tracing down indices. MYLONAS\r | |
1506 | illustrated several generic scenes of sacrifice on vases. The features\r | |
1507 | demonstrated derived from Perseus 1.0; version 2.0 will implement even\r | |
1508 | better means of retrieval.\r | |
1509 | \r | |
1510 | MYLONAS closed by looking at one of the pictures and noting again that\r | |
1511 | one can do a great deal of research using the iconography as well as the\r | |
1512 | texts. For instance, students in a core course at Harvard this year were\r | |
1513 | highly interested in Greek concepts of foreigners and representations of\r | |
1514 | non-Greeks. So they performed a great deal of research, both with texts\r | |
1515 | (e.g., Herodotus) and with iconography on vases and coins, on how the\r | |
1516 | Greeks portrayed non-Greeks. At the same time, art historians who study\r | |
1517 | iconography were also interested, and were able to use this material.\r | |
1518 | \r | |
1519 | ******\r | |
1520 | \r | |
1521 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
1522 | DISCUSSION * Indexing and searchability of all English words in Perseus *\r | |
1523 | Several features of Perseus 1.0 * Several levels of customization\r | |
1524 | possible * Perseus used for general education * Perseus' effects on\r | |
1525 | education * Contextual information in Perseus * Main challenge and\r | |
1526 | emphasis of Perseus *\r | |
1527 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
1528 | \r | |
1529 | Several points emerged in the discussion that followed MYLONAS's presentation.\r | |
1530 | \r | |
1531 | Although MYLONAS had not demonstrated Perseus' ability to cross-search\r | |
1532 | documents, she confirmed that all English words in Perseus are indexed\r | |
1533 | and can be searched. So, for example, sacrifice could have been searched\r | |
1534 | in all texts, the historical essay, and all the catalogue entries with\r | |
1535 | their descriptions--in short, in all of Perseus.\r | |
1536 | \r | |
1537 | Boolean logic is not in Perseus 1.0 but will be added to the next\r | |
1538 | version, although an effort is being made not to restrict Perseus to a\r | |
1539 | database in which one just performs searching, Boolean or otherwise. It\r | |
1540 | is possible to move laterally through the documents by selecting a word\r | |
1541 | one is interested in and selecting an area of information one is\r | |
1542 | interested in and trying to look that word up in that area.\r | |
1543 | \r | |
1544 | Since Perseus was developed in HyperCard, several levels of customization\r | |
1545 | are possible. Simple authoring tools exist that allow one to create\r | |
1546 | annotated paths through the information, which are useful for note-taking\r | |
1547 | and for guided tours for teaching purposes and for expository writing. \r | |
1548 | With a little more ingenuity it is possible to begin to add or substitute\r | |
1549 | material in Perseus.\r | |
1550 | \r | |
1551 | Perseus has not been used so much for classics education as for general\r | |
1552 | education, where it seemed to have an impact on the students in the core\r | |
1553 | course at Harvard (a general required course that students must take in\r | |
1554 | certain areas). Students were able to use primary material much more.\r | |
1555 | \r | |
1556 | The Perseus Project has an evaluation team at the University of Maryland\r | |
1557 | that has been documenting Perseus' effects on education. Perseus is very\r | |
1558 | popular, and anecdotal evidence indicates that it is having an effect at\r | |
1559 | places other than Harvard, for example, test sites at Ball State\r | |
1560 | University, Drury College, and numerous small places where opportunities\r | |
1561 | to use vast amounts of primary data may not exist. One documented effect\r | |
1562 | is that archaeological, anthropological, and philological research is\r | |
1563 | being done by the same person instead of by three different people.\r | |
1564 | \r | |
1565 | The contextual information in Perseus includes an overview essay, a\r | |
1566 | fairly linear historical essay on the fifth century B.C. that provides\r | |
1567 | links into the primary material (e.g., Herodotus, Thucydides, and\r | |
1568 | Plutarch), via small gray underscoring (on the screen) of linked\r | |
1569 | passages. These are handmade links into other material.\r | |
1570 | \r | |
1571 | To different extents, most of the production work was done at Harvard,\r | |
1572 | where the people and the equipment are located. Much of the\r | |
1573 | collaborative activity involved data collection and structuring, because\r | |
1574 | the main challenge and the emphasis of Perseus is the gathering of\r | |
1575 | primary material, that is, building a useful environment for studying\r | |
1576 | classical Greece, collecting data, and making it useful. \r | |
1577 | Systems-building is definitely not the main concern. Thus, much of the\r | |
1578 | work has involved writing essays, collecting information, rewriting it,\r | |
1579 | and tagging it. That can be done off site. The creative link for the\r | |
1580 | overview essay as well as for both systems and data was collaborative,\r | |
1581 | and was forged via E-mail and paper mail with professors at Pomona and\r | |
1582 | Bowdoin.\r | |
1583 | \r | |
1584 | ******\r | |
1585 | \r | |
1586 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
1587 | CALALUCA * PLD's principal focus and contribution to scholarship *\r | |
1588 | Various questions preparatory to beginning the project * Basis for\r | |
1589 | project * Basic rule in converting PLD * Concerning the images in PLD *\r | |
1590 | Running PLD under a variety of retrieval softwares * Encoding the\r | |
1591 | database a hard-fought issue * Various features demonstrated * Importance\r | |
1592 | of user documentation * Limitations of the CD-ROM version * \r | |
1593 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
1594 | \r | |
1595 | Eric CALALUCA, vice president, Chadwyck-Healey, Inc., demonstrated a\r | |
1596 | software interpretation of the Patrologia Latina Database (PLD). PLD's\r | |
1597 | principal focus from the beginning of the project about three-and-a-half\r | |
1598 | years ago was on converting Migne's Latin series, and in the end,\r | |
1599 | CALALUCA suggested, conversion of the text will be the major contribution\r | |
1600 | to scholarship. CALALUCA stressed that, as possibly the only private\r | |
1601 | publishing organization at the Workshop, Chadwyck-Healey had sought no\r | |
1602 | federal funds or national foundation support before embarking upon the\r | |
1603 | project, but instead had relied upon a great deal of homework and\r | |
1604 | marketing to accomplish the task of conversion.\r | |
1605 | \r | |
1606 | Ever since the possibilities of computer-searching have emerged, scholars\r | |
1607 | in the field of late ancient and early medieval studies (philosophers,\r | |
1608 | theologians, classicists, and those studying the history of natural law\r | |
1609 | and the history of the legal development of Western civilization) have\r | |
1610 | been longing for a fully searchable version of Western literature, for\r | |
1611 | example, all the texts of Augustine and Bernard of Clairvaux and\r | |
1612 | Boethius, not to mention all the secondary and tertiary authors.\r | |
1613 | \r | |
1614 | Various questions arose, CALALUCA said. Should one convert Migne? \r | |
1615 | Should the database be encoded? Is it necessary to do that? How should\r | |
1616 | it be delivered? What about CD-ROM? Since this is a transitional\r | |
1617 | medium, why even bother to create software to run on a CD-ROM? Since\r | |
1618 | everybody knows people will be networking information, why go to the\r | |
1619 | trouble--which is far greater with CD-ROM than with the production of\r | |
1620 | magnetic data? Finally, how does one make the data available? Can many\r | |
1621 | of the hurdles to using electronic information that some publishers have\r | |
1622 | imposed upon databases be eliminated?\r | |
1623 | \r | |
1624 | The PLD project was based on the principle that computer-searching of\r | |
1625 | texts is most effective when it is done with a large database. Because\r | |
1626 | PLD represented a collection that serves so many disciplines across so\r | |
1627 | many periods, it was irresistible.\r | |
1628 | \r | |
1629 | The basic rule in converting PLD was to do no harm, to avoid the sins of\r | |
1630 | intrusion in such a database: no introduction of newer editions, no\r | |
1631 | on-the-spot changes, no eradicating of all possible falsehoods from an\r | |
1632 | edition. Thus, PLD is not the final act in electronic publishing for\r | |
1633 | this discipline, but simply the beginning. The conversion of PLD has\r | |
1634 | evoked numerous unanticipated questions: How will information be used? \r | |
1635 | What about networking? Can the rights of a database be protected? \r | |
1636 | Should one protect the rights of a database? How can it be made\r | |
1637 | available?\r | |
1638 | \r | |
1639 | Those converting PLD also tried to avoid the sins of omission, that is,\r | |
1640 | excluding portions of the collections or whole sections. What about the\r | |
1641 | images? PLD is full of images, some are extremely pious\r | |
1642 | nineteenth-century representations of the Fathers, while others contain\r | |
1643 | highly interesting elements. The goal was to cover all the text of Migne\r | |
1644 | (including notes, in Greek and in Hebrew, the latter of which, in\r | |
1645 | particular, causes problems in creating a search structure), all the\r | |
1646 | indices, and even the images, which are being scanned in separately\r | |
1647 | searchable files.\r | |
1648 | \r | |
1649 | Several North American institutions that have placed acquisition requests\r | |
1650 | for the PLD database have requested it in magnetic form without software,\r | |
1651 | which means they are already running it without software, without\r | |
1652 | anything demonstrated at the Workshop.\r | |
1653 | \r | |
1654 | What cannot practically be done is go back and reconvert and re-encode\r | |
1655 | data, a time-consuming and extremely costly enterprise. CALALUCA sees\r | |
1656 | PLD as a database that can, and should, be run under a variety of\r | |
1657 | retrieval softwares. This will permit the widest possible searches. \r | |
1658 | Consequently, the need to produce a CD-ROM of PLD, as well as to develop\r | |
1659 | software that could handle some 1.3 gigabyte of heavily encoded text,\r | |
1660 | developed out of conversations with collection development and reference\r | |
1661 | librarians who wanted software both compassionate enough for the\r | |
1662 | pedestrian but also capable of incorporating the most detailed\r | |
1663 | lexicographical studies that a user desires to conduct. In the end, the\r | |
1664 | encoding and conversion of the data will prove the most enduring\r | |
1665 | testament to the value of the project.\r | |
1666 | \r | |
1667 | The encoding of the database was also a hard-fought issue: Did the\r | |
1668 | database need to be encoded? Were there normative structures for encoding\r | |
1669 | humanist texts? Should it be SGML? What about the TEI--will it last,\r | |
1670 | will it prove useful? CALALUCA expressed some minor doubts as to whether\r | |
1671 | a data bank can be fully TEI-conformant. Every effort can be made, but\r | |
1672 | in the end to be TEI-conformant means to accept the need to make some\r | |
1673 | firm encoding decisions that can, indeed, be disputed. The TEI points\r | |
1674 | the publisher in a proper direction but does not presume to make all the\r | |
1675 | decisions for him or her. Essentially, the goal of encoding was to\r | |
1676 | eliminate, as much as possible, the hindrances to information-networking,\r | |
1677 | so that if an institution acquires a database, everybody associated with\r | |
1678 | the institution can have access to it.\r | |
1679 | \r | |
1680 | CALALUCA demonstrated a portion of Volume 160, because it had the most\r | |
1681 | anomalies in it. The software was created by Electronic Book\r | |
1682 | Technologies of Providence, RI, and is called Dynatext. The software\r | |
1683 | works only with SGML-coded data.\r | |
1684 | \r | |
1685 | Viewing a table of contents on the screen, the audience saw how Dynatext\r | |
1686 | treats each element as a book and attempts to simplify movement through a\r | |
1687 | volume. Familiarity with the Patrologia in print (i.e., the text, its\r | |
1688 | source, and the editions) will make the machine-readable versions highly\r | |
1689 | useful. (Software with a Windows application was sought for PLD,\r | |
1690 | CALALUCA said, because this was the main trend for scholarly use.)\r | |
1691 | \r | |
1692 | CALALUCA also demonstrated how a user can perform a variety of searches\r | |
1693 | and quickly move to any part of a volume; the look-up screen provides\r | |
1694 | some basic, simple word-searching. \r | |
1695 | \r | |
1696 | CALALUCA argued that one of the major difficulties is not the software. \r | |
1697 | Rather, in creating a product that will be used by scholars representing\r | |
1698 | a broad spectrum of computer sophistication, user documentation proves\r | |
1699 | to be the most important service one can provide.\r | |
1700 | \r | |
1701 | CALALUCA next illustrated a truncated search under mysterium within ten\r | |
1702 | words of virtus and how one would be able to find its contents throughout\r | |
1703 | the entire database. He said that the exciting thing about PLD is that\r | |
1704 | many of the applications in the retrieval software being written for it\r | |
1705 | will exceed the capabilities of the software employed now for the CD-ROM\r | |
1706 | version. The CD-ROM faces genuine limitations, in terms of speed and\r | |
1707 | comprehensiveness, in the creation of a retrieval software to run it. \r | |
1708 | CALALUCA said he hoped that individual scholars will download the data,\r | |
1709 | if they wish, to their personal computers, and have ready access to\r | |
1710 | important texts on a constant basis, which they will be able to use in\r | |
1711 | their research and from which they might even be able to publish.\r | |
1712 | \r | |
1713 | (CALALUCA explained that the blue numbers represented Migne's column numbers,\r | |
1714 | which are the standard scholarly references. Pulling up a note, he stated\r | |
1715 | that these texts were heavily edited and the image files would appear simply\r | |
1716 | as a note as well, so that one could quickly access an image.)\r | |
1717 | \r | |
1718 | ******\r | |
1719 | \r | |
1720 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
1721 | FLEISCHHAUER/ERWAY * Several problems with which AM is still wrestling *\r | |
1722 | Various search and retrieval capabilities * Illustration of automatic\r | |
1723 | stemming and a truncated search * AM's attempt to find ways to connect\r | |
1724 | cataloging to the texts * AM's gravitation towards SGML * Striking a\r | |
1725 | balance between quantity and quality * How AM furnishes users recourse to\r | |
1726 | images * Conducting a search in a full-text environment * Macintosh and\r | |
1727 | IBM prototypes of AM * Multimedia aspects of AM *\r | |
1728 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
1729 | \r | |
1730 | A demonstration of American Memory by its coordinator, Carl FLEISCHHAUER,\r | |
1731 | and Ricky ERWAY, associate coordinator, Library of Congress, concluded\r | |
1732 | the morning session. Beginning with a collection of broadsides from the\r | |
1733 | Continental Congress and the Constitutional Convention, the only text\r | |
1734 | collection in a presentable form at the time of the Workshop, FLEISCHHAUER\r | |
1735 | highlighted several of the problems with which AM is still wrestling.\r | |
1736 | (In its final form, the disk will contain two collections, not only the\r | |
1737 | broadsides but also the full text with illustrations of a set of\r | |
1738 | approximately 300 African-American pamphlets from the period 1870 to 1910.)\r | |
1739 | \r | |
1740 | As FREEMAN had explained earlier, AM has attempted to use a small amount\r | |
1741 | of interpretation to introduce collections. In the present case, the\r | |
1742 | contractor, a company named Quick Source, in Silver Spring, MD., used\r | |
1743 | software called Toolbook and put together a modestly interactive\r | |
1744 | introduction to the collection. Like the two preceding speakers,\r | |
1745 | FLEISCHHAUER argued that the real asset was the underlying collection.\r | |
1746 | \r | |
1747 | FLEISCHHAUER proceeded to describe various search and retrieval\r | |
1748 | capabilities while ERWAY worked the computer. In this particular package\r | |
1749 | the "go to" pull-down allowed the user in effect to jump out of Toolbook,\r | |
1750 | where the interactive program was located, and enter the third-party\r | |
1751 | software used by AM for this text collection, which is called Personal\r | |
1752 | Librarian. This was the Windows version of Personal Librarian, a\r | |
1753 | software application put together by a company in Rockville, Md.\r | |
1754 | \r | |
1755 | Since the broadsides came from the Revolutionary War period, a search was\r | |
1756 | conducted using the words British or war, with the default operator reset\r | |
1757 | as or. FLEISCHHAUER demonstrated both automatic stemming (which finds\r | |
1758 | other forms of the same root) and a truncated search. One of Personal\r | |
1759 | Librarian's strongest features, the relevance ranking, was represented by\r | |
1760 | a chart that indicated how often words being sought appeared in\r | |
1761 | documents, with the one receiving the most "hits" obtaining the highest\r | |
1762 | score. The "hit list" that is supplied takes the relevance ranking into\r | |
1763 | account, making the first hit, in effect, the one the software has\r | |
1764 | selected as the most relevant example.\r | |
1765 | \r | |
1766 | While in the text of one of the broadside documents, FLEISCHHAUER\r | |
1767 | remarked AM's attempt to find ways to connect cataloging to the texts,\r | |
1768 | which it does in different ways in different manifestations. In the case\r | |
1769 | shown, the cataloging was pasted on: AM took MARC records that were\r | |
1770 | written as on-line records right into one of the Library's mainframe\r | |
1771 | retrieval programs, pulled them out, and handed them off to the contractor,\r | |
1772 | who massaged them somewhat to display them in the manner shown. One of\r | |
1773 | AM's questions is, Does the cataloguing normally performed in the mainframe\r | |
1774 | work in this context, or had AM ought to think through adjustments?\r | |
1775 | \r | |
1776 | FLEISCHHAUER made the additional point that, as far as the text goes, AM\r | |
1777 | has gravitated towards SGML (he pointed to the boldface in the upper part\r | |
1778 | of the screen). Although extremely limited in its ability to translate\r | |
1779 | or interpret SGML, Personal Librarian will furnish both bold and italics\r | |
1780 | on screen; a fairly easy thing to do, but it is one of the ways in which\r | |
1781 | SGML is useful.\r | |
1782 | \r | |
1783 | Striking a balance between quantity and quality has been a major concern\r | |
1784 | of AM, with accuracy being one of the places where project staff have\r | |
1785 | felt that less than 100-percent accuracy was not unacceptable. \r | |
1786 | FLEISCHHAUER cited the example of the standard of the rekeying industry,\r | |
1787 | namely 99.95 percent; as one service bureau informed him, to go from\r | |
1788 | 99.95 to 100 percent would double the cost.\r | |
1789 | \r | |
1790 | FLEISCHHAUER next demonstrated how AM furnishes users recourse to images,\r | |
1791 | and at the same time recalled LESK's pointed question concerning the\r | |
1792 | number of people who would look at those images and the number who would\r | |
1793 | work only with the text. If the implication of LESK's question was\r | |
1794 | sound, FLEISCHHAUER said, it raised the stakes for text accuracy and\r | |
1795 | reduced the value of the strategy for images.\r | |
1796 | \r | |
1797 | Contending that preservation is always a bugaboo, FLEISCHHAUER\r | |
1798 | demonstrated several images derived from a scan of a preservation\r | |
1799 | microfilm that AM had made. He awarded a grade of C at best, perhaps a\r | |
1800 | C minus or a C plus, for how well it worked out. Indeed, the matter of\r | |
1801 | learning if other people had better ideas about scanning in general, and,\r | |
1802 | in particular, scanning from microfilm, was one of the factors that drove\r | |
1803 | AM to attempt to think through the agenda for the Workshop. Skew, for\r | |
1804 | example, was one of the issues that AM in its ignorance had not reckoned\r | |
1805 | would prove so difficult.\r | |
1806 | \r | |
1807 | Further, the handling of images of the sort shown, in a desktop computer\r | |
1808 | environment, involved a considerable amount of zooming and scrolling. \r | |
1809 | Ultimately, AM staff feel that perhaps the paper copy that is printed out\r | |
1810 | might be the most useful one, but they remain uncertain as to how much\r | |
1811 | on-screen reading users will do.\r | |
1812 | \r | |
1813 | Returning to the text, FLEISCHHAUER asked viewers to imagine a person who\r | |
1814 | might be conducting a search in a full-text environment. With this\r | |
1815 | scenario, he proceeded to illustrate other features of Personal Librarian\r | |
1816 | that he considered helpful; for example, it provides the ability to\r | |
1817 | notice words as one reads. Clicking the "include" button on the bottom\r | |
1818 | of the search window pops the words that have been highlighted into the\r | |
1819 | search. Thus, a user can refine the search as he or she reads,\r | |
1820 | re-executing the search and continuing to find things in the quest for\r | |
1821 | materials. This software not only contains relevance ranking, Boolean\r | |
1822 | operators, and truncation, it also permits one to perform word algebra,\r | |
1823 | so to say, where one puts two or three words in parentheses and links\r | |
1824 | them with one Boolean operator and then a couple of words in another set\r | |
1825 | of parentheses and asks for things within so many words of others.\r | |
1826 | \r | |
1827 | Until they became acquainted recently with some of the work being done in\r | |
1828 | classics, the AM staff had not realized that a large number of the\r | |
1829 | projects that involve electronic texts were being done by people with a\r | |
1830 | profound interest in language and linguistics. Their search strategies\r | |
1831 | and thinking are oriented to those fields, as is shown in particular by\r | |
1832 | the Perseus example. As amateur historians, the AM staff were thinking\r | |
1833 | more of searching for concepts and ideas than for particular words. \r | |
1834 | Obviously, FLEISCHHAUER conceded, searching for concepts and ideas and\r | |
1835 | searching for words may be two rather closely related things.\r | |
1836 | \r | |
1837 | While displaying several images, FLEISCHHAUER observed that the Macintosh\r | |
1838 | prototype built by AM contains a greater diversity of formats. Echoing a\r | |
1839 | previous speaker, he said that it was easier to stitch things together in\r | |
1840 | the Macintosh, though it tended to be a little more anemic in search and\r | |
1841 | retrieval. AM, therefore, increasingly has been investigating\r | |
1842 | sophisticated retrieval engines in the IBM format.\r | |
1843 | \r | |
1844 | FLEISCHHAUER demonstrated several additional examples of the prototype\r | |
1845 | interfaces: One was AM's metaphor for the network future, in which a\r | |
1846 | kind of reading-room graphic suggests how one would be able to go around\r | |
1847 | to different materials. AM contains a large number of photographs in\r | |
1848 | analog video form worked up from a videodisc, which enable users to make\r | |
1849 | copies to print or incorporate in digital documents. A frame-grabber is\r | |
1850 | built into the system, making it possible to bring an image into a window\r | |
1851 | and digitize or print it out.\r | |
1852 | \r | |
1853 | FLEISCHHAUER next demonstrated sound recording, which included texts. \r | |
1854 | Recycled from a previous project, the collection included sixty 78-rpm\r | |
1855 | phonograph records of political speeches that were made during and\r | |
1856 | immediately after World War I. These constituted approximately three\r | |
1857 | hours of audio, as AM has digitized it, which occupy 150 megabytes on a\r | |
1858 | CD. Thus, they are considerably compressed. From the catalogue card,\r | |
1859 | FLEISCHHAUER proceeded to a transcript of a speech with the audio\r | |
1860 | available and with highlighted text following it as it played.\r | |
1861 | A photograph has been added and a transcription made.\r | |
1862 | \r | |
1863 | Considerable value has been added beyond what the Library of Congress\r | |
1864 | normally would do in cataloguing a sound recording, which raises several\r | |
1865 | questions for AM concerning where to draw lines about how much value it can\r | |
1866 | afford to add and at what point, perhaps, this becomes more than AM could\r | |
1867 | reasonably do or reasonably wish to do. FLEISCHHAUER also demonstrated\r | |
1868 | a motion picture. As FREEMAN had reported earlier, the motion picture\r | |
1869 | materials have proved the most popular, not surprisingly. This says more\r | |
1870 | about the medium, he thought, than about AM's presentation of it.\r | |
1871 | \r | |
1872 | Because AM's goal was to bring together things that could be used by\r | |
1873 | historians or by people who were curious about history,\r | |
1874 | turn-of-the-century footage seemed to represent the most appropriate\r | |
1875 | collections from the Library of Congress in motion pictures. These were\r | |
1876 | the very first films made by Thomas Edison's company and some others at\r | |
1877 | that time. The particular example illustrated was a Biograph film,\r | |
1878 | brought in with a frame-grabber into a window. A single videodisc\r | |
1879 | contains about fifty titles and pieces of film from that period, all of\r | |
1880 | New York City. Taken together, AM believes, they provide an interesting\r | |
1881 | documentary resource.\r | |
1882 | \r | |
1883 | ******\r | |
1884 | \r | |
1885 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
1886 | DISCUSSION * Using the frame-grabber in AM * Volume of material processed\r | |
1887 | and to be processed * Purpose of AM within LC * Cataloguing and the\r | |
1888 | nature of AM's material * SGML coding and the question of quality versus\r | |
1889 | quantity *\r | |
1890 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
1891 | \r | |
1892 | During the question-and-answer period that followed FLEISCHHAUER's\r | |
1893 | presentation, several clarifications were made.\r | |
1894 | \r | |
1895 | AM is bringing in motion pictures from a videodisc. The frame-grabber\r | |
1896 | devices create a window on a computer screen, which permits users to\r | |
1897 | digitize a single frame of the movie or one of the photographs. It\r | |
1898 | produces a crude, rough-and-ready image that high school students can\r | |
1899 | incorporate into papers, and that has worked very nicely in this way.\r | |
1900 | \r | |
1901 | Commenting on FLEISCHHAUER's assertion that AM was looking more at\r | |
1902 | searching ideas than words, MYLONAS argued that without words an idea\r | |
1903 | does not exist. FLEISCHHAUER conceded that he ought to have articulated\r | |
1904 | his point more clearly. MYLONAS stated that they were in fact both\r | |
1905 | talking about the same thing. By searching for words and by forcing\r | |
1906 | people to focus on the word, the Perseus Project felt that they would get\r | |
1907 | them to the idea. The way one reviews results is tailored more to one\r | |
1908 | kind of user than another.\r | |
1909 | \r | |
1910 | Concerning the total volume of material that has been processed in this\r | |
1911 | way, AM at this point has in retrievable form seven or eight collections,\r | |
1912 | all of them photographic. In the Macintosh environment, for example,\r | |
1913 | there probably are 35,000-40,000 photographs. The sound recordings\r | |
1914 | number sixty items. The broadsides number about 300 items. There are\r | |
1915 | 500 political cartoons in the form of drawings. The motion pictures, as\r | |
1916 | individual items, number sixty to seventy.\r | |
1917 | \r | |
1918 | AM also has a manuscript collection, the life history portion of one of\r | |
1919 | the federal project series, which will contain 2,900 individual\r | |
1920 | documents, all first-person narratives. AM has in process about 350\r | |
1921 | African-American pamphlets, or about 12,000 printed pages for the period\r | |
1922 | 1870-1910. Also in the works are some 4,000 panoramic photographs. AM\r | |
1923 | has recycled a fair amount of the work done by LC's Prints and\r | |
1924 | Photographs Division during the Library's optical disk pilot project in\r | |
1925 | the 1980s. For example, a special division of LC has tooled up and\r | |
1926 | thought through all the ramifications of electronic presentation of\r | |
1927 | photographs. Indeed, they are wheeling them out in great barrel loads. \r | |
1928 | The purpose of AM within the Library, it is hoped, is to catalyze several\r | |
1929 | of the other special collection divisions which have no particular\r | |
1930 | experience with, in some cases, mixed feelings about, an activity such as\r | |
1931 | AM. Moreover, in many cases the divisions may be characterized as not\r | |
1932 | only lacking experience in "electronifying" things but also in automated\r | |
1933 | cataloguing. MARC cataloguing as practiced in the United States is\r | |
1934 | heavily weighted toward the description of monograph and serial\r | |
1935 | materials, but is much thinner when one enters the world of manuscripts\r | |
1936 | and things that are held in the Library's music collection and other\r | |
1937 | units. In response to a comment by LESK, that AM's material is very\r | |
1938 | heavily photographic, and is so primarily because individual records have\r | |
1939 | been made for each photograph, FLEISCHHAUER observed that an item-level\r | |
1940 | catalog record exists, for example, for each photograph in the Detroit\r | |
1941 | Publishing collection of 25,000 pictures. In the case of the Federal\r | |
1942 | Writers Project, for which nearly 3,000 documents exist, representing\r | |
1943 | information from twenty-six different states, AM with the assistance of\r | |
1944 | Karen STUART of the Manuscript Division will attempt to find some way not\r | |
1945 | only to have a collection-level record but perhaps a MARC record for each\r | |
1946 | state, which will then serve as an umbrella for the 100-200 documents\r | |
1947 | that come under it. But that drama remains to be enacted. The AM staff\r | |
1948 | is conservative and clings to cataloguing, though of course visitors tout\r | |
1949 | artificial intelligence and neural networks in a manner that suggests that\r | |
1950 | perhaps one need not have cataloguing or that much of it could be put aside.\r | |
1951 | \r | |
1952 | The matter of SGML coding, FLEISCHHAUER conceded, returned the discussion\r | |
1953 | to the earlier treated question of quality versus quantity in the Library\r | |
1954 | of Congress. Of course, text conversion can be done with 100-percent\r | |
1955 | accuracy, but it means that when one's holdings are as vast as LC's only\r | |
1956 | a tiny amount will be exposed, whereas permitting lower levels of\r | |
1957 | accuracy can lead to exposing or sharing larger amounts, but with the\r | |
1958 | quality correspondingly impaired.\r | |
1959 | \r | |
1960 | ******\r | |
1961 | \r | |
1962 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
1963 | TWOHIG * A contrary experience concerning electronic options * Volume of\r | |
1964 | material in the Washington papers and a suggestion of David Packard *\r | |
1965 | Implications of Packard's suggestion * Transcribing the documents for the\r | |
1966 | CD-ROM * Accuracy of transcriptions * The CD-ROM edition of the Founding\r | |
1967 | Fathers documents *\r | |
1968 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
1969 | \r | |
1970 | Finding encouragement in a comment of MICHELSON's from the morning\r | |
1971 | session--that numerous people in the humanities were choosing electronic\r | |
1972 | options to do their work--Dorothy TWOHIG, editor, The Papers of George\r | |
1973 | Washington, opened her illustrated talk by noting that her experience\r | |
1974 | with literary scholars and numerous people in editing was contrary to\r | |
1975 | MICHELSON's. TWOHIG emphasized literary scholars' complete ignorance of\r | |
1976 | the technological options available to them or their reluctance or, in\r | |
1977 | some cases, their downright hostility toward these options.\r | |
1978 | \r | |
1979 | After providing an overview of the five Founding Fathers projects\r | |
1980 | (Jefferson at Princeton, Franklin at Yale, John Adams at the\r | |
1981 | Massachusetts Historical Society, and Madison down the hall from her at\r | |
1982 | the University of Virginia), TWOHIG observed that the Washington papers,\r | |
1983 | like all of the projects, include both sides of the Washington\r | |
1984 | correspondence and deal with some 135,000 documents to be published with\r | |
1985 | extensive annotation in eighty to eighty-five volumes, a project that\r | |
1986 | will not be completed until well into the next century. Thus, it was\r | |
1987 | with considerable enthusiasm several years ago that the Washington Papers\r | |
1988 | Project (WPP) greeted David Packard's suggestion that the papers of the\r | |
1989 | Founding Fathers could be published easily and inexpensively, and to the\r | |
1990 | great benefit of American scholarship, via CD-ROM.\r | |
1991 | \r | |
1992 | In pragmatic terms, funding from the Packard Foundation would expedite\r | |
1993 | the transcription of thousands of documents waiting to be put on disk in\r | |
1994 | the WPP offices. Further, since the costs of collecting, editing, and\r | |
1995 | converting the Founding Fathers documents into letterpress editions were\r | |
1996 | running into the millions of dollars, and the considerable staffs\r | |
1997 | involved in all of these projects were devoting their careers to\r | |
1998 | producing the work, the Packard Foundation's suggestion had a\r | |
1999 | revolutionary aspect: Transcriptions of the entire corpus of the\r | |
2000 | Founding Fathers papers would be available on CD-ROM to public and\r | |
2001 | college libraries, even high schools, at a fraction of the cost--\r | |
2002 | $100-$150 for the annual license fee--to produce a limited university\r | |
2003 | press run of 1,000 of each volume of the published papers at $45-$150 per\r | |
2004 | printed volume. Given the current budget crunch in educational systems\r | |
2005 | and the corresponding constraints on librarians in smaller institutions\r | |
2006 | who wish to add these volumes to their collections, producing the\r | |
2007 | documents on CD-ROM would likely open a greatly expanded audience for the\r | |
2008 | papers. TWOHIG stressed, however, that development of the Founding\r | |
2009 | Fathers CD-ROM is still in its infancy. Serious software problems remain\r | |
2010 | to be resolved before the material can be put into readable form. \r | |
2011 | \r | |
2012 | Funding from the Packard Foundation resulted in a major push to\r | |
2013 | transcribe the 75,000 or so documents of the Washington papers remaining\r | |
2014 | to be transcribed onto computer disks. Slides illustrated several of the\r | |
2015 | problems encountered, for example, the present inability of CD-ROM to\r | |
2016 | indicate the cross-outs (deleted material) in eighteenth century\r | |
2017 | documents. TWOHIG next described documents from various periods in the\r | |
2018 | eighteenth century that have been transcribed in chronological order and\r | |
2019 | delivered to the Packard offices in California, where they are converted\r | |
2020 | to the CD-ROM, a process that is expected to consume five years to\r | |
2021 | complete (that is, reckoning from David Packard's suggestion made several\r | |
2022 | years ago, until about July 1994). TWOHIG found an encouraging\r | |
2023 | indication of the project's benefits in the ongoing use made by scholars\r | |
2024 | of the search functions of the CD-ROM, particularly in reducing the time\r | |
2025 | spent in manually turning the pages of the Washington papers.\r | |
2026 | \r | |
2027 | TWOHIG next furnished details concerning the accuracy of transcriptions. \r | |
2028 | For instance, the insertion of thousands of documents on the CD-ROM\r | |
2029 | currently does not permit each document to be verified against the\r | |
2030 | original manuscript several times as in the case of documents that appear\r | |
2031 | in the published edition. However, the transcriptions receive a cursory\r | |
2032 | check for obvious typos, the misspellings of proper names, and other\r | |
2033 | errors from the WPP CD-ROM editor. Eventually, all documents that appear\r | |
2034 | in the electronic version will be checked by project editors. Although\r | |
2035 | this process has met with opposition from some of the editors on the\r | |
2036 | grounds that imperfect work may leave their offices, the advantages in\r | |
2037 | making this material available as a research tool outweigh fears about the\r | |
2038 | misspelling of proper names and other relatively minor editorial matters.\r | |
2039 | \r | |
2040 | Completion of all five Founding Fathers projects (i.e., retrievability\r | |
2041 | and searchability of all of the documents by proper names, alternate\r | |
2042 | spellings, or varieties of subjects) will provide one of the richest\r | |
2043 | sources of this size for the history of the United States in the latter\r | |
2044 | part of the eighteenth century. Further, publication on CD-ROM will\r | |
2045 | allow editors to include even minutiae, such as laundry lists, not\r | |
2046 | included in the printed volumes.\r | |
2047 | \r | |
2048 | It seems possible that the extensive annotation provided in the printed\r | |
2049 | volumes eventually will be added to the CD-ROM edition, pending\r | |
2050 | negotiations with the publishers of the papers. At the moment, the\r | |
2051 | Founding Fathers CD-ROM is accessible only on the IBYCUS, a computer\r | |
2052 | developed out of the Thesaurus Linguae Graecae project and designed for\r | |
2053 | the use of classical scholars. There are perhaps 400 IBYCUS computers in\r | |
2054 | the country, most of which are in university classics departments. \r | |
2055 | Ultimately, it is anticipated that the CD-ROM edition of the Founding\r | |
2056 | Fathers documents will run on any IBM-compatible or Macintosh computer\r | |
2057 | with a CD-ROM drive. Numerous changes in the software will also occur\r | |
2058 | before the project is completed. (Editor's note: an IBYCUS was\r | |
2059 | unavailable to demonstrate the CD-ROM.)\r | |
2060 | \r | |
2061 | ******\r | |
2062 | \r | |
2063 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
2064 | DISCUSSION * Several additional features of WPP clarified *\r | |
2065 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
2066 | \r | |
2067 | Discussion following TWOHIG's presentation served to clarify several\r | |
2068 | additional features, including (1) that the project's primary\r | |
2069 | intellectual product consists in the electronic transcription of the\r | |
2070 | material; (2) that the text transmitted to the CD-ROM people is not\r | |
2071 | marked up; (3) that cataloging and subject-indexing of the material\r | |
2072 | remain to be worked out (though at this point material can be retrieved\r | |
2073 | by name); and (4) that because all the searching is done in the hardware,\r | |
2074 | the IBYCUS is designed to read a CD-ROM which contains only sequential\r | |
2075 | text files. Technically, it then becomes very easy to read the material\r | |
2076 | off and put it on another device.\r | |
2077 | \r | |
2078 | ******\r | |
2079 | \r | |
2080 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
2081 | LEBRON * Overview of the history of the joint project between AAAS and\r | |
2082 | OCLC * Several practices the on-line environment shares with traditional\r | |
2083 | publishing on hard copy * Several technical and behavioral barriers to\r | |
2084 | electronic publishing * How AAAS and OCLC arrived at the subject of\r | |
2085 | clinical trials * Advantages of the electronic format and other features\r | |
2086 | of OJCCT * An illustrated tour of the journal *\r | |
2087 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
2088 | \r | |
2089 | Maria LEBRON, managing editor, The Online Journal of Current Clinical\r | |
2090 | Trials (OJCCT), presented an illustrated overview of the history of the\r | |
2091 | joint project between the American Association for the Advancement of\r | |
2092 | Science (AAAS) and the Online Computer Library Center, Inc. (OCLC). The\r | |
2093 | joint venture between AAAS and OCLC owes its beginning to a\r | |
2094 | reorganization launched by the new chief executive officer at OCLC about\r | |
2095 | three years ago and combines the strengths of these two disparate\r | |
2096 | organizations. In short, OJCCT represents the process of scholarly\r | |
2097 | publishing on line.\r | |
2098 | \r | |
2099 | LEBRON next discussed several practices the on-line environment shares\r | |
2100 | with traditional publishing on hard copy--for example, peer review of\r | |
2101 | manuscripts--that are highly important in the academic world. LEBRON\r | |
2102 | noted in particular the implications of citation counts for tenure\r | |
2103 | committees and grants committees. In the traditional hard-copy\r | |
2104 | environment, citation counts are readily demonstrable, whereas the\r | |
2105 | on-line environment represents an ethereal medium to most academics.\r | |
2106 | \r | |
2107 | LEBRON remarked several technical and behavioral barriers to electronic\r | |
2108 | publishing, for instance, the problems in transmission created by special\r | |
2109 | characters or by complex graphics and halftones. In addition, she noted\r | |
2110 | economic limitations such as the storage costs of maintaining back issues\r | |
2111 | and market or audience education.\r | |
2112 | \r | |
2113 | Manuscripts cannot be uploaded to OJCCT, LEBRON explained, because it is\r | |
2114 | not a bulletin board or E-mail, forms of electronic transmission of\r | |
2115 | information that have created an ambience clouding people's understanding\r | |
2116 | of what the journal is attempting to do. OJCCT, which publishes\r | |
2117 | peer-reviewed medical articles dealing with the subject of clinical\r | |
2118 | trials, includes text, tabular material, and graphics, although at this\r | |
2119 | time it can transmit only line illustrations.\r | |
2120 | \r | |
2121 | Next, LEBRON described how AAAS and OCLC arrived at the subject of\r | |
2122 | clinical trials: It is 1) a highly statistical discipline that 2) does\r | |
2123 | not require halftones but can satisfy the needs of its audience with line\r | |
2124 | illustrations and graphic material, and 3) there is a need for the speedy\r | |
2125 | dissemination of high-quality research results. Clinical trials are\r | |
2126 | research activities that involve the administration of a test treatment\r | |
2127 | to some experimental unit in order to test its usefulness before it is\r | |
2128 | made available to the general population. LEBRON proceeded to give\r | |
2129 | additional information on OJCCT concerning its editor-in-chief, editorial\r | |
2130 | board, editorial content, and the types of articles it publishes\r | |
2131 | (including peer-reviewed research reports and reviews), as well as\r | |
2132 | features shared by other traditional hard-copy journals.\r | |
2133 | \r | |
2134 | Among the advantages of the electronic format are faster dissemination of\r | |
2135 | information, including raw data, and the absence of space constraints\r | |
2136 | because pages do not exist. (This latter fact creates an interesting\r | |
2137 | situation when it comes to citations.) Nor are there any issues. AAAS's\r | |
2138 | capacity to download materials directly from the journal to a\r | |
2139 | subscriber's printer, hard drive, or floppy disk helps ensure highly\r | |
2140 | accurate transcription. Other features of OJCCT include on-screen alerts\r | |
2141 | that allow linkage of subsequently published documents to the original\r | |
2142 | documents; on-line searching by subject, author, title, etc.; indexing of\r | |
2143 | every single word that appears in an article; viewing access to an\r | |
2144 | article by component (abstract, full text, or graphs); numbered\r | |
2145 | paragraphs to replace page counts; publication in Science every thirty\r | |
2146 | days of indexing of all articles published in the journal;\r | |
2147 | typeset-quality screens; and Hypertext links that enable subscribers to\r | |
2148 | bring up Medline abstracts directly without leaving the journal.\r | |
2149 | \r | |
2150 | After detailing the two primary ways to gain access to the journal,\r | |
2151 | through the OCLC network and Compuserv if one desires graphics or through\r | |
2152 | the Internet if just an ASCII file is desired, LEBRON illustrated the\r | |
2153 | speedy editorial process and the coding of the document using SGML tags\r | |
2154 | after it has been accepted for publication. She also gave an illustrated\r | |
2155 | tour of the journal, its search-and-retrieval capabilities in particular,\r | |
2156 | but also including problems associated with scanning in illustrations,\r | |
2157 | and the importance of on-screen alerts to the medical profession re\r | |
2158 | retractions or corrections, or more frequently, editorials, letters to\r | |
2159 | the editors, or follow-up reports. She closed by inviting the audience\r | |
2160 | to join AAAS on 1 July, when OJCCT was scheduled to go on-line.\r | |
2161 | \r | |
2162 | ******\r | |
2163 | \r | |
2164 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
2165 | DISCUSSION * Additional features of OJCCT *\r | |
2166 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
2167 | \r | |
2168 | In the lengthy discussion that followed LEBRON's presentation, these\r | |
2169 | points emerged:\r | |
2170 | \r | |
2171 | * The SGML text can be tailored as users wish.\r | |
2172 | \r | |
2173 | * All these articles have a fairly simple document definition.\r | |
2174 | \r | |
2175 | * Document-type definitions (DTDs) were developed and given to OJCCT\r | |
2176 | for coding.\r | |
2177 | \r | |
2178 | * No articles will be removed from the journal. (Because there are\r | |
2179 | no back issues, there are no lost issues either. Once a subscriber\r | |
2180 | logs onto the journal he or she has access not only to the currently\r | |
2181 | published materials, but retrospectively to everything that has been\r | |
2182 | published in it. Thus the table of contents grows bigger. The date\r | |
2183 | of publication serves to distinguish between currently published\r | |
2184 | materials and older materials.)\r | |
2185 | \r | |
2186 | * The pricing system for the journal resembles that for most medical\r | |
2187 | journals: for 1992, $95 for a year, plus telecommunications charges\r | |
2188 | (there are no connect time charges); for 1993, $110 for the\r | |
2189 | entire year for single users, though the journal can be put on a\r | |
2190 | local area network (LAN). However, only one person can access the\r | |
2191 | journal at a time. Site licenses may come in the future.\r | |
2192 | \r | |
2193 | * AAAS is working closely with colleagues at OCLC to display\r | |
2194 | mathematical equations on screen.\r | |
2195 | \r | |
2196 | * Without compromising any steps in the editorial process, the\r | |
2197 | technology has reduced the time lag between when a manuscript is\r | |
2198 | originally submitted and the time it is accepted; the review process\r | |
2199 | does not differ greatly from the standard six-to-eight weeks\r | |
2200 | employed by many of the hard-copy journals. The process still\r | |
2201 | depends on people.\r | |
2202 | \r | |
2203 | * As far as a preservation copy is concerned, articles will be\r | |
2204 | maintained on the computer permanently and subscribers, as part of\r | |
2205 | their subscription, will receive a microfiche-quality archival copy\r | |
2206 | of everything published during that year; in addition, reprints can\r | |
2207 | be purchased in much the same way as in a hard-copy environment. \r | |
2208 | Hard copies are prepared but are not the primary medium for the\r | |
2209 | dissemination of the information.\r | |
2210 | \r | |
2211 | * Because OJCCT is not yet on line, it is difficult to know how many\r | |
2212 | people would simply browse through the journal on the screen as\r | |
2213 | opposed to downloading the whole thing and printing it out; a mix of\r | |
2214 | both types of users likely will result.\r | |
2215 | \r | |
2216 | ******\r | |
2217 | \r | |
2218 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
2219 | PERSONIUS * Developments in technology over the past decade * The CLASS\r | |
2220 | Project * Advantages for technology and for the CLASS Project *\r | |
2221 | Developing a network application an underlying assumption of the project\r | |
2222 | * Details of the scanning process * Print-on-demand copies of books *\r | |
2223 | Future plans include development of a browsing tool *\r | |
2224 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
2225 | \r | |
2226 | Lynne PERSONIUS, assistant director, Cornell Information Technologies for\r | |
2227 | Scholarly Information Services, Cornell University, first commented on\r | |
2228 | the tremendous impact that developments in technology over the past ten\r | |
2229 | years--networking, in particular--have had on the way information is\r | |
2230 | handled, and how, in her own case, these developments have counterbalanced\r | |
2231 | Cornell's relative geographical isolation. Other significant technologies\r | |
2232 | include scanners, which are much more sophisticated than they were ten years\r | |
2233 | ago; mass storage and the dramatic savings that result from it in terms of\r | |
2234 | both space and money relative to twenty or thirty years ago; new and\r | |
2235 | improved printing technologies, which have greatly affected the distribution\r | |
2236 | of information; and, of course, digital technologies, whose applicability to\r | |
2237 | library preservation remains at issue.\r | |
2238 | \r | |
2239 | Given that context, PERSONIUS described the College Library Access and\r | |
2240 | Storage System (CLASS) Project, a library preservation project,\r | |
2241 | primarily, and what has been accomplished. Directly funded by the\r | |
2242 | Commission on Preservation and Access and by the Xerox Corporation, which\r | |
2243 | has provided a significant amount of hardware, the CLASS Project has been\r | |
2244 | working with a development team at Xerox to develop a software\r | |
2245 | application tailored to library preservation requirements. Within\r | |
2246 | Cornell, participants in the project have been working jointly with both\r | |
2247 | library and information technologies. The focus of the project has been\r | |
2248 | on reformatting and saving books that are in brittle condition. \r | |
2249 | PERSONIUS showed Workshop participants a brittle book, and described how\r | |
2250 | such books were the result of developments in papermaking around the\r | |
2251 | beginning of the Industrial Revolution. The papermaking process was\r | |
2252 | changed so that a significant amount of acid was introduced into the\r | |
2253 | actual paper itself, which deteriorates as it sits on library shelves.\r | |
2254 | \r | |
2255 | One of the advantages for technology and for the CLASS Project is that\r | |
2256 | the information in brittle books is mostly out of copyright and thus\r | |
2257 | offers an opportunity to work with material that requires library\r | |
2258 | preservation, and to create and work on an infrastructure to save the\r | |
2259 | material. Acknowledging the familiarity of those working in preservation\r | |
2260 | with this information, PERSONIUS noted that several things are being\r | |
2261 | done: the primary preservation technology used today is photocopying of\r | |
2262 | brittle material. Saving the intellectual content of the material is the\r | |
2263 | main goal. With microfilm copy, the intellectual content is preserved on\r | |
2264 | the assumption that in the future the image can be reformatted in any\r | |
2265 | other way that then exists.\r | |
2266 | \r | |
2267 | An underlying assumption of the CLASS Project from the beginning was\r | |
2268 | that it would develop a network application. Project staff scan books\r | |
2269 | at a workstation located in the library, near the brittle material.\r | |
2270 | An image-server filing system is located at a distance from that\r | |
2271 | workstation, and a printer is located in another building. All of the\r | |
2272 | materials digitized and stored on the image-filing system are cataloged\r | |
2273 | in the on-line catalogue. In fact, a record for each of these electronic\r | |
2274 | books is stored in the RLIN database so that a record exists of what is\r | |
2275 | in the digital library throughout standard catalogue procedures. In the\r | |
2276 | future, researchers working from their own workstations in their offices,\r | |
2277 | or their networks, will have access--wherever they might be--through a\r | |
2278 | request server being built into the new digital library. A second\r | |
2279 | assumption is that the preferred means of finding the material will be by\r | |
2280 | looking through a catalogue. PERSONIUS described the scanning process,\r | |
2281 | which uses a prototype scanner being developed by Xerox and which scans a\r | |
2282 | very high resolution image at great speed. Another significant feature,\r | |
2283 | because this is a preservation application, is the placing of the pages\r | |
2284 | that fall apart one for one on the platen. Ordinarily, a scanner could\r | |
2285 | be used with some sort of a document feeder, but because of this\r | |
2286 | application that is not feasible. Further, because CLASS is a\r | |
2287 | preservation application, after the paper replacement is made there, a\r | |
2288 | very careful quality control check is performed. An original book is\r | |
2289 | compared to the printed copy and verification is made, before proceeding,\r | |
2290 | that all of the image, all of the information, has been captured. Then,\r | |
2291 | a new library book is produced: The printed images are rebound by a\r | |
2292 | commercial binder and a new book is returned to the shelf. \r | |
2293 | Significantly, the books returned to the library shelves are beautiful\r | |
2294 | and useful replacements on acid-free paper that should last a long time,\r | |
2295 | in effect, the equivalent of preservation photocopies. Thus, the project\r | |
2296 | has a library of digital books. In essence, CLASS is scanning and\r | |
2297 | storing books as 600 dot-per-inch bit-mapped images, compressed using\r | |
2298 | Group 4 CCITT (i.e., the French acronym for International Consultative\r | |
2299 | Committee for Telegraph and Telephone) compression. They are stored as\r | |
2300 | TIFF files on an optical filing system that is composed of a database\r | |
2301 | used for searching and locating the books and an optical jukebox that\r | |
2302 | stores 64 twelve-inch platters. A very-high-resolution printed copy of\r | |
2303 | these books at 600 dots per inch is created, using a Xerox DocuTech\r | |
2304 | printer to make the paper replacements on acid-free paper.\r | |
2305 | \r | |
2306 | PERSONIUS maintained that the CLASS Project presents an opportunity to\r | |
2307 | introduce people to books as digital images by using a paper medium. \r | |
2308 | Books are returned to the shelves while people are also given the ability\r | |
2309 | to print on demand--to make their own copies of books. (PERSONIUS\r | |
2310 | distributed copies of an engineering journal published by engineering\r | |
2311 | students at Cornell around 1900 as an example of what a print-on-demand\r | |
2312 | copy of material might be like. This very cheap copy would be available\r | |
2313 | to people to use for their own research purposes and would bridge the gap\r | |
2314 | between an electronic work and the paper that readers like to have.) \r | |
2315 | PERSONIUS then attempted to illustrate a very early prototype of\r | |
2316 | networked access to this digital library. Xerox Corporation has\r | |
2317 | developed a prototype of a view station that can send images across the\r | |
2318 | network to be viewed.\r | |
2319 | \r | |
2320 | The particular library brought down for demonstration contained two\r | |
2321 | mathematics books. CLASS is developing and will spend the next year\r | |
2322 | developing an application that allows people at workstations to browse\r | |
2323 | the books. Thus, CLASS is developing a browsing tool, on the assumption\r | |
2324 | that users do not want to read an entire book from a workstation, but\r | |
2325 | would prefer to be able to look through and decide if they would like to\r | |
2326 | have a printed copy of it.\r | |
2327 | \r | |
2328 | ******\r | |
2329 | \r | |
2330 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
2331 | DISCUSSION * Re retrieval software * "Digital file copyright" * Scanning\r | |
2332 | rate during production * Autosegmentation * Criteria employed in\r | |
2333 | selecting books for scanning * Compression and decompression of images *\r | |
2334 | OCR not precluded *\r | |
2335 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
2336 | \r | |
2337 | During the question-and-answer period that followed her presentation,\r | |
2338 | PERSONIUS made these additional points:\r | |
2339 | \r | |
2340 | * Re retrieval software, Cornell is developing a Unix-based server\r | |
2341 | as well as clients for the server that support multiple platforms\r | |
2342 | (Macintosh, IBM and Sun workstations), in the hope that people from\r | |
2343 | any of those platforms will retrieve books; a further operating\r | |
2344 | assumption is that standard interfaces will be used as much as\r | |
2345 | possible, where standards can be put in place, because CLASS\r | |
2346 | considers this retrieval software a library application and would\r | |
2347 | like to be able to look at material not only at Cornell but at other\r | |
2348 | institutions.\r | |
2349 | \r | |
2350 | * The phrase "digital file copyright by Cornell University" was\r | |
2351 | added at the advice of Cornell's legal staff with the caveat that it\r | |
2352 | probably would not hold up in court. Cornell does not want people\r | |
2353 | to copy its books and sell them but would like to keep them\r | |
2354 | available for use in a library environment for library purposes.\r | |
2355 | \r | |
2356 | * In production the scanner can scan about 300 pages per hour,\r | |
2357 | capturing 600 dots per inch.\r | |
2358 | \r | |
2359 | * The Xerox software has filters to scan halftone material and avoid\r | |
2360 | the moire patterns that occur when halftone material is scanned. \r | |
2361 | Xerox has been working on hardware and software that would enable\r | |
2362 | the scanner itself to recognize this situation and deal with it\r | |
2363 | appropriately--a kind of autosegmentation that would enable the\r | |
2364 | scanner to handle halftone material as well as text on a single page.\r | |
2365 | \r | |
2366 | * The books subjected to the elaborate process described above were\r | |
2367 | selected because CLASS is a preservation project, with the first 500\r | |
2368 | books selected coming from Cornell's mathematics collection, because\r | |
2369 | they were still being heavily used and because, although they were\r | |
2370 | in need of preservation, the mathematics library and the mathematics\r | |
2371 | faculty were uncomfortable having them microfilmed. (They wanted a\r | |
2372 | printed copy.) Thus, these books became a logical choice for this\r | |
2373 | project. Other books were chosen by the project's selection committees\r | |
2374 | for experiments with the technology, as well as to meet a demand or need.\r | |
2375 | \r | |
2376 | * Images will be decompressed before they are sent over the line; at\r | |
2377 | this time they are compressed and sent to the image filing system\r | |
2378 | and then sent to the printer as compressed images; they are returned\r | |
2379 | to the workstation as compressed 600-dpi images and the workstation\r | |
2380 | decompresses and scales them for display--an inefficient way to\r | |
2381 | access the material though it works quite well for printing and\r | |
2382 | other purposes.\r | |
2383 | \r | |
2384 | * CLASS is also decompressing on Macintosh and IBM, a slow process\r | |
2385 | right now. Eventually, compression and decompression will take\r | |
2386 | place on an image conversion server. Trade-offs will be made, based\r | |
2387 | on future performance testing, concerning where the file is\r | |
2388 | compressed and what resolution image is sent.\r | |
2389 | \r | |
2390 | * OCR has not been precluded; images are being stored that have been\r | |
2391 | scanned at a high resolution, which presumably would suit them well\r | |
2392 | to an OCR process. Because the material being scanned is about 100\r | |
2393 | years old and was printed with less-than-ideal technologies, very\r | |
2394 | early and preliminary tests have not produced good results. But the\r | |
2395 | project is capturing an image that is of sufficient resolution to be\r | |
2396 | subjected to OCR in the future. Moreover, the system architecture\r | |
2397 | and the system plan have a logical place to store an OCR image if it\r | |
2398 | has been captured. But that is not being done now.\r | |
2399 | \r | |
2400 | ******\r | |
2401 | \r | |
2402 | SESSION III. DISTRIBUTION, NETWORKS, AND NETWORKING: OPTIONS FOR\r | |
2403 | DISSEMINATION\r | |
2404 | \r | |
2405 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
2406 | ZICH * Issues pertaining to CD-ROMs * Options for publishing in CD-ROM *\r | |
2407 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
2408 | \r | |
2409 | Robert ZICH, special assistant to the associate librarian for special\r | |
2410 | projects, Library of Congress, and moderator of this session, first noted\r | |
2411 | the blessed but somewhat awkward circumstance of having four very\r | |
2412 | distinguished people representing networks and networking or at least\r | |
2413 | leaning in that direction, while lacking anyone to speak from the\r | |
2414 | strongest possible background in CD-ROMs. ZICH expressed the hope that\r | |
2415 | members of the audience would join the discussion. He stressed the\r | |
2416 | subtitle of this particular session, "Options for Dissemination," and,\r | |
2417 | concerning CD-ROMs, the importance of determining when it would be wise\r | |
2418 | to consider dissemination in CD-ROM versus networks. A shopping list of\r | |
2419 | issues pertaining to CD-ROMs included: the grounds for selecting\r | |
2420 | commercial publishers, and in-house publication where possible versus\r | |
2421 | nonprofit or government publication. A similar list for networks\r | |
2422 | included: determining when one should consider dissemination through a\r | |
2423 | network, identifying the mechanisms or entities that exist to place items\r | |
2424 | on networks, identifying the pool of existing networks, determining how a\r | |
2425 | producer would choose between networks, and identifying the elements of\r | |
2426 | a business arrangement in a network.\r | |
2427 | \r | |
2428 | Options for publishing in CD-ROM: an outside publisher versus\r | |
2429 | self-publication. If an outside publisher is used, it can be nonprofit,\r | |
2430 | such as the Government Printing Office (GPO) or the National Technical\r | |
2431 | Information Service (NTIS), in the case of government. The pros and cons\r | |
2432 | associated with employing an outside publisher are obvious. Among the\r | |
2433 | pros, there is no trouble getting accepted. One pays the bill and, in\r | |
2434 | effect, goes one's way. Among the cons, when one pays an outside\r | |
2435 | publisher to perform the work, that publisher will perform the work it is\r | |
2436 | obliged to do, but perhaps without the production expertise and skill in\r | |
2437 | marketing and dissemination that some would seek. There is the body of\r | |
2438 | commercial publishers that do possess that kind of expertise in\r | |
2439 | distribution and marketing but that obviously are selective. In\r | |
2440 | self-publication, one exercises full control, but then one must handle\r | |
2441 | matters such as distribution and marketing. Such are some of the options\r | |
2442 | for publishing in the case of CD-ROM.\r | |
2443 | \r | |
2444 | In the case of technical and design issues, which are also important,\r | |
2445 | there are many matters which many at the Workshop already knew a good\r | |
2446 | deal about: retrieval system requirements and costs, what to do about\r | |
2447 | images, the various capabilities and platforms, the trade-offs between\r | |
2448 | cost and performance, concerns about local-area networkability,\r | |
2449 | interoperability, etc.\r | |
2450 | \r | |
2451 | ******\r | |
2452 | \r | |
2453 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
2454 | LYNCH * Creating networked information is different from using networks\r | |
2455 | as an access or dissemination vehicle * Networked multimedia on a large\r | |
2456 | scale does not yet work * Typical CD-ROM publication model a two-edged\r | |
2457 | sword * Publishing information on a CD-ROM in the present world of\r | |
2458 | immature standards * Contrast between CD-ROM and network pricing *\r | |
2459 | Examples demonstrated earlier in the day as a set of insular information\r | |
2460 | gems * Paramount need to link databases * Layering to become increasingly\r | |
2461 | necessary * Project NEEDS and the issues of information reuse and active\r | |
2462 | versus passive use * X-Windows as a way of differentiating between\r | |
2463 | network access and networked information * Barriers to the distribution\r | |
2464 | of networked multimedia information * Need for good, real-time delivery\r | |
2465 | protocols * The question of presentation integrity in client-server\r | |
2466 | computing in the academic world * Recommendations for producing multimedia\r | |
2467 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
2468 | \r | |
2469 | Clifford LYNCH, director, Library Automation, University of California,\r | |
2470 | opened his talk with the general observation that networked information\r | |
2471 | constituted a difficult and elusive topic because it is something just\r | |
2472 | starting to develop and not yet fully understood. LYNCH contended that\r | |
2473 | creating genuinely networked information was different from using\r | |
2474 | networks as an access or dissemination vehicle and was more sophisticated\r | |
2475 | and more subtle. He invited the members of the audience to extrapolate,\r | |
2476 | from what they heard about the preceding demonstration projects, to what\r | |
2477 | sort of a world of electronics information--scholarly, archival,\r | |
2478 | cultural, etc.--they wished to end up with ten or fifteen years from now. \r | |
2479 | LYNCH suggested that to extrapolate directly from these projects would\r | |
2480 | produce unpleasant results.\r | |
2481 | \r | |
2482 | Putting the issue of CD-ROM in perspective before getting into\r | |
2483 | generalities on networked information, LYNCH observed that those engaged\r | |
2484 | in multimedia today who wish to ship a product, so to say, probably do\r | |
2485 | not have much choice except to use CD-ROM: networked multimedia on a\r | |
2486 | large scale basically does not yet work because the technology does not\r | |
2487 | exist. For example, anybody who has tried moving images around over the\r | |
2488 | Internet knows that this is an exciting touch-and-go process, a\r | |
2489 | fascinating and fertile area for experimentation, research, and\r | |
2490 | development, but not something that one can become deeply enthusiastic\r | |
2491 | about committing to production systems at this time.\r | |
2492 | \r | |
2493 | This situation will change, LYNCH said. He differentiated CD-ROM from\r | |
2494 | the practices that have been followed up to now in distributing data on\r | |
2495 | CD-ROM. For LYNCH the problem with CD-ROM is not its portability or its\r | |
2496 | slowness but the two-edged sword of having the retrieval application and\r | |
2497 | the user interface inextricably bound up with the data, which is the\r | |
2498 | typical CD-ROM publication model. It is not a case of publishing data\r | |
2499 | but of distributing a typically stand-alone, typically closed system,\r | |
2500 | all--software, user interface, and data--on a little disk. Hence, all\r | |
2501 | the between-disk navigational issues as well as the impossibility in most\r | |
2502 | cases of integrating data on one disk with that on another. Most CD-ROM\r | |
2503 | retrieval software does not network very gracefully at present. However,\r | |
2504 | in the present world of immature standards and lack of understanding of\r | |
2505 | what network information is or what the ground rules are for creating or\r | |
2506 | using it, publishing information on a CD-ROM does add value in a very\r | |
2507 | real sense.\r | |
2508 | \r | |
2509 | LYNCH drew a contrast between CD-ROM and network pricing and in doing so\r | |
2510 | highlighted something bizarre in information pricing. A large\r | |
2511 | institution such as the University of California has vendors who will\r | |
2512 | offer to sell information on CD-ROM for a price per year in four digits,\r | |
2513 | but for the same data (e.g., an abstracting and indexing database) on\r | |
2514 | magnetic tape, regardless of how many people may use it concurrently,\r | |
2515 | will quote a price in six digits.\r | |
2516 | \r | |
2517 | What is packaged with the CD-ROM in one sense adds value--a complete\r | |
2518 | access system, not just raw, unrefined information--although it is not\r | |
2519 | generally perceived that way. This is because the access software,\r | |
2520 | although it adds value, is viewed by some people, particularly in the\r | |
2521 | university environment where there is a very heavy commitment to\r | |
2522 | networking, as being developed in the wrong direction.\r | |
2523 | \r | |
2524 | Given that context, LYNCH described the examples demonstrated as a set of\r | |
2525 | insular information gems--Perseus, for example, offers nicely linked\r | |
2526 | information, but would be very difficult to integrate with other\r | |
2527 | databases, that is, to link together seamlessly with other source files\r | |
2528 | from other sources. It resembles an island, and in this respect is\r | |
2529 | similar to numerous stand-alone projects that are based on videodiscs,\r | |
2530 | that is, on the single-workstation concept.\r | |
2531 | \r | |
2532 | As scholarship evolves in a network environment, the paramount need will\r | |
2533 | be to link databases. We must link personal databases to public\r | |
2534 | databases, to group databases, in fairly seamless ways--which is\r | |
2535 | extremely difficult in the environments under discussion with copies of\r | |
2536 | databases proliferating all over the place.\r | |
2537 | \r | |
2538 | The notion of layering also struck LYNCH as lurking in several of the\r | |
2539 | projects demonstrated. Several databases in a sense constitute\r | |
2540 | information archives without a significant amount of navigation built in. \r | |
2541 | Educators, critics, and others will want a layered structure--one that\r | |
2542 | defines or links paths through the layers to allow users to reach\r | |
2543 | specific points. In LYNCH's view, layering will become increasingly\r | |
2544 | necessary, and not just within a single resource but across resources\r | |
2545 | (e.g., tracing mythology and cultural themes across several classics\r | |
2546 | databases as well as a database of Renaissance culture). This ability to\r | |
2547 | organize resources, to build things out of multiple other things on the\r | |
2548 | network or select pieces of it, represented for LYNCH one of the key\r | |
2549 | aspects of network information.\r | |
2550 | \r | |
2551 | Contending that information reuse constituted another significant issue,\r | |
2552 | LYNCH commended to the audience's attention Project NEEDS (i.e., National\r | |
2553 | Engineering Education Delivery System). This project's objective is to\r | |
2554 | produce a database of engineering courseware as well as the components\r | |
2555 | that can be used to develop new courseware. In a number of the existing\r | |
2556 | applications, LYNCH said, the issue of reuse (how much one can take apart\r | |
2557 | and reuse in other applications) was not being well considered. He also\r | |
2558 | raised the issue of active versus passive use, one aspect of which is\r | |
2559 | how much information will be manipulated locally by users. Most people,\r | |
2560 | he argued, may do a little browsing and then will wish to print. LYNCH\r | |
2561 | was uncertain how these resources would be used by the vast majority of\r | |
2562 | users in the network environment.\r | |
2563 | \r | |
2564 | LYNCH next said a few words about X-Windows as a way of differentiating\r | |
2565 | between network access and networked information. A number of the\r | |
2566 | applications demonstrated at the Workshop could be rewritten to use X\r | |
2567 | across the network, so that one could run them from any X-capable device-\r | |
2568 | -a workstation, an X terminal--and transact with a database across the\r | |
2569 | network. Although this opens up access a little, assuming one has enough\r | |
2570 | network to handle it, it does not provide an interface to develop a\r | |
2571 | program that conveniently integrates information from multiple databases. \r | |
2572 | X is a viewing technology that has limits. In a real sense, it is just a\r | |
2573 | graphical version of remote log-in across the network. X-type applications\r | |
2574 | represent only one step in the progression towards real access.\r | |
2575 | \r | |
2576 | LYNCH next discussed barriers to the distribution of networked multimedia\r | |
2577 | information. The heart of the problem is a lack of standards to provide\r | |
2578 | the ability for computers to talk to each other, retrieve information,\r | |
2579 | and shuffle it around fairly casually. At the moment, little progress is\r | |
2580 | being made on standards for networked information; for example, present\r | |
2581 | standards do not cover images, digital voice, and digital video. A\r | |
2582 | useful tool kit of exchange formats for basic texts is only now being\r | |
2583 | assembled. The synchronization of content streams (i.e., synchronizing a\r | |
2584 | voice track to a video track, establishing temporal relations between\r | |
2585 | different components in a multimedia object) constitutes another issue\r | |
2586 | for networked multimedia that is just beginning to receive attention.\r | |
2587 | \r | |
2588 | Underlying network protocols also need some work; good, real-time\r | |
2589 | delivery protocols on the Internet do not yet exist. In LYNCH's view,\r | |
2590 | highly important in this context is the notion of networked digital\r | |
2591 | object IDs, the ability of one object on the network to point to another\r | |
2592 | object (or component thereof) on the network. Serious bandwidth issues\r | |
2593 | also exist. LYNCH was uncertain if billion-bit-per-second networks would\r | |
2594 | prove sufficient if numerous people ran video in parallel.\r | |
2595 | \r | |
2596 | LYNCH concluded by offering an issue for database creators to consider,\r | |
2597 | as well as several comments about what might constitute good trial\r | |
2598 | multimedia experiments. In a networked information world the database\r | |
2599 | builder or service builder (publisher) does not exercise the same\r | |
2600 | extensive control over the integrity of the presentation; strange\r | |
2601 | programs "munge" with one's data before the user sees it. Serious\r | |
2602 | thought must be given to what guarantees integrity of presentation. Part\r | |
2603 | of that is related to where one draws the boundaries around a networked\r | |
2604 | information service. This question of presentation integrity in\r | |
2605 | client-server computing has not been stressed enough in the academic\r | |
2606 | world, LYNCH argued, though commercial service providers deal with it\r | |
2607 | regularly.\r | |
2608 | \r | |
2609 | Concerning multimedia, LYNCH observed that good multimedia at the moment\r | |
2610 | is hideously expensive to produce. He recommended producing multimedia\r | |
2611 | with either very high sale value, or multimedia with a very long life\r | |
2612 | span, or multimedia that will have a very broad usage base and whose\r | |
2613 | costs therefore can be amortized among large numbers of users. In this\r | |
2614 | connection, historical and humanistically oriented material may be a good\r | |
2615 | place to start, because it tends to have a longer life span than much of\r | |
2616 | the scientific material, as well as a wider user base. LYNCH noted, for\r | |
2617 | example, that American Memory fits many of the criteria outlined. He\r | |
2618 | remarked the extensive discussion about bringing the Internet or the\r | |
2619 | National Research and Education Network (NREN) into the K-12 environment\r | |
2620 | as a way of helping the American educational system.\r | |
2621 | \r | |
2622 | LYNCH closed by noting that the kinds of applications demonstrated struck\r | |
2623 | him as excellent justifications of broad-scale networking for K-12, but\r | |
2624 | that at this time no "killer" application exists to mobilize the K-12\r | |
2625 | community to obtain connectivity.\r | |
2626 | \r | |
2627 | ******\r | |
2628 | \r | |
2629 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
2630 | DISCUSSION * Dearth of genuinely interesting applications on the network\r | |
2631 | a slow-changing situation * The issue of the integrity of presentation in\r | |
2632 | a networked environment * Several reasons why CD-ROM software does not\r | |
2633 | network *\r | |
2634 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
2635 | \r | |
2636 | During the discussion period that followed LYNCH's presentation, several\r | |
2637 | additional points were made.\r | |
2638 | \r | |
2639 | LYNCH reiterated even more strongly his contention that, historically,\r | |
2640 | once one goes outside high-end science and the group of those who need\r | |
2641 | access to supercomputers, there is a great dearth of genuinely\r | |
2642 | interesting applications on the network. He saw this situation changing\r | |
2643 | slowly, with some of the scientific databases and scholarly discussion\r | |
2644 | groups and electronic journals coming on as well as with the availability\r | |
2645 | of Wide Area Information Servers (WAIS) and some of the databases that\r | |
2646 | are being mounted there. However, many of those things do not seem to\r | |
2647 | have piqued great popular interest. For instance, most high school\r | |
2648 | students of LYNCH's acquaintance would not qualify as devotees of serious\r | |
2649 | molecular biology.\r | |
2650 | \r | |
2651 | Concerning the issue of the integrity of presentation, LYNCH believed\r | |
2652 | that a couple of information providers have laid down the law at least on\r | |
2653 | certain things. For example, his recollection was that the National\r | |
2654 | Library of Medicine feels strongly that one needs to employ the\r | |
2655 | identifier field if he or she is to mount a database commercially. The\r | |
2656 | problem with a real networked environment is that one does not know who\r | |
2657 | is reformatting and reprocessing one's data when one enters a client\r | |
2658 | server mode. It becomes anybody's guess, for example, if the network\r | |
2659 | uses a Z39.50 server, or what clients are doing with one's data. A data\r | |
2660 | provider can say that his contract will only permit clients to have\r | |
2661 | access to his data after he vets them and their presentation and makes\r | |
2662 | certain it suits him. But LYNCH held out little expectation that the\r | |
2663 | network marketplace would evolve in that way, because it required too\r | |
2664 | much prior negotiation.\r | |
2665 | \r | |
2666 | CD-ROM software does not network for a variety of reasons, LYNCH said. \r | |
2667 | He speculated that CD-ROM publishers are not eager to have their products\r | |
2668 | really hook into wide area networks, because they fear it will make their\r | |
2669 | data suppliers nervous. Moreover, until relatively recently, one had to\r | |
2670 | be rather adroit to run a full TCP/IP stack plus applications on a\r | |
2671 | PC-size machine, whereas nowadays it is becoming easier as PCs grow\r | |
2672 | bigger and faster. LYNCH also speculated that software providers had not\r | |
2673 | heard from their customers until the last year or so, or had not heard\r | |
2674 | from enough of their customers.\r | |
2675 | \r | |
2676 | ******\r | |
2677 | \r | |
2678 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
2679 | BESSER * Implications of disseminating images on the network; planning\r | |
2680 | the distribution of multimedia documents poses two critical\r | |
2681 | implementation problems * Layered approach represents the way to deal\r | |
2682 | with users' capabilities * Problems in platform design; file size and its\r | |
2683 | implications for networking * Transmission of megabyte size images\r | |
2684 | impractical * Compression and decompression at the user's end * Promising\r | |
2685 | trends for compression * A disadvantage of using X-Windows * A project at\r | |
2686 | the Smithsonian that mounts images on several networks * \r | |
2687 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
2688 | \r | |
2689 | Howard BESSER, School of Library and Information Science, University of\r | |
2690 | Pittsburgh, spoke primarily about multimedia, focusing on images and the\r | |
2691 | broad implications of disseminating them on the network. He argued that\r | |
2692 | planning the distribution of multimedia documents posed two critical\r | |
2693 | implementation problems, which he framed in the form of two questions: \r | |
2694 | 1) What platform will one use and what hardware and software will users\r | |
2695 | have for viewing of the material? and 2) How can one deliver a\r | |
2696 | sufficiently robust set of information in an accessible format in a\r | |
2697 | reasonable amount of time? Depending on whether network or CD-ROM is the\r | |
2698 | medium used, this question raises different issues of storage,\r | |
2699 | compression, and transmission.\r | |
2700 | \r | |
2701 | Concerning the design of platforms (e.g., sound, gray scale, simple\r | |
2702 | color, etc.) and the various capabilities users may have, BESSER\r | |
2703 | maintained that a layered approach was the way to deal with users'\r | |
2704 | capabilities. A result would be that users with less powerful\r | |
2705 | workstations would simply have less functionality. He urged members of\r | |
2706 | the audience to advocate standards and accompanying software that handle\r | |
2707 | layered functionality across a wide variety of platforms.\r | |
2708 | \r | |
2709 | BESSER also addressed problems in platform design, namely, deciding how\r | |
2710 | large a machine to design for situations when the largest number of users\r | |
2711 | have the lowest level of the machine, and one desires higher\r | |
2712 | functionality. BESSER then proceeded to the question of file size and\r | |
2713 | its implications for networking. He discussed still images in the main. \r | |
2714 | For example, a digital color image that fills the screen of a standard\r | |
2715 | mega-pel workstation (Sun or Next) will require one megabyte of storage\r | |
2716 | for an eight-bit image or three megabytes of storage for a true color or\r | |
2717 | twenty-four-bit image. Lossless compression algorithms (that is,\r | |
2718 | computational procedures in which no data is lost in the process of\r | |
2719 | compressing [and decompressing] an image--the exact bit-representation is\r | |
2720 | maintained) might bring storage down to a third of a megabyte per image,\r | |
2721 | but not much further than that. The question of size makes it difficult\r | |
2722 | to fit an appropriately sized set of these images on a single disk or to\r | |
2723 | transmit them quickly enough on a network.\r | |
2724 | \r | |
2725 | With these full screen mega-pel images that constitute a third of a\r | |
2726 | megabyte, one gets 1,000-3,000 full-screen images on a one-gigabyte disk;\r | |
2727 | a standard CD-ROM represents approximately 60 percent of that. Storing\r | |
2728 | images the size of a PC screen (just 8 bit color) increases storage\r | |
2729 | capacity to 4,000-12,000 images per gigabyte; 60 percent of that gives\r | |
2730 | one the size of a CD-ROM, which in turn creates a major problem. One\r | |
2731 | cannot have full-screen, full-color images with lossless compression; one\r | |
2732 | must compress them or use a lower resolution. For megabyte-size images,\r | |
2733 | anything slower than a T-1 speed is impractical. For example, on a\r | |
2734 | fifty-six-kilobaud line, it takes three minutes to transfer a\r | |
2735 | one-megabyte file, if it is not compressed; and this speed assumes ideal\r | |
2736 | circumstances (no other user contending for network bandwidth). Thus,\r | |
2737 | questions of disk access, remote display, and current telephone\r | |
2738 | connection speed make transmission of megabyte-size images impractical.\r | |
2739 | \r | |
2740 | BESSER then discussed ways to deal with these large images, for example,\r | |
2741 | compression and decompression at the user's end. In this connection, the\r | |
2742 | issues of how much one is willing to lose in the compression process and\r | |
2743 | what image quality one needs in the first place are unknown. But what is\r | |
2744 | known is that compression entails some loss of data. BESSER urged that\r | |
2745 | more studies be conducted on image quality in different situations, for\r | |
2746 | example, what kind of images are needed for what kind of disciplines, and\r | |
2747 | what kind of image quality is needed for a browsing tool, an intermediate\r | |
2748 | viewing tool, and archiving.\r | |
2749 | \r | |
2750 | BESSER remarked two promising trends for compression: from a technical\r | |
2751 | perspective, algorithms that use what is called subjective redundancy\r | |
2752 | employ principles from visual psycho-physics to identify and remove\r | |
2753 | information from the image that the human eye cannot perceive; from an\r | |
2754 | interchange and interoperability perspective, the JPEG (i.e., Joint\r | |
2755 | Photographic Experts Group, an ISO standard) compression algorithms also\r | |
2756 | offer promise. These issues of compression and decompression, BESSER\r | |
2757 | argued, resembled those raised earlier concerning the design of different\r | |
2758 | platforms. Gauging the capabilities of potential users constitutes a\r | |
2759 | primary goal. BESSER advocated layering or separating the images from\r | |
2760 | the applications that retrieve and display them, to avoid tying them to\r | |
2761 | particular software.\r | |
2762 | \r | |
2763 | BESSER detailed several lessons learned from his work at Berkeley with\r | |
2764 | Imagequery, especially the advantages and disadvantages of using\r | |
2765 | X-Windows. In the latter category, for example, retrieval is tied\r | |
2766 | directly to one's data, an intolerable situation in the long run on a\r | |
2767 | networked system. Finally, BESSER described a project of Jim Wallace at\r | |
2768 | the Smithsonian Institution, who is mounting images in a extremely\r | |
2769 | rudimentary way on the Compuserv and Genie networks and is preparing to\r | |
2770 | mount them on America On Line. Although the average user takes over\r | |
2771 | thirty minutes to download these images (assuming a fairly fast modem),\r | |
2772 | nevertheless, images have been downloaded 25,000 times.\r | |
2773 | \r | |
2774 | BESSER concluded his talk with several comments on the business\r | |
2775 | arrangement between the Smithsonian and Compuserv. He contended that not\r | |
2776 | enough is known concerning the value of images.\r | |
2777 | \r | |
2778 | ******\r | |
2779 | \r | |
2780 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
2781 | DISCUSSION * Creating digitized photographic collections nearly\r | |
2782 | impossible except with large organizations like museums * Need for study\r | |
2783 | to determine quality of images users will tolerate *\r | |
2784 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
2785 | \r | |
2786 | During the brief exchange between LESK and BESSER that followed, several\r | |
2787 | clarifications emerged.\r | |
2788 | \r | |
2789 | LESK argued that the photographers were far ahead of BESSER: It is\r | |
2790 | almost impossible to create such digitized photographic collections\r | |
2791 | except with large organizations like museums, because all the\r | |
2792 | photographic agencies have been going crazy about this and will not sign\r | |
2793 | licensing agreements on any sort of reasonable terms. LESK had heard\r | |
2794 | that National Geographic, for example, had tried to buy the right to use\r | |
2795 | some image in some kind of educational production for $100 per image, but\r | |
2796 | the photographers will not touch it. They want accounting and payment\r | |
2797 | for each use, which cannot be accomplished within the system. BESSER\r | |
2798 | responded that a consortium of photographers, headed by a former National\r | |
2799 | Geographic photographer, had started assembling its own collection of\r | |
2800 | electronic reproductions of images, with the money going back to the\r | |
2801 | cooperative.\r | |
2802 | \r | |
2803 | LESK contended that BESSER was unnecessarily pessimistic about multimedia\r | |
2804 | images, because people are accustomed to low-quality images, particularly\r | |
2805 | from video. BESSER urged the launching of a study to determine what\r | |
2806 | users would tolerate, what they would feel comfortable with, and what\r | |
2807 | absolutely is the highest quality they would ever need. Conceding that\r | |
2808 | he had adopted a dire tone in order to arouse people about the issue,\r | |
2809 | BESSER closed on a sanguine note by saying that he would not be in this\r | |
2810 | business if he did not think that things could be accomplished.\r | |
2811 | \r | |
2812 | ******\r | |
2813 | \r | |
2814 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
2815 | LARSEN * Issues of scalability and modularity * Geometric growth of the\r | |
2816 | Internet and the role played by layering * Basic functions sustaining\r | |
2817 | this growth * A library's roles and functions in a network environment *\r | |
2818 | Effects of implementation of the Z39.50 protocol for information\r | |
2819 | retrieval on the library system * The trade-off between volumes of data\r | |
2820 | and its potential usage * A snapshot of current trends *\r | |
2821 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
2822 | \r | |
2823 | Ronald LARSEN, associate director for information technology, University\r | |
2824 | of Maryland at College Park, first addressed the issues of scalability\r | |
2825 | and modularity. He noted the difficulty of anticipating the effects of\r | |
2826 | orders-of-magnitude growth, reflecting on the twenty years of experience\r | |
2827 | with the Arpanet and Internet. Recalling the day's demonstrations of\r | |
2828 | CD-ROM and optical disk material, he went on to ask if the field has yet\r | |
2829 | learned how to scale new systems to enable delivery and dissemination\r | |
2830 | across large-scale networks.\r | |
2831 | \r | |
2832 | LARSEN focused on the geometric growth of the Internet from its inception\r | |
2833 | circa 1969 to the present, and the adjustments required to respond to\r | |
2834 | that rapid growth. To illustrate the issue of scalability, LARSEN\r | |
2835 | considered computer networks as including three generic components: \r | |
2836 | computers, network communication nodes, and communication media. Each\r | |
2837 | component scales (e.g., computers range from PCs to supercomputers;\r | |
2838 | network nodes scale from interface cards in a PC through sophisticated\r | |
2839 | routers and gateways; and communication media range from 2,400-baud\r | |
2840 | dial-up facilities through 4.5-Mbps backbone links, and eventually to\r | |
2841 | multigigabit-per-second communication lines), and architecturally, the\r | |
2842 | components are organized to scale hierarchically from local area networks\r | |
2843 | to international-scale networks. Such growth is made possible by\r | |
2844 | building layers of communication protocols, as BESSER pointed out.\r | |
2845 | By layering both physically and logically, a sense of scalability is\r | |
2846 | maintained from local area networks in offices, across campuses, through\r | |
2847 | bridges, routers, campus backbones, fiber-optic links, etc., up into\r | |
2848 | regional networks and ultimately into national and international\r | |
2849 | networks.\r | |
2850 | \r | |
2851 | LARSEN then illustrated the geometric growth over a two-year period--\r | |
2852 | through September 1991--of the number of networks that comprise the\r | |
2853 | Internet. This growth has been sustained largely by the availability of\r | |
2854 | three basic functions: electronic mail, file transfer (ftp), and remote\r | |
2855 | log-on (telnet). LARSEN also reviewed the growth in the kind of traffic\r | |
2856 | that occurs on the network. Network traffic reflects the joint contributions\r | |
2857 | of a larger population of users and increasing use per user. Today one sees\r | |
2858 | serious applications involving moving images across the network--a rarity\r | |
2859 | ten years ago. LARSEN recalled and concurred with BESSER's main point\r | |
2860 | that the interesting problems occur at the application level.\r | |
2861 | \r | |
2862 | LARSEN then illustrated a model of a library's roles and functions in a\r | |
2863 | network environment. He noted, in particular, the placement of on-line\r | |
2864 | catalogues onto the network and patrons obtaining access to the library\r | |
2865 | increasingly through local networks, campus networks, and the Internet. \r | |
2866 | LARSEN supported LYNCH's earlier suggestion that we need to address\r | |
2867 | fundamental questions of networked information in order to build\r | |
2868 | environments that scale in the information sense as well as in the\r | |
2869 | physical sense.\r | |
2870 | \r | |
2871 | LARSEN supported the role of the library system as the access point into\r | |
2872 | the nation's electronic collections. Implementation of the Z39.50\r | |
2873 | protocol for information retrieval would make such access practical and\r | |
2874 | feasible. For example, this would enable patrons in Maryland to search\r | |
2875 | California libraries, or other libraries around the world that are\r | |
2876 | conformant with Z39.50 in a manner that is familiar to University of\r | |
2877 | Maryland patrons. This client-server model also supports moving beyond\r | |
2878 | secondary content into primary content. (The notion of how one links\r | |
2879 | from secondary content to primary content, LARSEN said, represents a\r | |
2880 | fundamental problem that requires rigorous thought.) After noting\r | |
2881 | numerous network experiments in accessing full-text materials, including\r | |
2882 | projects supporting the ordering of materials across the network, LARSEN\r | |
2883 | revisited the issue of transmitting high-density, high-resolution color\r | |
2884 | images across the network and the large amounts of bandwidth they\r | |
2885 | require. He went on to address the bandwidth and synchronization\r | |
2886 | problems inherent in sending full-motion video across the network.\r | |
2887 | \r | |
2888 | LARSEN illustrated the trade-off between volumes of data in bytes or\r | |
2889 | orders of magnitude and the potential usage of that data. He discussed\r | |
2890 | transmission rates (particularly, the time it takes to move various forms\r | |
2891 | of information), and what one could do with a network supporting\r | |
2892 | multigigabit-per-second transmission. At the moment, the network\r | |
2893 | environment includes a composite of data-transmission requirements,\r | |
2894 | volumes and forms, going from steady to bursty (high-volume) and from\r | |
2895 | very slow to very fast. This aggregate must be considered in the design,\r | |
2896 | construction, and operation of multigigabyte networks.\r | |
2897 | \r | |
2898 | LARSEN's objective is to use the networks and library systems now being\r | |
2899 | constructed to increase access to resources wherever they exist, and\r | |
2900 | thus, to evolve toward an on-line electronic virtual library.\r | |
2901 | \r | |
2902 | LARSEN concluded by offering a snapshot of current trends: continuing\r | |
2903 | geometric growth in network capacity and number of users; slower\r | |
2904 | development of applications; and glacial development and adoption of\r | |
2905 | standards. The challenge is to design and develop each new application\r | |
2906 | system with network access and scalability in mind.\r | |
2907 | \r | |
2908 | ******\r | |
2909 | \r | |
2910 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
2911 | BROWNRIGG * Access to the Internet cannot be taken for granted * Packet\r | |
2912 | radio and the development of MELVYL in 1980-81 in the Division of Library\r | |
2913 | Automation at the University of California * Design criteria for packet\r | |
2914 | radio * A demonstration project in San Diego and future plans * Spread\r | |
2915 | spectrum * Frequencies at which the radios will run and plans to\r | |
2916 | reimplement the WAIS server software in the public domain * Need for an\r | |
2917 | infrastructure of radios that do not move around * \r | |
2918 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
2919 | \r | |
2920 | Edwin BROWNRIGG, executive director, Memex Research Institute, first\r | |
2921 | polled the audience in order to seek out regular users of the Internet as\r | |
2922 | well as those planning to use it some time in the future. With nearly\r | |
2923 | everybody in the room falling into one category or the other, BROWNRIGG\r | |
2924 | made a point re access, namely that numerous individuals, especially those\r | |
2925 | who use the Internet every day, take for granted their access to it, the\r | |
2926 | speeds with which they are connected, and how well it all works. \r | |
2927 | However, as BROWNRIGG discovered between 1987 and 1989 in Australia,\r | |
2928 | if one wants access to the Internet but cannot afford it or has some\r | |
2929 | physical boundary that prevents her or him from gaining access, it can\r | |
2930 | be extremely frustrating. He suggested that because of economics and\r | |
2931 | physical barriers we were beginning to create a world of haves and have-nots\r | |
2932 | in the process of scholarly communication, even in the United States.\r | |
2933 | \r | |
2934 | BROWNRIGG detailed the development of MELVYL in academic year 1980-81 in\r | |
2935 | the Division of Library Automation at the University of California, in\r | |
2936 | order to underscore the issue of access to the system, which at the\r | |
2937 | outset was extremely limited. In short, the project needed to build a\r | |
2938 | network, which at that time entailed use of satellite technology, that is,\r | |
2939 | putting earth stations on campus and also acquiring some terrestrial links\r | |
2940 | from the State of California's microwave system. The installation of\r | |
2941 | satellite links, however, did not solve the problem (which actually\r | |
2942 | formed part of a larger problem involving politics and financial resources).\r | |
2943 | For while the project team could get a signal onto a campus, it had no means\r | |
2944 | of distributing the signal throughout the campus. The solution involved\r | |
2945 | adopting a recent development in wireless communication called packet radio,\r | |
2946 | which combined the basic notion of packet-switching with radio. The project\r | |
2947 | used this technology to get the signal from a point on campus where it\r | |
2948 | came down, an earth station for example, into the libraries, because it\r | |
2949 | found that wiring the libraries, especially the older marble buildings,\r | |
2950 | would cost $2,000-$5,000 per terminal.\r | |
2951 | \r | |
2952 | BROWNRIGG noted that, ten years ago, the project had neither the public\r | |
2953 | policy nor the technology that would have allowed it to use packet radio\r | |
2954 | in any meaningful way. Since then much had changed. He proceeded to\r | |
2955 | detail research and development of the technology, how it is being\r | |
2956 | deployed in California, and what direction he thought it would take.\r | |
2957 | The design criteria are to produce a high-speed, one-time, low-cost,\r | |
2958 | high-quality, secure, license-free device (packet radio) that one can\r | |
2959 | plug in and play today, forget about it, and have access to the Internet. \r | |
2960 | By high speed, BROWNRIGG meant 1 megabyte and 1.5 megabytes. Those units\r | |
2961 | have been built, he continued, and are in the process of being\r | |
2962 | type-certified by an independent underwriting laboratory so that they can\r | |
2963 | be type-licensed by the Federal Communications Commission. As is the\r | |
2964 | case with citizens band, one will be able to purchase a unit and not have\r | |
2965 | to worry about applying for a license.\r | |
2966 | \r | |
2967 | The basic idea, BROWNRIGG elaborated, is to take high-speed radio data\r | |
2968 | transmission and create a backbone network that at certain strategic\r | |
2969 | points in the network will "gateway" into a medium-speed packet radio\r | |
2970 | (i.e., one that runs at 38.4 kilobytes), so that perhaps by 1994-1995\r | |
2971 | people, like those in the audience for the price of a VCR could purchase\r | |
2972 | a medium-speed radio for the office or home, have full network connectivity\r | |
2973 | to the Internet, and partake of all its services, with no need for an FCC\r | |
2974 | license and no regular bill from the local common carrier. BROWNRIGG\r | |
2975 | presented several details of a demonstration project currently taking\r | |
2976 | place in San Diego and described plans, pending funding, to install a\r | |
2977 | full-bore network in the San Francisco area. This network will have 600\r | |
2978 | nodes running at backbone speeds, and 100 of these nodes will be libraries,\r | |
2979 | which in turn will be the gateway ports to the 38.4 kilobyte radios that\r | |
2980 | will give coverage for the neighborhoods surrounding the libraries.\r | |
2981 | \r | |
2982 | BROWNRIGG next explained Part 15.247, a new rule within Title 47 of the\r | |
2983 | Code of Federal Regulations enacted by the FCC in 1985. This rule\r | |
2984 | challenged the industry, which has only now risen to the occasion, to\r | |
2985 | build a radio that would run at no more than one watt of output power and\r | |
2986 | use a fairly exotic method of modulating the radio wave called spread\r | |
2987 | spectrum. Spread spectrum in fact permits the building of networks so\r | |
2988 | that numerous data communications can occur simultaneously, without\r | |
2989 | interfering with each other, within the same wide radio channel.\r | |
2990 | \r | |
2991 | BROWNRIGG explained that the frequencies at which the radios would run\r | |
2992 | are very short wave signals. They are well above standard microwave and\r | |
2993 | radar. With a radio wave that small, one watt becomes a tremendous punch\r | |
2994 | per bit and thus makes transmission at reasonable speed possible. In\r | |
2995 | order to minimize the potential for congestion, the project is\r | |
2996 | undertaking to reimplement software which has been available in the\r | |
2997 | networking business and is taken for granted now, for example, TCP/IP,\r | |
2998 | routing algorithms, bridges, and gateways. In addition, the project\r | |
2999 | plans to take the WAIS server software in the public domain and\r | |
3000 | reimplement it so that one can have a WAIS server on a Mac instead of a\r | |
3001 | Unix machine. The Memex Research Institute believes that libraries, in\r | |
3002 | particular, will want to use the WAIS servers with packet radio. This\r | |
3003 | project, which has a team of about twelve people, will run through 1993\r | |
3004 | and will include the 100 libraries already mentioned as well as other\r | |
3005 | professionals such as those in the medical profession, engineering, and\r | |
3006 | law. Thus, the need is to create an infrastructure of radios that do not\r | |
3007 | move around, which, BROWNRIGG hopes, will solve a problem not only for\r | |
3008 | libraries but for individuals who, by and large today, do not have access\r | |
3009 | to the Internet from their homes and offices.\r | |
3010 | \r | |
3011 | ******\r | |
3012 | \r | |
3013 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
3014 | DISCUSSION * Project operating frequencies *\r | |
3015 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
3016 | \r | |
3017 | During a brief discussion period, which also concluded the day's\r | |
3018 | proceedings, BROWNRIGG stated that the project was operating in four\r | |
3019 | frequencies. The slow speed is operating at 435 megahertz, and it would\r | |
3020 | later go up to 920 megahertz. With the high-speed frequency, the\r | |
3021 | one-megabyte radios will run at 2.4 gigabits, and 1.5 will run at 5.7. \r | |
3022 | At 5.7, rain can be a factor, but it would have to be tropical rain,\r | |
3023 | unlike what falls in most parts of the United States.\r | |
3024 | \r | |
3025 | ******\r | |
3026 | \r | |
3027 | SESSION IV. IMAGE CAPTURE, TEXT CAPTURE, OVERVIEW OF TEXT AND\r | |
3028 | IMAGE STORAGE FORMATS\r | |
3029 | \r | |
3030 | William HOOTON, vice president of operations, I-NET, moderated this session.\r | |
3031 | \r | |
3032 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
3033 | KENNEY * Factors influencing development of CXP * Advantages of using\r | |
3034 | digital technology versus photocopy and microfilm * A primary goal of\r | |
3035 | CXP; publishing challenges * Characteristics of copies printed * Quality\r | |
3036 | of samples achieved in image capture * Several factors to be considered\r | |
3037 | in choosing scanning * Emphasis of CXP on timely and cost-effective\r | |
3038 | production of black-and-white printed facsimiles * Results of producing\r | |
3039 | microfilm from digital files * Advantages of creating microfilm * Details\r | |
3040 | concerning production * Costs * Role of digital technology in library\r | |
3041 | preservation *\r | |
3042 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
3043 | \r | |
3044 | Anne KENNEY, associate director, Department of Preservation and\r | |
3045 | Conservation, Cornell University, opened her talk by observing that the\r | |
3046 | Cornell Xerox Project (CXP) has been guided by the assumption that the\r | |
3047 | ability to produce printed facsimiles or to replace paper with paper\r | |
3048 | would be important, at least for the present generation of users and\r | |
3049 | equipment. She described three factors that influenced development of\r | |
3050 | the project: 1) Because the project has emphasized the preservation of\r | |
3051 | deteriorating brittle books, the quality of what was produced had to be\r | |
3052 | sufficiently high to return a paper replacement to the shelf. CXP was\r | |
3053 | only interested in using: 2) a system that was cost-effective, which\r | |
3054 | meant that it had to be cost-competitive with the processes currently\r | |
3055 | available, principally photocopy and microfilm, and 3) new or currently\r | |
3056 | available product hardware and software.\r | |
3057 | \r | |
3058 | KENNEY described the advantages that using digital technology offers over\r | |
3059 | both photocopy and microfilm: 1) The potential exists to create a higher\r | |
3060 | quality reproduction of a deteriorating original than conventional\r | |
3061 | light-lens technology. 2) Because a digital image is an encoded\r | |
3062 | representation, it can be reproduced again and again with no resulting\r | |
3063 | loss of quality, as opposed to the situation with light-lens processes,\r | |
3064 | in which there is discernible difference between a second and a\r | |
3065 | subsequent generation of an image. 3) A digital image can be manipulated\r | |
3066 | in a number of ways to improve image capture; for example, Xerox has\r | |
3067 | developed a windowing application that enables one to capture a page\r | |
3068 | containing both text and illustrations in a manner that optimizes the\r | |
3069 | reproduction of both. (With light-lens technology, one must choose which\r | |
3070 | to optimize, text or the illustration; in preservation microfilming, the\r | |
3071 | current practice is to shoot an illustrated page twice, once to highlight\r | |
3072 | the text and the second time to provide the best capture for the\r | |
3073 | illustration.) 4) A digital image can also be edited, density levels\r | |
3074 | adjusted to remove underlining and stains, and to increase legibility for\r | |
3075 | faint documents. 5) On-screen inspection can take place at the time of\r | |
3076 | initial setup and adjustments made prior to scanning, factors that\r | |
3077 | substantially reduce the number of retakes required in quality control.\r | |
3078 | \r | |
3079 | A primary goal of CXP has been to evaluate the paper output printed on\r | |
3080 | the Xerox DocuTech, a high-speed printer that produces 600-dpi pages from\r | |
3081 | scanned images at a rate of 135 pages a minute. KENNEY recounted several\r | |
3082 | publishing challenges to represent faithful and legible reproductions of\r | |
3083 | the originals that the 600-dpi copy for the most part successfully\r | |
3084 | captured. For example, many of the deteriorating volumes in the project\r | |
3085 | were heavily illustrated with fine line drawings or halftones or came in\r | |
3086 | languages such as Japanese, in which the buildup of characters comprised\r | |
3087 | of varying strokes is difficult to reproduce at lower resolutions; a\r | |
3088 | surprising number of them came with annotations and mathematical\r | |
3089 | formulas, which it was critical to be able to duplicate exactly.\r | |
3090 | \r | |
3091 | KENNEY noted that 1) the copies are being printed on paper that meets the\r | |
3092 | ANSI standards for performance, 2) the DocuTech printer meets the machine\r | |
3093 | and toner requirements for proper adhesion of print to page, as described\r | |
3094 | by the National Archives, and thus 3) paper product is considered to be\r | |
3095 | the archival equivalent of preservation photocopy.\r | |
3096 | \r | |
3097 | KENNEY then discussed several samples of the quality achieved in the\r | |
3098 | project that had been distributed in a handout, for example, a copy of a\r | |
3099 | print-on-demand version of the 1911 Reed lecture on the steam turbine,\r | |
3100 | which contains halftones, line drawings, and illustrations embedded in\r | |
3101 | text; the first four loose pages in the volume compared the capture\r | |
3102 | capabilities of scanning to photocopy for a standard test target, the\r | |
3103 | IEEE standard 167A 1987 test chart. In all instances scanning proved\r | |
3104 | superior to photocopy, though only slightly more so in one.\r | |
3105 | \r | |
3106 | Conceding the simplistic nature of her review of the quality of scanning\r | |
3107 | to photocopy, KENNEY described it as one representation of the kinds of\r | |
3108 | settings that could be used with scanning capabilities on the equipment\r | |
3109 | CXP uses. KENNEY also pointed out that CXP investigated the quality\r | |
3110 | achieved with binary scanning only, and noted the great promise in gray\r | |
3111 | scale and color scanning, whose advantages and disadvantages need to be\r | |
3112 | examined. She argued further that scanning resolutions and file formats\r | |
3113 | can represent a complex trade-off between the time it takes to capture\r | |
3114 | material, file size, fidelity to the original, and on-screen display; and\r | |
3115 | printing and equipment availability. All these factors must be taken\r | |
3116 | into consideration.\r | |
3117 | \r | |
3118 | CXP placed primary emphasis on the production in a timely and\r | |
3119 | cost-effective manner of printed facsimiles that consisted largely of\r | |
3120 | black-and-white text. With binary scanning, large files may be\r | |
3121 | compressed efficiently and in a lossless manner (i.e., no data is lost in\r | |
3122 | the process of compressing [and decompressing] an image--the exact\r | |
3123 | bit-representation is maintained) using Group 4 CCITT (i.e., the French\r | |
3124 | acronym for International Consultative Committee for Telegraph and\r | |
3125 | Telephone) compression. CXP was getting compression ratios of about\r | |
3126 | forty to one. Gray-scale compression, which primarily uses JPEG, is much\r | |
3127 | less economical and can represent a lossy compression (i.e., not\r | |
3128 | lossless), so that as one compresses and decompresses, the illustration\r | |
3129 | is subtly changed. While binary files produce a high-quality printed\r | |
3130 | version, it appears 1) that other combinations of spatial resolution with\r | |
3131 | gray and/or color hold great promise as well, and 2) that gray scale can\r | |
3132 | represent a tremendous advantage for on-screen viewing. The quality\r | |
3133 | associated with binary and gray scale also depends on the equipment used. \r | |
3134 | For instance, binary scanning produces a much better copy on a binary\r | |
3135 | printer.\r | |
3136 | \r | |
3137 | Among CXP's findings concerning the production of microfilm from digital\r | |
3138 | files, KENNEY reported that the digital files for the same Reed lecture\r | |
3139 | were used to produce sample film using an electron beam recorder. The\r | |
3140 | resulting film was faithful to the image capture of the digital files,\r | |
3141 | and while CXP felt that the text and image pages represented in the Reed\r | |
3142 | lecture were superior to that of the light-lens film, the resolution\r | |
3143 | readings for the 600 dpi were not as high as standard microfilming. \r | |
3144 | KENNEY argued that the standards defined for light-lens technology are\r | |
3145 | not totally transferable to a digital environment. Moreover, they are\r | |
3146 | based on definition of quality for a preservation copy. Although making\r | |
3147 | this case will prove to be a long, uphill struggle, CXP plans to continue\r | |
3148 | to investigate the issue over the course of the next year.\r | |
3149 | \r | |
3150 | KENNEY concluded this portion of her talk with a discussion of the\r | |
3151 | advantages of creating film: it can serve as a primary backup and as a\r | |
3152 | preservation master to the digital file; it could then become the print\r | |
3153 | or production master and service copies could be paper, film, optical\r | |
3154 | disks, magnetic media, or on-screen display.\r | |
3155 | \r | |
3156 | Finally, KENNEY presented details re production:\r | |
3157 | \r | |
3158 | * Development and testing of a moderately-high resolution production\r | |
3159 | scanning workstation represented a third goal of CXP; to date, 1,000\r | |
3160 | volumes have been scanned, or about 300,000 images.\r | |
3161 | \r | |
3162 | * The resulting digital files are stored and used to produce\r | |
3163 | hard-copy replacements for the originals and additional prints on\r | |
3164 | demand; although the initial costs are high, scanning technology\r | |
3165 | offers an affordable means for reformatting brittle material.\r | |
3166 | \r | |
3167 | * A technician in production mode can scan 300 pages per hour when\r | |
3168 | performing single-sheet scanning, which is a necessity when working\r | |
3169 | with truly brittle paper; this figure is expected to increase\r | |
3170 | significantly with subsequent iterations of the software from Xerox;\r | |
3171 | a three-month time-and-cost study of scanning found that the average\r | |
3172 | 300-page book would take about an hour and forty minutes to scan\r | |
3173 | (this figure included the time for setup, which involves keying in\r | |
3174 | primary bibliographic data, going into quality control mode to\r | |
3175 | define page size, establishing front-to-back registration, and\r | |
3176 | scanning sample pages to identify a default range of settings for\r | |
3177 | the entire book--functions not dissimilar to those performed by\r | |
3178 | filmers or those preparing a book for photocopy).\r | |
3179 | \r | |
3180 | * The final step in the scanning process involved rescans, which\r | |
3181 | happily were few and far between, representing well under 1 percent\r | |
3182 | of the total pages scanned.\r | |
3183 | \r | |
3184 | In addition to technician time, CXP costed out equipment, amortized over\r | |
3185 | four years, the cost of storing and refreshing the digital files every\r | |
3186 | four years, and the cost of printing and binding, book-cloth binding, a\r | |
3187 | paper reproduction. The total amounted to a little under $65 per single\r | |
3188 | 300-page volume, with 30 percent overhead included--a figure competitive\r | |
3189 | with the prices currently charged by photocopy vendors.\r | |
3190 | \r | |
3191 | Of course, with scanning, in addition to the paper facsimile, one is left\r | |
3192 | with a digital file from which subsequent copies of the book can be\r | |
3193 | produced for a fraction of the cost of photocopy, with readers afforded\r | |
3194 | choices in the form of these copies.\r | |
3195 | \r | |
3196 | KENNEY concluded that digital technology offers an electronic means for a\r | |
3197 | library preservation effort to pay for itself. If a brittle-book program\r | |
3198 | included the means of disseminating reprints of books that are in demand\r | |
3199 | by libraries and researchers alike, the initial investment in capture\r | |
3200 | could be recovered and used to preserve additional but less popular\r | |
3201 | books. She disclosed that an economic model for a self-sustaining\r | |
3202 | program could be developed for CXP's report to the Commission on\r | |
3203 | Preservation and Access (CPA).\r | |
3204 | \r | |
3205 | KENNEY stressed that the focus of CXP has been on obtaining high quality\r | |
3206 | in a production environment. The use of digital technology is viewed as\r | |
3207 | an affordable alternative to other reformatting options.\r | |
3208 | \r | |
3209 | ******\r | |
3210 | \r | |
3211 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
3212 | ANDRE * Overview and history of NATDP * Various agricultural CD-ROM\r | |
3213 | products created inhouse and by service bureaus * Pilot project on\r | |
3214 | Internet transmission * Additional products in progress *\r | |
3215 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
3216 | \r | |
3217 | Pamela ANDRE, associate director for automation, National Agricultural\r | |
3218 | Text Digitizing Program (NATDP), National Agricultural Library (NAL),\r | |
3219 | presented an overview of NATDP, which has been underway at NAL the last\r | |
3220 | four years, before Judith ZIDAR discussed the technical details. ANDRE\r | |
3221 | defined agricultural information as a broad range of material going from\r | |
3222 | basic and applied research in the hard sciences to the one-page pamphlets\r | |
3223 | that are distributed by the cooperative state extension services on such\r | |
3224 | things as how to grow blueberries.\r | |
3225 | \r | |
3226 | NATDP began in late 1986 with a meeting of representatives from the\r | |
3227 | land-grant library community to deal with the issue of electronic\r | |
3228 | information. NAL and forty-five of these libraries banded together to\r | |
3229 | establish this project--to evaluate the technology for converting what\r | |
3230 | were then source documents in paper form into electronic form, to provide\r | |
3231 | access to that digital information, and then to distribute it. \r | |
3232 | Distributing that material to the community--the university community as\r | |
3233 | well as the extension service community, potentially down to the county\r | |
3234 | level--constituted the group's chief concern.\r | |
3235 | \r | |
3236 | Since January 1988 (when the microcomputer-based scanning system was\r | |
3237 | installed at NAL), NATDP has done a variety of things, concerning which\r | |
3238 | ZIDAR would provide further details. For example, the first technology\r | |
3239 | considered in the project's discussion phase was digital videodisc, which\r | |
3240 | indicates how long ago it was conceived.\r | |
3241 | \r | |
3242 | Over the four years of this project, four separate CD-ROM products on\r | |
3243 | four different agricultural topics were created, two at a\r | |
3244 | scanning-and-OCR station installed at NAL, and two by service bureaus. \r | |
3245 | Thus, NATDP has gained comparative information in terms of those relative\r | |
3246 | costs. Each of these products contained the full ASCII text as well as\r | |
3247 | page images of the material, or between 4,000 and 6,000 pages of material\r | |
3248 | on these disks. Topics included aquaculture, food, agriculture and\r | |
3249 | science (i.e., international agriculture and research), acid rain, and\r | |
3250 | Agent Orange, which was the final product distributed (approximately\r | |
3251 | eighteen months before the Workshop).\r | |
3252 | \r | |
3253 | The third phase of NATDP focused on delivery mechanisms other than\r | |
3254 | CD-ROM. At the suggestion of Clifford LYNCH, who was a technical\r | |
3255 | consultant to the project at this point, NATDP became involved with the\r | |
3256 | Internet and initiated a project with the help of North Carolina State\r | |
3257 | University, in which fourteen of the land-grant university libraries are\r | |
3258 | transmitting digital images over the Internet in response to interlibrary\r | |
3259 | loan requests--a topic for another meeting. At this point, the pilot\r | |
3260 | project had been completed for about a year and the final report would be\r | |
3261 | available shortly after the Workshop. In the meantime, the project's\r | |
3262 | success had led to its extension. (ANDRE noted that one of the first\r | |
3263 | things done under the program title was to select a retrieval package to\r | |
3264 | use with subsequent products; Windows Personal Librarian was the package\r | |
3265 | of choice after a lengthy evaluation.)\r | |
3266 | \r | |
3267 | Three additional products had been planned and were in progress:\r | |
3268 | \r | |
3269 | 1) An arrangement with the American Society of Agronomy--a\r | |
3270 | professional society that has published the Agronomy Journal since\r | |
3271 | about 1908--to scan and create bit-mapped images of its journal. \r | |
3272 | ASA granted permission first to put and then to distribute this\r | |
3273 | material in electronic form, to hold it at NAL, and to use these\r | |
3274 | electronic images as a mechanism to deliver documents or print out\r | |
3275 | material for patrons, among other uses. Effectively, NAL has the\r | |
3276 | right to use this material in support of its program. \r | |
3277 | (Significantly, this arrangement offers a potential cooperative\r | |
3278 | model for working with other professional societies in agriculture\r | |
3279 | to try to do the same thing--put the journals of particular interest\r | |
3280 | to agriculture research into electronic form.)\r | |
3281 | \r | |
3282 | 2) An extension of the earlier product on aquaculture.\r | |
3283 | \r | |
3284 | 3) The George Washington Carver Papers--a joint project with\r | |
3285 | Tuskegee University to scan and convert from microfilm some 3,500\r | |
3286 | images of Carver's papers, letters, and drawings.\r | |
3287 | \r | |
3288 | It was anticipated that all of these products would appear no more than\r | |
3289 | six months after the Workshop.\r | |
3290 | \r | |
3291 | ******\r | |
3292 | \r | |
3293 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
3294 | ZIDAR * (A separate arena for scanning) * Steps in creating a database *\r | |
3295 | Image capture, with and without performing OCR * Keying in tracking data\r | |
3296 | * Scanning, with electronic and manual tracking * Adjustments during\r | |
3297 | scanning process * Scanning resolutions * Compression * De-skewing and\r | |
3298 | filtering * Image capture from microform: the papers and letters of\r | |
3299 | George Washington Carver * Equipment used for a scanning system * \r | |
3300 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
3301 | \r | |
3302 | Judith ZIDAR, coordinator, National Agricultural Text Digitizing Program\r | |
3303 | (NATDP), National Agricultural Library (NAL), illustrated the technical\r | |
3304 | details of NATDP, including her primary responsibility, scanning and\r | |
3305 | creating databases on a topic and putting them on CD-ROM.\r | |
3306 | \r | |
3307 | (ZIDAR remarked a separate arena from the CD-ROM projects, although the\r | |
3308 | processing of the material is nearly identical, in which NATDP is also\r | |
3309 | scanning material and loading it on a Next microcomputer, which in turn\r | |
3310 | is linked to NAL's integrated library system. Thus, searches in NAL's\r | |
3311 | bibliographic database will enable people to pull up actual page images\r | |
3312 | and text for any documents that have been entered.)\r | |
3313 | \r | |
3314 | In accordance with the session's topic, ZIDAR focused her illustrated\r | |
3315 | talk on image capture, offering a primer on the three main steps in the\r | |
3316 | process: 1) assemble the printed publications; 2) design the database\r | |
3317 | (database design occurs in the process of preparing the material for\r | |
3318 | scanning; this step entails reviewing and organizing the material,\r | |
3319 | defining the contents--what will constitute a record, what kinds of\r | |
3320 | fields will be captured in terms of author, title, etc.); 3) perform a\r | |
3321 | certain amount of markup on the paper publications. NAL performs this\r | |
3322 | task record by record, preparing work sheets or some other sort of\r | |
3323 | tracking material and designing descriptors and other enhancements to be\r | |
3324 | added to the data that will not be captured from the printed publication. \r | |
3325 | Part of this process also involves determining NATDP's file and directory\r | |
3326 | structure: NATDP attempts to avoid putting more than approximately 100\r | |
3327 | images in a directory, because placing more than that on a CD-ROM would\r | |
3328 | reduce the access speed.\r | |
3329 | \r | |
3330 | This up-front process takes approximately two weeks for a\r | |
3331 | 6,000-7,000-page database. The next step is to capture the page images. \r | |
3332 | How long this process takes is determined by the decision whether or not\r | |
3333 | to perform OCR. Not performing OCR speeds the process, whereas text\r | |
3334 | capture requires greater care because of the quality of the image: it\r | |
3335 | has to be straighter and allowance must be made for text on a page, not\r | |
3336 | just for the capture of photographs.\r | |
3337 | \r | |
3338 | NATDP keys in tracking data, that is, a standard bibliographic record\r | |
3339 | including the title of the book and the title of the chapter, which will\r | |
3340 | later either become the access information or will be attached to the\r | |
3341 | front of a full-text record so that it is searchable.\r | |
3342 | \r | |
3343 | Images are scanned from a bound or unbound publication, chiefly from\r | |
3344 | bound publications in the case of NATDP, however, because often they are\r | |
3345 | the only copies and the publications are returned to the shelves. NATDP\r | |
3346 | usually scans one record at a time, because its database tracking system\r | |
3347 | tracks the document in that way and does not require further logical\r | |
3348 | separating of the images. After performing optical character\r | |
3349 | recognition, NATDP moves the images off the hard disk and maintains a\r | |
3350 | volume sheet. Though the system tracks electronically, all the\r | |
3351 | processing steps are also tracked manually with a log sheet.\r | |
3352 | \r | |
3353 | ZIDAR next illustrated the kinds of adjustments that one can make when\r | |
3354 | scanning from paper and microfilm, for example, redoing images that need\r | |
3355 | special handling, setting for dithering or gray scale, and adjusting for\r | |
3356 | brightness or for the whole book at one time.\r | |
3357 | \r | |
3358 | NATDP is scanning at 300 dots per inch, a standard scanning resolution. \r | |
3359 | Though adequate for capturing text that is all of a standard size, 300\r | |
3360 | dpi is unsuitable for any kind of photographic material or for very small\r | |
3361 | text. Many scanners allow for different image formats, TIFF, of course,\r | |
3362 | being a de facto standard. But if one intends to exchange images with\r | |
3363 | other people, the ability to scan other image formats, even if they are\r | |
3364 | less common, becomes highly desirable.\r | |
3365 | \r | |
3366 | CCITT Group 4 is the standard compression for normal black-and-white\r | |
3367 | images, JPEG for gray scale or color. ZIDAR recommended 1) using the\r | |
3368 | standard compressions, particularly if one attempts to make material\r | |
3369 | available and to allow users to download images and reuse them from\r | |
3370 | CD-ROMs; and 2) maintaining the ability to output an uncompressed image,\r | |
3371 | because in image exchange uncompressed images are more likely to be able\r | |
3372 | to cross platforms.\r | |
3373 | \r | |
3374 | ZIDAR emphasized the importance of de-skewing and filtering as\r | |
3375 | requirements on NATDP's upgraded system. For instance, scanning bound\r | |
3376 | books, particularly books published by the federal government whose pages\r | |
3377 | are skewed, and trying to scan them straight if OCR is to be performed,\r | |
3378 | is extremely time-consuming. The same holds for filtering of\r | |
3379 | poor-quality or older materials.\r | |
3380 | \r | |
3381 | ZIDAR described image capture from microform, using as an example three\r | |
3382 | reels from a sixty-seven-reel set of the papers and letters of George\r | |
3383 | Washington Carver that had been produced by Tuskegee University. These\r | |
3384 | resulted in approximately 3,500 images, which NATDP had had scanned by\r | |
3385 | its service contractor, Science Applications International Corporation\r | |
3386 | (SAIC). NATDP also created bibliographic records for access. (NATDP did\r | |
3387 | not have such specialized equipment as a microfilm scanner.\r | |
3388 | \r | |
3389 | Unfortunately, the process of scanning from microfilm was not an\r | |
3390 | unqualified success, ZIDAR reported: because microfilm frame sizes vary,\r | |
3391 | occasionally some frames were missed, which without spending much time\r | |
3392 | and money could not be recaptured.\r | |
3393 | \r | |
3394 | OCR could not be performed from the scanned images of the frames. The\r | |
3395 | bleeding in the text simply output text, when OCR was run, that could not\r | |
3396 | even be edited. NATDP tested for negative versus positive images,\r | |
3397 | landscape versus portrait orientation, and single- versus dual-page\r | |
3398 | microfilm, none of which seemed to affect the quality of the image; but\r | |
3399 | also on none of them could OCR be performed.\r | |
3400 | \r | |
3401 | In selecting the microfilm they would use, therefore, NATDP had other\r | |
3402 | factors in mind. ZIDAR noted two factors that influenced the quality of\r | |
3403 | the images: 1) the inherent quality of the original and 2) the amount of\r | |
3404 | size reduction on the pages.\r | |
3405 | \r | |
3406 | The Carver papers were selected because they are informative and visually\r | |
3407 | interesting, treat a single subject, and are valuable in their own right. \r | |
3408 | The images were scanned and divided into logical records by SAIC, then\r | |
3409 | delivered, and loaded onto NATDP's system, where bibliographic\r | |
3410 | information taken directly from the images was added. Scanning was\r | |
3411 | completed in summer 1991 and by the end of summer 1992 the disk was\r | |
3412 | scheduled to be published.\r | |
3413 | \r | |
3414 | Problems encountered during processing included the following: Because\r | |
3415 | the microfilm scanning had to be done in a batch, adjustment for\r | |
3416 | individual page variations was not possible. The frame size varied on\r | |
3417 | account of the nature of the material, and therefore some of the frames\r | |
3418 | were missed while others were just partial frames. The only way to go\r | |
3419 | back and capture this material was to print out the page with the\r | |
3420 | microfilm reader from the missing frame and then scan it in from the\r | |
3421 | page, which was extremely time-consuming. The quality of the images\r | |
3422 | scanned from the printout of the microfilm compared unfavorably with that\r | |
3423 | of the original images captured directly from the microfilm. The\r | |
3424 | inability to perform OCR also was a major disappointment. At the time,\r | |
3425 | computer output microfilm was unavailable to test.\r | |
3426 | \r | |
3427 | The equipment used for a scanning system was the last topic addressed by\r | |
3428 | ZIDAR. The type of equipment that one would purchase for a scanning\r | |
3429 | system included: a microcomputer, at least a 386, but preferably a 486;\r | |
3430 | a large hard disk, 380 megabyte at minimum; a multi-tasking operating\r | |
3431 | system that allows one to run some things in batch in the background\r | |
3432 | while scanning or doing text editing, for example, Unix or OS/2 and,\r | |
3433 | theoretically, Windows; a high-speed scanner and scanning software that\r | |
3434 | allows one to make the various adjustments mentioned earlier; a\r | |
3435 | high-resolution monitor (150 dpi ); OCR software and hardware to perform\r | |
3436 | text recognition; an optical disk subsystem on which to archive all the\r | |
3437 | images as the processing is done; file management and tracking software.\r | |
3438 | \r | |
3439 | ZIDAR opined that the software one purchases was more important than the\r | |
3440 | hardware and might also cost more than the hardware, but it was likely to\r | |
3441 | prove critical to the success or failure of one's system. In addition to\r | |
3442 | a stand-alone scanning workstation for image capture, then, text capture\r | |
3443 | requires one or two editing stations networked to this scanning station\r | |
3444 | to perform editing. Editing the text takes two or three times as long as\r | |
3445 | capturing the images.\r | |
3446 | \r | |
3447 | Finally, ZIDAR stressed the importance of buying an open system that allows\r | |
3448 | for more than one vendor, complies with standards, and can be upgraded.\r | |
3449 | \r | |
3450 | ******\r | |
3451 | \r | |
3452 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
3453 | WATERS *Yale University Library's master plan to convert microfilm to\r | |
3454 | digital imagery (POB) * The place of electronic tools in the library of\r | |
3455 | the future * The uses of images and an image library * Primary input from\r | |
3456 | preservation microfilm * Features distinguishing POB from CXP and key\r | |
3457 | hypotheses guiding POB * Use of vendor selection process to facilitate\r | |
3458 | organizational work * Criteria for selecting vendor * Finalists and\r | |
3459 | results of process for Yale * Key factor distinguishing vendors *\r | |
3460 | Components, design principles, and some estimated costs of POB * Role of\r | |
3461 | preservation materials in developing imaging market * Factors affecting\r | |
3462 | quality and cost * Factors affecting the usability of complex documents\r | |
3463 | in image form * \r | |
3464 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
3465 | \r | |
3466 | Donald WATERS, head of the Systems Office, Yale University Library,\r | |
3467 | reported on the progress of a master plan for a project at Yale to\r | |
3468 | convert microfilm to digital imagery, Project Open Book (POB). Stating\r | |
3469 | that POB was in an advanced stage of planning, WATERS detailed, in\r | |
3470 | particular, the process of selecting a vendor partner and several key\r | |
3471 | issues under discussion as Yale prepares to move into the project itself. \r | |
3472 | He commented first on the vision that serves as the context of POB and\r | |
3473 | then described its purpose and scope.\r | |
3474 | \r | |
3475 | WATERS sees the library of the future not necessarily as an electronic\r | |
3476 | library but as a place that generates, preserves, and improves for its\r | |
3477 | clients ready access to both intellectual and physical recorded\r | |
3478 | knowledge. Electronic tools must find a place in the library in the\r | |
3479 | context of this vision. Several roles for electronic tools include\r | |
3480 | serving as: indirect sources of electronic knowledge or as "finding"\r | |
3481 | aids (the on-line catalogues, the article-level indices, registers for\r | |
3482 | documents and archives); direct sources of recorded knowledge; full-text\r | |
3483 | images; and various kinds of compound sources of recorded knowledge (the\r | |
3484 | so-called compound documents of Hypertext, mixed text and image,\r | |
3485 | mixed-text image format, and multimedia).\r | |
3486 | \r | |
3487 | POB is looking particularly at images and an image library, the uses to\r | |
3488 | which images will be put (e.g., storage, printing, browsing, and then use\r | |
3489 | as input for other processes), OCR as a subsequent process to image\r | |
3490 | capture, or creating an image library, and also possibly generating\r | |
3491 | microfilm.\r | |
3492 | \r | |
3493 | While input will come from a variety of sources, POB is considering\r | |
3494 | especially input from preservation microfilm. A possible outcome is that\r | |
3495 | the film and paper which provide the input for the image library\r | |
3496 | eventually may go off into remote storage, and that the image library may\r | |
3497 | be the primary access tool.\r | |
3498 | \r | |
3499 | The purpose and scope of POB focus on imaging. Though related to CXP,\r | |
3500 | POB has two features which distinguish it: 1) scale--conversion of\r | |
3501 | 10,000 volumes into digital image form; and 2) source--conversion from\r | |
3502 | microfilm. Given these features, several key working hypotheses guide\r | |
3503 | POB, including: 1) Since POB is using microfilm, it is not concerned with\r | |
3504 | the image library as a preservation medium. 2) Digital imagery can improve\r | |
3505 | access to recorded knowledge through printing and network distribution at\r | |
3506 | a modest incremental cost of microfilm. 3) Capturing and storing documents\r | |
3507 | in a digital image form is necessary to further improvements in access.\r | |
3508 | (POB distinguishes between the imaging, digitizing process and OCR,\r | |
3509 | which at this stage it does not plan to perform.)\r | |
3510 | \r | |
3511 | Currently in its first or organizational phase, POB found that it could\r | |
3512 | use a vendor selection process to facilitate a good deal of the\r | |
3513 | organizational work (e.g., creating a project team and advisory board,\r | |
3514 | confirming the validity of the plan, establishing the cost of the project\r | |
3515 | and a budget, selecting the materials to convert, and then raising the\r | |
3516 | necessary funds).\r | |
3517 | \r | |
3518 | POB developed numerous selection criteria, including: a firm committed\r | |
3519 | to image-document management, the ability to serve as systems integrator\r | |
3520 | in a large-scale project over several years, interest in developing the\r | |
3521 | requisite software as a standard rather than a custom product, and a\r | |
3522 | willingness to invest substantial resources in the project itself.\r | |
3523 | \r | |
3524 | Two vendors, DEC and Xerox, were selected as finalists in October 1991,\r | |
3525 | and with the support of the Commission on Preservation and Access, each\r | |
3526 | was commissioned to generate a detailed requirements analysis for the\r | |
3527 | project and then to submit a formal proposal for the completion of the\r | |
3528 | project, which included a budget and costs. The terms were that POB would\r | |
3529 | pay the loser. The results for Yale of involving a vendor included: \r | |
3530 | broad involvement of Yale staff across the board at a relatively low\r | |
3531 | cost, which may have long-term significance in carrying out the project\r | |
3532 | (twenty-five to thirty university people are engaged in POB); better\r | |
3533 | understanding of the factors that affect corporate response to markets\r | |
3534 | for imaging products; a competitive proposal; and a more sophisticated\r | |
3535 | view of the imaging markets.\r | |
3536 | \r | |
3537 | The most important factor that distinguished the vendors under\r | |
3538 | consideration was their identification with the customer. The size and\r | |
3539 | internal complexity of the company also was an important factor. POB was\r | |
3540 | looking at large companies that had substantial resources. In the end,\r | |
3541 | the process generated for Yale two competitive proposals, with Xerox's\r | |
3542 | the clear winner. WATERS then described the components of the proposal,\r | |
3543 | the design principles, and some of the costs estimated for the process.\r | |
3544 | \r | |
3545 | Components are essentially four: a conversion subsystem, a\r | |
3546 | network-accessible storage subsystem for 10,000 books (and POB expects\r | |
3547 | 200 to 600 dpi storage), browsing stations distributed on the campus\r | |
3548 | network, and network access to the image printers.\r | |
3549 | \r | |
3550 | Among the design principles, POB wanted conversion at the highest\r | |
3551 | possible resolution. Assuming TIFF files, TIFF files with Group 4\r | |
3552 | compression, TCP/IP, and ethernet network on campus, POB wanted a\r | |
3553 | client-server approach with image documents distributed to the\r | |
3554 | workstations and made accessible through native workstation interfaces\r | |
3555 | such as Windows. POB also insisted on a phased approach to\r | |
3556 | implementation: 1) a stand-alone, single-user, low-cost entry into the\r | |
3557 | business with a workstation focused on conversion and allowing POB to\r | |
3558 | explore user access; 2) movement into a higher-volume conversion with\r | |
3559 | network-accessible storage and multiple access stations; and 3) a\r | |
3560 | high-volume conversion, full-capacity storage, and multiple browsing\r | |
3561 | stations distributed throughout the campus.\r | |
3562 | \r | |
3563 | The costs proposed for start-up assumed the existence of the Yale network\r | |
3564 | and its two DocuTech image printers. Other start-up costs are estimated\r | |
3565 | at $1 million over the three phases. At the end of the project, the annual\r | |
3566 | operating costs estimated primarily for the software and hardware proposed\r | |
3567 | come to about $60,000, but these exclude costs for labor needed in the\r | |
3568 | conversion process, network and printer usage, and facilities management.\r | |
3569 | \r | |
3570 | Finally, the selection process produced for Yale a more sophisticated\r | |
3571 | view of the imaging markets: the management of complex documents in\r | |
3572 | image form is not a preservation problem, not a library problem, but a\r | |
3573 | general problem in a broad, general industry. Preservation materials are\r | |
3574 | useful for developing that market because of the qualities of the\r | |
3575 | material. For example, much of it is out of copyright. The resolution\r | |
3576 | of key issues such as the quality of scanning and image browsing also\r | |
3577 | will affect development of that market.\r | |
3578 | \r | |
3579 | The technology is readily available but changing rapidly. In this\r | |
3580 | context of rapid change, several factors affect quality and cost, to\r | |
3581 | which POB intends to pay particular attention, for example, the various\r | |
3582 | levels of resolution that can be achieved. POB believes it can bring\r | |
3583 | resolution up to 600 dpi, but an interpolation process from 400 to 600 is\r | |
3584 | more likely. The variation quality in microfilm will prove to be a\r | |
3585 | highly important factor. POB may reexamine the standards used to film in\r | |
3586 | the first place by looking at this process as a follow-on to microfilming.\r | |
3587 | \r | |
3588 | Other important factors include: the techniques available to the\r | |
3589 | operator for handling material, the ways of integrating quality control\r | |
3590 | into the digitizing work flow, and a work flow that includes indexing and\r | |
3591 | storage. POB's requirement was to be able to deal with quality control\r | |
3592 | at the point of scanning. Thus, thanks to Xerox, POB anticipates having\r | |
3593 | a mechanism which will allow it not only to scan in batch form, but to\r | |
3594 | review the material as it goes through the scanner and control quality\r | |
3595 | from the outset.\r | |
3596 | \r | |
3597 | The standards for measuring quality and costs depend greatly on the uses\r | |
3598 | of the material, including subsequent OCR, storage, printing, and\r | |
3599 | browsing. But especially at issue for POB is the facility for browsing. \r | |
3600 | This facility, WATERS said, is perhaps the weakest aspect of imaging\r | |
3601 | technology and the most in need of development.\r | |
3602 | \r | |
3603 | A variety of factors affect the usability of complex documents in image\r | |
3604 | form, among them: 1) the ability of the system to handle the full range\r | |
3605 | of document types, not just monographs but serials, multi-part\r | |
3606 | monographs, and manuscripts; 2) the location of the database of record\r | |
3607 | for bibliographic information about the image document, which POB wants\r | |
3608 | to enter once and in the most useful place, the on-line catalog; 3) a\r | |
3609 | document identifier for referencing the bibliographic information in one\r | |
3610 | place and the images in another; 4) the technique for making the basic\r | |
3611 | internal structure of the document accessible to the reader; and finally,\r | |
3612 | 5) the physical presentation on the CRT of those documents. POB is ready\r | |
3613 | to complete this phase now. One last decision involves deciding which\r | |
3614 | material to scan.\r | |
3615 | \r | |
3616 | ******\r | |
3617 | \r | |
3618 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
3619 | DISCUSSION * TIFF files constitute de facto standard * NARA's experience\r | |
3620 | with image conversion software and text conversion * RFC 1314 *\r | |
3621 | Considerable flux concerning available hardware and software solutions *\r | |
3622 | NAL through-put rate during scanning * Window management questions *\r | |
3623 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
3624 | \r | |
3625 | In the question-and-answer period that followed WATERS's presentation,\r | |
3626 | the following points emerged:\r | |
3627 | \r | |
3628 | * ZIDAR's statement about using TIFF files as a standard meant de\r | |
3629 | facto standard. This is what most people use and typically exchange\r | |
3630 | with other groups, across platforms, or even occasionally across\r | |
3631 | display software.\r | |
3632 | \r | |
3633 | * HOLMES commented on the unsuccessful experience of NARA in\r | |
3634 | attempting to run image-conversion software or to exchange between\r | |
3635 | applications: What are supposedly TIFF files go into other software\r | |
3636 | that is supposed to be able to accept TIFF but cannot recognize the\r | |
3637 | format and cannot deal with it, and thus renders the exchange\r | |
3638 | useless. Re text conversion, he noted the different recognition\r | |
3639 | rates obtained by substituting the make and model of scanners in\r | |
3640 | NARA's recent test of an "intelligent" character-recognition product\r | |
3641 | for a new company. In the selection of hardware and software,\r | |
3642 | HOLMES argued, software no longer constitutes the overriding factor\r | |
3643 | it did until about a year ago; rather it is perhaps important to\r | |
3644 | look at both now.\r | |
3645 | \r | |
3646 | * Danny Cohen and Alan Katz of the University of Southern California\r | |
3647 | Information Sciences Institute began circulating as an Internet RFC\r | |
3648 | (RFC 1314) about a month ago a standard for a TIFF interchange\r | |
3649 | format for Internet distribution of monochrome bit-mapped images,\r | |
3650 | which LYNCH said he believed would be used as a de facto standard.\r | |
3651 | \r | |
3652 | * FLEISCHHAUER's impression from hearing these reports and thinking\r | |
3653 | about AM's experience was that there is considerable flux concerning\r | |
3654 | available hardware and software solutions. HOOTON agreed and\r | |
3655 | commented at the same time on ZIDAR's statement that the equipment\r | |
3656 | employed affects the results produced. One cannot draw a complete\r | |
3657 | conclusion by saying it is difficult or impossible to perform OCR\r | |
3658 | from scanning microfilm, for example, with that device, that set of\r | |
3659 | parameters, and system requirements, because numerous other people\r | |
3660 | are accomplishing just that, using other components, perhaps. \r | |
3661 | HOOTON opined that both the hardware and the software were highly\r | |
3662 | important. Most of the problems discussed today have been solved in\r | |
3663 | numerous different ways by other people. Though it is good to be\r | |
3664 | cognizant of various experiences, this is not to say that it will\r | |
3665 | always be thus.\r | |
3666 | \r | |
3667 | * At NAL, the through-put rate of the scanning process for paper,\r | |
3668 | page by page, performing OCR, ranges from 300 to 600 pages per day;\r | |
3669 | not performing OCR is considerably faster, although how much faster\r | |
3670 | is not known. This is for scanning from bound books, which is much\r | |
3671 | slower.\r | |
3672 | \r | |
3673 | * WATERS commented on window management questions: DEC proposed an\r | |
3674 | X-Windows solution which was problematical for two reasons. One was\r | |
3675 | POB's requirement to be able to manipulate images on the workstation\r | |
3676 | and bring them down to the workstation itself and the other was\r | |
3677 | network usage.\r | |
3678 | \r | |
3679 | ******\r | |
3680 | \r | |
3681 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
3682 | THOMA * Illustration of deficiencies in scanning and storage process *\r | |
3683 | Image quality in this process * Different costs entailed by better image\r | |
3684 | quality * Techniques for overcoming various de-ficiencies: fixed\r | |
3685 | thresholding, dynamic thresholding, dithering, image merge * Page edge\r | |
3686 | effects * \r | |
3687 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
3688 | \r | |
3689 | George THOMA, chief, Communications Engineering Branch, National Library\r | |
3690 | of Medicine (NLM), illustrated several of the deficiencies discussed by\r | |
3691 | the previous speakers. He introduced the topic of special problems by\r | |
3692 | noting the advantages of electronic imaging. For example, it is regenerable\r | |
3693 | because it is a coded file, and real-time quality control is possible with\r | |
3694 | electronic capture, whereas in photographic capture it is not.\r | |
3695 | \r | |
3696 | One of the difficulties discussed in the scanning and storage process was\r | |
3697 | image quality which, without belaboring the obvious, means different\r | |
3698 | things for maps, medical X-rays, or broadcast television. In the case of\r | |
3699 | documents, THOMA said, image quality boils down to legibility of the\r | |
3700 | textual parts, and fidelity in the case of gray or color photo print-type\r | |
3701 | material. Legibility boils down to scan density, the standard in most\r | |
3702 | cases being 300 dpi. Increasing the resolution with scanners that\r | |
3703 | perform 600 or 1200 dpi, however, comes at a cost.\r | |
3704 | \r | |
3705 | Better image quality entails at least four different kinds of costs: 1)\r | |
3706 | equipment costs, because the CCD (i.e., charge-couple device) with\r | |
3707 | greater number of elements costs more; 2) time costs that translate to\r | |
3708 | the actual capture costs, because manual labor is involved (the time is\r | |
3709 | also dependent on the fact that more data has to be moved around in the\r | |
3710 | machine in the scanning or network devices that perform the scanning as\r | |
3711 | well as the storage); 3) media costs, because at high resolutions larger\r | |
3712 | files have to be stored; and 4) transmission costs, because there is just\r | |
3713 | more data to be transmitted.\r | |
3714 | \r | |
3715 | But while resolution takes care of the issue of legibility in image\r | |
3716 | quality, other deficiencies have to do with contrast and elements on the\r | |
3717 | page scanned or the image that needed to be removed or clarified. Thus,\r | |
3718 | THOMA proceeded to illustrate various deficiencies, how they are\r | |
3719 | manifested, and several techniques to overcome them.\r | |
3720 | \r | |
3721 | Fixed thresholding was the first technique described, suitable for\r | |
3722 | black-and-white text, when the contrast does not vary over the page. One\r | |
3723 | can have many different threshold levels in scanning devices. Thus,\r | |
3724 | THOMA offered an example of extremely poor contrast, which resulted from\r | |
3725 | the fact that the stock was a heavy red. This is the sort of image that\r | |
3726 | when microfilmed fails to provide any legibility whatsoever. Fixed\r | |
3727 | thresholding is the way to change the black-to-red contrast to the\r | |
3728 | desired black-to-white contrast.\r | |
3729 | \r | |
3730 | Other examples included material that had been browned or yellowed by\r | |
3731 | age. This was also a case of contrast deficiency, and correction was\r | |
3732 | done by fixed thresholding. A final example boils down to the same\r | |
3733 | thing, slight variability, but it is not significant. Fixed thresholding\r | |
3734 | solves this problem as well. The microfilm equivalent is certainly legible,\r | |
3735 | but it comes with dark areas. Though THOMA did not have a slide of the\r | |
3736 | microfilm in this case, he did show the reproduced electronic image.\r | |
3737 | \r | |
3738 | When one has variable contrast over a page or the lighting over the page\r | |
3739 | area varies, especially in the case where a bound volume has light\r | |
3740 | shining on it, the image must be processed by a dynamic thresholding\r | |
3741 | scheme. One scheme, dynamic averaging, allows the threshold level not to\r | |
3742 | be fixed but to be recomputed for every pixel from the neighboring\r | |
3743 | characteristics. The neighbors of a pixel determine where the threshold\r | |
3744 | should be set for that pixel.\r | |
3745 | \r | |
3746 | THOMA showed an example of a page that had been made deficient by a\r | |
3747 | variety of techniques, including a burn mark, coffee stains, and a yellow\r | |
3748 | marker. Application of a fixed-thresholding scheme, THOMA argued, might\r | |
3749 | take care of several deficiencies on the page but not all of them. \r | |
3750 | Performing the calculation for a dynamic threshold setting, however,\r | |
3751 | removes most of the deficiencies so that at least the text is legible.\r | |
3752 | \r | |
3753 | Another problem is representing a gray level with black-and-white pixels\r | |
3754 | by a process known as dithering or electronic screening. But dithering\r | |
3755 | does not provide good image quality for pure black-and-white textual\r | |
3756 | material. THOMA illustrated this point with examples. Although its\r | |
3757 | suitability for photoprint is the reason for electronic screening or\r | |
3758 | dithering, it cannot be used for every compound image. In the document\r | |
3759 | that was distributed by CXP, THOMA noticed that the dithered image of the\r | |
3760 | IEEE test chart evinced some deterioration in the text. He presented an\r | |
3761 | extreme example of deterioration in the text in which compounded\r | |
3762 | documents had to be set right by other techniques. The technique\r | |
3763 | illustrated by the present example was an image merge in which the page\r | |
3764 | is scanned twice and the settings go from fixed threshold to the\r | |
3765 | dithering matrix; the resulting images are merged to give the best\r | |
3766 | results with each technique.\r | |
3767 | \r | |
3768 | THOMA illustrated how dithering is also used in nonphotographic or\r | |
3769 | nonprint materials with an example of a grayish page from a medical text,\r | |
3770 | which was reproduced to show all of the gray that appeared in the\r | |
3771 | original. Dithering provided a reproduction of all the gray in the\r | |
3772 | original of another example from the same text.\r | |
3773 | \r | |
3774 | THOMA finally illustrated the problem of bordering, or page-edge,\r | |
3775 | effects. Books and bound volumes that are placed on a photocopy machine\r | |
3776 | or a scanner produce page-edge effects that are undesirable for two\r | |
3777 | reasons: 1) the aesthetics of the image; after all, if the image is to\r | |
3778 | be preserved, one does not necessarily want to keep all of its\r | |
3779 | deficiencies; 2) compression (with the bordering problem THOMA\r | |
3780 | illustrated, the compression ratio deteriorated tremendously). One way\r | |
3781 | to eliminate this more serious problem is to have the operator at the\r | |
3782 | point of scanning window the part of the image that is desirable and\r | |
3783 | automatically turn all of the pixels out of that picture to white. \r | |
3784 | \r | |
3785 | ******\r | |
3786 | \r | |
3787 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
3788 | FLEISCHHAUER * AM's experience with scanning bound materials * Dithering\r | |
3789 | *\r | |
3790 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
3791 | \r | |
3792 | Carl FLEISCHHAUER, coordinator, American Memory, Library of Congress,\r | |
3793 | reported AM's experience with scanning bound materials, which he likened\r | |
3794 | to the problems involved in using photocopying machines. Very few\r | |
3795 | devices in the industry offer book-edge scanning, let alone book cradles. \r | |
3796 | The problem may be unsolvable, FLEISCHHAUER said, because a large enough\r | |
3797 | market does not exist for a preservation-quality scanner. AM is using a\r | |
3798 | Kurzweil scanner, which is a book-edge scanner now sold by Xerox.\r | |
3799 | \r | |
3800 | Devoting the remainder of his brief presentation to dithering,\r | |
3801 | FLEISCHHAUER related AM's experience with a contractor who was using\r | |
3802 | unsophisticated equipment and software to reduce moire patterns from\r | |
3803 | printed halftones. AM took the same image and used the dithering\r | |
3804 | algorithm that forms part of the same Kurzweil Xerox scanner; it\r | |
3805 | disguised moire patterns much more effectively.\r | |
3806 | \r | |
3807 | FLEISCHHAUER also observed that dithering produces a binary file which is\r | |
3808 | useful for numerous purposes, for example, printing it on a laser printer\r | |
3809 | without having to "re-halftone" it. But it tends to defeat efficient\r | |
3810 | compression, because the very thing that dithers to reduce moire patterns\r | |
3811 | also tends to work against compression schemes. AM thought the\r | |
3812 | difference in image quality was worth it.\r | |
3813 | \r | |
3814 | ******\r | |
3815 | \r | |
3816 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
3817 | DISCUSSION * Relative use as a criterion for POB's selection of books to\r | |
3818 | be converted into digital form *\r | |
3819 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
3820 | \r | |
3821 | During the discussion period, WATERS noted that one of the criteria for\r | |
3822 | selecting books among the 10,000 to be converted into digital image form\r | |
3823 | would be how much relative use they would receive--a subject still\r | |
3824 | requiring evaluation. The challenge will be to understand whether\r | |
3825 | coherent bodies of material will increase usage or whether POB should\r | |
3826 | seek material that is being used, scan that, and make it more accessible. \r | |
3827 | POB might decide to digitize materials that are already heavily used, in\r | |
3828 | order to make them more accessible and decrease wear on them. Another\r | |
3829 | approach would be to provide a large body of intellectually coherent\r | |
3830 | material that may be used more in digital form than it is currently used\r | |
3831 | in microfilm. POB would seek material that was out of copyright.\r | |
3832 | \r | |
3833 | ******\r | |
3834 | \r | |
3835 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
3836 | BARONAS * Origin and scope of AIIM * Types of documents produced in\r | |
3837 | AIIM's standards program * Domain of AIIM's standardization work * AIIM's\r | |
3838 | structure * TC 171 and MS23 * Electronic image management standards *\r | |
3839 | Categories of EIM standardization where AIIM standards are being\r | |
3840 | developed * \r | |
3841 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
3842 | \r | |
3843 | Jean BARONAS, senior manager, Department of Standards and Technology,\r | |
3844 | Association for Information and Image Management (AIIM), described the\r | |
3845 | not-for-profit association and the national and international programs\r | |
3846 | for standardization in which AIIM is active.\r | |
3847 | \r | |
3848 | Accredited for twenty-five years as the nation's standards development\r | |
3849 | organization for document image management, AIIM began life in a library\r | |
3850 | community developing microfilm standards. Today the association\r | |
3851 | maintains both its library and business-image management standardization\r | |
3852 | activities--and has moved into electronic image-management\r | |
3853 | standardization (EIM).\r | |
3854 | \r | |
3855 | BARONAS defined the program's scope. AIIM deals with: 1) the\r | |
3856 | terminology of standards and of the technology it uses; 2) methods of\r | |
3857 | measurement for the systems, as well as quality; 3) methodologies for\r | |
3858 | users to evaluate and measure quality; 4) the features of apparatus used\r | |
3859 | to manage and edit images; and 5) the procedures used to manage images.\r | |
3860 | \r | |
3861 | BARONAS noted that three types of documents are produced in the AIIM\r | |
3862 | standards program: the first two, accredited by the American National\r | |
3863 | Standards Institute (ANSI), are standards and standard recommended\r | |
3864 | practices. Recommended practices differ from standards in that they\r | |
3865 | contain more tutorial information. A technical report is not an ANSI\r | |
3866 | standard. Because AIIM's policies and procedures for developing\r | |
3867 | standards are approved by ANSI, its standards are labeled ANSI/AIIM,\r | |
3868 | followed by the number and title of the standard.\r | |
3869 | \r | |
3870 | BARONAS then illustrated the domain of AIIM's standardization work. For\r | |
3871 | example, AIIM is the administrator of the U.S. Technical Advisory Group\r | |
3872 | (TAG) to the International Standards Organization's (ISO) technical\r | |
3873 | committee, TC l7l Micrographics and Optical Memories for Document and\r | |
3874 | Image Recording, Storage, and Use. AIIM officially works through ANSI in\r | |
3875 | the international standardization process.\r | |
3876 | \r | |
3877 | BARONAS described AIIM's structure, including its board of directors, its\r | |
3878 | standards board of twelve individuals active in the image-management\r | |
3879 | industry, its strategic planning and legal admissibility task forces, and\r | |
3880 | its National Standards Council, which is comprised of the members of a\r | |
3881 | number of organizations who vote on every AIIM standard before it is\r | |
3882 | published. BARONAS pointed out that AIIM's liaisons deal with numerous\r | |
3883 | other standards developers, including the optical disk community, office\r | |
3884 | and publishing systems, image-codes-and-character set committees, and the\r | |
3885 | National Information Standards Organization (NISO).\r | |
3886 | \r | |
3887 | BARONAS illustrated the procedures of TC l7l, which covers all aspects of\r | |
3888 | image management. When AIIM's national program has conceptualized a new\r | |
3889 | project, it is usually submitted to the international level, so that the\r | |
3890 | member countries of TC l7l can simultaneously work on the development of\r | |
3891 | the standard or the technical report. BARONAS also illustrated a classic\r | |
3892 | microfilm standard, MS23, which deals with numerous imaging concepts that\r | |
3893 | apply to electronic imaging. Originally developed in the l970s, revised\r | |
3894 | in the l980s, and revised again in l991, this standard is scheduled for\r | |
3895 | another revision. MS23 is an active standard whereby users may propose\r | |
3896 | new density ranges and new methods of evaluating film images in the\r | |
3897 | standard's revision.\r | |
3898 | \r | |
3899 | BARONAS detailed several electronic image-management standards, for\r | |
3900 | instance, ANSI/AIIM MS44, a quality-control guideline for scanning 8.5"\r | |
3901 | by 11" black-and-white office documents. This standard is used with the\r | |
3902 | IEEE fax image--a continuous tone photographic image with gray scales,\r | |
3903 | text, and several continuous tone pictures--and AIIM test target number\r | |
3904 | 2, a representative document used in office document management.\r | |
3905 | \r | |
3906 | BARONAS next outlined the four categories of EIM standardization in which\r | |
3907 | AIIM standards are being developed: transfer and retrieval, evaluation,\r | |
3908 | optical disc and document scanning applications, and design and\r | |
3909 | conversion of documents. She detailed several of the main projects of\r | |
3910 | each: 1) in the category of image transfer and retrieval, a bi-level\r | |
3911 | image transfer format, ANSI/AIIM MS53, which is a proposed standard that\r | |
3912 | describes a file header for image transfer between unlike systems when\r | |
3913 | the images are compressed using G3 and G4 compression; 2) the category of\r | |
3914 | image evaluation, which includes the AIIM-proposed TR26 tutorial on image\r | |
3915 | resolution (this technical report will treat the differences and\r | |
3916 | similarities between classical or photographic and electronic imaging);\r | |
3917 | 3) design and conversion, which includes a proposed technical report\r | |
3918 | called "Forms Design Optimization for EIM" (this report considers how\r | |
3919 | general-purpose business forms can be best designed so that scanning is\r | |
3920 | optimized; reprographic characteristics such as type, rules, background,\r | |
3921 | tint, and color will likewise be treated in the technical report); 4)\r | |
3922 | disk and document scanning applications includes a project a) on planning\r | |
3923 | platters and disk management, b) on generating an application profile for\r | |
3924 | EIM when images are stored and distributed on CD-ROM, and c) on\r | |
3925 | evaluating SCSI2, and how a common command set can be generated for SCSI2\r | |
3926 | so that document scanners are more easily integrated. (ANSI/AIIM MS53\r | |
3927 | will also apply to compressed images.)\r | |
3928 | \r | |
3929 | ******\r | |
3930 | \r | |
3931 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
3932 | BATTIN * The implications of standards for preservation * A major\r | |
3933 | obstacle to successful cooperation * A hindrance to access in the digital\r | |
3934 | environment * Standards a double-edged sword for those concerned with the\r | |
3935 | preservation of the human record * Near-term prognosis for reliable\r | |
3936 | archival standards * Preservation concerns for electronic media * Need\r | |
3937 | for reconceptualizing our preservation principles * Standards in the real\r | |
3938 | world and the politics of reproduction * Need to redefine the concept of\r | |
3939 | archival and to begin to think in terms of life cycles * Cooperation and\r | |
3940 | the La Guardia Eight * Concerns generated by discussions on the problems\r | |
3941 | of preserving text and image * General principles to be adopted in a\r | |
3942 | world without standards *\r | |
3943 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
3944 | \r | |
3945 | Patricia BATTIN, president, the Commission on Preservation and Access\r | |
3946 | (CPA), addressed the implications of standards for preservation. She\r | |
3947 | listed several areas where the library profession and the analog world of\r | |
3948 | the printed book had made enormous contributions over the past hundred\r | |
3949 | years--for example, in bibliographic formats, binding standards, and, most\r | |
3950 | important, in determining what constitutes longevity or archival quality.\r | |
3951 | \r | |
3952 | Although standards have lightened the preservation burden through the\r | |
3953 | development of national and international collaborative programs,\r | |
3954 | nevertheless, a pervasive mistrust of other people's standards remains a\r | |
3955 | major obstacle to successful cooperation, BATTIN said.\r | |
3956 | \r | |
3957 | The zeal to achieve perfection, regardless of the cost, has hindered\r | |
3958 | rather than facilitated access in some instances, and in the digital\r | |
3959 | environment, where no real standards exist, has brought an ironically\r | |
3960 | just reward.\r | |
3961 | \r | |
3962 | BATTIN argued that standards are a double-edged sword for those concerned\r | |
3963 | with the preservation of the human record, that is, the provision of\r | |
3964 | access to recorded knowledge in a multitude of media as far into the\r | |
3965 | future as possible. Standards are essential to facilitate\r | |
3966 | interconnectivity and access, but, BATTIN said, as LYNCH pointed out\r | |
3967 | yesterday, if set too soon they can hinder creativity, expansion of\r | |
3968 | capability, and the broadening of access. The characteristics of\r | |
3969 | standards for digital imagery differ radically from those for analog\r | |
3970 | imagery. And the nature of digital technology implies continuing\r | |
3971 | volatility and change. To reiterate, precipitous standard-setting can\r | |
3972 | inhibit creativity, but delayed standard-setting results in chaos.\r | |
3973 | \r | |
3974 | Since in BATTIN'S opinion the near-term prognosis for reliable archival\r | |
3975 | standards, as defined by librarians in the analog world, is poor, two\r | |
3976 | alternatives remain: standing pat with the old technology, or\r | |
3977 | reconceptualizing.\r | |
3978 | \r | |
3979 | Preservation concerns for electronic media fall into two general domains. \r | |
3980 | One is the continuing assurance of access to knowledge originally\r | |
3981 | generated, stored, disseminated, and used in electronic form. This\r | |
3982 | domain contains several subdivisions, including 1) the closed,\r | |
3983 | proprietary systems discussed the previous day, bundled information such\r | |
3984 | as electronic journals and government agency records, and electronically\r | |
3985 | produced or captured raw data; and 2) the application of digital\r | |
3986 | technologies to the reformatting of materials originally published on a\r | |
3987 | deteriorating analog medium such as acid paper or videotape.\r | |
3988 | \r | |
3989 | The preservation of electronic media requires a reconceptualizing of our\r | |
3990 | preservation principles during a volatile, standardless transition which\r | |
3991 | may last far longer than any of us envision today. BATTIN urged the\r | |
3992 | necessity of shifting focus from assessing, measuring, and setting\r | |
3993 | standards for the permanence of the medium to the concept of managing\r | |
3994 | continuing access to information stored on a variety of media and\r | |
3995 | requiring a variety of ever-changing hardware and software for access--a\r | |
3996 | fundamental shift for the library profession.\r | |
3997 | \r | |
3998 | BATTIN offered a primer on how to move forward with reasonable confidence\r | |
3999 | in a world without standards. Her comments fell roughly into two sections:\r | |
4000 | 1) standards in the real world and 2) the politics of reproduction.\r | |
4001 | \r | |
4002 | In regard to real-world standards, BATTIN argued the need to redefine the\r | |
4003 | concept of archive and to begin to think in terms of life cycles. In\r | |
4004 | the past, the naive assumption that paper would last forever produced a\r | |
4005 | cavalier attitude toward life cycles. The transient nature of the\r | |
4006 | electronic media has compelled people to recognize and accept upfront the\r | |
4007 | concept of life cycles in place of permanency.\r | |
4008 | \r | |
4009 | Digital standards have to be developed and set in a cooperative context\r | |
4010 | to ensure efficient exchange of information. Moreover, during this\r | |
4011 | transition period, greater flexibility concerning how concepts such as\r | |
4012 | backup copies and archival copies in the CXP are defined is necessary,\r | |
4013 | or the opportunity to move forward will be lost.\r | |
4014 | \r | |
4015 | In terms of cooperation, particularly in the university setting, BATTIN\r | |
4016 | also argued the need to avoid going off in a hundred different\r | |
4017 | directions. The CPA has catalyzed a small group of universities called\r | |
4018 | the La Guardia Eight--because La Guardia Airport is where meetings take\r | |
4019 | place--Harvard, Yale, Cornell, Princeton, Penn State, Tennessee,\r | |
4020 | Stanford, and USC, to develop a digital preservation consortium to look\r | |
4021 | at all these issues and develop de facto standards as we move along,\r | |
4022 | instead of waiting for something that is officially blessed. Continuing\r | |
4023 | to apply analog values and definitions of standards to the digital\r | |
4024 | environment, BATTIN said, will effectively lead to forfeiture of the\r | |
4025 | benefits of digital technology to research and scholarship.\r | |
4026 | \r | |
4027 | Under the second rubric, the politics of reproduction, BATTIN reiterated\r | |
4028 | an oft-made argument concerning the electronic library, namely, that it\r | |
4029 | is more difficult to transform than to create, and nowhere is that belief\r | |
4030 | expressed more dramatically than in the conversion of brittle books to\r | |
4031 | new media. Preserving information published in electronic media involves\r | |
4032 | making sure the information remains accessible and that digital\r | |
4033 | information is not lost through reproduction. In the analog world of\r | |
4034 | photocopies and microfilm, the issue of fidelity to the original becomes\r | |
4035 | paramount, as do issues of "Whose fidelity?" and "Whose original?"\r | |
4036 | \r | |
4037 | BATTIN elaborated these arguments with a few examples from a recent study\r | |
4038 | conducted by the CPA on the problems of preserving text and image. \r | |
4039 | Discussions with scholars, librarians, and curators in a variety of\r | |
4040 | disciplines dependent on text and image generated a variety of concerns,\r | |
4041 | for example: 1) Copy what is, not what the technology is capable of. \r | |
4042 | This is very important for the history of ideas. Scholars wish to know\r | |
4043 | what the author saw and worked from. And make available at the\r | |
4044 | workstation the opportunity to erase all the defects and enhance the\r | |
4045 | presentation. 2) The fidelity of reproduction--what is good enough, what\r | |
4046 | can we afford, and the difference it makes--issues of subjective versus\r | |
4047 | objective resolution. 3) The differences between primary and secondary\r | |
4048 | users. Restricting the definition of primary user to the one in whose\r | |
4049 | discipline the material has been published runs one headlong into the\r | |
4050 | reality that these printed books have had a host of other users from a\r | |
4051 | host of other disciplines, who not only were looking for very different\r | |
4052 | things, but who also shared values very different from those of the\r | |
4053 | primary user. 4) The relationship of the standard of reproduction to new\r | |
4054 | capabilities of scholarship--the browsing standard versus an archival\r | |
4055 | standard. How good must the archival standard be? Can a distinction be\r | |
4056 | drawn between potential users in setting standards for reproduction? \r | |
4057 | Archival storage, use copies, browsing copies--ought an attempt to set\r | |
4058 | standards even be made? 5) Finally, costs. How much are we prepared to\r | |
4059 | pay to capture absolute fidelity? What are the trade-offs between vastly\r | |
4060 | enhanced access, degrees of fidelity, and costs?\r | |
4061 | \r | |
4062 | These standards, BATTIN concluded, serve to complicate further the\r | |
4063 | reproduction process, and add to the long list of technical standards\r | |
4064 | that are necessary to ensure widespread access. Ways to articulate and\r | |
4065 | analyze the costs that are attached to the different levels of standards\r | |
4066 | must be found.\r | |
4067 | \r | |
4068 | Given the chaos concerning standards, which promises to linger for the\r | |
4069 | foreseeable future, BATTIN urged adoption of the following general\r | |
4070 | principles:\r | |
4071 | \r | |
4072 | * Strive to understand the changing information requirements of\r | |
4073 | scholarly disciplines as more and more technology is integrated into\r | |
4074 | the process of research and scholarly communication in order to meet\r | |
4075 | future scholarly needs, not to build for the past. Capture\r | |
4076 | deteriorating information at the highest affordable resolution, even\r | |
4077 | though the dissemination and display technologies will lag.\r | |
4078 | \r | |
4079 | * Develop cooperative mechanisms to foster agreement on protocols\r | |
4080 | for document structure and other interchange mechanisms necessary\r | |
4081 | for widespread dissemination and use before official standards are\r | |
4082 | set.\r | |
4083 | \r | |
4084 | * Accept that, in a transition period, de facto standards will have\r | |
4085 | to be developed.\r | |
4086 | \r | |
4087 | * Capture information in a way that keeps all options open and\r | |
4088 | provides for total convertibility: OCR, scanning of microfilm,\r | |
4089 | producing microfilm from scanned documents, etc.\r | |
4090 | \r | |
4091 | * Work closely with the generators of information and the builders\r | |
4092 | of networks and databases to ensure that continuing accessibility is\r | |
4093 | a primary concern from the beginning.\r | |
4094 | \r | |
4095 | * Piggyback on standards under development for the broad market, and\r | |
4096 | avoid library-specific standards; work with the vendors, in order to\r | |
4097 | take advantage of that which is being standardized for the rest of\r | |
4098 | the world.\r | |
4099 | \r | |
4100 | * Concentrate efforts on managing permanence in the digital world,\r | |
4101 | rather than perfecting the longevity of a particular medium.\r | |
4102 | \r | |
4103 | ******\r | |
4104 | \r | |
4105 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
4106 | DISCUSSION * Additional comments on TIFF *\r | |
4107 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
4108 | \r | |
4109 | During the brief discussion period that followed BATTIN's presentation,\r | |
4110 | BARONAS explained that TIFF was not developed in collaboration with or\r | |
4111 | under the auspices of AIIM. TIFF is a company product, not a standard,\r | |
4112 | is owned by two corporations, and is always changing. BARONAS also\r | |
4113 | observed that ANSI/AIIM MS53, a bi-level image file transfer format that\r | |
4114 | allows unlike systems to exchange images, is compatible with TIFF as well\r | |
4115 | as with DEC's architecture and IBM's MODCA/IOCA.\r | |
4116 | \r | |
4117 | ******\r | |
4118 | \r | |
4119 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
4120 | HOOTON * Several questions to be considered in discussing text conversion\r | |
4121 | *\r | |
4122 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
4123 | \r | |
4124 | HOOTON introduced the final topic, text conversion, by noting that it is\r | |
4125 | becoming an increasingly important part of the imaging business. Many\r | |
4126 | people now realize that it enhances their system to be able to have more\r | |
4127 | and more character data as part of their imaging system. Re the issue of\r | |
4128 | OCR versus rekeying, HOOTON posed several questions: How does one get\r | |
4129 | text into computer-readable form? Does one use automated processes? \r | |
4130 | Does one attempt to eliminate the use of operators where possible? \r | |
4131 | Standards for accuracy, he said, are extremely important: it makes a\r | |
4132 | major difference in cost and time whether one sets as a standard 98.5\r | |
4133 | percent acceptance or 99.5 percent. He mentioned outsourcing as a\r | |
4134 | possibility for converting text. Finally, what one does with the image\r | |
4135 | to prepare it for the recognition process is also important, he said,\r | |
4136 | because such preparation changes how recognition is viewed, as well as\r | |
4137 | facilitates recognition itself.\r | |
4138 | \r | |
4139 | ******\r | |
4140 | \r | |
4141 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
4142 | LESK * Roles of participants in CORE * Data flow * The scanning process *\r | |
4143 | The image interface * Results of experiments involving the use of\r | |
4144 | electronic resources and traditional paper copies * Testing the issue of\r | |
4145 | serendipity * Conclusions *\r | |
4146 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
4147 | \r | |
4148 | Michael LESK, executive director, Computer Science Research, Bell\r | |
4149 | Communications Research, Inc. (Bellcore), discussed the Chemical Online\r | |
4150 | Retrieval Experiment (CORE), a cooperative project involving Cornell\r | |
4151 | University, OCLC, Bellcore, and the American Chemical Society (ACS).\r | |
4152 | \r | |
4153 | LESK spoke on 1) how the scanning was performed, including the unusual\r | |
4154 | feature of page segmentation, and 2) the use made of the text and the\r | |
4155 | image in experiments.\r | |
4156 | \r | |
4157 | Working with the chemistry journals (because ACS has been saving its\r | |
4158 | typesetting tapes since the mid-1970s and thus has a significant back-run\r | |
4159 | of the most important chemistry journals in the United States), CORE is\r | |
4160 | attempting to create an automated chemical library. Approximately a\r | |
4161 | quarter of the pages by square inch are made up of images of\r | |
4162 | quasi-pictorial material; dealing with the graphic components of the\r | |
4163 | pages is extremely important. LESK described the roles of participants\r | |
4164 | in CORE: 1) ACS provides copyright permission, journals on paper,\r | |
4165 | journals on microfilm, and some of the definitions of the files; 2) at\r | |
4166 | Bellcore, LESK chiefly performs the data preparation, while Dennis Egan\r | |
4167 | performs experiments on the users of chemical abstracts, and supplies the\r | |
4168 | indexing and numerous magnetic tapes; 3) Cornell provides the site of the\r | |
4169 | experiment; 4) OCLC develops retrieval software and other user interfaces.\r | |
4170 | Various manufacturers and publishers have furnished other help.\r | |
4171 | \r | |
4172 | Concerning data flow, Bellcore receives microfilm and paper from ACS; the\r | |
4173 | microfilm is scanned by outside vendors, while the paper is scanned\r | |
4174 | inhouse on an Improvision scanner, twenty pages per minute at 300 dpi,\r | |
4175 | which provides sufficient quality for all practical uses. LESK would\r | |
4176 | prefer to have more gray level, because one of the ACS journals prints on\r | |
4177 | some colored pages, which creates a problem.\r | |
4178 | \r | |
4179 | Bellcore performs all this scanning, creates a page-image file, and also\r | |
4180 | selects from the pages the graphics, to mix with the text file (which is\r | |
4181 | discussed later in the Workshop). The user is always searching the ASCII\r | |
4182 | file, but she or he may see a display based on the ASCII or a display\r | |
4183 | based on the images.\r | |
4184 | \r | |
4185 | LESK illustrated how the program performs page analysis, and the image\r | |
4186 | interface. (The user types several words, is presented with a list--\r | |
4187 | usually of the titles of articles contained in an issue--that derives\r | |
4188 | from the ASCII, clicks on an icon and receives an image that mirrors an\r | |
4189 | ACS page.) LESK also illustrated an alternative interface, based on text\r | |
4190 | on the ASCII, the so-called SuperBook interface from Bellcore.\r | |
4191 | \r | |
4192 | LESK next presented the results of an experiment conducted by Dennis Egan\r | |
4193 | and involving thirty-six students at Cornell, one third of them\r | |
4194 | undergraduate chemistry majors, one third senior undergraduate chemistry\r | |
4195 | majors, and one third graduate chemistry students. A third of them\r | |
4196 | received the paper journals, the traditional paper copies and chemical\r | |
4197 | abstracts on paper. A third received image displays of the pictures of\r | |
4198 | the pages, and a third received the text display with pop-up graphics.\r | |
4199 | \r | |
4200 | The students were given several questions made up by some chemistry\r | |
4201 | professors. The questions fell into five classes, ranging from very easy\r | |
4202 | to very difficult, and included questions designed to simulate browsing\r | |
4203 | as well as a traditional information retrieval-type task.\r | |
4204 | \r | |
4205 | LESK furnished the following results. In the straightforward question\r | |
4206 | search--the question being, what is the phosphorus oxygen bond distance\r | |
4207 | and hydroxy phosphate?--the students were told that they could take\r | |
4208 | fifteen minutes and, then, if they wished, give up. The students with\r | |
4209 | paper took more than fifteen minutes on average, and yet most of them\r | |
4210 | gave up. The students with either electronic format, text or image,\r | |
4211 | received good scores in reasonable time, hardly ever had to give up, and\r | |
4212 | usually found the right answer.\r | |
4213 | \r | |
4214 | In the browsing study, the students were given a list of eight topics,\r | |
4215 | told to imagine that an issue of the Journal of the American Chemical\r | |
4216 | Society had just appeared on their desks, and were also told to flip\r | |
4217 | through it and to find topics mentioned in the issue. The average scores\r | |
4218 | were about the same. (The students were told to answer yes or no about\r | |
4219 | whether or not particular topics appeared.) The errors, however, were\r | |
4220 | quite different. The students with paper rarely said that something\r | |
4221 | appeared when it had not. But they often failed to find something\r | |
4222 | actually mentioned in the issue. The computer people found numerous\r | |
4223 | things, but they also frequently said that a topic was mentioned when it\r | |
4224 | was not. (The reason, of course, was that they were performing word\r | |
4225 | searches. They were finding that words were mentioned and they were\r | |
4226 | concluding that they had accomplished their task.)\r | |
4227 | \r | |
4228 | This question also contained a trick to test the issue of serendipity. \r | |
4229 | The students were given another list of eight topics and instructed,\r | |
4230 | without taking a second look at the journal, to recall how many of this\r | |
4231 | new list of eight topics were in this particular issue. This was an\r | |
4232 | attempt to see if they performed better at remembering what they were not\r | |
4233 | looking for. They all performed about the same, paper or electronics,\r | |
4234 | about 62 percent accurate. In short, LESK said, people were not very\r | |
4235 | good when it came to serendipity, but they were no worse at it with\r | |
4236 | computers than they were with paper.\r | |
4237 | \r | |
4238 | (LESK gave a parenthetical illustration of the learning curve of students\r | |
4239 | who used SuperBook.)\r | |
4240 | \r | |
4241 | The students using the electronic systems started off worse than the ones\r | |
4242 | using print, but by the third of the three sessions in the series had\r | |
4243 | caught up to print. As one might expect, electronics provide a much\r | |
4244 | better means of finding what one wants to read; reading speeds, once the\r | |
4245 | object of the search has been found, are about the same.\r | |
4246 | \r | |
4247 | Almost none of the students could perform the hard task--the analogous\r | |
4248 | transformation. (It would require the expertise of organic chemists to\r | |
4249 | complete.) But an interesting result was that the students using the text\r | |
4250 | search performed terribly, while those using the image system did best.\r | |
4251 | That the text search system is driven by text offers the explanation.\r | |
4252 | Everything is focused on the text; to see the pictures, one must press\r | |
4253 | on an icon. Many students found the right article containing the answer\r | |
4254 | to the question, but they did not click on the icon to bring up the right\r | |
4255 | figure and see it. They did not know that they had found the right place,\r | |
4256 | and thus got it wrong.\r | |
4257 | \r | |
4258 | The short answer demonstrated by this experiment was that in the event\r | |
4259 | one does not know what to read, one needs the electronic systems; the\r | |
4260 | electronic systems hold no advantage at the moment if one knows what to\r | |
4261 | read, but neither do they impose a penalty.\r | |
4262 | \r | |
4263 | LESK concluded by commenting that, on one hand, the image system was easy\r | |
4264 | to use. On the other hand, the text display system, which represented\r | |
4265 | twenty man-years of work in programming and polishing, was not winning,\r | |
4266 | because the text was not being read, just searched. The much easier\r | |
4267 | system is highly competitive as well as remarkably effective for the\r | |
4268 | actual chemists.\r | |
4269 | \r | |
4270 | ******\r | |
4271 | \r | |
4272 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
4273 | ERWAY * Most challenging aspect of working on AM * Assumptions guiding\r | |
4274 | AM's approach * Testing different types of service bureaus * AM's\r | |
4275 | requirement for 99.95 percent accuracy * Requirements for text-coding *\r | |
4276 | Additional factors influencing AM's approach to coding * Results of AM's\r | |
4277 | experience with rekeying * Other problems in dealing with service bureaus\r | |
4278 | * Quality control the most time-consuming aspect of contracting out\r | |
4279 | conversion * Long-term outlook uncertain *\r | |
4280 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
4281 | \r | |
4282 | To Ricky ERWAY, associate coordinator, American Memory, Library of\r | |
4283 | Congress, the constant variety of conversion projects taking place\r | |
4284 | simultaneously represented perhaps the most challenging aspect of working\r | |
4285 | on AM. Thus, the challenge was not to find a solution for text\r | |
4286 | conversion but a tool kit of solutions to apply to LC's varied\r | |
4287 | collections that need to be converted. ERWAY limited her remarks to the\r | |
4288 | process of converting text to machine-readable form, and the variety of\r | |
4289 | LC's text collections, for example, bound volumes, microfilm, and\r | |
4290 | handwritten manuscripts.\r | |
4291 | \r | |
4292 | Two assumptions have guided AM's approach, ERWAY said: 1) A desire not\r | |
4293 | to perform the conversion inhouse. Because of the variety of formats and\r | |
4294 | types of texts, to capitalize the equipment and have the talents and\r | |
4295 | skills to operate them at LC would be extremely expensive. Further, the\r | |
4296 | natural inclination to upgrade to newer and better equipment each year\r | |
4297 | made it reasonable for AM to focus on what it did best and seek external\r | |
4298 | conversion services. Using service bureaus also allowed AM to have\r | |
4299 | several types of operations take place at the same time. 2) AM was not a\r | |
4300 | technology project, but an effort to improve access to library\r | |
4301 | collections. Hence, whether text was converted using OCR or rekeying\r | |
4302 | mattered little to AM. What mattered were cost and accuracy of results.\r | |
4303 | \r | |
4304 | AM considered different types of service bureaus and selected three to\r | |
4305 | perform several small tests in order to acquire a sense of the field. \r | |
4306 | The sample collections with which they worked included handwritten\r | |
4307 | correspondence, typewritten manuscripts from the 1940s, and\r | |
4308 | eighteenth-century printed broadsides on microfilm. On none of these\r | |
4309 | samples was OCR performed; they were all rekeyed. AM had several special\r | |
4310 | requirements for the three service bureaus it had engaged. For instance,\r | |
4311 | any errors in the original text were to be retained. Working from bound\r | |
4312 | volumes or anything that could not be sheet-fed also constituted a factor\r | |
4313 | eliminating companies that would have performed OCR.\r | |
4314 | \r | |
4315 | AM requires 99.95 percent accuracy, which, though it sounds high, often\r | |
4316 | means one or two errors per page. The initial batch of test samples\r | |
4317 | contained several handwritten materials for which AM did not require\r | |
4318 | text-coding. The results, ERWAY reported, were in all cases fairly\r | |
4319 | comparable: for the most part, all three service bureaus achieved 99.95\r | |
4320 | percent accuracy. AM was satisfied with the work but surprised at the cost.\r | |
4321 | \r | |
4322 | As AM began converting whole collections, it retained the requirement for\r | |
4323 | 99.95 percent accuracy and added requirements for text-coding. AM needed\r | |
4324 | to begin performing work more than three years ago before LC requirements\r | |
4325 | for SGML applications had been established. Since AM's goal was simply\r | |
4326 | to retain any of the intellectual content represented by the formatting\r | |
4327 | of the document (which would be lost if one performed a straight ASCII\r | |
4328 | conversion), AM used "SGML-like" codes. These codes resembled SGML tags\r | |
4329 | but were used without the benefit of document-type definitions. AM found\r | |
4330 | that many service bureaus were not yet SGML-proficient.\r | |
4331 | \r | |
4332 | Additional factors influencing the approach AM took with respect to\r | |
4333 | coding included: 1) the inability of any known microcomputer-based\r | |
4334 | user-retrieval software to take advantage of SGML coding; and 2) the\r | |
4335 | multiple inconsistencies in format of the older documents, which\r | |
4336 | confirmed AM in its desire not to attempt to force the different formats\r | |
4337 | to conform to a single document-type definition (DTD) and thus create the\r | |
4338 | need for a separate DTD for each document. \r | |
4339 | \r | |
4340 | The five text collections that AM has converted or is in the process of\r | |
4341 | converting include a collection of eighteenth-century broadsides, a\r | |
4342 | collection of pamphlets, two typescript document collections, and a\r | |
4343 | collection of 150 books.\r | |
4344 | \r | |
4345 | ERWAY next reviewed the results of AM's experience with rekeying, noting\r | |
4346 | again that because the bulk of AM's materials are historical, the quality\r | |
4347 | of the text often does not lend itself to OCR. While non-English\r | |
4348 | speakers are less likely to guess or elaborate or correct typos in the\r | |
4349 | original text, they are also less able to infer what we would; they also\r | |
4350 | are nearly incapable of converting handwritten text. Another\r | |
4351 | disadvantage of working with overseas keyers is that they are much less\r | |
4352 | likely to telephone with questions, especially on the coding, with the\r | |
4353 | result that they develop their own rules as they encounter new\r | |
4354 | situations.\r | |
4355 | \r | |
4356 | Government contracting procedures and time frames posed a major challenge\r | |
4357 | to performing the conversion. Many service bureaus are not accustomed to\r | |
4358 | retaining the image, even if they perform OCR. Thus, questions of image\r | |
4359 | format and storage media were somewhat novel to many of them. ERWAY also\r | |
4360 | remarked other problems in dealing with service bureaus, for example,\r | |
4361 | their inability to perform text conversion from the kind of microfilm\r | |
4362 | that LC uses for preservation purposes.\r | |
4363 | \r | |
4364 | But quality control, in ERWAY's experience, was the most time-consuming\r | |
4365 | aspect of contracting out conversion. AM has been attempting to perform\r | |
4366 | a 10-percent quality review, looking at either every tenth document or\r | |
4367 | every tenth page to make certain that the service bureaus are maintaining\r | |
4368 | 99.95 percent accuracy. But even if they are complying with the\r | |
4369 | requirement for accuracy, finding errors produces a desire to correct\r | |
4370 | them and, in turn, to clean up the whole collection, which defeats the\r | |
4371 | purpose to some extent. Even a double entry requires a\r | |
4372 | character-by-character comparison to the original to meet the accuracy\r | |
4373 | requirement. LC is not accustomed to publish imperfect texts, which\r | |
4374 | makes attempting to deal with the industry standard an emotionally\r | |
4375 | fraught issue for AM. As was mentioned in the previous day's discussion,\r | |
4376 | going from 99.95 to 99.99 percent accuracy usually doubles costs and\r | |
4377 | means a third keying or another complete run-through of the text.\r | |
4378 | \r | |
4379 | Although AM has learned much from its experiences with various collections\r | |
4380 | and various service bureaus, ERWAY concluded pessimistically that no\r | |
4381 | breakthrough has been achieved. Incremental improvements have occurred\r | |
4382 | in some of the OCR technology, some of the processes, and some of the\r | |
4383 | standards acceptances, which, though they may lead to somewhat lower costs,\r | |
4384 | do not offer much encouragement to many people who are anxiously awaiting\r | |
4385 | the day that the entire contents of LC are available on-line.\r | |
4386 | \r | |
4387 | ******\r | |
4388 | \r | |
4389 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
4390 | ZIDAR * Several answers to why one attempts to perform full-text\r | |
4391 | conversion * Per page cost of performing OCR * Typical problems\r | |
4392 | encountered during editing * Editing poor copy OCR vs. rekeying *\r | |
4393 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
4394 | \r | |
4395 | Judith ZIDAR, coordinator, National Agricultural Text Digitizing Program\r | |
4396 | (NATDP), National Agricultural Library (NAL), offered several answers to\r | |
4397 | the question of why one attempts to perform full-text conversion: 1)\r | |
4398 | Text in an image can be read by a human but not by a computer, so of\r | |
4399 | course it is not searchable and there is not much one can do with it. 2)\r | |
4400 | Some material simply requires word-level access. For instance, the legal\r | |
4401 | profession insists on full-text access to its material; with taxonomic or\r | |
4402 | geographic material, which entails numerous names, one virtually requires\r | |
4403 | word-level access. 3) Full text permits rapid browsing and searching,\r | |
4404 | something that cannot be achieved in an image with today's technology. \r | |
4405 | 4) Text stored as ASCII and delivered in ASCII is standardized and highly\r | |
4406 | portable. 5) People just want full-text searching, even those who do not\r | |
4407 | know how to do it. NAL, for the most part, is performing OCR at an\r | |
4408 | actual cost per average-size page of approximately $7. NAL scans the\r | |
4409 | page to create the electronic image and passes it through the OCR device.\r | |
4410 | \r | |
4411 | ZIDAR next rehearsed several typical problems encountered during editing. \r | |
4412 | Praising the celerity of her student workers, ZIDAR observed that editing\r | |
4413 | requires approximately five to ten minutes per page, assuming that there\r | |
4414 | are no large tables to audit. Confusion among the three characters I, 1, \r | |
4415 | and l, constitutes perhaps the most common problem encountered. Zeroes\r | |
4416 | and O's also are frequently confused. Double M's create a particular\r | |
4417 | problem, even on clean pages. They are so wide in most fonts that they\r | |
4418 | touch, and the system simply cannot tell where one letter ends and the\r | |
4419 | other begins. Complex page formats occasionally fail to columnate\r | |
4420 | properly, which entails rescanning as though one were working with a\r | |
4421 | single column, entering the ASCII, and decolumnating for better\r | |
4422 | searching. With proportionally spaced text, OCR can have difficulty\r | |
4423 | discerning what is a space and what are merely spaces between letters, as\r | |
4424 | opposed to spaces between words, and therefore will merge text or break\r | |
4425 | up words where it should not.\r | |
4426 | \r | |
4427 | ZIDAR said that it can often take longer to edit a poor-copy OCR than to\r | |
4428 | key it from scratch. NAL has also experimented with partial editing of\r | |
4429 | text, whereby project workers go into and clean up the format, removing\r | |
4430 | stray characters but not running a spell-check. NAL corrects typos in\r | |
4431 | the title and authors' names, which provides a foothold for searching and\r | |
4432 | browsing. Even extremely poor-quality OCR (e.g., 60-percent accuracy)\r | |
4433 | can still be searched, because numerous words are correct, while the\r | |
4434 | important words are probably repeated often enough that they are likely\r | |
4435 | to be found correct somewhere. Librarians, however, cannot tolerate this\r | |
4436 | situation, though end users seem more willing to use this text for\r | |
4437 | searching, provided that NAL indicates that it is unedited. ZIDAR\r | |
4438 | concluded that rekeying of text may be the best route to take, in spite\r | |
4439 | of numerous problems with quality control and cost.\r | |
4440 | \r | |
4441 | ******\r | |
4442 | \r | |
4443 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
4444 | DISCUSSION * Modifying an image before performing OCR * NAL's costs per\r | |
4445 | page *AM's costs per page and experience with Federal Prison Industries *\r | |
4446 | Elements comprising NATDP's costs per page * OCR and structured markup *\r | |
4447 | Distinction between the structure of a document and its representation\r | |
4448 | when put on the screen or printed *\r | |
4449 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
4450 | \r | |
4451 | HOOTON prefaced the lengthy discussion that followed with several\r | |
4452 | comments about modifying an image before one reaches the point of\r | |
4453 | performing OCR. For example, in regard to an application containing a\r | |
4454 | significant amount of redundant data, such as form-type data, numerous\r | |
4455 | companies today are working on various kinds of form renewal, prior to\r | |
4456 | going through a recognition process, by using dropout colors. Thus,\r | |
4457 | acquiring access to form design or using electronic means are worth\r | |
4458 | considering. HOOTON also noted that conversion usually makes or breaks\r | |
4459 | one's imaging system. It is extremely important, extremely costly in\r | |
4460 | terms of either capital investment or service, and determines the quality\r | |
4461 | of the remainder of one's system, because it determines the character of\r | |
4462 | the raw material used by the system.\r | |
4463 | \r | |
4464 | Concerning the four projects undertaken by NAL, two inside and two\r | |
4465 | performed by outside contractors, ZIDAR revealed that an in-house service\r | |
4466 | bureau executed the first at a cost between $8 and $10 per page for\r | |
4467 | everything, including building of the database. The project undertaken\r | |
4468 | by the Consultative Group on International Agricultural Research (CGIAR)\r | |
4469 | cost approximately $10 per page for the conversion, plus some expenses\r | |
4470 | for the software and building of the database. The Acid Rain Project--a\r | |
4471 | two-disk set produced by the University of Vermont, consisting of\r | |
4472 | Canadian publications on acid rain--cost $6.70 per page for everything,\r | |
4473 | including keying of the text, which was double keyed, scanning of the\r | |
4474 | images, and building of the database. The in-house project offered\r | |
4475 | considerable ease of convenience and greater control of the process. On\r | |
4476 | the other hand, the service bureaus know their job and perform it\r | |
4477 | expeditiously, because they have more people.\r | |
4478 | \r | |
4479 | As a useful comparison, ERWAY revealed AM's costs as follows: $0.75\r | |
4480 | cents to $0.85 cents per thousand characters, with an average page\r | |
4481 | containing 2,700 characters. Requirements for coding and imaging\r | |
4482 | increase the costs. Thus, conversion of the text, including the coding,\r | |
4483 | costs approximately $3 per page. (This figure does not include the\r | |
4484 | imaging and database-building included in the NAL costs.) AM also\r | |
4485 | enjoyed a happy experience with Federal Prison Industries, which\r | |
4486 | precluded the necessity of going through the request-for-proposal process\r | |
4487 | to award a contract, because it is another government agency. The\r | |
4488 | prisoners performed AM's rekeying just as well as other service bureaus\r | |
4489 | and proved handy as well. AM shipped them the books, which they would\r | |
4490 | photocopy on a book-edge scanner. They would perform the markup on\r | |
4491 | photocopies, return the books as soon as they were done with them,\r | |
4492 | perform the keying, and return the material to AM on WORM disks.\r | |
4493 | \r | |
4494 | ZIDAR detailed the elements that constitute the previously noted cost of\r | |
4495 | approximately $7 per page. Most significant is the editing, correction\r | |
4496 | of errors, and spell-checkings, which though they may sound easy to\r | |
4497 | perform require, in fact, a great deal of time. Reformatting text also\r | |
4498 | takes a while, but a significant amount of NAL's expenses are for equipment,\r | |
4499 | which was extremely expensive when purchased because it was one of the few\r | |
4500 | systems on the market. The costs of equipment are being amortized over\r | |
4501 | five years but are still quite high, nearly $2,000 per month.\r | |
4502 | \r | |
4503 | HOCKEY raised a general question concerning OCR and the amount of editing\r | |
4504 | required (substantial in her experience) to generate the kind of\r | |
4505 | structured markup necessary for manipulating the text on the computer or\r | |
4506 | loading it into any retrieval system. She wondered if the speakers could\r | |
4507 | extend the previous question about the cost-benefit of adding or exerting\r | |
4508 | structured markup. ERWAY noted that several OCR systems retain italics,\r | |
4509 | bolding, and other spatial formatting. While the material may not be in\r | |
4510 | the format desired, these systems possess the ability to remove the\r | |
4511 | original materials quickly from the hands of the people performing the\r | |
4512 | conversion, as well as to retain that information so that users can work\r | |
4513 | with it. HOCKEY rejoined that the current thinking on markup is that one\r | |
4514 | should not say that something is italic or bold so much as why it is that\r | |
4515 | way. To be sure, one needs to know that something was italicized, but\r | |
4516 | how can one get from one to the other? One can map from the structure to\r | |
4517 | the typographic representation.\r | |
4518 | \r | |
4519 | FLEISCHHAUER suggested that, given the 100 million items the Library\r | |
4520 | holds, it may not be possible for LC to do more than report that a thing\r | |
4521 | was in italics as opposed to why it was italics, although that may be\r | |
4522 | desirable in some contexts. Promising to talk a bit during the afternoon\r | |
4523 | session about several experiments OCLC performed on automatic recognition\r | |
4524 | of document elements, and which they hoped to extend, WEIBEL said that in\r | |
4525 | fact one can recognize the major elements of a document with a fairly\r | |
4526 | high degree of reliability, at least as good as OCR. STEVENS drew a\r | |
4527 | useful distinction between standard, generalized markup (i.e., defining\r | |
4528 | for a document-type definition the structure of the document), and what\r | |
4529 | he termed a style sheet, which had to do with italics, bolding, and other\r | |
4530 | forms of emphasis. Thus, two different components are at work, one being\r | |
4531 | the structure of the document itself (its logic), and the other being its\r | |
4532 | representation when it is put on the screen or printed.\r | |
4533 | \r | |
4534 | ******\r | |
4535 | \r | |
4536 | SESSION V. APPROACHES TO PREPARING ELECTRONIC TEXTS\r | |
4537 | \r | |
4538 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
4539 | HOCKEY * Text in ASCII and the representation of electronic text versus\r | |
4540 | an image * The need to look at ways of using markup to assist retrieval *\r | |
4541 | The need for an encoding format that will be reusable and multifunctional\r | |
4542 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
4543 | \r | |
4544 | Susan HOCKEY, director, Center for Electronic Texts in the Humanities\r | |
4545 | (CETH), Rutgers and Princeton Universities, announced that one talk\r | |
4546 | (WEIBEL's) was moved into this session from the morning and that David\r | |
4547 | Packard was unable to attend. The session would attempt to focus more on\r | |
4548 | what one can do with a text in ASCII and the representation of electronic\r | |
4549 | text rather than just an image, what one can do with a computer that\r | |
4550 | cannot be done with a book or an image. It would be argued that one can\r | |
4551 | do much more than just read a text, and from that starting point one can\r | |
4552 | use markup and methods of preparing the text to take full advantage of\r | |
4553 | the capability of the computer. That would lead to a discussion of what\r | |
4554 | the European Community calls REUSABILITY, what may better be termed\r | |
4555 | DURABILITY, that is, how to prepare or make a text that will last a long\r | |
4556 | time and that can be used for as many applications as possible, which\r | |
4557 | would lead to issues of improving intellectual access.\r | |
4558 | \r | |
4559 | HOCKEY urged the need to look at ways of using markup to facilitate retrieval,\r | |
4560 | not just for referencing or to help locate an item that is retrieved, but also to put markup tags in\r | |
4561 | a text to help retrieve the thing sought either with linguistic tagging or\r | |
4562 | interpretation. HOCKEY also argued that little advancement had occurred in\r | |
4563 | the software tools currently available for retrieving and searching text.\r | |
4564 | She pressed the desideratum of going beyond Boolean searches and performing\r | |
4565 | more sophisticated searching, which the insertion of more markup in the text\r | |
4566 | would facilitate. Thinking about electronic texts as opposed to images means\r | |
4567 | considering material that will never appear in print form, or print will not\r | |
4568 | be its primary form, that is, material which only appears in electronic form.\r | |
4569 | HOCKEY alluded to the history and the need for markup and tagging and\r | |
4570 | electronic text, which was developed through the use of computers in the\r | |
4571 | humanities; as MICHELSON had observed, Father Busa had started in 1949\r | |
4572 | to prepare the first-ever text on the computer.\r | |
4573 | \r | |
4574 | HOCKEY remarked several large projects, particularly in Europe, for the\r | |
4575 | compilation of dictionaries, language studies, and language analysis, in\r | |
4576 | which people have built up archives of text and have begun to recognize\r | |
4577 | the need for an encoding format that will be reusable and multifunctional,\r | |
4578 | that can be used not just to print the text, which may be assumed to be a\r | |
4579 | byproduct of what one wants to do, but to structure it inside the computer\r | |
4580 | so that it can be searched, built into a Hypertext system, etc.\r | |
4581 | \r | |
4582 | ******\r | |
4583 | \r | |
4584 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
4585 | WEIBEL * OCLC's approach to preparing electronic text: retroconversion,\r | |
4586 | keying of texts, more automated ways of developing data * Project ADAPT\r | |
4587 | and the CORE Project * Intelligent character recognition does not exist *\r | |
4588 | Advantages of SGML * Data should be free of procedural markup;\r | |
4589 | descriptive markup strongly advocated * OCLC's interface illustrated *\r | |
4590 | Storage requirements and costs for putting a lot of information on line *\r | |
4591 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
4592 | \r | |
4593 | Stuart WEIBEL, senior research scientist, Online Computer Library Center,\r | |
4594 | Inc. (OCLC), described OCLC's approach to preparing electronic text. He\r | |
4595 | argued that the electronic world into which we are moving must\r | |
4596 | accommodate not only the future but the past as well, and to some degree\r | |
4597 | even the present. Thus, starting out at one end with retroconversion and\r | |
4598 | keying of texts, one would like to move toward much more automated ways\r | |
4599 | of developing data.\r | |
4600 | \r | |
4601 | For example, Project ADAPT had to do with automatically converting\r | |
4602 | document images into a structured document database with OCR text as\r | |
4603 | indexing and also a little bit of automatic formatting and tagging of\r | |
4604 | that text. The CORE project hosted by Cornell University, Bellcore,\r | |
4605 | OCLC, the American Chemical Society, and Chemical Abstracts, constitutes\r | |
4606 | WEIBEL's principal concern at the moment. This project is an example of\r | |
4607 | converting text for which one already has a machine-readable version into\r | |
4608 | a format more suitable for electronic delivery and database searching. \r | |
4609 | (Since Michael LESK had previously described CORE, WEIBEL would say\r | |
4610 | little concerning it.) Borrowing a chemical phrase, de novo synthesis,\r | |
4611 | WEIBEL cited the Online Journal of Current Clinical Trials as an example\r | |
4612 | of de novo electronic publishing, that is, a form in which the primary\r | |
4613 | form of the information is electronic.\r | |
4614 | \r | |
4615 | Project ADAPT, then, which OCLC completed a couple of years ago and in\r | |
4616 | fact is about to resume, is a model in which one takes page images either\r | |
4617 | in paper or microfilm and converts them automatically to a searchable\r | |
4618 | electronic database, either on-line or local. The operating assumption\r | |
4619 | is that accepting some blemishes in the data, especially for\r | |
4620 | retroconversion of materials, will make it possible to accomplish more. \r | |
4621 | Not enough money is available to support perfect conversion.\r | |
4622 | \r | |
4623 | WEIBEL related several steps taken to perform image preprocessing\r | |
4624 | (processing on the image before performing optical character\r | |
4625 | recognition), as well as image postprocessing. He denied the existence\r | |
4626 | of intelligent character recognition and asserted that what is wanted is\r | |
4627 | page recognition, which is a long way off. OCLC has experimented with\r | |
4628 | merging of multiple optical character recognition systems that will\r | |
4629 | reduce errors from an unacceptable rate of 5 characters out of every\r | |
4630 | l,000 to an unacceptable rate of 2 characters out of every l,000, but it\r | |
4631 | is not good enough. It will never be perfect.\r | |
4632 | \r | |
4633 | Concerning the CORE Project, WEIBEL observed that Bellcore is taking the\r | |
4634 | topography files, extracting the page images, and converting those\r | |
4635 | topography files to SGML markup. LESK hands that data off to OCLC, which\r | |
4636 | builds that data into a Newton database, the same system that underlies\r | |
4637 | the on-line system in virtually all of the reference products at OCLC. \r | |
4638 | The long-term goal is to make the systems interoperable so that not just\r | |
4639 | Bellcore's system and OCLC's system can access this data, but other\r | |
4640 | systems can as well, and the key to that is the Z39.50 common command\r | |
4641 | language and the full-text extension. Z39.50 is fine for MARC records,\r | |
4642 | but is not enough to do it for full text (that is, make full texts\r | |
4643 | interoperable).\r | |
4644 | \r | |
4645 | WEIBEL next outlined the critical role of SGML for a variety of purposes,\r | |
4646 | for example, as noted by HOCKEY, in the world of extremely large\r | |
4647 | databases, using highly structured data to perform field searches. \r | |
4648 | WEIBEL argued that by building the structure of the data in (i.e., the\r | |
4649 | structure of the data originally on a printed page), it becomes easy to\r | |
4650 | look at a journal article even if one cannot read the characters and know\r | |
4651 | where the title or author is, or what the sections of that document would be.\r | |
4652 | OCLC wants to make that structure explicit in the database, because it will\r | |
4653 | be important for retrieval purposes.\r | |
4654 | \r | |
4655 | The second big advantage of SGML is that it gives one the ability to\r | |
4656 | build structure into the database that can be used for display purposes\r | |
4657 | without contaminating the data with instructions about how to format\r | |
4658 | things. The distinction lies between procedural markup, which tells one\r | |
4659 | where to put dots on the page, and descriptive markup, which describes\r | |
4660 | the elements of a document.\r | |
4661 | \r | |
4662 | WEIBEL believes that there should be no procedural markup in the data at\r | |
4663 | all, that the data should be completely unsullied by information about\r | |
4664 | italics or boldness. That should be left up to the display device,\r | |
4665 | whether that display device is a page printer or a screen display device. \r | |
4666 | By keeping one's database free of that kind of contamination, one can\r | |
4667 | make decisions down the road, for example, reorganize the data in ways\r | |
4668 | that are not cramped by built-in notions of what should be italic and\r | |
4669 | what should be bold. WEIBEL strongly advocated descriptive markup. As\r | |
4670 | an example, he illustrated the index structure in the CORE data. With\r | |
4671 | subsequent illustrated examples of markup, WEIBEL acknowledged the common\r | |
4672 | complaint that SGML is hard to read in its native form, although markup\r | |
4673 | decreases considerably once one gets into the body. Without the markup,\r | |
4674 | however, one would not have the structure in the data. One can pass\r | |
4675 | markup through a LaTeX processor and convert it relatively easily to a\r | |
4676 | printed version of the document.\r | |
4677 | \r | |
4678 | WEIBEL next illustrated an extremely cluttered screen dump of OCLC's\r | |
4679 | system, in order to show as much as possible the inherent capability on\r | |
4680 | the screen. (He noted parenthetically that he had become a supporter of\r | |
4681 | X-Windows as a result of the progress of the CORE Project.) WEIBEL also\r | |
4682 | illustrated the two major parts of the interface: l) a control box that\r | |
4683 | allows one to generate lists of items, which resembles a small table of\r | |
4684 | contents based on key words one wishes to search, and 2) a document\r | |
4685 | viewer, which is a separate process in and of itself. He demonstrated\r | |
4686 | how to follow links through the electronic database simply by selecting\r | |
4687 | the appropriate button and bringing them up. He also noted problems that\r | |
4688 | remain to be accommodated in the interface (e.g., as pointed out by LESK,\r | |
4689 | what happens when users do not click on the icon for the figure).\r | |
4690 | \r | |
4691 | Given the constraints of time, WEIBEL omitted a large number of ancillary\r | |
4692 | items in order to say a few words concerning storage requirements and\r | |
4693 | what will be required to put a lot of things on line. Since it is\r | |
4694 | extremely expensive to reconvert all of this data, especially if it is\r | |
4695 | just in paper form (and even if it is in electronic form in typesetting\r | |
4696 | tapes), he advocated building journals electronically from the start. In\r | |
4697 | that case, if one only has text graphics and indexing (which is all that\r | |
4698 | one needs with de novo electronic publishing, because there is no need to\r | |
4699 | go back and look at bit-maps of pages), one can get 10,000 journals of\r | |
4700 | full text, or almost 6 million pages per year. These pages can be put in\r | |
4701 | approximately 135 gigabytes of storage, which is not all that much,\r | |
4702 | WEIBEL said. For twenty years, something less than three terabytes would\r | |
4703 | be required. WEIBEL calculated the costs of storing this information as\r | |
4704 | follows: If a gigabyte costs approximately $1,000, then a terabyte costs\r | |
4705 | approximately $1 million to buy in terms of hardware. One also needs a\r | |
4706 | building to put it in and a staff like OCLC to handle that information. \r | |
4707 | So, to support a terabyte, multiply by five, which gives $5 million per\r | |
4708 | year for a supported terabyte of data.\r | |
4709 | \r | |
4710 | ******\r | |
4711 | \r | |
4712 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
4713 | DISCUSSION * Tapes saved by ACS are the typography files originally\r | |
4714 | supporting publication of the journal * Cost of building tagged text into\r | |
4715 | the database *\r | |
4716 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
4717 | \r | |
4718 | During the question-and-answer period that followed WEIBEL's\r | |
4719 | presentation, these clarifications emerged. The tapes saved by the\r | |
4720 | American Chemical Society are the typography files that originally\r | |
4721 | supported the publication of the journal. Although they are not tagged\r | |
4722 | in SGML, they are tagged in very fine detail. Every single sentence is\r | |
4723 | marked, all the registry numbers, all the publications issues, dates, and\r | |
4724 | volumes. No cost figures on tagging material on a per-megabyte basis\r | |
4725 | were available. Because ACS's typesetting system runs from tagged text,\r | |
4726 | there is no extra cost per article. It was unknown what it costs ACS to\r | |
4727 | keyboard the tagged text rather than just keyboard the text in the\r | |
4728 | cheapest process. In other words, since one intends to publish things\r | |
4729 | and will need to build tagged text into a typography system in any case,\r | |
4730 | if one does that in such a way that it can drive not only typography but\r | |
4731 | an electronic system (which is what ACS intends to do--move to SGML\r | |
4732 | publishing), the marginal cost is zero. The marginal cost represents the\r | |
4733 | cost of building tagged text into the database, which is small.\r | |
4734 | \r | |
4735 | ******\r | |
4736 | \r | |
4737 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
4738 | SPERBERG-McQUEEN * Distinction between texts and computers * Implications\r | |
4739 | of recognizing that all representation is encoding * Dealing with\r | |
4740 | complicated representations of text entails the need for a grammar of\r | |
4741 | documents * Variety of forms of formal grammars * Text as a bit-mapped\r | |
4742 | image does not represent a serious attempt to represent text in\r | |
4743 | electronic form * SGML, the TEI, document-type declarations, and the\r | |
4744 | reusability and longevity of data * TEI conformance explicitly allows\r | |
4745 | extension or modification of the TEI tag set * Administrative background\r | |
4746 | of the TEI * Several design goals for the TEI tag set * An absolutely\r | |
4747 | fixed requirement of the TEI Guidelines * Challenges the TEI has\r | |
4748 | attempted to face * Good texts not beyond economic feasibility * The\r | |
4749 | issue of reproducibility or processability * The issue of mages as\r | |
4750 | simulacra for the text redux * One's model of text determines what one's\r | |
4751 | software can do with a text and has economic consequences *\r | |
4752 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
4753 | \r | |
4754 | Prior to speaking about SGML and markup, Michael SPERBERG-McQUEEN, editor,\r | |
4755 | Text Encoding Initiative (TEI), University of Illinois-Chicago, first drew\r | |
4756 | a distinction between texts and computers: Texts are abstract cultural\r | |
4757 | and linguistic objects while computers are complicated physical devices,\r | |
4758 | he said. Abstract objects cannot be placed inside physical devices; with\r | |
4759 | computers one can only represent text and act upon those representations.\r | |
4760 | \r | |
4761 | The recognition that all representation is encoding, SPERBERG-McQUEEN\r | |
4762 | argued, leads to the recognition of two things: 1) The topic description\r | |
4763 | for this session is slightly misleading, because there can be no discussion\r | |
4764 | of pros and cons of text-coding unless what one means is pros and cons of\r | |
4765 | working with text with computers. 2) No text can be represented in a\r | |
4766 | computer without some sort of encoding; images are one way of encoding text,\r | |
4767 | ASCII is another, SGML yet another. There is no encoding without some\r | |
4768 | information loss, that is, there is no perfect reproduction of a text that\r | |
4769 | allows one to do away with the original. Thus, the question becomes,\r | |
4770 | What is the most useful representation of text for a serious work?\r | |
4771 | This depends on what kind of serious work one is talking about.\r | |
4772 | \r | |
4773 | The projects demonstrated the previous day all involved highly complex\r | |
4774 | information and fairly complex manipulation of the textual material.\r | |
4775 | In order to use that complicated information, one has to calculate it\r | |
4776 | slowly or manually and store the result. It needs to be stored, therefore,\r | |
4777 | as part of one's representation of the text. Thus, one needs to store the\r | |
4778 | structure in the text. To deal with complicated representations of text,\r | |
4779 | one needs somehow to control the complexity of the representation of a text;\r | |
4780 | that means one needs a way of finding out whether a document and an\r | |
4781 | electronic representation of a document is legal or not; and that\r | |
4782 | means one needs a grammar of documents.\r | |
4783 | \r | |
4784 | SPERBERG-McQUEEN discussed the variety of forms of formal grammars,\r | |
4785 | implicit and explicit, as applied to text, and their capabilities. He\r | |
4786 | argued that these grammars correspond to different models of text that\r | |
4787 | different developers have. For example, one implicit model of the text\r | |
4788 | is that there is no internal structure, but just one thing after another,\r | |
4789 | a few characters and then perhaps a start-title command, and then a few\r | |
4790 | more characters and an end-title command. SPERBERG-McQUEEN also\r | |
4791 | distinguished several kinds of text that have a sort of hierarchical\r | |
4792 | structure that is not very well defined, which, typically, corresponds\r | |
4793 | to grammars that are not very well defined, as well as hierarchies that\r | |
4794 | are very well defined (e.g., the Thesaurus Linguae Graecae) and extremely\r | |
4795 | complicated things such as SGML, which handle strictly hierarchical data\r | |
4796 | very nicely.\r | |
4797 | \r | |
4798 | SPERBERG-McQUEEN conceded that one other model not illustrated on his two\r | |
4799 | displays was the model of text as a bit-mapped image, an image of a page,\r | |
4800 | and confessed to having been converted to a limited extent by the\r | |
4801 | Workshop to the view that electronic images constitute a promising,\r | |
4802 | probably superior alternative to microfilming. But he was not convinced\r | |
4803 | that electronic images represent a serious attempt to represent text in\r | |
4804 | electronic form. Many of their problems stem from the fact that they are\r | |
4805 | not direct attempts to represent the text but attempts to represent the\r | |
4806 | page, thus making them representations of representations.\r | |
4807 | \r | |
4808 | In this situation of increasingly complicated textual information and the\r | |
4809 | need to control that complexity in a useful way (which begs the question\r | |
4810 | of the need for good textual grammars), one has the introduction of SGML. \r | |
4811 | With SGML, one can develop specific document-type declarations\r | |
4812 | for specific text types or, as with the TEI, attempts to generate\r | |
4813 | general document-type declarations that can handle all sorts of text.\r | |
4814 | The TEI is an attempt to develop formats for text representation that\r | |
4815 | will ensure the kind of reusability and longevity of data discussed earlier.\r | |
4816 | It offers a way to stay alive in the state of permanent technological\r | |
4817 | revolution.\r | |
4818 | \r | |
4819 | It has been a continuing challenge in the TEI to create document grammars\r | |
4820 | that do some work in controlling the complexity of the textual object but\r | |
4821 | also allowing one to represent the real text that one will find. \r | |
4822 | Fundamental to the notion of the TEI is that TEI conformance allows one\r | |
4823 | the ability to extend or modify the TEI tag set so that it fits the text\r | |
4824 | that one is attempting to represent.\r | |
4825 | \r | |
4826 | SPERBERG-McQUEEN next outlined the administrative background of the TEI. \r | |
4827 | The TEI is an international project to develop and disseminate guidelines\r | |
4828 | for the encoding and interchange of machine-readable text. It is\r | |
4829 | sponsored by the Association for Computers in the Humanities, the\r | |
4830 | Association for Computational Linguistics, and the Association for\r | |
4831 | Literary and Linguistic Computing. Representatives of numerous other\r | |
4832 | professional societies sit on its advisory board. The TEI has a number\r | |
4833 | of affiliated projects that have provided assistance by testing drafts of\r | |
4834 | the guidelines.\r | |
4835 | \r | |
4836 | Among the design goals for the TEI tag set, the scheme first of all must\r | |
4837 | meet the needs of research, because the TEI came out of the research\r | |
4838 | community, which did not feel adequately served by existing tag sets. \r | |
4839 | The tag set must be extensive as well as compatible with existing and\r | |
4840 | emerging standards. In 1990, version 1.0 of the Guidelines was released\r | |
4841 | (SPERBERG-McQUEEN illustrated their contents).\r | |
4842 | \r | |
4843 | SPERBERG-McQUEEN noted that one problem besetting electronic text has\r | |
4844 | been the lack of adequate internal or external documentation for many\r | |
4845 | existing electronic texts. The TEI guidelines as currently formulated\r | |
4846 | contain few fixed requirements, but one of them is this: There must\r | |
4847 | always be a document header, an in-file SGML tag that provides\r | |
4848 | 1) a bibliographic description of the electronic object one is talking\r | |
4849 | about (that is, who included it, when, what for, and under which title);\r | |
4850 | and 2) the copy text from which it was derived, if any. If there was\r | |
4851 | no copy text or if the copy text is unknown, then one states as much.\r | |
4852 | Version 2.0 of the Guidelines was scheduled to be completed in fall 1992\r | |
4853 | and a revised third version is to be presented to the TEI advisory board\r | |
4854 | for its endorsement this coming winter. The TEI itself exists to provide\r | |
4855 | a markup language, not a marked-up text.\r | |
4856 | \r | |
4857 | Among the challenges the TEI has attempted to face is the need for a\r | |
4858 | markup language that will work for existing projects, that is, handle the\r | |
4859 | level of markup that people are using now to tag only chapter, section,\r | |
4860 | and paragraph divisions and not much else. At the same time, such a\r | |
4861 | language also will be able to scale up gracefully to handle the highly\r | |
4862 | detailed markup which many people foresee as the future destination of\r | |
4863 | much electronic text, and which is not the future destination but the\r | |
4864 | present home of numerous electronic texts in specialized areas.\r | |
4865 | \r | |
4866 | SPERBERG-McQUEEN dismissed the lowest-common-denominator approach as\r | |
4867 | unable to support the kind of applications that draw people who have\r | |
4868 | never been in the public library regularly before, and make them come\r | |
4869 | back. He advocated more interesting text and more intelligent text. \r | |
4870 | Asserting that it is not beyond economic feasibility to have good texts,\r | |
4871 | SPERBERG-McQUEEN noted that the TEI Guidelines listing 200-odd tags\r | |
4872 | contains tags that one is expected to enter every time the relevant\r | |
4873 | textual feature occurs. It contains all the tags that people need now,\r | |
4874 | and it is not expected that everyone will tag things in the same way.\r | |
4875 | \r | |
4876 | The question of how people will tag the text is in large part a function\r | |
4877 | of their reaction to what SPERBERG-McQUEEN termed the issue of\r | |
4878 | reproducibility. What one needs to be able to reproduce are the things\r | |
4879 | one wants to work with. Perhaps a more useful concept than that of\r | |
4880 | reproducibility or recoverability is that of processability, that is,\r | |
4881 | what can one get from an electronic text without reading it again\r | |
4882 | in the original. He illustrated this contention with a page from\r | |
4883 | Jan Comenius's bilingual Introduction to Latin.\r | |
4884 | \r | |
4885 | SPERBERG-McQUEEN returned at length to the issue of images as simulacra\r | |
4886 | for the text, in order to reiterate his belief that in the long run more\r | |
4887 | than images of pages of particular editions of the text are needed,\r | |
4888 | because just as second-generation photocopies and second-generation\r | |
4889 | microfilm degenerate, so second-generation representations tend to\r | |
4890 | degenerate, and one tends to overstress some relatively trivial aspects\r | |
4891 | of the text such as its layout on the page, which is not always\r | |
4892 | significant, despite what the text critics might say, and slight other\r | |
4893 | pieces of information such as the very important lexical ties between the\r | |
4894 | English and Latin versions of Comenius's bilingual text, for example. \r | |
4895 | Moreover, in many crucial respects it is easy to fool oneself concerning\r | |
4896 | what a scanned image of the text will accomplish. For example, in order\r | |
4897 | to study the transmission of texts, information concerning the text\r | |
4898 | carrier is necessary, which scanned images simply do not always handle. \r | |
4899 | Further, even the high-quality materials being produced at Cornell use\r | |
4900 | much of the information that one would need if studying those books as\r | |
4901 | physical objects. It is a choice that has been made. It is an arguably\r | |
4902 | justifiable choice, but one does not know what color those pen strokes in\r | |
4903 | the margin are or whether there was a stain on the page, because it has\r | |
4904 | been filtered out. One does not know whether there were rips in the page\r | |
4905 | because they do not show up, and on a couple of the marginal marks one\r | |
4906 | loses half of the mark because the pen is very light and the scanner\r | |
4907 | failed to pick it up, and so what is clearly a checkmark in the margin of\r | |
4908 | the original becomes a little scoop in the margin of the facsimile. \r | |
4909 | Standard problems for facsimile editions, not new to electronics, but\r | |
4910 | also true of light-lens photography, and are remarked here because it is\r | |
4911 | important that we not fool ourselves that even if we produce a very nice\r | |
4912 | image of this page with good contrast, we are not replacing the\r | |
4913 | manuscript any more than microfilm has replaced the manuscript.\r | |
4914 | \r | |
4915 | The TEI comes from the research community, where its first allegiance\r | |
4916 | lies, but it is not just an academic exercise. It has relevance far\r | |
4917 | beyond those who spend all of their time studying text, because one's\r | |
4918 | model of text determines what one's software can do with a text. Good\r | |
4919 | models lead to good software. Bad models lead to bad software. That has\r | |
4920 | economic consequences, and it is these economic consequences that have\r | |
4921 | led the European Community to help support the TEI, and that will lead,\r | |
4922 | SPERBERG-McQUEEN hoped, some software vendors to realize that if they\r | |
4923 | provide software with a better model of the text they can make a killing.\r | |
4924 | \r | |
4925 | ******\r | |
4926 | \r | |
4927 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
4928 | DISCUSSION * Implications of different DTDs and tag sets * ODA versus SGML *\r | |
4929 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
4930 | \r | |
4931 | During the discussion that followed, several additional points were made. \r | |
4932 | Neither AAP (i.e., Association of American Publishers) nor CALS (i.e.,\r | |
4933 | Computer-aided Acquisition and Logistics Support) has a document-type\r | |
4934 | definition for ancient Greek drama, although the TEI will be able to\r | |
4935 | handle that. Given this state of affairs and assuming that the\r | |
4936 | technical-journal producers and the commercial vendors decide to use the\r | |
4937 | other two types, then an institution like the Library of Congress, which\r | |
4938 | might receive all of their publications, would have to be able to handle\r | |
4939 | three different types of document definitions and tag sets and be able to\r | |
4940 | distinguish among them.\r | |
4941 | \r | |
4942 | Office Document Architecture (ODA) has some advantages that flow from its\r | |
4943 | tight focus on office documents and clear directions for implementation. \r | |
4944 | Much of the ODA standard is easier to read and clearer at first reading\r | |
4945 | than the SGML standard, which is extremely general. What that means is\r | |
4946 | that if one wants to use graphics in TIFF and ODA, one is stuck, because\r | |
4947 | ODA defines graphics formats while TIFF does not, whereas SGML says the\r | |
4948 | world is not waiting for this work group to create another graphics format.\r | |
4949 | What is needed is an ability to use whatever graphics format one wants.\r | |
4950 | \r | |
4951 | The TEI provides a socket that allows one to connect the SGML document to\r | |
4952 | the graphics. The notation that the graphics are in is clearly a choice\r | |
4953 | that one needs to make based on her or his environment, and that is one\r | |
4954 | advantage. SGML is less megalomaniacal in attempting to define formats\r | |
4955 | for all kinds of information, though more megalomaniacal in attempting to\r | |
4956 | cover all sorts of documents. The other advantage is that the model of\r | |
4957 | text represented by SGML is simply an order of magnitude richer and more\r | |
4958 | flexible than the model of text offered by ODA. Both offer hierarchical\r | |
4959 | structures, but SGML recognizes that the hierarchical model of the text\r | |
4960 | that one is looking at may not have been in the minds of the designers,\r | |
4961 | whereas ODA does not.\r | |
4962 | \r | |
4963 | ODA is not really aiming for the kind of document that the TEI wants to\r | |
4964 | encompass. The TEI can handle the kind of material ODA has, as well as a\r | |
4965 | significantly broader range of material. ODA seems to be very much\r | |
4966 | focused on office documents, which is what it started out being called--\r | |
4967 | office document architecture.\r | |
4968 | \r | |
4969 | ******\r | |
4970 | \r | |
4971 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
4972 | CALALUCA * Text-encoding from a publisher's perspective *\r | |
4973 | Responsibilities of a publisher * Reproduction of Migne's Latin series\r | |
4974 | whole and complete with SGML tags based on perceived need and expected\r | |
4975 | use * Particular decisions arising from the general decision to produce\r | |
4976 | and publish PLD *\r | |
4977 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
4978 | \r | |
4979 | The final speaker in this session, Eric CALALUCA, vice president,\r | |
4980 | Chadwyck-Healey, Inc., spoke from the perspective of a publisher re\r | |
4981 | text-encoding, rather than as one qualified to discuss methods of\r | |
4982 | encoding data, and observed that the presenters sitting in the room,\r | |
4983 | whether they had chosen to or not, were acting as publishers: making\r | |
4984 | choices, gathering data, gathering information, and making assessments. \r | |
4985 | CALALUCA offered the hard-won conviction that in publishing very large\r | |
4986 | text files (such as PLD), one cannot avoid making personal judgments of\r | |
4987 | appropriateness and structure.\r | |
4988 | \r | |
4989 | In CALALUCA's view, encoding decisions stem from prior judgments. Two\r | |
4990 | notions have become axioms for him in the consideration of future sources\r | |
4991 | for electronic publication: 1) electronic text publishing is as personal\r | |
4992 | as any other kind of publishing, and questions of if and how to encode\r | |
4993 | the data are simply a consequence of that prior decision; 2) all\r | |
4994 | personal decisions are open to criticism, which is unavoidable.\r | |
4995 | \r | |
4996 | CALALUCA rehearsed his role as a publisher or, better, as an intermediary\r | |
4997 | between what is viewed as a sound idea and the people who would make use\r | |
4998 | of it. Finding the specialist to advise in this process is the core of\r | |
4999 | that function. The publisher must monitor and hug the fine line between\r | |
5000 | giving users what they want and suggesting what they might need. One\r | |
5001 | responsibility of a publisher is to represent the desires of scholars and\r | |
5002 | research librarians as opposed to bullheadedly forcing them into areas\r | |
5003 | they would not choose to enter.\r | |
5004 | \r | |
5005 | CALALUCA likened the questions being raised today about data structure\r | |
5006 | and standards to the decisions faced by the Abbe Migne himself during\r | |
5007 | production of the Patrologia series in the mid-nineteenth century. \r | |
5008 | Chadwyck-Healey's decision to reproduce Migne's Latin series whole and\r | |
5009 | complete with SGML tags was also based upon a perceived need and an\r | |
5010 | expected use. In the same way that Migne's work came to be far more than\r | |
5011 | a simple handbook for clerics, PLD is already far more than a database\r | |
5012 | for theologians. It is a bedrock source for the study of Western\r | |
5013 | civilization, CALALUCA asserted.\r | |
5014 | \r | |
5015 | In regard to the decision to produce and publish PLD, the editorial board\r | |
5016 | offered direct judgments on the question of appropriateness of these\r | |
5017 | texts for conversion, their encoding and their distribution, and\r | |
5018 | concluded that the best possible project was one that avoided overt\r | |
5019 | intrusions or exclusions in so important a resource. Thus, the general\r | |
5020 | decision to transmit the original collection as clearly as possible with\r | |
5021 | the widest possible avenues for use led to other decisions: 1) To encode\r | |
5022 | the data or not, SGML or not, TEI or not. Again, the expected user\r | |
5023 | community asserted the need for normative tagging structures of important\r | |
5024 | humanities texts, and the TEI seemed the most appropriate structure for\r | |
5025 | that purpose. Research librarians, who are trained to view the larger\r | |
5026 | impact of electronic text sources on 80 or 90 or 100 doctoral\r | |
5027 | disciplines, loudly approved the decision to include tagging. They see\r | |
5028 | what is coming better than the specialist who is completely focused on\r | |
5029 | one edition of Ambrose's De Anima, and they also understand that the\r | |
5030 | potential uses exceed present expectations. 2) What will be tagged and\r | |
5031 | what will not. Once again, the board realized that one must tag the\r | |
5032 | obvious. But in no way should one attempt to identify through encoding\r | |
5033 | schemes every single discrete area of a text that might someday be\r | |
5034 | searched. That was another decision. Searching by a column number, an\r | |
5035 | author, a word, a volume, permitting combination searches, and tagging\r | |
5036 | notations seemed logical choices as core elements. 3) How does one make\r | |
5037 | the data available? Tieing it to a CD-ROM edition creates limitations,\r | |
5038 | but a magnetic tape file that is very large, is accompanied by the\r | |
5039 | encoding specifications, and that allows one to make local modifications\r | |
5040 | also allows one to incorporate any changes one may desire within the\r | |
5041 | bounds of private research, though exporting tag files from a CD-ROM\r | |
5042 | could serve just as well. Since no one on the board could possibly\r | |
5043 | anticipate each and every way in which a scholar might choose to mine\r | |
5044 | this data bank, it was decided to satisfy the basics and make some\r | |
5045 | provisions for what might come. 4) Not to encode the database would rob\r | |
5046 | it of the interchangeability and portability these important texts should\r | |
5047 | accommodate. For CALALUCA, the extensive options presented by full-text\r | |
5048 | searching require care in text selection and strongly support encoding of\r | |
5049 | data to facilitate the widest possible search strategies. Better\r | |
5050 | software can always be created, but summoning the resources, the people,\r | |
5051 | and the energy to reconvert the text is another matter.\r | |
5052 | \r | |
5053 | PLD is being encoded, captured, and distributed, because to\r | |
5054 | Chadwyck-Healey and the board it offers the widest possible array of\r | |
5055 | future research applications that can be seen today. CALALUCA concluded\r | |
5056 | by urging the encoding of all important text sources in whatever way\r | |
5057 | seems most appropriate and durable at the time, without blanching at the\r | |
5058 | thought that one's work may require emendation in the future. (Thus,\r | |
5059 | Chadwyck-Healey produced a very large humanities text database before the\r | |
5060 | final release of the TEI Guidelines.)\r | |
5061 | \r | |
5062 | ******\r | |
5063 | \r | |
5064 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
5065 | DISCUSSION * Creating texts with markup advocated * Trends in encoding *\r | |
5066 | The TEI and the issue of interchangeability of standards * A\r | |
5067 | misconception concerning the TEI * Implications for an institution like\r | |
5068 | LC in the event that a multiplicity of DTDs develops * Producing images\r | |
5069 | as a first step towards possible conversion to full text through\r | |
5070 | character recognition * The AAP tag sets as a common starting point and\r | |
5071 | the need for caution *\r | |
5072 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
5073 | \r | |
5074 | HOCKEY prefaced the discussion that followed with several comments in\r | |
5075 | favor of creating texts with markup and on trends in encoding. In the\r | |
5076 | future, when many more texts are available for on-line searching, real\r | |
5077 | problems in finding what is wanted will develop, if one is faced with\r | |
5078 | millions of words of data. It therefore becomes important to consider\r | |
5079 | putting markup in texts to help searchers home in on the actual things\r | |
5080 | they wish to retrieve. Various approaches to refining retrieval methods\r | |
5081 | toward this end include building on a computer version of a dictionary\r | |
5082 | and letting the computer look up words in it to obtain more information\r | |
5083 | about the semantic structure or semantic field of a word, its grammatical\r | |
5084 | structure, and syntactic structure.\r | |
5085 | \r | |
5086 | HOCKEY commented on the present keen interest in the encoding world\r | |
5087 | in creating: 1) machine-readable versions of dictionaries that can be\r | |
5088 | initially tagged in SGML, which gives a structure to the dictionary entry;\r | |
5089 | these entries can then be converted into a more rigid or otherwise\r | |
5090 | different database structure inside the computer, which can be treated as\r | |
5091 | a dynamic tool for searching mechanisms; 2) large bodies of text to study\r | |
5092 | the language. In order to incorporate more sophisticated mechanisms,\r | |
5093 | more about how words behave needs to be known, which can be learned in\r | |
5094 | part from information in dictionaries. However, the last ten years have\r | |
5095 | seen much interest in studying the structure of printed dictionaries\r | |
5096 | converted into computer-readable form. The information one derives about\r | |
5097 | many words from those is only partial, one or two definitions of the\r | |
5098 | common or the usual meaning of a word, and then numerous definitions of\r | |
5099 | unusual usages. If the computer is using a dictionary to help retrieve\r | |
5100 | words in a text, it needs much more information about the common usages,\r | |
5101 | because those are the ones that occur over and over again. Hence the\r | |
5102 | current interest in developing large bodies of text in computer-readable\r | |
5103 | form in order to study the language. Several projects are engaged in\r | |
5104 | compiling, for example, 100 million words. HOCKEY described one with\r | |
5105 | which she was associated briefly at Oxford University involving\r | |
5106 | compilation of 100 million words of British English: about 10 percent of\r | |
5107 | that will contain detailed linguistic tagging encoded in SGML; it will\r | |
5108 | have word class taggings, with words identified as nouns, verbs,\r | |
5109 | adjectives, or other parts of speech. This tagging can then be used by\r | |
5110 | programs which will begin to learn a bit more about the structure of the\r | |
5111 | language, and then, can go to tag more text.\r | |
5112 | \r | |
5113 | HOCKEY said that the more that is tagged accurately, the more one can\r | |
5114 | refine the tagging process and thus the bigger body of text one can build\r | |
5115 | up with linguistic tagging incorporated into it. Hence, the more tagging\r | |
5116 | or annotation there is in the text, the more one may begin to learn about\r | |
5117 | language and the more it will help accomplish more intelligent OCR. She\r | |
5118 | recommended the development of software tools that will help one begin to\r | |
5119 | understand more about a text, which can then be applied to scanning\r | |
5120 | images of that text in that format and to using more intelligence to help\r | |
5121 | one interpret or understand the text.\r | |
5122 | \r | |
5123 | HOCKEY posited the need to think about common methods of text-encoding\r | |
5124 | for a long time to come, because building these large bodies of text is\r | |
5125 | extremely expensive and will only be done once.\r | |
5126 | \r | |
5127 | In the more general discussion on approaches to encoding that followed,\r | |
5128 | these points were made:\r | |
5129 | \r | |
5130 | BESSER identified the underlying problem with standards that all have to\r | |
5131 | struggle with in adopting a standard, namely, the tension between a very\r | |
5132 | highly defined standard that is very interchangeable but does not work\r | |
5133 | for everyone because something is lacking, and a standard that is less\r | |
5134 | defined, more open, more adaptable, but less interchangeable. Contending\r | |
5135 | that the way in which people use SGML is not sufficiently defined, BESSER\r | |
5136 | wondered 1) if people resist the TEI because they think it is too defined\r | |
5137 | in certain things they do not fit into, and 2) how progress with\r | |
5138 | interchangeability can be made without frightening people away.\r | |
5139 | \r | |
5140 | SPERBERG-McQUEEN replied that the published drafts of the TEI had met\r | |
5141 | with surprisingly little objection on the grounds that they do not allow\r | |
5142 | one to handle X or Y or Z. Particular concerns of the affiliated\r | |
5143 | projects have led, in practice, to discussions of how extensions are to\r | |
5144 | be made; the primary concern of any project has to be how it can be\r | |
5145 | represented locally, thus making interchange secondary. The TEI has\r | |
5146 | received much criticism based on the notion that everything in it is\r | |
5147 | required or even recommended, which, as it happens, is a misconception\r | |
5148 | from the beginning, because none of it is required and very little is\r | |
5149 | actually actively recommended for all cases, except that one document\r | |
5150 | one's source.\r | |
5151 | \r | |
5152 | SPERBERG-McQUEEN agreed with BESSER about this trade-off: all the\r | |
5153 | projects in a set of twenty TEI-conformant projects will not necessarily\r | |
5154 | tag the material in the same way. One result of the TEI will be that the\r | |
5155 | easiest problems will be solved--those dealing with the external form of\r | |
5156 | the information; but the problem that is hardest in interchange is that\r | |
5157 | one is not encoding what another wants, and vice versa. Thus, after\r | |
5158 | the adoption of a common notation, the differences in the underlying\r | |
5159 | conceptions of what is interesting about texts become more visible.\r | |
5160 | The success of a standard like the TEI will lie in the ability of\r | |
5161 | the recipient of interchanged texts to use some of what it contains\r | |
5162 | and to add the information that was not encoded that one wants, in a\r | |
5163 | layered way, so that texts can be gradually enriched and one does not\r | |
5164 | have to put in everything all at once. Hence, having a well-behaved\r | |
5165 | markup scheme is important.\r | |
5166 | \r | |
5167 | STEVENS followed up on the paradoxical analogy that BESSER alluded to in\r | |
5168 | the example of the MARC records, namely, the formats that are the same\r | |
5169 | except that they are different. STEVENS drew a parallel between\r | |
5170 | document-type definitions and MARC records for books and serials and maps,\r | |
5171 | where one has a tagging structure and there is a text-interchange. \r | |
5172 | STEVENS opined that the producers of the information will set the terms\r | |
5173 | for the standard (i.e., develop document-type definitions for the users\r | |
5174 | of their products), creating a situation that will be problematical for\r | |
5175 | an institution like the Library of Congress, which will have to deal with\r | |
5176 | the DTDs in the event that a multiplicity of them develops. Thus,\r | |
5177 | numerous people are seeking a standard but cannot find the tag set that\r | |
5178 | will be acceptable to them and their clients. SPERBERG-McQUEEN agreed\r | |
5179 | with this view, and said that the situation was in a way worse: attempting\r | |
5180 | to unify arbitrary DTDs resembled attempting to unify a MARC record with a\r | |
5181 | bibliographic record done according to the Prussian instructions. \r | |
5182 | According to STEVENS, this situation occurred very early in the process.\r | |
5183 | \r | |
5184 | WATERS recalled from early discussions on Project Open Book the concern\r | |
5185 | of many people that merely by producing images, POB was not really\r | |
5186 | enhancing intellectual access to the material. Nevertheless, not wishing\r | |
5187 | to overemphasize the opposition between imaging and full text, WATERS\r | |
5188 | stated that POB views getting the images as a first step toward possibly\r | |
5189 | converting to full text through character recognition, if the technology\r | |
5190 | is appropriate. WATERS also emphasized that encoding is involved even\r | |
5191 | with a set of images.\r | |
5192 | \r | |
5193 | SPERBERG-McQUEEN agreed with WATERS that one can create an SGML document\r | |
5194 | consisting wholly of images. At first sight, organizing graphic images\r | |
5195 | with an SGML document may not seem to offer great advantages, but the\r | |
5196 | advantages of the scheme WATERS described would be precisely that\r | |
5197 | ability to move into something that is more of a multimedia document:\r | |
5198 | a combination of transcribed text and page images. WEIBEL concurred in\r | |
5199 | this judgment, offering evidence from Project ADAPT, where a page is\r | |
5200 | divided into text elements and graphic elements, and in fact the text\r | |
5201 | elements are organized by columns and lines. These lines may be used as\r | |
5202 | the basis for distributing documents in a network environment. As one\r | |
5203 | develops software intelligent enough to recognize what those elements\r | |
5204 | are, it makes sense to apply SGML to an image initially, that may, in\r | |
5205 | fact, ultimately become more and more text, either through OCR or edited\r | |
5206 | OCR or even just through keying. For WATERS, the labor of composing the\r | |
5207 | document and saying this set of documents or this set of images belongs\r | |
5208 | to this document constitutes a significant investment.\r | |
5209 | \r | |
5210 | WEIBEL also made the point that the AAP tag sets, while not excessively\r | |
5211 | prescriptive, offer a common starting point; they do not define the\r | |
5212 | structure of the documents, though. They have some recommendations about\r | |
5213 | DTDs one could use as examples, but they do just suggest tag sets. For\r | |
5214 | example, the CORE project attempts to use the AAP markup as much as\r | |
5215 | possible, but there are clearly areas where structure must be added. \r | |
5216 | That in no way contradicts the use of AAP tag sets.\r | |
5217 | \r | |
5218 | SPERBERG-McQUEEN noted that the TEI prepared a long working paper early\r | |
5219 | on about the AAP tag set and what it lacked that the TEI thought it\r | |
5220 | needed, and a fairly long critique of the naming conventions, which has\r | |
5221 | led to a very different style of naming in the TEI. He stressed the\r | |
5222 | importance of the opposition between prescriptive markup, the kind that a\r | |
5223 | publisher or anybody can do when producing documents de novo, and\r | |
5224 | descriptive markup, in which one has to take what the text carrier\r | |
5225 | provides. In these particular tag sets it is easy to overemphasize this\r | |
5226 | opposition, because the AAP tag set is extremely flexible. Even if one\r | |
5227 | just used the DTDs, they allow almost anything to appear almost anywhere.\r | |
5228 | \r | |
5229 | ******\r | |
5230 | \r | |
5231 | SESSION VI. COPYRIGHT ISSUES\r | |
5232 | \r | |
5233 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
5234 | PETERS * Several cautions concerning copyright in an electronic\r | |
5235 | environment * Review of copyright law in the United States * The notion\r | |
5236 | of the public good and the desirability of incentives to promote it *\r | |
5237 | What copyright protects * Works not protected by copyright * The rights\r | |
5238 | of copyright holders * Publishers' concerns in today's electronic\r | |
5239 | environment * Compulsory licenses * The price of copyright in a digital\r | |
5240 | medium and the need for cooperation * Additional clarifications * Rough\r | |
5241 | justice oftentimes the outcome in numerous copyright matters * Copyright\r | |
5242 | in an electronic society * Copyright law always only sets up the\r | |
5243 | boundaries; anything can be changed by contract *\r | |
5244 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
5245 | \r | |
5246 | Marybeth PETERS, policy planning adviser to the Register of Copyrights,\r | |
5247 | Library of Congress, made several general comments and then opened the\r | |
5248 | floor to discussion of subjects of interest to the audience.\r | |
5249 | \r | |
5250 | Having attended several sessions in an effort to gain a sense of what\r | |
5251 | people did and where copyright would affect their lives, PETERS expressed\r | |
5252 | the following cautions:\r | |
5253 | \r | |
5254 | * If one takes and converts materials and puts them in new forms,\r | |
5255 | then, from a copyright point of view, one is creating something and\r | |
5256 | will receive some rights.\r | |
5257 | \r | |
5258 | * However, if what one is converting already exists, a question\r | |
5259 | immediately arises about the status of the materials in question.\r | |
5260 | \r | |
5261 | * Putting something in the public domain in the United States offers\r | |
5262 | some freedom from anxiety, but distributing it throughout the world\r | |
5263 | on a network is another matter, even if one has put it in the public\r | |
5264 | domain in the United States. Re foreign laws, very frequently a\r | |
5265 | work can be in the public domain in the United States but protected\r | |
5266 | in other countries. Thus, one must consider all of the places a\r | |
5267 | work may reach, lest one unwittingly become liable to being faced\r | |
5268 | with a suit for copyright infringement, or at least a letter\r | |
5269 | demanding discussion of what one is doing.\r | |
5270 | \r | |
5271 | PETERS reviewed copyright law in the United States. The U.S.\r | |
5272 | Constitution effectively states that Congress has the power to enact\r | |
5273 | copyright laws for two purposes: 1) to encourage the creation and\r | |
5274 | dissemination of intellectual works for the good of society as a whole;\r | |
5275 | and, significantly, 2) to give creators and those who package and\r | |
5276 | disseminate materials the economic rewards that are due them.\r | |
5277 | \r | |
5278 | Congress strives to strike a balance, which at times can become an\r | |
5279 | emotional issue. The United States has never accepted the notion of the\r | |
5280 | natural right of an author so much as it has accepted the notion of the\r | |
5281 | public good and the desirability of incentives to promote it. This state\r | |
5282 | of affairs, however, has created strains on the international level and\r | |
5283 | is the reason for several of the differences in the laws that we have. \r | |
5284 | Today the United States protects almost every kind of work that can be\r | |
5285 | called an expression of an author. The standard for gaining copyright\r | |
5286 | protection is simply originality. This is a low standard and means that\r | |
5287 | a work is not copied from something else, as well as shows a certain\r | |
5288 | minimal amount of authorship. One can also acquire copyright protection\r | |
5289 | for making a new version of preexisting material, provided it manifests\r | |
5290 | some spark of creativity.\r | |
5291 | \r | |
5292 | However, copyright does not protect ideas, methods, systems--only the way\r | |
5293 | that one expresses those things. Nor does copyright protect anything\r | |
5294 | that is mechanical, anything that does not involve choice, or criteria\r | |
5295 | concerning whether or not one should do a thing. For example, the\r | |
5296 | results of a process called declicking, in which one mechanically removes\r | |
5297 | impure sounds from old recordings, are not copyrightable. On the other\r | |
5298 | hand, the choice to record a song digitally and to increase the sound of\r | |
5299 | violins or to bring up the tympani constitutes the results of conversion\r | |
5300 | that are copyrightable. Moreover, if a work is protected by copyright in\r | |
5301 | the United States, one generally needs the permission of the copyright\r | |
5302 | owner to convert it. Normally, who will own the new--that is, converted-\r | |
5303 | -material is a matter of contract. In the absence of a contract, the\r | |
5304 | person who creates the new material is the author and owner. But people\r | |
5305 | do not generally think about the copyright implications until after the\r | |
5306 | fact. PETERS stressed the need when dealing with copyrighted works to\r | |
5307 | think about copyright in advance. One's bargaining power is much greater\r | |
5308 | up front than it is down the road.\r | |
5309 | \r | |
5310 | PETERS next discussed works not protected by copyright, for example, any\r | |
5311 | work done by a federal employee as part of his or her official duties is\r | |
5312 | in the public domain in the United States. The issue is not wholly free\r | |
5313 | of doubt concerning whether or not the work is in the public domain\r | |
5314 | outside the United States. Other materials in the public domain include: \r | |
5315 | any works published more than seventy-five years ago, and any work\r | |
5316 | published in the United States more than twenty-eight years ago, whose\r | |
5317 | copyright was not renewed. In talking about the new technology and\r | |
5318 | putting material in a digital form to send all over the world, PETERS\r | |
5319 | cautioned, one must keep in mind that while the rights may not be an\r | |
5320 | issue in the United States, they may be in different parts of the world,\r | |
5321 | where most countries previously employed a copyright term of the life of\r | |
5322 | the author plus fifty years.\r | |
5323 | \r | |
5324 | PETERS next reviewed the economics of copyright holding. Simply,\r | |
5325 | economic rights are the rights to control the reproduction of a work in\r | |
5326 | any form. They belong to the author, or in the case of a work made for\r | |
5327 | hire, the employer. The second right, which is critical to conversion,\r | |
5328 | is the right to change a work. The right to make new versions is perhaps\r | |
5329 | one of the most significant rights of authors, particularly in an\r | |
5330 | electronic world. The third right is the right to publish the work and\r | |
5331 | the right to disseminate it, something that everyone who deals in an\r | |
5332 | electronic medium needs to know. The basic rule is if a copy is sold,\r | |
5333 | all rights of distribution are extinguished with the sale of that copy. \r | |
5334 | The key is that it must be sold. A number of companies overcome this\r | |
5335 | obstacle by leasing or renting their product. These companies argue that\r | |
5336 | if the material is rented or leased and not sold, they control the uses\r | |
5337 | of a work. The fourth right, and one very important in a digital world,\r | |
5338 | is a right of public performance, which means the right to show the work\r | |
5339 | sequentially. For example, copyright owners control the showing of a\r | |
5340 | CD-ROM product in a public place such as a public library. The reverse\r | |
5341 | side of public performance is something called the right of public\r | |
5342 | display. Moral rights also exist, which at the federal level apply only\r | |
5343 | to very limited visual works of art, but in theory may apply under\r | |
5344 | contract and other principles. Moral rights may include the right of an\r | |
5345 | author to have his or her name on a work, the right of attribution, and\r | |
5346 | the right to object to distortion or mutilation--the right of integrity.\r | |
5347 | \r | |
5348 | The way copyright law is worded gives much latitude to activities such as\r | |
5349 | preservation; to use of material for scholarly and research purposes when\r | |
5350 | the user does not make multiple copies; and to the generation of\r | |
5351 | facsimile copies of unpublished works by libraries for themselves and\r | |
5352 | other libraries. But the law does not allow anyone to become the\r | |
5353 | distributor of the product for the entire world. In today's electronic\r | |
5354 | environment, publishers are extremely concerned that the entire world is\r | |
5355 | networked and can obtain the information desired from a single copy in a\r | |
5356 | single library. Hence, if there is to be only one sale, which publishers\r | |
5357 | may choose to live with, they will obtain their money in other ways, for\r | |
5358 | example, from access and use. Hence, the development of site licenses\r | |
5359 | and other kinds of agreements to cover what publishers believe they\r | |
5360 | should be compensated for. Any solution that the United States takes\r | |
5361 | today has to consider the international arena.\r | |
5362 | \r | |
5363 | Noting that the United States is a member of the Berne Convention and\r | |
5364 | subscribes to its provisions, PETERS described the permissions process. \r | |
5365 | She also defined compulsory licenses. A compulsory license, of which the\r | |
5366 | United States has had a few, builds into the law the right to use a work\r | |
5367 | subject to certain terms and conditions. In the international arena,\r | |
5368 | however, the ability to use compulsory licenses is extremely limited. \r | |
5369 | Thus, clearinghouses and other collectives comprise one option that has\r | |
5370 | succeeded in providing for use of a work. Often overlooked when one\r | |
5371 | begins to use copyrighted material and put products together is how\r | |
5372 | expensive the permissions process and managing it is. According to\r | |
5373 | PETERS, the price of copyright in a digital medium, whatever solution is\r | |
5374 | worked out, will include managing and assembling the database. She\r | |
5375 | strongly recommended that publishers and librarians or people with\r | |
5376 | various backgrounds cooperate to work out administratively feasible\r | |
5377 | systems, in order to produce better results.\r | |
5378 | \r | |
5379 | In the lengthy question-and-answer period that followed PETERS's\r | |
5380 | presentation, the following points emerged:\r | |
5381 | \r | |
5382 | * The Copyright Office maintains that anything mechanical and\r | |
5383 | totally exhaustive probably is not protected. In the event that\r | |
5384 | what an individual did in developing potentially copyrightable\r | |
5385 | material is not understood, the Copyright Office will ask about the\r | |
5386 | creative choices the applicant chose to make or not to make. As a\r | |
5387 | practical matter, if one believes she or he has made enough of those\r | |
5388 | choices, that person has a right to assert a copyright and someone\r | |
5389 | else must assert that the work is not copyrightable. The more\r | |
5390 | mechanical, the more automatic, a thing is, the less likely it is to\r | |
5391 | be copyrightable.\r | |
5392 | \r | |
5393 | * Nearly all photographs are deemed to be copyrightable, but no one\r | |
5394 | worries about them much, because everyone is free to take the same\r | |
5395 | image. Thus, a photographic copyright represents what is called a\r | |
5396 | "thin" copyright. The photograph itself must be duplicated, in\r | |
5397 | order for copyright to be violated.\r | |
5398 | \r | |
5399 | * The Copyright Office takes the position that X-rays are not\r | |
5400 | copyrightable because they are mechanical. It can be argued\r | |
5401 | whether or not image enhancement in scanning can be protected. One\r | |
5402 | must exercise care with material created with public funds and\r | |
5403 | generally in the public domain. An article written by a federal\r | |
5404 | employee, if written as part of official duties, is not\r | |
5405 | copyrightable. However, control over a scientific article written\r | |
5406 | by a National Institutes of Health grantee (i.e., someone who\r | |
5407 | receives money from the U.S. government), depends on NIH policy. If\r | |
5408 | the government agency has no policy (and that policy can be\r | |
5409 | contained in its regulations, the contract, or the grant), the\r | |
5410 | author retains copyright. If a provision of the contract, grant, or\r | |
5411 | regulation states that there will be no copyright, then it does not\r | |
5412 | exist. When a work is created, copyright automatically comes into\r | |
5413 | existence unless something exists that says it does not.\r | |
5414 | \r | |
5415 | * An enhanced electronic copy of a print copy of an older reference\r | |
5416 | work in the public domain that does not contain copyrightable new\r | |
5417 | material is a purely mechanical rendition of the original work, and\r | |
5418 | is not copyrightable.\r | |
5419 | \r | |
5420 | * Usually, when a work enters the public domain, nothing can remove\r | |
5421 | it. For example, Congress recently passed into law the concept of\r | |
5422 | automatic renewal, which means that copyright on any work published\r | |
5423 | between l964 and l978 does not have to be renewed in order to\r | |
5424 | receive a seventy-five-year term. But any work not renewed before\r | |
5425 | 1964 is in the public domain.\r | |
5426 | \r | |
5427 | * Concerning whether or not the United States keeps track of when\r | |
5428 | authors die, nothing was ever done, nor is anything being done at\r | |
5429 | the moment by the Copyright Office.\r | |
5430 | \r | |
5431 | * Software that drives a mechanical process is itself copyrightable. \r | |
5432 | If one changes platforms, the software itself has a copyright. The\r | |
5433 | World Intellectual Property Organization will hold a symposium 28\r | |
5434 | March through 2 April l993, at Harvard University, on digital\r | |
5435 | technology, and will study this entire issue. If one purchases a\r | |
5436 | computer software package, such as MacPaint, and creates something\r | |
5437 | new, one receives protection only for that which has been added.\r | |
5438 | \r | |
5439 | PETERS added that often in copyright matters, rough justice is the\r | |
5440 | outcome, for example, in collective licensing, ASCAP (i.e., American\r | |
5441 | Society of Composers, Authors, and Publishers), and BMI (i.e., Broadcast\r | |
5442 | Music, Inc.), where it may seem that the big guys receive more than their\r | |
5443 | due. Of course, people ought not to copy a creative product without\r | |
5444 | paying for it; there should be some compensation. But the truth of the\r | |
5445 | world, and it is not a great truth, is that the big guy gets played on\r | |
5446 | the radio more frequently than the little guy, who has to do much more\r | |
5447 | until he becomes a big guy. That is true of every author, every\r | |
5448 | composer, everyone, and, unfortunately, is part of life.\r | |
5449 | \r | |
5450 | Copyright always originates with the author, except in cases of works\r | |
5451 | made for hire. (Most software falls into this category.) When an author\r | |
5452 | sends his article to a journal, he has not relinquished copyright, though\r | |
5453 | he retains the right to relinquish it. The author receives absolutely\r | |
5454 | everything. The less prominent the author, the more leverage the\r | |
5455 | publisher will have in contract negotiations. In order to transfer the\r | |
5456 | rights, the author must sign an agreement giving them away.\r | |
5457 | \r | |
5458 | In an electronic society, it is important to be able to license a writer\r | |
5459 | and work out deals. With regard to use of a work, it usually is much\r | |
5460 | easier when a publisher holds the rights. In an electronic era, a real\r | |
5461 | problem arises when one is digitizing and making information available. \r | |
5462 | PETERS referred again to electronic licensing clearinghouses. Copyright\r | |
5463 | ought to remain with the author, but as one moves forward globally in the\r | |
5464 | electronic arena, a middleman who can handle the various rights becomes\r | |
5465 | increasingly necessary.\r | |
5466 | \r | |
5467 | The notion of copyright law is that it resides with the individual, but\r | |
5468 | in an on-line environment, where a work can be adapted and tinkered with\r | |
5469 | by many individuals, there is concern. If changes are authorized and\r | |
5470 | there is no agreement to the contrary, the person who changes a work owns\r | |
5471 | the changes. To put it another way, the person who acquires permission\r | |
5472 | to change a work technically will become the author and the owner, unless\r | |
5473 | some agreement to the contrary has been made. It is typical for the\r | |
5474 | original publisher to try to control all of the versions and all of the\r | |
5475 | uses. Copyright law always only sets up the boundaries. Anything can be\r | |
5476 | changed by contract.\r | |
5477 | \r | |
5478 | ******\r | |
5479 | \r | |
5480 | SESSION VII. CONCLUSION\r | |
5481 | \r | |
5482 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
5483 | GENERAL DISCUSSION * Two questions for discussion * Different emphases in\r | |
5484 | the Workshop * Bringing the text and image partisans together *\r | |
5485 | Desiderata in planning the long-term development of something * Questions\r | |
5486 | surrounding the issue of electronic deposit * Discussion of electronic\r | |
5487 | deposit as an allusion to the issue of standards * Need for a directory\r | |
5488 | of preservation projects in digital form and for access to their\r | |
5489 | digitized files * CETH's catalogue of machine-readable texts in the\r | |
5490 | humanities * What constitutes a publication in the electronic world? *\r | |
5491 | Need for LC to deal with the concept of on-line publishing * LC's Network\r | |
5492 | Development Office exploring the limits of MARC as a standard in terms\r | |
5493 | of handling electronic information * Magnitude of the problem and the\r | |
5494 | need for distributed responsibility in order to maintain and store\r | |
5495 | electronic information * Workshop participants to be viewed as a starting\r | |
5496 | point * Development of a network version of AM urged * A step toward AM's\r | |
5497 | construction of some sort of apparatus for network access * A delicate\r | |
5498 | and agonizing policy question for LC * Re the issue of electronic\r | |
5499 | deposit, LC urged to initiate a catalytic process in terms of distributed\r | |
5500 | responsibility * Suggestions for cooperative ventures * Commercial\r | |
5501 | publishers' fears * Strategic questions for getting the image and text\r | |
5502 | people to think through long-term cooperation * Clarification of the\r | |
5503 | driving force behind both the Perseus and the Cornell Xerox projects *\r | |
5504 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\r | |
5505 | \r | |
5506 | In his role as moderator of the concluding session, GIFFORD raised two\r | |
5507 | questions he believed would benefit from discussion: 1) Are there enough\r | |
5508 | commonalities among those of us that have been here for two days so that\r | |
5509 | we can see courses of action that should be taken in the future? And, if\r | |
5510 | so, what are they and who might take them? 2) Partly derivative from\r | |
5511 | that, but obviously very dangerous to LC as host, do you see a role for\r | |
5512 | the Library of Congress in all this? Of course, the Library of Congress\r | |
5513 | holds a rather special status in a number of these matters, because it is\r | |
5514 | not perceived as a player with an economic stake in them, but are there\r | |
5515 | roles that LC can play that can help advance us toward where we are heading?\r | |
5516 | \r | |
5517 | Describing himself as an uninformed observer of the technicalities of the\r | |
5518 | last two days, GIFFORD detected three different emphases in the Workshop: \r | |
5519 | 1) people who are very deeply committed to text; 2) people who are almost\r | |
5520 | passionate about images; and 3) a few people who are very committed to\r | |
5521 | what happens to the networks. In other words, the new networking\r | |
5522 | dimension, the accessibility of the processability, the portability of\r | |
5523 | all this across the networks. How do we pull those three together?\r | |
5524 | \r | |
5525 | Adding a question that reflected HOCKEY's comment that this was the\r | |
5526 | fourth workshop she had attended in the previous thirty days, FLEISCHHAUER\r | |
5527 | wondered to what extent this meeting had reinvented the wheel, or if it\r | |
5528 | had contributed anything in the way of bringing together a different group\r | |
5529 | of people from those who normally appear on the workshop circuit.\r | |
5530 | \r | |
5531 | HOCKEY confessed to being struck at this meeting and the one the\r | |
5532 | Electronic Pierce Consortium organized the previous week that this was a\r | |
5533 | coming together of people working on texts and not images. Attempting to\r | |
5534 | bring the two together is something we ought to be thinking about for the\r | |
5535 | future: How one can think about working with image material to begin\r | |
5536 | with, but structuring it and digitizing it in such a way that at a later\r | |
5537 | stage it can be interpreted into text, and find a common way of building\r | |
5538 | text and images together so that they can be used jointly in the future,\r | |
5539 | with the network support to begin there because that is how people will\r | |
5540 | want to access it.\r | |
5541 | \r | |
5542 | In planning the long-term development of something, which is what is\r | |
5543 | being done in electronic text, HOCKEY stressed the importance not only\r | |
5544 | of discussing the technical aspects of how one does it but particularly\r | |
5545 | of thinking about what the people who use the stuff will want to do.\r | |
5546 | But conversely, there are numerous things that people start to do with\r | |
5547 | electronic text or material that nobody ever thought of in the beginning.\r | |
5548 | \r | |
5549 | LESK, in response to the question concerning the role of the Library of\r | |
5550 | Congress, remarked the often suggested desideratum of having electronic\r | |
5551 | deposit: Since everything is now computer-typeset, an entire decade of\r | |
5552 | material that was machine-readable exists, but the publishers frequently\r | |
5553 | did not save it; has LC taken any action to have its copyright deposit\r | |
5554 | operation start collecting these machine-readable versions? In the\r | |
5555 | absence of PETERS, GIFFORD replied that the question was being\r | |
5556 | actively considered but that that was only one dimension of the problem.\r | |
5557 | Another dimension is the whole question of the integrity of the original\r | |
5558 | electronic document. It becomes highly important in science to prove\r | |
5559 | authorship. How will that be done?\r | |
5560 | \r | |
5561 | ERWAY explained that, under the old policy, to make a claim for a\r | |
5562 | copyright for works that were published in electronic form, including\r | |
5563 | software, one had to submit a paper copy of the first and last twenty\r | |
5564 | pages of code--something that represented the work but did not include\r | |
5565 | the entire work itself and had little value to anyone. As a temporary\r | |
5566 | measure, LC has claimed the right to demand electronic versions of\r | |
5567 | electronic publications. This measure entails a proactive role for the\r | |
5568 | Library to say that it wants a particular electronic version. Publishers\r | |
5569 | then have perhaps a year to submit it. But the real problem for LC is\r | |
5570 | what to do with all this material in all these different formats. Will\r | |
5571 | the Library mount it? How will it give people access to it? How does LC\r | |
5572 | keep track of the appropriate computers, software, and media? The situation\r | |
5573 | is so hard to control, ERWAY said, that it makes sense for each publishing\r | |
5574 | house to maintain its own archive. But LC cannot enforce that either.\r | |
5575 | \r | |
5576 | GIFFORD acknowledged LESK's suggestion that establishing a priority\r | |
5577 | offered the solution, albeit a fairly complicated one. But who maintains\r | |
5578 | that register?, he asked. GRABER noted that LC does attempt to collect a\r | |
5579 | Macintosh version and the IBM-compatible version of software. It does\r | |
5580 | not collect other versions. But while true for software, BYRUM observed,\r | |
5581 | this reply does not speak to materials, that is, all the materials that\r | |
5582 | were published that were on somebody's microcomputer or driver tapes\r | |
5583 | at a publishing office across the country. LC does well to acquire\r | |
5584 | specific machine-readable products selectively that were intended to be\r | |
5585 | machine-readable. Materials that were in machine-readable form at one time,\r | |
5586 | BYRUM said, would be beyond LC's capability at the moment, insofar as\r | |
5587 | attempting to acquire, organize, and preserve them are concerned--and\r | |
5588 | preservation would be the most important consideration. In this\r | |
5589 | connection, GIFFORD reiterated the need to work out some sense of\r | |
5590 | distributive responsibility for a number of these issues, which\r | |
5591 | inevitably will require significant cooperation and discussion.\r | |
5592 | Nobody can do it all.\r | |
5593 | \r | |
5594 | LESK suggested that some publishers may look with favor on LC beginning\r | |
5595 | to serve as a depository of tapes in an electronic manuscript standard. \r | |
5596 | Publishers may view this as a service that they did not have to perform\r | |
5597 | and they might send in tapes. However, SPERBERG-McQUEEN countered,\r | |
5598 | although publishers have had equivalent services available to them for a\r | |
5599 | long time, the electronic text archive has never turned away or been\r | |
5600 | flooded with tapes and is forever sending feedback to the depositor. \r | |
5601 | Some publishers do send in tapes.\r | |
5602 | \r | |
5603 | ANDRE viewed this discussion as an allusion to the issue of standards. \r | |
5604 | She recommended that the AAP standard and the TEI, which has already been\r | |
5605 | somewhat harmonized internationally and which also shares several\r | |
5606 | compatibilities with the AAP, be harmonized to ensure sufficient\r | |
5607 | compatibility in the software. She drew the line at saying LC ought to\r | |
5608 | be the locus or forum for such harmonization.\r | |
5609 | \r | |
5610 | Taking the group in a slightly different direction, but one where at\r | |
5611 | least in the near term LC might play a helpful role, LYNCH remarked the\r | |
5612 | plans of a number of projects to carry out preservation by creating\r | |
5613 | digital images that will end up in on-line or near-line storage at some\r | |
5614 | institution. Presumably, LC will link this material somehow to its\r | |
5615 | on-line catalog in most cases. Thus, it is in a digital form. LYNCH had\r | |
5616 | the impression that many of these institutions would be willing to make\r | |
5617 | those files accessible to other people outside the institution, provided\r | |
5618 | that there is no copyright problem. This desideratum will require\r | |
5619 | propagating the knowledge that those digitized files exist, so that they\r | |
5620 | can end up in other on-line catalogs. Although uncertain about the\r | |
5621 | mechanism for achieving this result, LYNCH said that it warranted\r | |
5622 | scrutiny because it seemed to be connected to some of the basic issues of\r | |
5623 | cataloging and distribution of records. It would be foolish, given the\r | |
5624 | amount of work that all of us have to do and our meager resources, to\r | |
5625 | discover multiple institutions digitizing the same work. Re microforms,\r | |
5626 | LYNCH said, we are in pretty good shape.\r | |
5627 | \r | |
5628 | BATTIN called this a big problem and noted that the Cornell people (who\r | |
5629 | had already departed) were working on it. At issue from the beginning\r | |
5630 | was to learn how to catalog that information into RLIN and then into\r | |
5631 | OCLC, so that it would be accessible. That issue remains to be resolved. \r | |
5632 | LYNCH rejoined that putting it into OCLC or RLIN was helpful insofar as\r | |
5633 | somebody who is thinking of performing preservation activity on that work\r | |
5634 | could learn about it. It is not necessarily helpful for institutions to\r | |
5635 | make that available. BATTIN opined that the idea was that it not only be\r | |
5636 | for preservation purposes but for the convenience of people looking for\r | |
5637 | this material. She endorsed LYNCH's dictum that duplication of this\r | |
5638 | effort was to be avoided by every means.\r | |
5639 | \r | |
5640 | HOCKEY informed the Workshop about one major current activity of CETH,\r | |
5641 | namely a catalogue of machine-readable texts in the humanities. Held on\r | |
5642 | RLIN at present, the catalogue has been concentrated on ASCII as opposed\r | |
5643 | to digitized images of text. She is exploring ways to improve the\r | |
5644 | catalogue and make it more widely available, and welcomed suggestions\r | |
5645 | about these concerns. CETH owns the records, which are not just\r | |
5646 | restricted to RLIN, and can distribute them however it wishes.\r | |
5647 | \r | |
5648 | Taking up LESK's earlier question, BATTIN inquired whether LC, since it\r | |
5649 | is accepting electronic files and designing a mechanism for dealing with\r | |
5650 | that rather than putting books on shelves, would become responsible for\r | |
5651 | the National Copyright Depository of Electronic Materials. Of course\r | |
5652 | that could not be accomplished overnight, but it would be something LC\r | |
5653 | could plan for. GIFFORD acknowledged that much thought was being devoted\r | |
5654 | to that set of problems and returned the discussion to the issue raised\r | |
5655 | by LYNCH--whether or not putting the kind of records that both BATTIN and\r | |
5656 | HOCKEY have been talking about in RLIN is not a satisfactory solution. \r | |
5657 | It seemed to him that RLIN answered LYNCH's original point concerning\r | |
5658 | some kind of directory for these kinds of materials. In a situation\r | |
5659 | where somebody is attempting to decide whether or not to scan this or\r | |
5660 | film that or to learn whether or not someone has already done so, LYNCH\r | |
5661 | suggested, RLIN is helpful, but it is not helpful in the case of a local,\r | |
5662 | on-line catalogue. Further, one would like to have her or his system be\r | |
5663 | aware that that exists in digital form, so that one can present it to a\r | |
5664 | patron, even though one did not digitize it, if it is out of copyright. \r | |
5665 | The only way to make those linkages would be to perform a tremendous\r | |
5666 | amount of real-time look-up, which would be awkward at best, or\r | |
5667 | periodically to yank the whole file from RLIN and match it against one's\r | |
5668 | own stuff, which is a nuisance.\r | |
5669 | \r | |
5670 | But where, ERWAY inquired, does one stop including things that are\r | |
5671 | available with Internet, for instance, in one's local catalogue?\r | |
5672 | It almost seems that that is LC's means to acquire access to them.\r | |
5673 | That represents LC's new form of library loan. Perhaps LC's new on-line\r | |
5674 | catalogue is an amalgamation of all these catalogues on line. LYNCH\r | |
5675 | conceded that perhaps that was true in the very long term, but was not\r | |
5676 | applicable to scanning in the short term. In his view, the totals cited\r | |
5677 | by Yale, 10,000 books over perhaps a four-year period, and 1,000-1,500\r | |
5678 | books from Cornell, were not big numbers, while searching all over\r | |
5679 | creation for relatively rare occurrences will prove to be less efficient. \r | |
5680 | As GIFFORD wondered if this would not be a separable file on RLIN and\r | |
5681 | could be requested from them, BATTIN interjected that it was easily\r | |
5682 | accessible to an institution. SEVERTSON pointed out that that file, cum\r | |
5683 | enhancements, was available with reference information on CD-ROM, which\r | |
5684 | makes it a little more available.\r | |
5685 | \r | |
5686 | In HOCKEY's view, the real question facing the Workshop is what to put in\r | |
5687 | this catalogue, because that raises the question of what constitutes a\r | |
5688 | publication in the electronic world. (WEIBEL interjected that Eric Joule\r | |
5689 | in OCLC's Office of Research is also wrestling with this particular\r | |
5690 | problem, while GIFFORD thought it sounded fairly generic.) HOCKEY\r | |
5691 | contended that a majority of texts in the humanities are in the hands\r | |
5692 | of either a small number of large research institutions or individuals\r | |
5693 | and are not generally available for anyone else to access at all.\r | |
5694 | She wondered if these texts ought to be catalogued.\r | |
5695 | \r | |
5696 | After argument proceeded back and forth for several minutes over why\r | |
5697 | cataloguing might be a necessary service, LEBRON suggested that this\r | |
5698 | issue involved the responsibility of a publisher. The fact that someone\r | |
5699 | has created something electronically and keeps it under his or her\r | |
5700 | control does not constitute publication. Publication implies\r | |
5701 | dissemination. While it would be important for a scholar to let other\r | |
5702 | people know that this creation exists, in many respects this is no\r | |
5703 | different from an unpublished manuscript. That is what is being accessed\r | |
5704 | in there, except that now one is not looking at it in the hard-copy but\r | |
5705 | in the electronic environment.\r | |
5706 | \r | |
5707 | LEBRON expressed puzzlement at the variety of ways electronic publishing\r | |
5708 | has been viewed. Much of what has been discussed throughout these two\r | |
5709 | days has concerned CD-ROM publishing, whereas in the on-line environment\r | |
5710 | that she confronts, the constraints and challenges are very different. \r | |
5711 | Sooner or later LC will have to deal with the concept of on-line\r | |
5712 | publishing. Taking up the comment ERWAY made earlier about storing\r | |
5713 | copies, LEBRON gave her own journal as an example. How would she deposit\r | |
5714 | OJCCT for copyright?, she asked, because the journal will exist in the\r | |
5715 | mainframe at OCLC and people will be able to access it. Here the\r | |
5716 | situation is different, ownership versus access, and is something that\r | |
5717 | arises with publication in the on-line environment, faster than is\r | |
5718 | sometimes realized. Lacking clear answers to all of these questions\r | |
5719 | herself, LEBRON did not anticipate that LC would be able to take a role\r | |
5720 | in helping to define some of them for quite a while.\r | |
5721 | \r | |
5722 | GREENFIELD observed that LC's Network Development Office is attempting,\r | |
5723 | among other things, to explore the limits of MARC as a standard in terms\r | |
5724 | of handling electronic information. GREENFIELD also noted that Rebecca\r | |
5725 | GUENTHER from that office gave a paper to the American Society for\r | |
5726 | Information Science (ASIS) summarizing several of the discussion papers\r | |
5727 | that were coming out of the Network Development Office. GREENFIELD said\r | |
5728 | he understood that that office had a list-server soliciting just the kind\r | |
5729 | of feedback received today concerning the difficulties of identifying and\r | |
5730 | cataloguing electronic information. GREENFIELD hoped that everybody\r | |
5731 | would be aware of that and somehow contribute to that conversation.\r | |
5732 | \r | |
5733 | Noting two of LC's roles, first, to act as a repository of record for\r | |
5734 | material that is copyrighted in this country, and second, to make\r | |
5735 | materials it holds available in some limited form to a clientele that\r | |
5736 | goes beyond Congress, BESSER suggested that it was incumbent on LC to\r | |
5737 | extend those responsibilities to all the things being published in\r | |
5738 | electronic form. This would mean eventually accepting electronic\r | |
5739 | formats. LC could require that at some point they be in a certain\r | |
5740 | limited set of formats, and then develop mechanisms for allowing people\r | |
5741 | to access those in the same way that other things are accessed. This\r | |
5742 | does not imply that they are on the network and available to everyone. \r | |
5743 | LC does that with most of its bibliographic records, BESSER said, which\r | |
5744 | end up migrating to the utility (e.g., OCLC) or somewhere else. But just\r | |
5745 | as most of LC's books are available in some form through interlibrary\r | |
5746 | loan or some other mechanism, so in the same way electronic formats ought\r | |
5747 | to be available to others in some format, though with some copyright\r | |
5748 | considerations. BESSER was not suggesting that these mechanisms be\r | |
5749 | established tomorrow, only that they seemed to fall within LC's purview,\r | |
5750 | and that there should be long-range plans to establish them.\r | |
5751 | \r | |
5752 | Acknowledging that those from LC in the room agreed with BESSER\r | |
5753 | concerning the need to confront difficult questions, GIFFORD underscored\r | |
5754 | the magnitude of the problem of what to keep and what to select. GIFFORD\r | |
5755 | noted that LC currently receives some 31,000 items per day, not counting\r | |
5756 | electronic materials, and argued for much more distributed responsibility\r | |
5757 | in order to maintain and store electronic information.\r | |
5758 | \r | |
5759 | BESSER responded that the assembled group could be viewed as a starting\r | |
5760 | point, whose initial operating premise could be helping to move in this\r | |
5761 | direction and defining how LC could do so, for example, in areas of\r | |
5762 | standardization or distribution of responsibility.\r | |
5763 | \r | |
5764 | FLEISCHHAUER added that AM was fully engaged, wrestling with some of the\r | |
5765 | questions that pertain to the conversion of older historical materials,\r | |
5766 | which would be one thing that the Library of Congress might do. Several\r | |
5767 | points mentioned by BESSER and several others on this question have a\r | |
5768 | much greater impact on those who are concerned with cataloguing and the\r | |
5769 | networking of bibliographic information, as well as preservation itself.\r | |
5770 | \r | |
5771 | Speaking directly to AM, which he considered was a largely uncopyrighted\r | |
5772 | database, LYNCH urged development of a network version of AM, or\r | |
5773 | consideration of making the data in it available to people interested in\r | |
5774 | doing network multimedia. On account of the current great shortage of\r | |
5775 | digital data that is both appealing and unencumbered by complex rights\r | |
5776 | problems, this course of action could have a significant effect on making\r | |
5777 | network multimedia a reality.\r | |
5778 | \r | |
5779 | In this connection, FLEISCHHAUER reported on a fragmentary prototype in\r | |
5780 | LC's Office of Information Technology Services that attempts to associate\r | |
5781 | digital images of photographs with cataloguing information in ways that\r | |
5782 | work within a local area network--a step, so to say, toward AM's\r | |
5783 | construction of some sort of apparatus for access. Further, AM has\r | |
5784 | attempted to use standard data forms in order to help make that\r | |
5785 | distinction between the access tools and the underlying data, and thus\r | |
5786 | believes that the database is networkable.\r | |
5787 | \r | |
5788 | A delicate and agonizing policy question for LC, however, which comes\r | |
5789 | back to resources and unfortunately has an impact on this, is to find\r | |
5790 | some appropriate, honorable, and legal cost-recovery possibilities. A\r | |
5791 | certain skittishness concerning cost-recovery has made people unsure\r | |
5792 | exactly what to do. AM would be highly receptive to discussing further\r | |
5793 | LYNCH's offer to test or demonstrate its database in a network\r | |
5794 | environment, FLEISCHHAUER said.\r | |
5795 | \r | |
5796 | Returning the discussion to what she viewed as the vital issue of\r | |
5797 | electronic deposit, BATTIN recommended that LC initiate a catalytic\r | |
5798 | process in terms of distributed responsibility, that is, bring together\r | |
5799 | the distributed organizations and set up a study group to look at all\r | |
5800 | these issues and see where we as a nation should move. The broader\r | |
5801 | issues of how we deal with the management of electronic information will\r | |
5802 | not disappear, but only grow worse.\r | |
5803 | \r | |
5804 | LESK took up this theme and suggested that LC attempt to persuade one\r | |
5805 | major library in each state to deal with its state equivalent publisher,\r | |
5806 | which might produce a cooperative project that would be equitably\r | |
5807 | distributed around the country, and one in which LC would be dealing with\r | |
5808 | a minimal number of publishers and minimal copyright problems.\r | |
5809 | \r | |
5810 | GRABER remarked the recent development in the scientific community of a\r | |
5811 | willingness to use SGML and either deposit or interchange on a fairly\r | |
5812 | standardized format. He wondered if a similar movement was taking place\r | |
5813 | in the humanities. Although the National Library of Medicine found only\r | |
5814 | a few publishers to cooperate in a like venture two or three years ago, a\r | |
5815 | new effort might generate a much larger number willing to cooperate.\r | |
5816 | \r | |
5817 | KIMBALL recounted his unit's (Machine-Readable Collections Reading Room)\r | |
5818 | troubles with the commercial publishers of electronic media in acquiring\r | |
5819 | materials for LC's collections, in particular the publishers' fear that\r | |
5820 | they would not be able to cover their costs and would lose control of\r | |
5821 | their products, that LC would give them away or sell them and make\r | |
5822 | profits from them. He doubted that the publishing industry was prepared\r | |
5823 | to move into this area at the moment, given its resistance to allowing LC\r | |
5824 | to use its machine-readable materials as the Library would like.\r | |
5825 | \r | |
5826 | The copyright law now addresses compact disk as a medium, and LC can\r | |
5827 | request one copy of that, or two copies if it is the only version, and\r | |
5828 | can request copies of software, but that fails to address magazines or\r | |
5829 | books or anything like that which is in machine-readable form.\r | |
5830 | \r | |
5831 | GIFFORD acknowledged the thorny nature of this issue, which he illustrated\r | |
5832 | with the example of the cumbersome process involved in putting a copy of a\r | |
5833 | scientific database on a LAN in LC's science reading room. He also\r | |
5834 | acknowledged that LC needs help and could enlist the energies and talents\r | |
5835 | of Workshop participants in thinking through a number of these problems.\r | |
5836 | \r | |
5837 | GIFFORD returned the discussion to getting the image and text people to\r | |
5838 | think through together where they want to go in the long term. MYLONAS\r | |
5839 | conceded that her experience at the Pierce Symposium the previous week at\r | |
5840 | Georgetown University and this week at LC had forced her to reevaluate\r | |
5841 | her perspective on the usefulness of text as images. MYLONAS framed the\r | |
5842 | issues in a series of questions: How do we acquire machine-readable\r | |
5843 | text? Do we take pictures of it and perform OCR on it later? Is it\r | |
5844 | important to obtain very high-quality images and text, etc.? \r | |
5845 | FLEISCHHAUER agreed with MYLONAS's framing of strategic questions, adding\r | |
5846 | that a large institution such as LC probably has to do all of those\r | |
5847 | things at different times. Thus, the trick is to exercise judgment. The\r | |
5848 | Workshop had added to his and AM's considerations in making those\r | |
5849 | judgments. Concerning future meetings or discussions, MYLONAS suggested\r | |
5850 | that screening priorities would be helpful.\r | |
5851 | \r | |
5852 | WEIBEL opined that the diversity reflected in this group was a sign both\r | |
5853 | of the health and of the immaturity of the field, and more time would\r | |
5854 | have to pass before we convince one another concerning standards.\r | |
5855 | \r | |
5856 | An exchange between MYLONAS and BATTIN clarified the point that the\r | |
5857 | driving force behind both the Perseus and the Cornell Xerox projects was\r | |
5858 | the preservation of knowledge for the future, not simply for particular\r | |
5859 | research use. In the case of Perseus, MYLONAS said, the assumption was\r | |
5860 | that the texts would not be entered again into electronically readable\r | |
5861 | form. SPERBERG-McQUEEN added that a scanned image would not serve as an\r | |
5862 | archival copy for purposes of preservation in the case of, say, the Bill\r | |
5863 | of Rights, in the sense that the scanned images are effectively the\r | |
5864 | archival copies for the Cornell mathematics books.\r | |
5865 | \r | |
5866 | \r | |
5867 | *** *** *** ****** *** *** ***\r | |
5868 | \r | |
5869 | \r | |
5870 | Appendix I: PROGRAM\r | |
5871 | \r | |
5872 | \r | |
5873 | \r | |
5874 | WORKSHOP\r | |
5875 | ON\r | |
5876 | ELECTRONIC\r | |
5877 | TEXTS\r | |
5878 | \r | |
5879 | \r | |
5880 | \r | |
5881 | 9-10 June 1992\r | |
5882 | \r | |
5883 | Library of Congress\r | |
5884 | Washington, D.C.\r | |
5885 | \r | |
5886 | \r | |
5887 | \r | |
5888 | Supported by a Grant from the David and Lucile Packard Foundation\r | |
5889 | \r | |
5890 | \r | |
5891 | Tuesday, 9 June 1992\r | |
5892 | \r | |
5893 | NATIONAL DEMONSTRATION LAB, ATRIUM, LIBRARY MADISON\r | |
5894 | \r | |
5895 | 8:30 AM Coffee and Danish, registration\r | |
5896 | \r | |
5897 | 9:00 AM Welcome\r | |
5898 | \r | |
5899 | Prosser Gifford, Director for Scholarly Programs, and Carl\r | |
5900 | Fleischhauer, Coordinator, American Memory, Library of\r | |
5901 | Congress\r | |
5902 | \r | |
5903 | 9:l5 AM Session I. Content in a New Form: Who Will Use It and What\r | |
5904 | Will They Do?\r | |
5905 | \r | |
5906 | Broad description of the range of electronic information. \r | |
5907 | Characterization of who uses it and how it is or may be used. \r | |
5908 | In addition to a look at scholarly uses, this session will\r | |
5909 | include a presentation on use by students (K-12 and college)\r | |
5910 | and the general public.\r | |
5911 | \r | |
5912 | Moderator: James Daly\r | |
5913 | Avra Michelson, Archival Research and Evaluation Staff,\r | |
5914 | National Archives and Records Administration (Overview)\r | |
5915 | Susan H. Veccia, Team Leader, American Memory, User Evaluation,\r | |
5916 | and\r | |
5917 | Joanne Freeman, Associate Coordinator, American Memory, Library\r | |
5918 | of Congress (Beyond the scholar)\r | |
5919 | \r | |
5920 | 10:30-\r | |
5921 | 11:00 AM Break\r | |
5922 | \r | |
5923 | 11:00 AM Session II. Show and Tell.\r | |
5924 | \r | |
5925 | Each presentation to consist of a fifteen-minute\r | |
5926 | statement/show; group discussion will follow lunch.\r | |
5927 | \r | |
5928 | Moderator: Jacqueline Hess, Director, National Demonstration\r | |
5929 | Lab\r | |
5930 | \r | |
5931 | 1. A classics project, stressing texts and text retrieval\r | |
5932 | more than multimedia: Perseus Project, Harvard\r | |
5933 | University\r | |
5934 | Elli Mylonas, Managing Editor\r | |
5935 | \r | |
5936 | 2. Other humanities projects employing the emerging norms of\r | |
5937 | the Text Encoding Initiative (TEI): Chadwyck-Healey's\r | |
5938 | The English Poetry Full Text Database and/or Patrologia\r | |
5939 | Latina Database\r | |
5940 | Eric M. Calaluca, Vice President, Chadwyck-Healey, Inc.\r | |
5941 | \r | |
5942 | 3. American Memory\r | |
5943 | Carl Fleischhauer, Coordinator, and\r | |
5944 | Ricky Erway, Associate Coordinator, Library of Congress\r | |
5945 | \r | |
5946 | 4. Founding Fathers example from Packard Humanities\r | |
5947 | Institute: The Papers of George Washington, University\r | |
5948 | of Virginia\r | |
5949 | Dorothy Twohig, Managing Editor, and/or\r | |
5950 | David Woodley Packard\r | |
5951 | \r | |
5952 | 5. An electronic medical journal offering graphics and\r | |
5953 | full-text searchability: The Online Journal of Current\r | |
5954 | Clinical Trials, American Association for the Advancement\r | |
5955 | of Science\r | |
5956 | Maria L. Lebron, Managing Editor\r | |
5957 | \r | |
5958 | 6. A project that offers facsimile images of pages but omits\r | |
5959 | searchable text: Cornell math books\r | |
5960 | Lynne K. Personius, Assistant Director, Cornell\r | |
5961 | Information Technologies for Scholarly Information\r | |
5962 | Sources, Cornell University\r | |
5963 | \r | |
5964 | 12:30 PM Lunch (Dining Room A, Library Madison 620. Exhibits\r | |
5965 | available.)\r | |
5966 | \r | |
5967 | 1:30 PM Session II. Show and Tell (Cont'd.).\r | |
5968 | \r | |
5969 | 3:00-\r | |
5970 | 3:30 PM Break\r | |
5971 | \r | |
5972 | 3:30-\r | |
5973 | 5:30 PM Session III. Distribution, Networks, and Networking: Options\r | |
5974 | for Dissemination.\r | |
5975 | \r | |
5976 | Published disks: University presses and public-sector\r | |
5977 | publishers, private-sector publishers\r | |
5978 | Computer networks\r | |
5979 | \r | |
5980 | Moderator: Robert G. Zich, Special Assistant to the Associate\r | |
5981 | Librarian for Special Projects, Library of Congress\r | |
5982 | Clifford A. Lynch, Director, Library Automation, University of\r | |
5983 | California\r | |
5984 | Howard Besser, School of Library and Information Science,\r | |
5985 | University of Pittsburgh\r | |
5986 | Ronald L. Larsen, Associate Director of Libraries for\r | |
5987 | Information Technology, University of Maryland at College\r | |
5988 | Park\r | |
5989 | Edwin B. Brownrigg, Executive Director, Memex Research\r | |
5990 | Institute\r | |
5991 | \r | |
5992 | 6:30 PM Reception (Montpelier Room, Library Madison 619.)\r | |
5993 | \r | |
5994 | ******\r | |
5995 | \r | |
5996 | Wednesday, 10 June 1992\r | |
5997 | \r | |
5998 | DINING ROOM A, LIBRARY MADISON 620\r | |
5999 | \r | |
6000 | 8:30 AM Coffee and Danish\r | |
6001 | \r | |
6002 | 9:00 AM Session IV. Image Capture, Text Capture, Overview of Text and\r | |
6003 | Image Storage Formats.\r | |
6004 | \r | |
6005 | Moderator: William L. Hooton, Vice President of Operations,\r | |
6006 | I-NET\r | |
6007 | \r | |
6008 | A) Principal Methods for Image Capture of Text:\r | |
6009 | Direct scanning\r | |
6010 | Use of microform\r | |
6011 | \r | |
6012 | Anne R. Kenney, Assistant Director, Department of Preservation\r | |
6013 | and Conservation, Cornell University\r | |
6014 | Pamela Q.J. Andre, Associate Director, Automation, and\r | |
6015 | Judith A. Zidar, Coordinator, National Agricultural Text\r | |
6016 | Digitizing Program (NATDP), National Agricultural Library\r | |
6017 | (NAL)\r | |
6018 | Donald J. Waters, Head, Systems Office, Yale University Library\r | |
6019 | \r | |
6020 | B) Special Problems:\r | |
6021 | Bound volumes\r | |
6022 | Conservation\r | |
6023 | Reproducing printed halftones\r | |
6024 | \r | |
6025 | Carl Fleischhauer, Coordinator, American Memory, Library of\r | |
6026 | Congress\r | |
6027 | George Thoma, Chief, Communications Engineering Branch,\r | |
6028 | National Library of Medicine (NLM)\r | |
6029 | \r | |
6030 | 10:30-\r | |
6031 | 11:00 AM Break\r | |
6032 | \r | |
6033 | 11:00 AM Session IV. Image Capture, Text Capture, Overview of Text and\r | |
6034 | Image Storage Formats (Cont'd.).\r | |
6035 | \r | |
6036 | C) Image Standards and Implications for Preservation\r | |
6037 | \r | |
6038 | Jean Baronas, Senior Manager, Department of Standards and\r | |
6039 | Technology, Association for Information and Image Management\r | |
6040 | (AIIM)\r | |
6041 | Patricia Battin, President, The Commission on Preservation and\r | |
6042 | Access (CPA)\r | |
6043 | \r | |
6044 | D) Text Conversion:\r | |
6045 | OCR vs. rekeying\r | |
6046 | Standards of accuracy and use of imperfect texts\r | |
6047 | Service bureaus\r | |
6048 | \r | |
6049 | Stuart Weibel, Senior Research Specialist, Online Computer\r | |
6050 | Library Center, Inc. (OCLC)\r | |
6051 | Michael Lesk, Executive Director, Computer Science Research,\r | |
6052 | Bellcore\r | |
6053 | Ricky Erway, Associate Coordinator, American Memory, Library of\r | |
6054 | Congress\r | |
6055 | Pamela Q.J. Andre, Associate Director, Automation, and\r | |
6056 | Judith A. Zidar, Coordinator, National Agricultural Text\r | |
6057 | Digitizing Program (NATDP), National Agricultural Library\r | |
6058 | (NAL)\r | |
6059 | \r | |
6060 | 12:30-\r | |
6061 | 1:30 PM Lunch\r | |
6062 | \r | |
6063 | 1:30 PM Session V. Approaches to Preparing Electronic Texts.\r | |
6064 | \r | |
6065 | Discussion of approaches to structuring text for the computer;\r | |
6066 | pros and cons of text coding, description of methods in\r | |
6067 | practice, and comparison of text-coding methods.\r | |
6068 | \r | |
6069 | Moderator: Susan Hockey, Director, Center for Electronic Texts\r | |
6070 | in the Humanities (CETH), Rutgers and Princeton Universities\r | |
6071 | David Woodley Packard\r | |
6072 | C.M. Sperberg-McQueen, Editor, Text Encoding Initiative (TEI),\r | |
6073 | University of Illinois-Chicago\r | |
6074 | Eric M. Calaluca, Vice President, Chadwyck-Healey, Inc.\r | |
6075 | \r | |
6076 | 3:30-\r | |
6077 | 4:00 PM Break\r | |
6078 | \r | |
6079 | 4:00 PM Session VI. Copyright Issues.\r | |
6080 | \r | |
6081 | Marybeth Peters, Policy Planning Adviser to the Register of\r | |
6082 | Copyrights, Library of Congress\r | |
6083 | \r | |
6084 | 5:00 PM Session VII. Conclusion.\r | |
6085 | \r | |
6086 | General discussion.\r | |
6087 | What topics were omitted or given short shrift that anyone\r | |
6088 | would like to talk about now?\r | |
6089 | Is there a "group" here? What should the group do next, if\r | |
6090 | anything? What should the Library of Congress do next, if\r | |
6091 | anything?\r | |
6092 | Moderator: Prosser Gifford, Director for Scholarly Programs,\r | |
6093 | Library of Congress\r | |
6094 | \r | |
6095 | 6:00 PM Adjourn\r | |
6096 | \r | |
6097 | \r | |
6098 | *** *** *** ****** *** *** ***\r | |
6099 | \r | |
6100 | \r | |
6101 | Appendix II: ABSTRACTS\r | |
6102 | \r | |
6103 | \r | |
6104 | SESSION I\r | |
6105 | \r | |
6106 | Avra MICHELSON Forecasting the Use of Electronic Texts by\r | |
6107 | Social Sciences and Humanities Scholars\r | |
6108 | \r | |
6109 | This presentation explores the ways in which electronic texts are likely\r | |
6110 | to be used by the non-scientific scholarly community. Many of the\r | |
6111 | remarks are drawn from a report the speaker coauthored with Jeff\r | |
6112 | Rothenberg, a computer scientist at The RAND Corporation.\r | |
6113 | \r | |
6114 | The speaker assesses 1) current scholarly use of information technology\r | |
6115 | and 2) the key trends in information technology most relevant to the\r | |
6116 | research process, in order to predict how social sciences and humanities\r | |
6117 | scholars are apt to use electronic texts. In introducing the topic,\r | |
6118 | current use of electronic texts is explored broadly within the context of\r | |
6119 | scholarly communication. From the perspective of scholarly\r | |
6120 | communication, the work of humanities and social sciences scholars\r | |
6121 | involves five processes: 1) identification of sources, 2) communication\r | |
6122 | with colleagues, 3) interpretation and analysis of data, 4) dissemination\r | |
6123 | of research findings, and 5) curriculum development and instruction. The\r | |
6124 | extent to which computation currently permeates aspects of scholarly\r | |
6125 | communication represents a viable indicator of the prospects for\r | |
6126 | electronic texts.\r | |
6127 | \r | |
6128 | The discussion of current practice is balanced by an analysis of key\r | |
6129 | trends in the scholarly use of information technology. These include the\r | |
6130 | trends toward end-user computing and connectivity, which provide a\r | |
6131 | framework for forecasting the use of electronic texts through this\r | |
6132 | millennium. The presentation concludes with a summary of the ways in\r | |
6133 | which the nonscientific scholarly community can be expected to use\r | |
6134 | electronic texts, and the implications of that use for information\r | |
6135 | providers.\r | |
6136 | \r | |
6137 | Susan VECCIA and Joanne FREEMAN Electronic Archives for the Public: \r | |
6138 | Use of American Memory in Public and\r | |
6139 | School Libraries\r | |
6140 | \r | |
6141 | This joint discussion focuses on nonscholarly applications of electronic\r | |
6142 | library materials, specifically addressing use of the Library of Congress\r | |
6143 | American Memory (AM) program in a small number of public and school\r | |
6144 | libraries throughout the United States. AM consists of selected Library\r | |
6145 | of Congress primary archival materials, stored on optical media\r | |
6146 | (CD-ROM/videodisc), and presented with little or no editing. Many\r | |
6147 | collections are accompanied by electronic introductions and user's guides\r | |
6148 | offering background information and historical context. Collections\r | |
6149 | represent a variety of formats including photographs, graphic arts,\r | |
6150 | motion pictures, recorded sound, music, broadsides and manuscripts,\r | |
6151 | books, and pamphlets.\r | |
6152 | \r | |
6153 | In 1991, the Library of Congress began a nationwide evaluation of AM in\r | |
6154 | different types of institutions. Test sites include public libraries,\r | |
6155 | elementary and secondary school libraries, college and university\r | |
6156 | libraries, state libraries, and special libraries. Susan VECCIA and\r | |
6157 | Joanne FREEMAN will discuss their observations on the use of AM by the\r | |
6158 | nonscholarly community, using evidence gleaned from this ongoing\r | |
6159 | evaluation effort.\r | |
6160 | \r | |
6161 | VECCIA will comment on the overall goals of the evaluation project, and\r | |
6162 | the types of public and school libraries included in this study. Her\r | |
6163 | comments on nonscholarly use of AM will focus on the public library as a\r | |
6164 | cultural and community institution, often bridging the gap between formal\r | |
6165 | and informal education. FREEMAN will discuss the use of AM in school\r | |
6166 | libraries. Use by students and teachers has revealed some broad\r | |
6167 | questions about the use of electronic resources, as well as definite\r | |
6168 | benefits gained by the "nonscholar." Topics will include the problem of\r | |
6169 | grasping content and context in an electronic environment, the stumbling\r | |
6170 | blocks created by "new" technologies, and the unique skills and interests\r | |
6171 | awakened through use of electronic resources.\r | |
6172 | \r | |
6173 | SESSION II\r | |
6174 | \r | |
6175 | Elli MYLONAS The Perseus Project: Interactive Sources and\r | |
6176 | Studies in Classical Greece\r | |
6177 | \r | |
6178 | The Perseus Project (5) has just released Perseus 1.0, the first publicly\r | |
6179 | available version of its hypertextual database of multimedia materials on\r | |
6180 | classical Greece. Perseus is designed to be used by a wide audience,\r | |
6181 | comprised of readers at the student and scholar levels. As such, it must\r | |
6182 | be able to locate information using different strategies, and it must\r | |
6183 | contain enough detail to serve the different needs of its users. In\r | |
6184 | addition, it must be delivered so that it is affordable to its target\r | |
6185 | audience. [These problems and the solutions we chose are described in\r | |
6186 | Mylonas, "An Interface to Classical Greek Civilization," JASIS 43:2,\r | |
6187 | March 1992.]\r | |
6188 | \r | |
6189 | In order to achieve its objective, the project staff decided to make a\r | |
6190 | conscious separation between selecting and converting textual, database,\r | |
6191 | and image data on the one hand, and putting it into a delivery system on\r | |
6192 | the other. That way, it is possible to create the electronic data\r | |
6193 | without thinking about the restrictions of the delivery system. We have\r | |
6194 | made a great effort to choose system-independent formats for our data,\r | |
6195 | and to put as much thought and work as possible into structuring it so\r | |
6196 | that the translation from paper to electronic form will enhance the value\r | |
6197 | of the data. [A discussion of these solutions as of two years ago is in\r | |
6198 | Elli Mylonas, Gregory Crane, Kenneth Morrell, and D. Neel Smith, "The\r | |
6199 | Perseus Project: Data in the Electronic Age," in Accessing Antiquity: \r | |
6200 | The Computerization of Classical Databases, J. Solomon and T. Worthen\r | |
6201 | (eds.), University of Arizona Press, in press.]\r | |
6202 | \r | |
6203 | Much of the work on Perseus is focused on collecting and converting the\r | |
6204 | data on which the project is based. At the same time, it is necessary to\r | |
6205 | provide means of access to the information, in order to make it usable,\r | |
6206 | and them to investigate how it is used. As we learn more about what\r | |
6207 | students and scholars from different backgrounds do with Perseus, we can\r | |
6208 | adjust our data collection, and also modify the system to accommodate\r | |
6209 | them. In creating a delivery system for general use, we have tried to\r | |
6210 | avoid favoring any one type of use by allowing multiple forms of access\r | |
6211 | to and navigation through the system.\r | |
6212 | \r | |
6213 | The way text is handled exemplifies some of these principles. All text\r | |
6214 | in Perseus is tagged using SGML, following the guidelines of the Text\r | |
6215 | Encoding Initiative (TEI). This markup is used to index the text, and\r | |
6216 | process it so that it can be imported into HyperCard. No SGML markup\r | |
6217 | remains in the text that reaches the user, because currently it would be\r | |
6218 | too expensive to create a system that acts on SGML in real time. \r | |
6219 | However, the regularity provided by SGML is essential for verifying the\r | |
6220 | content of the texts, and greatly speeds all the processing performed on\r | |
6221 | them. The fact that the texts exist in SGML ensures that they will be\r | |
6222 | relatively easy to port to different hardware and software, and so will\r | |
6223 | outlast the current delivery platform. Finally, the SGML markup\r | |
6224 | incorporates existing canonical reference systems (chapter, verse, line,\r | |
6225 | etc.); indexing and navigation are based on these features. This ensures\r | |
6226 | that the same canonical reference will always resolve to the same point\r | |
6227 | within a text, and that all versions of our texts, regardless of delivery\r | |
6228 | platform (even paper printouts) will function the same way.\r | |
6229 | \r | |
6230 | In order to provide tools for users, the text is processed by a\r | |
6231 | morphological analyzer, and the results are stored in a database. \r | |
6232 | Together with the index, the Greek-English Lexicon, and the index of all\r | |
6233 | the English words in the definitions of the lexicon, the morphological\r | |
6234 | analyses comprise a set of linguistic tools that allow users of all\r | |
6235 | levels to work with the textual information, and to accomplish different\r | |
6236 | tasks. For example, students who read no Greek may explore a concept as\r | |
6237 | it appears in Greek texts by using the English-Greek index, and then\r | |
6238 | looking up works in the texts and translations, or scholars may do\r | |
6239 | detailed morphological studies of word use by using the morphological\r | |
6240 | analyses of the texts. Because these tools were not designed for any one\r | |
6241 | use, the same tools and the same data can be used by both students and\r | |
6242 | scholars.\r | |
6243 | \r | |
6244 | NOTES:\r | |
6245 | (5) Perseus is based at Harvard University, with collaborators at\r | |
6246 | several other universities. The project has been funded primarily\r | |
6247 | by the Annenberg/CPB Project, as well as by Harvard University,\r | |
6248 | Apple Computer, and others. It is published by Yale University\r | |
6249 | Press. Perseus runs on Macintosh computers, under the HyperCard\r | |
6250 | program.\r | |
6251 | \r | |
6252 | Eric CALALUCA\r | |
6253 | \r | |
6254 | Chadwyck-Healey embarked last year on two distinct yet related full-text\r | |
6255 | humanities database projects.\r | |
6256 | \r | |
6257 | The English Poetry Full-Text Database and the Patrologia Latina Database\r | |
6258 | represent new approaches to linguistic research resources. The size and\r | |
6259 | complexity of the projects present problems for electronic publishers,\r | |
6260 | but surmountable ones if they remain abreast of the latest possibilities\r | |
6261 | in data capture and retrieval software techniques.\r | |
6262 | \r | |
6263 | The issues which required address prior to the commencement of the\r | |
6264 | projects were legion:\r | |
6265 | \r | |
6266 | 1. Editorial selection (or exclusion) of materials in each\r | |
6267 | database\r | |
6268 | \r | |
6269 | 2. Deciding whether or not to incorporate a normative encoding\r | |
6270 | structure into the databases?\r | |
6271 | A. If one is selected, should it be SGML?\r | |
6272 | B. If SGML, then the TEI?\r | |
6273 | \r | |
6274 | 3. Deliver as CD-ROM, magnetic tape, or both?\r | |
6275 | \r | |
6276 | 4. Can one produce retrieval software advanced enough for the\r | |
6277 | postdoctoral linguist, yet accessible enough for unattended\r | |
6278 | general use? Should one try?\r | |
6279 | \r | |
6280 | 5. Re fair and liberal networking policies, what are the risks to\r | |
6281 | an electronic publisher?\r | |
6282 | \r | |
6283 | 6. How does the emergence of national and international education\r | |
6284 | networks affect the use and viability of research projects\r | |
6285 | requiring high investment? Do the new European Community\r | |
6286 | directives concerning database protection necessitate two\r | |
6287 | distinct publishing projects, one for North America and one for\r | |
6288 | overseas?\r | |
6289 | \r | |
6290 | From new notions of "scholarly fair use" to the future of optical media,\r | |
6291 | virtually every issue related to electronic publishing was aired. The\r | |
6292 | result is two projects which have been constructed to provide the quality\r | |
6293 | research resources with the fewest encumbrances to use by teachers and\r | |
6294 | private scholars.\r | |
6295 | \r | |
6296 | Dorothy TWOHIG\r | |
6297 | \r | |
6298 | In spring 1988 the editors of the papers of George Washington, John\r | |
6299 | Adams, Thomas Jefferson, James Madison, and Benjamin Franklin were\r | |
6300 | approached by classics scholar David Packard on behalf of the Packard\r | |
6301 | Humanities Foundation with a proposal to produce a CD-ROM edition of the\r | |
6302 | complete papers of each of the Founding Fathers. This electronic edition\r | |
6303 | will supplement the published volumes, making the documents widely\r | |
6304 | available to students and researchers at reasonable cost. We estimate\r | |
6305 | that our CD-ROM edition of Washington's Papers will be substantially\r | |
6306 | completed within the next two years and ready for publication. Within\r | |
6307 | the next ten years or so, similar CD-ROM editions of the Franklin, Adams,\r | |
6308 | Jefferson, and Madison papers also will be available. At the Library of\r | |
6309 | Congress's session on technology, I would like to discuss not only the\r | |
6310 | experience of the Washington Papers in producing the CD-ROM edition, but\r | |
6311 | the impact technology has had on these major editorial projects. \r | |
6312 | Already, we are editing our volumes with an eye to the material that will\r | |
6313 | be readily available in the CD-ROM edition. The completed electronic\r | |
6314 | edition will provide immense possibilities for the searching of documents\r | |
6315 | for information in a way never possible before. The kind of technical\r | |
6316 | innovations that are currently available and on the drawing board will\r | |
6317 | soon revolutionize historical research and the production of historical\r | |
6318 | documents. Unfortunately, much of this new technology is not being used\r | |
6319 | in the planning stages of historical projects, simply because many\r | |
6320 | historians are aware only in the vaguest way of its existence. At least\r | |
6321 | two major new historical editing projects are considering microfilm\r | |
6322 | editions, simply because they are not aware of the possibilities of\r | |
6323 | electronic alternatives and the advantages of the new technology in terms\r | |
6324 | of flexibility and research potential compared to microfilm. In fact,\r | |
6325 | too many of us in history and literature are still at the stage of\r | |
6326 | struggling with our PCs. There are many historical editorial projects in\r | |
6327 | progress presently, and an equal number of literary projects. While the\r | |
6328 | two fields have somewhat different approaches to textual editing, there\r | |
6329 | are ways in which electronic technology can be of service to both.\r | |
6330 | \r | |
6331 | Since few of the editors involved in the Founding Fathers CD-ROM editions\r | |
6332 | are technical experts in any sense, I hope to point out in my discussion\r | |
6333 | of our experience how many of these electronic innovations can be used\r | |
6334 | successfully by scholars who are novices in the world of new technology. \r | |
6335 | One of the major concerns of the sponsors of the multitude of new\r | |
6336 | scholarly editions is the limited audience reached by the published\r | |
6337 | volumes. Most of these editions are being published in small quantities\r | |
6338 | and the publishers' price for them puts them out of the reach not only of\r | |
6339 | individual scholars but of most public libraries and all but the largest\r | |
6340 | educational institutions. However, little attention is being given to\r | |
6341 | ways in which technology can bypass conventional publication to make\r | |
6342 | historical and literary documents more widely available.\r | |
6343 | \r | |
6344 | What attracted us most to the CD-ROM edition of The Papers of George\r | |
6345 | Washington was the fact that David Packard's aim was to make a complete\r | |
6346 | edition of all of the 135,000 documents we have collected available in an\r | |
6347 | inexpensive format that would be placed in public libraries, small\r | |
6348 | colleges, and even high schools. This would provide an audience far\r | |
6349 | beyond our present 1,000-copy, $45 published edition. Since the CD-ROM\r | |
6350 | edition will carry none of the explanatory annotation that appears in the\r | |
6351 | published volumes, we also feel that the use of the CD-ROM will lead many\r | |
6352 | researchers to seek out the published volumes.\r | |
6353 | \r | |
6354 | In addition to ignorance of new technical advances, I have found that too\r | |
6355 | many editors--and historians and literary scholars--are resistant and\r | |
6356 | even hostile to suggestions that electronic technology may enhance their\r | |
6357 | work. I intend to discuss some of the arguments traditionalists are\r | |
6358 | advancing to resist technology, ranging from distrust of the speed with\r | |
6359 | which it changes (we are already wondering what is out there that is\r | |
6360 | better than CD-ROM) to suspicion of the technical language used to\r | |
6361 | describe electronic developments.\r | |
6362 | \r | |
6363 | Maria LEBRON\r | |
6364 | \r | |
6365 | The Online Journal of Current Clinical Trials, a joint venture of the\r | |
6366 | American Association for the Advancement of Science (AAAS) and the Online\r | |
6367 | Computer Library Center, Inc. (OCLC), is the first peer-reviewed journal\r | |
6368 | to provide full text, tabular material, and line illustrations on line. \r | |
6369 | This presentation will discuss the genesis and start-up period of the\r | |
6370 | journal. Topics of discussion will include historical overview,\r | |
6371 | day-to-day management of the editorial peer review, and manuscript\r | |
6372 | tagging and publication. A demonstration of the journal and its features\r | |
6373 | will accompany the presentation.\r | |
6374 | \r | |
6375 | Lynne PERSONIUS\r | |
6376 | \r | |
6377 | Cornell University Library, Cornell Information Technologies, and Xerox\r | |
6378 | Corporation, with the support of the Commission on Preservation and\r | |
6379 | Access, and Sun Microsystems, Inc., have been collaborating in a project\r | |
6380 | to test a prototype system for recording brittle books as digital images\r | |
6381 | and producing, on demand, high-quality archival paper replacements. The\r | |
6382 | project goes beyond that, however, to investigate some of the issues\r | |
6383 | surrounding scanning, storing, retrieving, and providing access to\r | |
6384 | digital images in a network environment.\r | |
6385 | \r | |
6386 | The Joint Study in Digital Preservation began in January 1990. Xerox\r | |
6387 | provided the College Library Access and Storage System (CLASS) software,\r | |
6388 | a prototype 600-dots-per-inch (dpi) scanner, and the hardware necessary\r | |
6389 | to support network printing on the DocuTech printer housed in Cornell's\r | |
6390 | Computing and Communications Center (CCC).\r | |
6391 | \r | |
6392 | The Cornell staff using the hardware and software became an integral part\r | |
6393 | of the development and testing process for enhancements to the CLASS\r | |
6394 | software system. The collaborative nature of this relationship is\r | |
6395 | resulting in a system that is specifically tailored to the preservation\r | |
6396 | application.\r | |
6397 | \r | |
6398 | A digital library of 1,000 volumes (or approximately 300,000 images) has\r | |
6399 | been created and is stored on an optical jukebox that resides in CCC. \r | |
6400 | The library includes a collection of select mathematics monographs that\r | |
6401 | provides mathematics faculty with an opportunity to use the electronic\r | |
6402 | library. The remaining volumes were chosen for the library to test the\r | |
6403 | various capabilities of the scanning system.\r | |
6404 | \r | |
6405 | One project objective is to provide users of the Cornell library and the\r | |
6406 | library staff with the ability to request facsimiles of digitized images\r | |
6407 | or to retrieve the actual electronic image for browsing. A prototype\r | |
6408 | viewing workstation has been created by Xerox, with input into the design\r | |
6409 | by a committee of Cornell librarians and computer professionals. This\r | |
6410 | will allow us to experiment with patron access to the images that make up\r | |
6411 | the digital library. The viewing station provides search, retrieval, and\r | |
6412 | (ultimately) printing functions with enhancements to facilitate\r | |
6413 | navigation through multiple documents.\r | |
6414 | \r | |
6415 | Cornell currently is working to extend access to the digital library to\r | |
6416 | readers using workstations from their offices. This year is devoted to\r | |
6417 | the development of a network resident image conversion and delivery\r | |
6418 | server, and client software that will support readers who use Apple\r | |
6419 | Macintosh computers, IBM windows platforms, and Sun workstations. \r | |
6420 | Equipment for this development was provided by Sun Microsystems with\r | |
6421 | support from the Commission on Preservation and Access.\r | |
6422 | \r | |
6423 | During the show-and-tell session of the Workshop on Electronic Texts, a\r | |
6424 | prototype view station will be demonstrated. In addition, a display of\r | |
6425 | original library books that have been digitized will be available for\r | |
6426 | review with associated printed copies for comparison. The fifteen-minute\r | |
6427 | overview of the project will include a slide presentation that\r | |
6428 | constitutes a "tour" of the preservation digitizing process.\r | |
6429 | \r | |
6430 | The final network-connected version of the viewing station will provide\r | |
6431 | library users with another mechanism for accessing the digital library,\r | |
6432 | and will also provide the capability of viewing images directly. This\r | |
6433 | will not require special software, although a powerful computer with good\r | |
6434 | graphics will be needed.\r | |
6435 | \r | |
6436 | The Joint Study in Digital Preservation has generated a great deal of\r | |
6437 | interest in the library community. Unfortunately, or perhaps\r | |
6438 | fortunately, this project serves to raise a vast number of other issues\r | |
6439 | surrounding the use of digital technology for the preservation and use of\r | |
6440 | deteriorating library materials, which subsequent projects will need to\r | |
6441 | examine. Much work remains.\r | |
6442 | \r | |
6443 | SESSION III\r | |
6444 | \r | |
6445 | Howard BESSER Networking Multimedia Databases\r | |
6446 | \r | |
6447 | What do we have to consider in building and distributing databases of\r | |
6448 | visual materials in a multi-user environment? This presentation examines\r | |
6449 | a variety of concerns that need to be addressed before a multimedia\r | |
6450 | database can be set up in a networked environment.\r | |
6451 | \r | |
6452 | In the past it has not been feasible to implement databases of visual\r | |
6453 | materials in shared-user environments because of technological barriers. \r | |
6454 | Each of the two basic models for multi-user multimedia databases has\r | |
6455 | posed its own problem. The analog multimedia storage model (represented\r | |
6456 | by Project Athena's parallel analog and digital networks) has required an\r | |
6457 | incredibly complex (and expensive) infrastructure. The economies of\r | |
6458 | scale that make multi-user setups cheaper per user served do not operate\r | |
6459 | in an environment that requires a computer workstation, videodisc player,\r | |
6460 | and two display devices for each user.\r | |
6461 | \r | |
6462 | The digital multimedia storage model has required vast amounts of storage\r | |
6463 | space (as much as one gigabyte per thirty still images). In the past the\r | |
6464 | cost of such a large amount of storage space made this model a\r | |
6465 | prohibitive choice as well. But plunging storage costs are finally\r | |
6466 | making this second alternative viable.\r | |
6467 | \r | |
6468 | If storage no longer poses such an impediment, what do we need to\r | |
6469 | consider in building digitally stored multi-user databases of visual\r | |
6470 | materials? This presentation will examine the networking and\r | |
6471 | telecommunication constraints that must be overcome before such databases\r | |
6472 | can become commonplace and useful to a large number of people.\r | |
6473 | \r | |
6474 | The key problem is the vast size of multimedia documents, and how this\r | |
6475 | affects not only storage but telecommunications transmission time. \r | |
6476 | Anything slower than T-1 speed is impractical for files of 1 megabyte or\r | |
6477 | larger (which is likely to be small for a multimedia document). For\r | |
6478 | instance, even on a 56 Kb line it would take three minutes to transfer a\r | |
6479 | 1-megabyte file. And these figures assume ideal circumstances, and do\r | |
6480 | not take into consideration other users contending for network bandwidth,\r | |
6481 | disk access time, or the time needed for remote display. Current common\r | |
6482 | telephone transmission rates would be completely impractical; few users\r | |
6483 | would be willing to wait the hour necessary to transmit a single image at\r | |
6484 | 2400 baud.\r | |
6485 | \r | |
6486 | This necessitates compression, which itself raises a number of other\r | |
6487 | issues. In order to decrease file sizes significantly, we must employ\r | |
6488 | lossy compression algorithms. But how much quality can we afford to\r | |
6489 | lose? To date there has been only one significant study done of\r | |
6490 | image-quality needs for a particular user group, and this study did not\r | |
6491 | look at loss resulting from compression. Only after identifying\r | |
6492 | image-quality needs can we begin to address storage and network bandwidth\r | |
6493 | needs.\r | |
6494 | \r | |
6495 | Experience with X-Windows-based applications (such as Imagequery, the\r | |
6496 | University of California at Berkeley image database) demonstrates the\r | |
6497 | utility of a client-server topology, but also points to the limitation of\r | |
6498 | current software for a distributed environment. For example,\r | |
6499 | applications like Imagequery can incorporate compression, but current X\r | |
6500 | implementations do not permit decompression at the end user's\r | |
6501 | workstation. Such decompression at the host computer alleviates storage\r | |
6502 | capacity problems while doing nothing to address problems of\r | |
6503 | telecommunications bandwidth.\r | |
6504 | \r | |
6505 | We need to examine the effects on network through-put of moving\r | |
6506 | multimedia documents around on a network. We need to examine various\r | |
6507 | topologies that will help us avoid bottlenecks around servers and\r | |
6508 | gateways. Experience with applications such as these raise still broader\r | |
6509 | questions. How closely is the multimedia document tied to the software\r | |
6510 | for viewing it? Can it be accessed and viewed from other applications? \r | |
6511 | Experience with the MARC format (and more recently with the Z39.50\r | |
6512 | protocols) shows how useful it can be to store documents in a form in\r | |
6513 | which they can be accessed by a variety of application software.\r | |
6514 | \r | |
6515 | Finally, from an intellectual-access standpoint, we need to address the\r | |
6516 | issue of providing access to these multimedia documents in\r | |
6517 | interdisciplinary environments. We need to examine terminology and\r | |
6518 | indexing strategies that will allow us to provide access to this material\r | |
6519 | in a cross-disciplinary way.\r | |
6520 | \r | |
6521 | Ronald LARSEN Directions in High-Performance Networking for\r | |
6522 | Libraries\r | |
6523 | \r | |
6524 | The pace at which computing technology has advanced over the past forty\r | |
6525 | years shows no sign of abating. Roughly speaking, each five-year period\r | |
6526 | has yielded an order-of-magnitude improvement in price and performance of\r | |
6527 | computing equipment. No fundamental hurdles are likely to prevent this\r | |
6528 | pace from continuing for at least the next decade. It is only in the\r | |
6529 | past five years, though, that computing has become ubiquitous in\r | |
6530 | libraries, affecting all staff and patrons, directly or indirectly.\r | |
6531 | \r | |
6532 | During these same five years, communications rates on the Internet, the\r | |
6533 | principal academic computing network, have grown from 56 kbps to 1.5\r | |
6534 | Mbps, and the NSFNet backbone is now running 45 Mbps. Over the next five\r | |
6535 | years, communication rates on the backbone are expected to exceed 1 Gbps. \r | |
6536 | Growth in both the population of network users and the volume of network\r | |
6537 | traffic has continued to grow geometrically, at rates approaching 15\r | |
6538 | percent per month. This flood of capacity and use, likened by some to\r | |
6539 | "drinking from a firehose," creates immense opportunities and challenges\r | |
6540 | for libraries. Libraries must anticipate the future implications of this\r | |
6541 | technology, participate in its development, and deploy it to ensure\r | |
6542 | access to the world's information resources.\r | |
6543 | \r | |
6544 | The infrastructure for the information age is being put in place. \r | |
6545 | Libraries face strategic decisions about their role in the development,\r | |
6546 | deployment, and use of this infrastructure. The emerging infrastructure\r | |
6547 | is much more than computers and communication lines. It is more than the\r | |
6548 | ability to compute at a remote site, send electronic mail to a peer\r | |
6549 | across the country, or move a file from one library to another. The next\r | |
6550 | five years will witness substantial development of the information\r | |
6551 | infrastructure of the network.\r | |
6552 | \r | |
6553 | In order to provide appropriate leadership, library professionals must\r | |
6554 | have a fundamental understanding of and appreciation for computer\r | |
6555 | networking, from local area networks to the National Research and\r | |
6556 | Education Network (NREN). This presentation addresses these\r | |
6557 | fundamentals, and how they relate to libraries today and in the near\r | |
6558 | future.\r | |
6559 | \r | |
6560 | Edwin BROWNRIGG Electronic Library Visions and Realities\r | |
6561 | \r | |
6562 | The electronic library has been a vision desired by many--and rejected by\r | |
6563 | some--since Vannevar Bush coined the term memex to describe an automated,\r | |
6564 | intelligent, personal information system. Variations on this vision have\r | |
6565 | included Ted Nelson's Xanadau, Alan Kay's Dynabook, and Lancaster's\r | |
6566 | "paperless library," with the most recent incarnation being the\r | |
6567 | "Knowledge Navigator" described by John Scully of Apple. But the reality\r | |
6568 | of library service has been less visionary and the leap to the electronic\r | |
6569 | library has eluded universities, publishers, and information technology\r | |
6570 | files.\r | |
6571 | \r | |
6572 | The Memex Research Institute (MemRI), an independent, nonprofit research\r | |
6573 | and development organization, has created an Electronic Library Program\r | |
6574 | of shared research and development in order to make the collective vision\r | |
6575 | more concrete. The program is working toward the creation of large,\r | |
6576 | indexed publicly available electronic image collections of published\r | |
6577 | documents in academic, special, and public libraries. This strategic\r | |
6578 | plan is the result of the first stage of the program, which has been an\r | |
6579 | investigation of the information technologies available to support such\r | |
6580 | an effort, the economic parameters of electronic service compared to\r | |
6581 | traditional library operations, and the business and political factors\r | |
6582 | affecting the shift from print distribution to electronic networked\r | |
6583 | access.\r | |
6584 | \r | |
6585 | The strategic plan envisions a combination of publicly searchable access\r | |
6586 | databases, image (and text) document collections stored on network "file\r | |
6587 | servers," local and remote network access, and an intellectual property\r | |
6588 | management-control system. This combination of technology and\r | |
6589 | information content is defined in this plan as an E-library or E-library\r | |
6590 | collection. Some participating sponsors are already developing projects\r | |
6591 | based on MemRI's recommended directions.\r | |
6592 | \r | |
6593 | The E-library strategy projected in this plan is a visionary one that can\r | |
6594 | enable major changes and improvements in academic, public, and special\r | |
6595 | library service. This vision is, though, one that can be realized with\r | |
6596 | today's technology. At the same time, it will challenge the political\r | |
6597 | and social structure within which libraries operate: in academic\r | |
6598 | libraries, the traditional emphasis on local collections, extending to\r | |
6599 | accreditation issues; in public libraries, the potential of electronic\r | |
6600 | branch and central libraries fully available to the public; and for\r | |
6601 | special libraries, new opportunities for shared collections and networks.\r | |
6602 | \r | |
6603 | The environment in which this strategic plan has been developed is, at\r | |
6604 | the moment, dominated by a sense of library limits. The continued\r | |
6605 | expansion and rapid growth of local academic library collections is now\r | |
6606 | clearly at an end. Corporate libraries, and even law libraries, are\r | |
6607 | faced with operating within a difficult economic climate, as well as with\r | |
6608 | very active competition from commercial information sources. For\r | |
6609 | example, public libraries may be seen as a desirable but not critical\r | |
6610 | municipal service in a time when the budgets of safety and health\r | |
6611 | agencies are being cut back.\r | |
6612 | \r | |
6613 | Further, libraries in general have a very high labor-to-cost ratio in\r | |
6614 | their budgets, and labor costs are still increasing, notwithstanding\r | |
6615 | automation investments. It is difficult for libraries to obtain capital,\r | |
6616 | startup, or seed funding for innovative activities, and those\r | |
6617 | technology-intensive initiatives that offer the potential of decreased\r | |
6618 | labor costs can provoke the opposition of library staff.\r | |
6619 | \r | |
6620 | However, libraries have achieved some considerable successes in the past\r | |
6621 | two decades by improving both their service and their credibility within\r | |
6622 | their organizations--and these positive changes have been accomplished\r | |
6623 | mostly with judicious use of information technologies. The advances in\r | |
6624 | computing and information technology have been well-chronicled: the\r | |
6625 | continuing precipitous drop in computing costs, the growth of the\r | |
6626 | Internet and private networks, and the explosive increase in publicly\r | |
6627 | available information databases.\r | |
6628 | \r | |
6629 | For example, OCLC has become one of the largest computer network\r | |
6630 | organizations in the world by creating a cooperative cataloging network\r | |
6631 | of more than 6,000 libraries worldwide. On-line public access catalogs\r | |
6632 | now serve millions of users on more than 50,000 dedicated terminals in\r | |
6633 | the United States alone. The University of California MELVYL on-line\r | |
6634 | catalog system has now expanded into an index database reference service\r | |
6635 | and supports more than six million searches a year. And, libraries have\r | |
6636 | become the largest group of customers of CD-ROM publishing technology;\r | |
6637 | more than 30,000 optical media publications such as those offered by\r | |
6638 | InfoTrac and Silver Platter are subscribed to by U.S. libraries.\r | |
6639 | \r | |
6640 | This march of technology continues and in the next decade will result in\r | |
6641 | further innovations that are extremely difficult to predict. What is\r | |
6642 | clear is that libraries can now go beyond automation of their order files\r | |
6643 | and catalogs to automation of their collections themselves--and it is\r | |
6644 | possible to circumvent the fiscal limitations that appear to obtain\r | |
6645 | today.\r | |
6646 | \r | |
6647 | This Electronic Library Strategic Plan recommends a paradigm shift in\r | |
6648 | library service, and demonstrates the steps necessary to provide improved\r | |
6649 | library services with limited capacities and operating investments.\r | |
6650 | \r | |
6651 | SESSION IV-A\r | |
6652 | \r | |
6653 | Anne KENNEY\r | |
6654 | \r | |
6655 | The Cornell/Xerox Joint Study in Digital Preservation resulted in the\r | |
6656 | recording of 1,000 brittle books as 600-dpi digital images and the\r | |
6657 | production, on demand, of high-quality and archivally sound paper\r | |
6658 | replacements. The project, which was supported by the Commission on\r | |
6659 | Preservation and Access, also investigated some of the issues surrounding\r | |
6660 | scanning, storing, retrieving, and providing access to digital images in\r | |
6661 | a network environment.\r | |
6662 | \r | |
6663 | Anne Kenney will focus on some of the issues surrounding direct scanning\r | |
6664 | as identified in the Cornell Xerox Project. Among those to be discussed\r | |
6665 | are: image versus text capture; indexing and access; image-capture\r | |
6666 | capabilities; a comparison to photocopy and microfilm; production and\r | |
6667 | cost analysis; storage formats, protocols, and standards; and the use of\r | |
6668 | this scanning technology for preservation purposes.\r | |
6669 | \r | |
6670 | The 600-dpi digital images produced in the Cornell Xerox Project proved\r | |
6671 | highly acceptable for creating paper replacements of deteriorating\r | |
6672 | originals. The 1,000 scanned volumes provided an array of image-capture\r | |
6673 | challenges that are common to nineteenth-century printing techniques and\r | |
6674 | embrittled material, and that defy the use of text-conversion processes. \r | |
6675 | These challenges include diminished contrast between text and background,\r | |
6676 | fragile and deteriorated pages, uneven printing, elaborate type faces,\r | |
6677 | faint and bold text adjacency, handwritten text and annotations, nonRoman\r | |
6678 | languages, and a proliferation of illustrated material embedded in text. \r | |
6679 | The latter category included high-frequency and low-frequency halftones,\r | |
6680 | continuous tone photographs, intricate mathematical drawings, maps,\r | |
6681 | etchings, reverse-polarity drawings, and engravings.\r | |
6682 | \r | |
6683 | The Xerox prototype scanning system provided a number of important\r | |
6684 | features for capturing this diverse material. Technicians used multiple\r | |
6685 | threshold settings, filters, line art and halftone definitions,\r | |
6686 | autosegmentation, windowing, and software-editing programs to optimize\r | |
6687 | image capture. At the same time, this project focused on production. \r | |
6688 | The goal was to make scanning as affordable and acceptable as\r | |
6689 | photocopying and microfilming for preservation reformatting. A\r | |
6690 | time-and-cost study conducted during the last three months of this\r | |
6691 | project confirmed the economic viability of digital scanning, and these\r | |
6692 | findings will be discussed here.\r | |
6693 | \r | |
6694 | From the outset, the Cornell Xerox Project was predicated on the use of\r | |
6695 | nonproprietary standards and the use of common protocols when standards\r | |
6696 | did not exist. Digital files were created as TIFF images which were\r | |
6697 | compressed prior to storage using Group 4 CCITT compression. The Xerox\r | |
6698 | software is MS DOS based and utilizes off-the shelf programs such as\r | |
6699 | Microsoft Windows and Wang Image Wizard. The digital library is designed\r | |
6700 | to be hardware-independent and to provide interchangeability with other\r | |
6701 | institutions through network connections. Access to the digital files\r | |
6702 | themselves is two-tiered: Bibliographic records for the computer files\r | |
6703 | are created in RLIN and Cornell's local system and access into the actual\r | |
6704 | digital images comprising a book is provided through a document control\r | |
6705 | structure and a networked image file-server, both of which will be\r | |
6706 | described.\r | |
6707 | \r | |
6708 | The presentation will conclude with a discussion of some of the issues\r | |
6709 | surrounding the use of this technology as a preservation tool (storage,\r | |
6710 | refreshing, backup).\r | |
6711 | \r | |
6712 | Pamela ANDRE and Judith ZIDAR\r | |
6713 | \r | |
6714 | The National Agricultural Library (NAL) has had extensive experience with\r | |
6715 | raster scanning of printed materials. Since 1987, the Library has\r | |
6716 | participated in the National Agricultural Text Digitizing Project (NATDP)\r | |
6717 | a cooperative effort between NAL and forty-five land grant university\r | |
6718 | libraries. An overview of the project will be presented, giving its\r | |
6719 | history and NAL's strategy for the future.\r | |
6720 | \r | |
6721 | An in-depth discussion of NATDP will follow, including a description of\r | |
6722 | the scanning process, from the gathering of the printed materials to the\r | |
6723 | archiving of the electronic pages. The type of equipment required for a\r | |
6724 | stand-alone scanning workstation and the importance of file management\r | |
6725 | software will be discussed. Issues concerning the images themselves will\r | |
6726 | be addressed briefly, such as image format; black and white versus color;\r | |
6727 | gray scale versus dithering; and resolution.\r | |
6728 | \r | |
6729 | Also described will be a study currently in progress by NAL to evaluate\r | |
6730 | the usefulness of converting microfilm to electronic images in order to\r | |
6731 | improve access. With the cooperation of Tuskegee University, NAL has\r | |
6732 | selected three reels of microfilm from a collection of sixty-seven reels\r | |
6733 | containing the papers, letters, and drawings of George Washington Carver. \r | |
6734 | The three reels were converted into 3,500 electronic images using a\r | |
6735 | specialized microfilm scanner. The selection, filming, and indexing of\r | |
6736 | this material will be discussed.\r | |
6737 | \r | |
6738 | Donald WATERS\r | |
6739 | \r | |
6740 | Project Open Book, the Yale University Library's effort to convert 10,\r | |
6741 | 000 books from microfilm to digital imagery, is currently in an advanced\r | |
6742 | state of planning and organization. The Yale Library has selected a\r | |
6743 | major vendor to serve as a partner in the project and as systems\r | |
6744 | integrator. In its proposal, the successful vendor helped isolate areas\r | |
6745 | of risk and uncertainty as well as key issues to be addressed during the\r | |
6746 | life of the project. The Yale Library is now poised to decide what\r | |
6747 | material it will convert to digital image form and to seek funding,\r | |
6748 | initially for the first phase and then for the entire project.\r | |
6749 | \r | |
6750 | The proposal that Yale accepted for the implementation of Project Open\r | |
6751 | Book will provide at the end of three phases a conversion subsystem,\r | |
6752 | browsing stations distributed on the campus network within the Yale\r | |
6753 | Library, a subsystem for storing 10,000 books at 200 and 600 dots per\r | |
6754 | inch, and network access to the image printers. Pricing for the system\r | |
6755 | implementation assumes the existence of Yale's campus ethernet network\r | |
6756 | and its high-speed image printers, and includes other requisite hardware\r | |
6757 | and software, as well as system integration services. Proposed operating\r | |
6758 | costs include hardware and software maintenance, but do not include\r | |
6759 | estimates for the facilities management of the storage devices and image\r | |
6760 | servers.\r | |
6761 | \r | |
6762 | Yale selected its vendor partner in a formal process, partly funded by\r | |
6763 | the Commission for Preservation and Access. Following a request for\r | |
6764 | proposal, the Yale Library selected two vendors as finalists to work with\r | |
6765 | Yale staff to generate a detailed analysis of requirements for Project\r | |
6766 | Open Book. Each vendor used the results of the requirements analysis to\r | |
6767 | generate and submit a formal proposal for the entire project. This\r | |
6768 | competitive process not only enabled the Yale Library to select its\r | |
6769 | primary vendor partner but also revealed much about the state of the\r | |
6770 | imaging industry, about the varying, corporate commitments to the markets\r | |
6771 | for imaging technology, and about the varying organizational dynamics\r | |
6772 | through which major companies are responding to and seeking to develop\r | |
6773 | these markets.\r | |
6774 | \r | |
6775 | Project Open Book is focused specifically on the conversion of images\r | |
6776 | from microfilm to digital form. The technology for scanning microfilm is\r | |
6777 | readily available but is changing rapidly. In its project requirements,\r | |
6778 | the Yale Library emphasized features of the technology that affect the\r | |
6779 | technical quality of digital image production and the costs of creating\r | |
6780 | and storing the image library: What levels of digital resolution can be\r | |
6781 | achieved by scanning microfilm? How does variation in the quality of\r | |
6782 | microfilm, particularly in film produced to preservation standards,\r | |
6783 | affect the quality of the digital images? What technologies can an\r | |
6784 | operator effectively and economically apply when scanning film to\r | |
6785 | separate two-up images and to control for and correct image\r | |
6786 | imperfections? How can quality control best be integrated into\r | |
6787 | digitizing work flow that includes document indexing and storage?\r | |
6788 | \r | |
6789 | The actual and expected uses of digital images--storage, browsing,\r | |
6790 | printing, and OCR--help determine the standards for measuring their\r | |
6791 | quality. Browsing is especially important, but the facilities available\r | |
6792 | for readers to browse image documents is perhaps the weakest aspect of\r | |
6793 | imaging technology and most in need of development. As it defined its\r | |
6794 | requirements, the Yale Library concentrated on some fundamental aspects\r | |
6795 | of usability for image documents: Does the system have sufficient\r | |
6796 | flexibility to handle the full range of document types, including\r | |
6797 | monographs, multi-part and multivolume sets, and serials, as well as\r | |
6798 | manuscript collections? What conventions are necessary to identify a\r | |
6799 | document uniquely for storage and retrieval? Where is the database of\r | |
6800 | record for storing bibliographic information about the image document? \r | |
6801 | How are basic internal structures of documents, such as pagination, made\r | |
6802 | accessible to the reader? How are the image documents physically\r | |
6803 | presented on the screen to the reader?\r | |
6804 | \r | |
6805 | The Yale Library designed Project Open Book on the assumption that\r | |
6806 | microfilm is more than adequate as a medium for preserving the content of\r | |
6807 | deteriorated library materials. As planning in the project has advanced,\r | |
6808 | it is increasingly clear that the challenge of digital image technology\r | |
6809 | and the key to the success of efforts like Project Open Book is to\r | |
6810 | provide a means of both preserving and improving access to those\r | |
6811 | deteriorated materials.\r | |
6812 | \r | |
6813 | SESSION IV-B\r | |
6814 | \r | |
6815 | George THOMA\r | |
6816 | \r | |
6817 | In the use of electronic imaging for document preservation, there are\r | |
6818 | several issues to consider, such as: ensuring adequate image quality,\r | |
6819 | maintaining substantial conversion rates (through-put), providing unique\r | |
6820 | identification for automated access and retrieval, and accommodating\r | |
6821 | bound volumes and fragile material.\r | |
6822 | \r | |
6823 | To maintain high image quality, image processing functions are required\r | |
6824 | to correct the deficiencies in the scanned image. Some commercially\r | |
6825 | available systems include these functions, while some do not. The\r | |
6826 | scanned raw image must be processed to correct contrast deficiencies--\r | |
6827 | both poor overall contrast resulting from light print and/or dark\r | |
6828 | background, and variable contrast resulting from stains and\r | |
6829 | bleed-through. Furthermore, the scan density must be adequate to allow\r | |
6830 | legibility of print and sufficient fidelity in the pseudo-halftoned gray\r | |
6831 | material. Borders or page-edge effects must be removed for both\r | |
6832 | compactibility and aesthetics. Page skew must be corrected for aesthetic\r | |
6833 | reasons and to enable accurate character recognition if desired. \r | |
6834 | Compound images consisting of both two-toned text and gray-scale\r | |
6835 | illustrations must be processed appropriately to retain the quality of\r | |
6836 | each.\r | |
6837 | \r | |
6838 | SESSION IV-C\r | |
6839 | \r | |
6840 | Jean BARONAS\r | |
6841 | \r | |
6842 | Standards publications being developed by scientists, engineers, and\r | |
6843 | business managers in Association for Information and Image Management\r | |
6844 | (AIIM) standards committees can be applied to electronic image management\r | |
6845 | (EIM) processes including: document (image) transfer, retrieval and\r | |
6846 | evaluation; optical disk and document scanning; and document design and\r | |
6847 | conversion. When combined with EIM system planning and operations,\r | |
6848 | standards can assist in generating image databases that are\r | |
6849 | interchangeable among a variety of systems. The applications of\r | |
6850 | different approaches for image-tagging, indexing, compression, and\r | |
6851 | transfer often cause uncertainty concerning EIM system compatibility,\r | |
6852 | calibration, performance, and upward compatibility, until standard\r | |
6853 | implementation parameters are established. The AIIM standards that are\r | |
6854 | being developed for these applications can be used to decrease the\r | |
6855 | uncertainty, successfully integrate imaging processes, and promote "open\r | |
6856 | systems." AIIM is an accredited American National Standards Institute\r | |
6857 | (ANSI) standards developer with more than twenty committees comprised of\r | |
6858 | 300 volunteers representing users, vendors, and manufacturers. The\r | |
6859 | standards publications that are developed in these committees have\r | |
6860 | national acceptance and provide the basis for international harmonization\r | |
6861 | in the development of new International Organization for Standardization\r | |
6862 | (ISO) standards.\r | |
6863 | \r | |
6864 | This presentation describes the development of AIIM's EIM standards and a\r | |
6865 | new effort at AIIM, a database on standards projects in a wide framework\r | |
6866 | of imaging industries including capture, recording, processing,\r | |
6867 | duplication, distribution, display, evaluation, and preservation. The\r | |
6868 | AIIM Imagery Database will cover imaging standards being developed by\r | |
6869 | many organizations in many different countries. It will contain\r | |
6870 | standards publications' dates, origins, related national and\r | |
6871 | international projects, status, key words, and abstracts. The ANSI Image\r | |
6872 | Technology Standards Board requested that such a database be established,\r | |
6873 | as did the ISO/International Electrotechnical Commission Joint Task Force\r | |
6874 | on Imagery. AIIM will take on the leadership role for the database and\r | |
6875 | coordinate its development with several standards developers.\r | |
6876 | \r | |
6877 | Patricia BATTIN\r | |
6878 | \r | |
6879 | Characteristics of standards for digital imagery:\r | |
6880 | \r | |
6881 | * Nature of digital technology implies continuing volatility.\r | |
6882 | \r | |
6883 | * Precipitous standard-setting not possible and probably not\r | |
6884 | desirable.\r | |
6885 | \r | |
6886 | * Standards are a complex issue involving the medium, the\r | |
6887 | hardware, the software, and the technical capacity for\r | |
6888 | reproductive fidelity and clarity.\r | |
6889 | \r | |
6890 | * The prognosis for reliable archival standards (as defined by\r | |
6891 | librarians) in the foreseeable future is poor.\r | |
6892 | \r | |
6893 | Significant potential and attractiveness of digital technology as a\r | |
6894 | preservation medium and access mechanism.\r | |
6895 | \r | |
6896 | Productive use of digital imagery for preservation requires a\r | |
6897 | reconceptualizing of preservation principles in a volatile,\r | |
6898 | standardless world.\r | |
6899 | \r | |
6900 | Concept of managing continuing access in the digital environment\r | |
6901 | rather than focusing on the permanence of the medium and long-term\r | |
6902 | archival standards developed for the analog world.\r | |
6903 | \r | |
6904 | Transition period: How long and what to do?\r | |
6905 | \r | |
6906 | * Redefine "archival."\r | |
6907 | \r | |
6908 | * Remove the burden of "archival copy" from paper artifacts.\r | |
6909 | \r | |
6910 | * Use digital technology for storage, develop management\r | |
6911 | strategies for refreshing medium, hardware and software.\r | |
6912 | \r | |
6913 | * Create acid-free paper copies for transition period backup\r | |
6914 | until we develop reliable procedures for ensuring continuing\r | |
6915 | access to digital files.\r | |
6916 | \r | |
6917 | SESSION IV-D\r | |
6918 | \r | |
6919 | Stuart WEIBEL The Role of SGML Markup in the CORE Project (6)\r | |
6920 | \r | |
6921 | The emergence of high-speed telecommunications networks as a basic\r | |
6922 | feature of the scholarly workplace is driving the demand for electronic\r | |
6923 | document delivery. Three distinct categories of electronic\r | |
6924 | publishing/republishing are necessary to support access demands in this\r | |
6925 | emerging environment:\r | |
6926 | \r | |
6927 | 1.) Conversion of paper or microfilm archives to electronic format\r | |
6928 | 2.) Conversion of electronic files to formats tailored to\r | |
6929 | electronic retrieval and display\r | |
6930 | 3.) Primary electronic publishing (materials for which the\r | |
6931 | electronic version is the primary format)\r | |
6932 | \r | |
6933 | OCLC has experimental or product development activities in each of these\r | |
6934 | areas. Among the challenges that lie ahead is the integration of these\r | |
6935 | three types of information stores in coherent distributed systems.\r | |
6936 | \r | |
6937 | The CORE (Chemistry Online Retrieval Experiment) Project is a model for\r | |
6938 | the conversion of large text and graphics collections for which\r | |
6939 | electronic typesetting files are available (category 2). The American\r | |
6940 | Chemical Society has made available computer typography files dating from\r | |
6941 | 1980 for its twenty journals. This collection of some 250 journal-years\r | |
6942 | is being converted to an electronic format that will be accessible\r | |
6943 | through several end-user applications.\r | |
6944 | \r | |
6945 | The use of Standard Generalized Markup Language (SGML) offers the means\r | |
6946 | to capture the structural richness of the original articles in a way that\r | |
6947 | will support a variety of retrieval, navigation, and display options\r | |
6948 | necessary to navigate effectively in very large text databases.\r | |
6949 | \r | |
6950 | An SGML document consists of text that is marked up with descriptive tags\r | |
6951 | that specify the function of a given element within the document. As a\r | |
6952 | formal language construct, an SGML document can be parsed against a\r | |
6953 | document-type definition (DTD) that unambiguously defines what elements\r | |
6954 | are allowed and where in the document they can (or must) occur. This\r | |
6955 | formalized map of article structure allows the user interface design to\r | |
6956 | be uncoupled from the underlying database system, an important step\r | |
6957 | toward interoperability. Demonstration of this separability is a part of\r | |
6958 | the CORE project, wherein user interface designs born of very different\r | |
6959 | philosophies will access the same database.\r | |
6960 | \r | |
6961 | NOTES:\r | |
6962 | (6) The CORE project is a collaboration among Cornell University's\r | |
6963 | Mann Library, Bell Communications Research (Bellcore), the American\r | |
6964 | Chemical Society (ACS), the Chemical Abstracts Service (CAS), and\r | |
6965 | OCLC.\r | |
6966 | \r | |
6967 | Michael LESK The CORE Electronic Chemistry Library\r | |
6968 | \r | |
6969 | A major on-line file of chemical journal literature complete with\r | |
6970 | graphics is being developed to test the usability of fully electronic\r | |
6971 | access to documents, as a joint project of Cornell University, the\r | |
6972 | American Chemical Society, the Chemical Abstracts Service, OCLC, and\r | |
6973 | Bellcore (with additional support from Sun Microsystems, Springer-Verlag,\r | |
6974 | DigitaI Equipment Corporation, Sony Corporation of America, and Apple\r | |
6975 | Computers). Our file contains the American Chemical Society's on-line\r | |
6976 | journals, supplemented with the graphics from the paper publication. The\r | |
6977 | indexing of the articles from Chemical Abstracts Documents is available\r | |
6978 | in both image and text format, and several different interfaces can be\r | |
6979 | used. Our goals are (1) to assess the effectiveness and acceptability of\r | |
6980 | electronic access to primary journals as compared with paper, and (2) to\r | |
6981 | identify the most desirable functions of the user interface to an\r | |
6982 | electronic system of journals, including in particular a comparison of\r | |
6983 | page-image display with ASCII display interfaces. Early experiments with\r | |
6984 | chemistry students on a variety of tasks suggest that searching tasks are\r | |
6985 | completed much faster with any electronic system than with paper, but\r | |
6986 | that for reading all versions of the articles are roughly equivalent.\r | |
6987 | \r | |
6988 | Pamela ANDRE and Judith ZIDAR\r | |
6989 | \r | |
6990 | Text conversion is far more expensive and time-consuming than image\r | |
6991 | capture alone. NAL's experience with optical character recognition (OCR)\r | |
6992 | will be related and compared with the experience of having text rekeyed. \r | |
6993 | What factors affect OCR accuracy? How accurate does full text have to be\r | |
6994 | in order to be useful? How do different users react to imperfect text? \r | |
6995 | These are questions that will be explored. For many, a service bureau\r | |
6996 | may be a better solution than performing the work inhouse; this will also\r | |
6997 | be discussed.\r | |
6998 | \r | |
6999 | SESSION VI\r | |
7000 | \r | |
7001 | Marybeth PETERS\r | |
7002 | \r | |
7003 | Copyright law protects creative works. Protection granted by the law to\r | |
7004 | authors and disseminators of works includes the right to do or authorize\r | |
7005 | the following: reproduce the work, prepare derivative works, distribute\r | |
7006 | the work to the public, and publicly perform or display the work. In\r | |
7007 | addition, copyright owners of sound recordings and computer programs have\r | |
7008 | the right to control rental of their works. These rights are not\r | |
7009 | unlimited; there are a number of exceptions and limitations.\r | |
7010 | \r | |
7011 | An electronic environment places strains on the copyright system. \r | |
7012 | Copyright owners want to control uses of their work and be paid for any\r | |
7013 | use; the public wants quick and easy access at little or no cost. The\r | |
7014 | marketplace is working in this area. Contracts, guidelines on electronic\r | |
7015 | use, and collective licensing are in use and being refined.\r | |
7016 | \r | |
7017 | Issues concerning the ability to change works without detection are more\r | |
7018 | difficult to deal with. Questions concerning the integrity of the work\r | |
7019 | and the status of the changed version under the copyright law are to be\r | |
7020 | addressed. These are public policy issues which require informed\r | |
7021 | dialogue.\r | |
7022 | \r | |
7023 | \r | |
7024 | *** *** *** ****** *** *** ***\r | |
7025 | \r | |
7026 | \r | |
7027 | Appendix III: DIRECTORY OF PARTICIPANTS\r | |
7028 | \r | |
7029 | \r | |
7030 | PRESENTERS:\r | |
7031 | \r | |
7032 | Pamela Q.J. Andre\r | |
7033 | Associate Director, Automation\r | |
7034 | National Agricultural Library\r | |
7035 | 10301 Baltimore Boulevard\r | |
7036 | Beltsville, MD 20705-2351\r | |
7037 | Phone: (301) 504-6813\r | |
7038 | Fax: (301) 504-7473\r | |
7039 | E-mail: INTERNET: PANDRE@ASRR.ARSUSDA.GOV\r | |
7040 | \r | |
7041 | Jean Baronas, Senior Manager\r | |
7042 | Department of Standards and Technology\r | |
7043 | Association for Information and Image Management (AIIM)\r | |
7044 | 1100 Wayne Avenue, Suite 1100\r | |
7045 | Silver Spring, MD 20910\r | |
7046 | Phone: (301) 587-8202\r | |
7047 | Fax: (301) 587-2711\r | |
7048 | \r | |
7049 | Patricia Battin, President\r | |
7050 | The Commission on Preservation and Access\r | |
7051 | 1400 16th Street, N.W.\r | |
7052 | Suite 740\r | |
7053 | Washington, DC 20036-2217\r | |
7054 | Phone: (202) 939-3400\r | |
7055 | Fax: (202) 939-3407\r | |
7056 | E-mail: CPA@GWUVM.BITNET\r | |
7057 | \r | |
7058 | Howard Besser\r | |
7059 | Centre Canadien d'Architecture\r | |
7060 | (Canadian Center for Architecture)\r | |
7061 | 1920, rue Baile\r | |
7062 | Montreal, Quebec H3H 2S6\r | |
7063 | CANADA\r | |
7064 | Phone: (514) 939-7001\r | |
7065 | Fax: (514) 939-7020\r | |
7066 | E-mail: howard@lis.pitt.edu\r | |
7067 | \r | |
7068 | Edwin B. Brownrigg, Executive Director\r | |
7069 | Memex Research Institute\r | |
7070 | 422 Bonita Avenue\r | |
7071 | Roseville, CA 95678\r | |
7072 | Phone: (916) 784-2298\r | |
7073 | Fax: (916) 786-7559\r | |
7074 | E-mail: BITNET: MEMEX@CALSTATE.2\r | |
7075 | \r | |
7076 | Eric M. Calaluca, Vice President\r | |
7077 | Chadwyck-Healey, Inc.\r | |
7078 | 1101 King Street\r | |
7079 | Alexandria, VA 223l4\r | |
7080 | Phone: (800) 752-05l5\r | |
7081 | Fax: (703) 683-7589\r | |
7082 | \r | |
7083 | James Daly\r | |
7084 | 4015 Deepwood Road\r | |
7085 | Baltimore, MD 21218-1404\r | |
7086 | Phone: (410) 235-0763\r | |
7087 | \r | |
7088 | Ricky Erway, Associate Coordinator\r | |
7089 | American Memory\r | |
7090 | Library of Congress\r | |
7091 | Phone: (202) 707-6233\r | |
7092 | Fax: (202) 707-3764\r | |
7093 | \r | |
7094 | Carl Fleischhauer, Coordinator\r | |
7095 | American Memory\r | |
7096 | Library of Congress\r | |
7097 | Phone: (202) 707-6233\r | |
7098 | Fax: (202) 707-3764\r | |
7099 | \r | |
7100 | Joanne Freeman\r | |
7101 | 2000 Jefferson Park Avenue, No. 7\r | |
7102 | Charlottesville, VA 22903\r | |
7103 | \r | |
7104 | Prosser Gifford\r | |
7105 | Director for Scholarly Programs\r | |
7106 | Library of Congress\r | |
7107 | Phone: (202) 707-1517\r | |
7108 | Fax: (202) 707-9898\r | |
7109 | E-mail: pgif@seq1.loc.gov\r | |
7110 | \r | |
7111 | Jacqueline Hess, Director\r | |
7112 | National Demonstration Laboratory\r | |
7113 | for Interactive Information Technologies\r | |
7114 | Library of Congress\r | |
7115 | Phone: (202) 707-4157\r | |
7116 | Fax: (202) 707-2829\r | |
7117 | \r | |
7118 | Susan Hockey, Director\r | |
7119 | Center for Electronic Texts in the Humanities (CETH)\r | |
7120 | Alexander Library\r | |
7121 | Rutgers University\r | |
7122 | 169 College Avenue\r | |
7123 | New Brunswick, NJ 08903\r | |
7124 | Phone: (908) 932-1384\r | |
7125 | Fax: (908) 932-1386\r | |
7126 | E-mail: hockey@zodiac.rutgers.edu\r | |
7127 | \r | |
7128 | William L. Hooton, Vice President\r | |
7129 | Business & Technical Development\r | |
7130 | Imaging & Information Systems Group\r | |
7131 | I-NET\r | |
7132 | 6430 Rockledge Drive, Suite 400\r | |
7133 | Bethesda, MD 208l7\r | |
7134 | Phone: (301) 564-6750\r | |
7135 | Fax: (513) 564-6867\r | |
7136 | \r | |
7137 | Anne R. Kenney, Associate Director\r | |
7138 | Department of Preservation and Conservation\r | |
7139 | 701 Olin Library\r | |
7140 | Cornell University\r | |
7141 | Ithaca, NY 14853\r | |
7142 | Phone: (607) 255-6875\r | |
7143 | Fax: (607) 255-9346\r | |
7144 | E-mail: LYDY@CORNELLA.BITNET\r | |
7145 | \r | |
7146 | Ronald L. Larsen\r | |
7147 | Associate Director for Information Technology\r | |
7148 | University of Maryland at College Park\r | |
7149 | Room B0224, McKeldin Library\r | |
7150 | College Park, MD 20742-7011\r | |
7151 | Phone: (301) 405-9194\r | |
7152 | Fax: (301) 314-9865\r | |
7153 | E-mail: rlarsen@libr.umd.edu\r | |
7154 | \r | |
7155 | Maria L. Lebron, Managing Editor\r | |
7156 | The Online Journal of Current Clinical Trials\r | |
7157 | l333 H Street, N.W.\r | |
7158 | Washington, DC 20005\r | |
7159 | Phone: (202) 326-6735\r | |
7160 | Fax: (202) 842-2868\r | |
7161 | E-mail: PUBSAAAS@GWUVM.BITNET\r | |
7162 | \r | |
7163 | Michael Lesk, Executive Director\r | |
7164 | Computer Science Research\r | |
7165 | Bell Communications Research, Inc.\r | |
7166 | Rm 2A-385\r | |
7167 | 445 South Street\r | |
7168 | Morristown, NJ 07960-l9l0 \r | |
7169 | Phone: (201) 829-4070\r | |
7170 | Fax: (201) 829-5981\r | |
7171 | E-mail: lesk@bellcore.com (Internet) or bellcore!lesk (uucp)\r | |
7172 | \r | |
7173 | Clifford A. Lynch\r | |
7174 | Director, Library Automation\r | |
7175 | University of California,\r | |
7176 | Office of the President\r | |
7177 | 300 Lakeside Drive, 8th Floor\r | |
7178 | Oakland, CA 94612-3350\r | |
7179 | Phone: (510) 987-0522\r | |
7180 | Fax: (510) 839-3573\r | |
7181 | E-mail: calur@uccmvsa\r | |
7182 | \r | |
7183 | Avra Michelson\r | |
7184 | National Archives and Records Administration\r | |
7185 | NSZ Rm. 14N\r | |
7186 | 7th & Pennsylvania, N.W.\r | |
7187 | Washington, D.C. 20408\r | |
7188 | Phone: (202) 501-5544\r | |
7189 | Fax: (202) 501-5533\r | |
7190 | E-mail: tmi@cu.nih.gov\r | |
7191 | \r | |
7192 | Elli Mylonas, Managing Editor\r | |
7193 | Perseus Project\r | |
7194 | Department of the Classics\r | |
7195 | Harvard University\r | |
7196 | 319 Boylston Hall\r | |
7197 | Cambridge, MA 02138\r | |
7198 | Phone: (617) 495-9025, (617) 495-0456 (direct)\r | |
7199 | Fax: (617) 496-8886\r | |
7200 | E-mail: Elli@IKAROS.Harvard.EDU or elli@wjh12.harvard.edu\r | |
7201 | \r | |
7202 | David Woodley Packard\r | |
7203 | Packard Humanities Institute\r | |
7204 | 300 Second Street, Suite 201\r | |
7205 | Los Altos, CA 94002\r | |
7206 | Phone: (415) 948-0150 (PHI)\r | |
7207 | Fax: (415) 948-5793\r | |
7208 | \r | |
7209 | Lynne K. Personius, Assistant Director\r | |
7210 | Cornell Information Technologies for\r | |
7211 | Scholarly Information Sources\r | |
7212 | 502 Olin Library\r | |
7213 | Cornell University\r | |
7214 | Ithaca, NY 14853\r | |
7215 | Phone: (607) 255-3393\r | |
7216 | Fax: (607) 255-9346\r | |
7217 | E-mail: JRN@CORNELLC.BITNET\r | |
7218 | \r | |
7219 | Marybeth Peters\r | |
7220 | Policy Planning Adviser to the\r | |
7221 | Register of Copyrights\r | |
7222 | Library of Congress\r | |
7223 | Office LM 403\r | |
7224 | Phone: (202) 707-8350\r | |
7225 | Fax: (202) 707-8366\r | |
7226 | \r | |
7227 | C. Michael Sperberg-McQueen\r | |
7228 | Editor, Text Encoding Initiative\r | |
7229 | Computer Center (M/C 135)\r | |
7230 | University of Illinois at Chicago\r | |
7231 | Box 6998\r | |
7232 | Chicago, IL 60680\r | |
7233 | Phone: (312) 413-0317\r | |
7234 | Fax: (312) 996-6834\r | |
7235 | E-mail: u35395@uicvm..cc.uic.edu or u35395@uicvm.bitnet\r | |
7236 | \r | |
7237 | George R. Thoma, Chief\r | |
7238 | Communications Engineering Branch\r | |
7239 | National Library of Medicine\r | |
7240 | 8600 Rockville Pike\r | |
7241 | Bethesda, MD 20894\r | |
7242 | Phone: (301) 496-4496\r | |
7243 | Fax: (301) 402-0341\r | |
7244 | E-mail: thoma@lhc.nlm.nih.gov\r | |
7245 | \r | |
7246 | Dorothy Twohig, Editor\r | |
7247 | The Papers of George Washington\r | |
7248 | 504 Alderman Library\r | |
7249 | University of Virginia\r | |
7250 | Charlottesville, VA 22903-2498\r | |
7251 | Phone: (804) 924-0523\r | |
7252 | Fax: (804) 924-4337\r | |
7253 | \r | |
7254 | Susan H. Veccia, Team leader\r | |
7255 | American Memory, User Evaluation\r | |
7256 | Library of Congress\r | |
7257 | American Memory Evaluation Project\r | |
7258 | Phone: (202) 707-9104\r | |
7259 | Fax: (202) 707-3764\r | |
7260 | E-mail: svec@seq1.loc.gov\r | |
7261 | \r | |
7262 | Donald J. Waters, Head\r | |
7263 | Systems Office\r | |
7264 | Yale University Library\r | |
7265 | New Haven, CT 06520\r | |
7266 | Phone: (203) 432-4889\r | |
7267 | Fax: (203) 432-7231\r | |
7268 | E-mail: DWATERS@YALEVM.BITNET or DWATERS@YALEVM.YCC.YALE.EDU\r | |
7269 | \r | |
7270 | Stuart Weibel, Senior Research Scientist\r | |
7271 | OCLC\r | |
7272 | 6565 Frantz Road\r | |
7273 | Dublin, OH 43017\r | |
7274 | Phone: (614) 764-608l\r | |
7275 | Fax: (614) 764-2344\r | |
7276 | E-mail: INTERNET: Stu@rsch.oclc.org\r | |
7277 | \r | |
7278 | Robert G. Zich\r | |
7279 | Special Assistant to the Associate Librarian\r | |
7280 | for Special Projects\r | |
7281 | Library of Congress\r | |
7282 | Phone: (202) 707-6233\r | |
7283 | Fax: (202) 707-3764\r | |
7284 | E-mail: rzic@seq1.loc.gov\r | |
7285 | \r | |
7286 | Judith A. Zidar, Coordinator\r | |
7287 | National Agricultural Text Digitizing Program\r | |
7288 | Information Systems Division\r | |
7289 | National Agricultural Library\r | |
7290 | 10301 Baltimore Boulevard\r | |
7291 | Beltsville, MD 20705-2351\r | |
7292 | Phone: (301) 504-6813 or 504-5853\r | |
7293 | Fax: (301) 504-7473\r | |
7294 | E-mail: INTERNET: JZIDAR@ASRR.ARSUSDA.GOV\r | |
7295 | \r | |
7296 | \r | |
7297 | OBSERVERS:\r | |
7298 | \r | |
7299 | Helen Aguera, Program Officer\r | |
7300 | Division of Research\r | |
7301 | Room 318\r | |
7302 | National Endowment for the Humanities\r | |
7303 | 1100 Pennsylvania Avenue, N.W.\r | |
7304 | Washington, D.C. 20506\r | |
7305 | Phone: (202) 786-0358\r | |
7306 | Fax: (202) 786-0243\r | |
7307 | \r | |
7308 | M. Ellyn Blanton, Deputy Director\r | |
7309 | National Demonstration Laboratory\r | |
7310 | for Interactive Information Technologies\r | |
7311 | Library of Congress\r | |
7312 | Phone: (202) 707-4157\r | |
7313 | Fax: (202) 707-2829\r | |
7314 | \r | |
7315 | Charles M. Dollar\r | |
7316 | National Archives and Records Administration\r | |
7317 | NSZ Rm. 14N\r | |
7318 | 7th & Pennsylvania, N.W.\r | |
7319 | Washington, DC 20408\r | |
7320 | Phone: (202) 501-5532\r | |
7321 | Fax: (202) 501-5512\r | |
7322 | \r | |
7323 | Jeffrey Field, Deputy to the Director\r | |
7324 | Division of Preservation and Access\r | |
7325 | Room 802\r | |
7326 | National Endowment for the Humanities\r | |
7327 | 1100 Pennsylvania Avenue, N.W.\r | |
7328 | Washington, DC 20506\r | |
7329 | Phone: (202) 786-0570\r | |
7330 | Fax: (202) 786-0243\r | |
7331 | \r | |
7332 | Lorrin Garson\r | |
7333 | American Chemical Society\r | |
7334 | Research and Development Department\r | |
7335 | 1155 16th Street, N.W.\r | |
7336 | Washington, D.C. 20036\r | |
7337 | Phone: (202) 872-4541\r | |
7338 | Fax: E-mail: INTERNET: LRG96@ACS.ORG\r | |
7339 | \r | |
7340 | William M. Holmes, Jr.\r | |
7341 | National Archives and Records Administration\r | |
7342 | NSZ Rm. 14N\r | |
7343 | 7th & Pennsylvania, N.W.\r | |
7344 | Washington, DC 20408\r | |
7345 | Phone: (202) 501-5540\r | |
7346 | Fax: (202) 501-5512\r | |
7347 | E-mail: WHOLMES@AMERICAN.EDU\r | |
7348 | \r | |
7349 | Sperling Martin\r | |
7350 | Information Resource Management\r | |
7351 | 20030 Doolittle Street\r | |
7352 | Gaithersburg, MD 20879\r | |
7353 | Phone: (301) 924-1803\r | |
7354 | \r | |
7355 | Michael Neuman, Director\r | |
7356 | The Center for Text and Technology\r | |
7357 | Academic Computing Center\r | |
7358 | 238 Reiss Science Building\r | |
7359 | Georgetown University\r | |
7360 | Washington, DC 20057\r | |
7361 | Phone: (202) 687-6096\r | |
7362 | Fax: (202) 687-6003\r | |
7363 | E-mail: neuman@guvax.bitnet, neuman@guvax.georgetown.edu\r | |
7364 | \r | |
7365 | Barbara Paulson, Program Officer\r | |
7366 | Division of Preservation and Access\r | |
7367 | Room 802\r | |
7368 | National Endowment for the Humanities\r | |
7369 | 1100 Pennsylvania Avenue, N.W.\r | |
7370 | Washington, DC 20506\r | |
7371 | Phone: (202) 786-0577\r | |
7372 | Fax: (202) 786-0243\r | |
7373 | \r | |
7374 | Allen H. Renear\r | |
7375 | Senior Academic Planning Analyst\r | |
7376 | Brown University Computing and Information Services\r | |
7377 | 115 Waterman Street\r | |
7378 | Campus Box 1885\r | |
7379 | Providence, R.I. 02912\r | |
7380 | Phone: (401) 863-7312\r | |
7381 | Fax: (401) 863-7329\r | |
7382 | E-mail: BITNET: Allen@BROWNVM or \r | |
7383 | INTERNET: Allen@brownvm.brown.edu\r | |
7384 | \r | |
7385 | Susan M. Severtson, President\r | |
7386 | Chadwyck-Healey, Inc.\r | |
7387 | 1101 King Street\r | |
7388 | Alexandria, VA 223l4\r | |
7389 | Phone: (800) 752-05l5\r | |
7390 | Fax: (703) 683-7589 \r | |
7391 | \r | |
7392 | Frank Withrow\r | |
7393 | U.S. Department of Education\r | |
7394 | 555 New Jersey Avenue, N.W.\r | |
7395 | Washington, DC 20208-5644\r | |
7396 | Phone: (202) 219-2200\r | |
7397 | Fax: (202) 219-2106\r | |
7398 | \r | |
7399 | \r | |
7400 | (LC STAFF)\r | |
7401 | \r | |
7402 | Linda L. Arret\r | |
7403 | Machine-Readable Collections Reading Room LJ 132\r | |
7404 | (202) 707-1490\r | |
7405 | \r | |
7406 | John D. Byrum, Jr.\r | |
7407 | Descriptive Cataloging Division LM 540\r | |
7408 | (202) 707-5194\r | |
7409 | \r | |
7410 | Mary Jane Cavallo\r | |
7411 | Science and Technology Division LA 5210\r | |
7412 | (202) 707-1219\r | |
7413 | \r | |
7414 | Susan Thea David\r | |
7415 | Congressional Research Service LM 226\r | |
7416 | (202) 707-7169\r | |
7417 | \r | |
7418 | Robert Dierker\r | |
7419 | Senior Adviser for Multimedia Activities LM 608\r | |
7420 | (202) 707-6151\r | |
7421 | \r | |
7422 | William W. Ellis\r | |
7423 | Associate Librarian for Science and Technology LM 611\r | |
7424 | (202) 707-6928\r | |
7425 | \r | |
7426 | Ronald Gephart\r | |
7427 | Manuscript Division LM 102\r | |
7428 | (202) 707-5097\r | |
7429 | \r | |
7430 | James Graber\r | |
7431 | Information Technology Services LM G51\r | |
7432 | (202) 707-9628\r | |
7433 | \r | |
7434 | Rich Greenfield\r | |
7435 | American Memory LM 603\r | |
7436 | (202) 707-6233\r | |
7437 | \r | |
7438 | Rebecca Guenther\r | |
7439 | Network Development LM 639\r | |
7440 | (202) 707-5092\r | |
7441 | \r | |
7442 | Kenneth E. Harris\r | |
7443 | Preservation LM G21\r | |
7444 | (202) 707-5213\r | |
7445 | \r | |
7446 | Staley Hitchcock\r | |
7447 | Manuscript Division LM 102\r | |
7448 | (202) 707-5383\r | |
7449 | \r | |
7450 | Bohdan Kantor\r | |
7451 | Office of Special Projects LM 612\r | |
7452 | (202) 707-0180\r | |
7453 | \r | |
7454 | John W. Kimball, Jr\r | |
7455 | Machine-Readable Collections Reading Room LJ 132\r | |
7456 | (202) 707-6560\r | |
7457 | \r | |
7458 | Basil Manns\r | |
7459 | Information Technology Services LM G51\r | |
7460 | (202) 707-8345\r | |
7461 | \r | |
7462 | Sally Hart McCallum\r | |
7463 | Network Development LM 639\r | |
7464 | (202) 707-6237\r | |
7465 | \r | |
7466 | Dana J. Pratt\r | |
7467 | Publishing Office LM 602\r | |
7468 | (202) 707-6027\r | |
7469 | \r | |
7470 | Jane Riefenhauser\r | |
7471 | American Memory LM 603\r | |
7472 | (202) 707-6233\r | |
7473 | \r | |
7474 | William Z. Schenck\r | |
7475 | Collections Development LM 650\r | |
7476 | (202) 707-7706\r | |
7477 | \r | |
7478 | Chandru J. Shahani\r | |
7479 | Preservation Research and Testing Office (R&T) LM G38\r | |
7480 | (202) 707-5607\r | |
7481 | \r | |
7482 | William J. Sittig\r | |
7483 | Collections Development LM 650\r | |
7484 | (202) 707-7050\r | |
7485 | \r | |
7486 | Paul Smith\r | |
7487 | Manuscript Division LM 102\r | |
7488 | (202) 707-5097\r | |
7489 | \r | |
7490 | James L. Stevens\r | |
7491 | Information Technology Services LM G51\r | |
7492 | (202) 707-9688\r | |
7493 | \r | |
7494 | Karen Stuart\r | |
7495 | Manuscript Division LM 130\r | |
7496 | (202) 707-5389\r | |
7497 | \r | |
7498 | Tamara Swora\r | |
7499 | Preservation Microfilming Office LM G05\r | |
7500 | (202) 707-6293\r | |
7501 | \r | |
7502 | Sarah Thomas\r | |
7503 | Collections Cataloging LM 642\r | |
7504 | (202) 707-5333\r | |
7505 | \r | |
7506 | \r | |
7507 | END\r | |
7508 | *************************************************************\r | |
7509 | \r | |
7510 | Note: This file has been edited for use on computer networks. This\r | |
7511 | editing required the removal of diacritics, underlining, and fonts such\r | |
7512 | as italics and bold. \r | |
7513 | \r | |
7514 | kde 11/92\r | |
7515 | \r | |
7516 | [A few of the italics (when used for emphasis) were replaced by CAPS mh]\r | |
7517 | \r | |
7518 | *End of The Project Gutenberg Etext of LOC WORKSHOP ON ELECTRONIC ETEXTS\r | |
7519 | \r |