September, 17-22, 2006. Alicante, Spain
"Towards the European Digital Library"

homepage : Tutorials :Tutorial 3

Introduction to (Teaching / Learning about)
Digital Libraries

Presenters' name:

Prof. Edward A. Fox, Dept. of Computer Science, Virginia Tech, Blacksburg.


This tutorial will provide a thorough and deep introduction to the DL field, introducing and building upon a firm theoretical foundation (starting with "5S": Streams, Structures, Spaces, Scenarios, Societies), giving careful definitions and explanations of all the key parts of a "minimal digital library", and expanding from that basis to cover key DL issues, illustrated with a well-chosen set of case studies. Attendees will receive a partial draft copy of a new book under development, with tentative title "Foundations for Information Systems: Digital Libraries and the 5S Framework".

Goals are to:

  • aid those with CS, library, or info. science backgrounds to enter the DL field
  • clarify key terms and concepts to provide a basis to understand JCDL
  • explain how DL services fit into a simple taxonomic framework
  • enhance concern for quality in DLs by providing a contextual setting in the Information Life Cycle, and precisely specifying popular indicators
  • show those teaching a DL course how to use the forthcoming book
  • personalize the tutorial based on a list of top priority goals from each attendee, making use of having 2 presenters who can switch off or handle different groups


  1. Introduction

    Motivation: Why do we need DLs (Goals, objectives)? What are DLs? How do DLs work? Why do we need this book? Why 5S? History: Memex (Bush), Licklider, Web... Related Areas: LIS (Bibliometrics), probability/statistics (distribution, e.g., Zipf), linguistics, AI, databases. Knowledge management, content management, ...; Context. Running examples: Institutional repositories, Archaeological info worldwide. "Other people's" Definitions.

  2. Streams (using an Object-Oriented approach)

    Text: Character strings and coding (Unicode); Morphology -> Stemming; Syntax, semantics -> stop words; Stemming, stopping; Multilingual issues. Images: Processing and Analysis. Audio; Video. Integrating streams: Synchronization, Rendering, ...

  3. Structures
    • Digital Objects and Metadata
      DOs: Documents. Digitization, packaging, interchange, standards, format conversion, METS. Genre: Plays, encyclopedia, dictionaries, educational resources,... Structural Organization: Books, chapters, sections, excerpts; TEI.
      Metadata: Standards: MARC,DC, MPEG-7. Markup: Latex, SGML, XML.
    • Knowledge Structures and Representations
      Databases: ER Diagrams, Relational Schema. Object-Relational DBs. Multimedia, Temporal, Hypermedia, Link Databases, ...
      Ontology, Thesauri: RDF , OWL, Networks (Semantic, Concept Maps), Semantic Web. Examples from ETANA.
      Dictionary/Lexicon/Authority Files: Wordnet, Coder. Examples from NDLTD.
      Indexes: Inverted Files. Signature Files (Hypercard). R-Trees, Quad Trees, etc.: GIS examples.
      Clusters/Classification schemes: PhysNet, ACM Categories.
    • Conversion between structures: Tradeoffs of performance between structures; Examples and exercises; XSLT
  4. Spaces

    Retrieval Models: General issues: natural vs. query languages. Boolean: Extended Boolean. Vector: LSI. Probabilistic: Classical; Belief Network, inference network; Language Models.
    User interfaces and Visualization: Taxonomy of UI components - by layout, location, shape; CitiViz

  5. Scenarios

    Information Needs/Access: Searching/Discovery (Ad-hoc, Filtering), Browsing (HT, InfoViz, Organizational scheme), Feedback, (Thin/thick client), Workflow. Scenario-Based Design. Usability: Environments for Workflow: DLITE; Tasks, claims, goals. Logging (to capture behavior/identifying sessions by transactions)

  6. Societies

    User Communities: Authors, editors, teachers. Readers, students, researchers. Accessibility, universal access, handicap
    Librarians: Reference, acquisition, operations.
    Research Community: Associations, conferences. Publications. Laboratories and projects.
    Social issues: Cooperation, collaboration: Acceptance, adoption (personal, organizational). Sharing info (annotation, ratings). Social networks. Digital divide. Cultural heritage and preservation: Museums. Internationalization.
    Economical issues: Security: Authorization, Authentication, Watermarks. Legal issues - terms and conditions: Patents, trademarks, Copyright, Intellectual Property Rights, Digital Rights Management. Publishers, Eprints, Self-Archiving, Cataloguing costs, Open Collections. Sustainability. Open source, commercial, hybrid solutions. E-commerce.

  7. Collections (also called ("database")

    Sets, Groups. Terminology. Packages, Granularity: METS.
    Collection Development policies: Coverage, breadth , Acquisition, Removal and retiring policies: Traditional vs. DLs.
    Large and Distributed Collections: Efficiency/Effectiveness. Scale: Large Objects (granularity, stream splitting, replication, compression); Intelligence/processing granularity: object, cluster, collection, repository. Parallelism and Distribution: Federation vs. Harvesting.

  8. Catalogs

    Cataloguing (as a process): Costs, Sharing, AACR2. Manual vs. (Semi-)Automatic. Distributed vs. centralized. OPACs: Worldcat (OCLC), ... Coverage, breadth. Specificity, depth. Management: versioning, works, multiple representations. Storage: Bucket model.

  9. Repositories/Archives

    Naming, Identifiers. Types: Institutional, personal, genre-specific, aggregate, ...
    Architectures, Interoperability: Federating: Selecting sites, parallel search (fall-back), fusion/merging of results; Z39.50 (CIMI), SRU/SRW, Dienst. Harvesting: Harvest (the system), OAI.
    Preservation, Archive: Replication(LOCKS), emulation, migration, hybrid schemes: UVC (Lorie). Institutions: DLF, Library of Congress, National Archives. People: Besser, Gladney. Standards: OAIS.
    Scalability, Storage. OpenURLs (ExLibris). Institutional Repositories (in depth).

  10. Services

    Taxonomy of Services: Ontology, Composition, reuse. Creational: Crawling. Preservational. Value-Added: Indexing; Logging; Clustering; Classifying. Info Satisfaction Services: Recommending, Social networks, Portals.

  11. Systems

    Architectures: Internet middleware; P2P, Grid, Service-Oriented, Client-server, Agents, clusters (simulation - Paul's). System descriptions and comparison (Greenstone, Fedora, Eprints, Dspace, Kepler, Phronesis, DLI spin-offs, VITAL, IBM Content Management). VT: ODL and DL-in-a-Box, MARIAN, 5S Suite.

  12. DL Case Studies: NDLTD, NCSTRL, CSTC, NSDL-CITIDEL, AmericanSouth, ETANA, OCKHAM, BDBComp, Brazilian ETDs, ...
  13. Quality
  14. Integration
  15. Research Challenges

« Go to tutorials