
New Media and Transformations in Knowledge
1. Introduction
2. Processes
3. Views and Levels of Abstraction
4. Scale
5. Kinds of Reality
6. Complex Systems
7. Individuals and Particulars
8. Now and Eternity
9. Meta-data
10. Global Efforts
11. Emerging Scenarios
12. Conclusions
As media change so also do our concepts of what constitutes knowledge. This, in a sentence, is a fundamental insight that has emerged from research over the past sixty years.1 In the field of classics, Eric Havelock,2 showed that introducing a written alphabet, shifting from an oral towards a written tradition, was much more than bringing in a new medium for recording knowledge. When claims are oral they vary from person to person. Once claims are written down, a single version of a claim can be shared by a community, which is then potentially open to public scrutiny, and verification.3 The introduction of a written alphabet thus transformed the Greek concept of truth (episteme) and their concepts of knowledge itself. In the field of English Literature, Marshall McLuhan,4 influenced also by historians of technology such as Harold Innis,5 went much further to show that this applied to all major shifts in media. He drew attention, for example, to the ways in which the shift from handwritten manuscripts to printed books at the time of Gutenberg had both positive and negative consequences on our world-view.6 In addition, he explored how the introduction of radio and television further changed our definitions of knowledge. These insights he distilled in his now famous phrase: "the medium is the message."
Pioneers in technology, such as Vannevar Bush,7 Douglas Engelbart,8 and visionaries such as Ted Nelson,9 have claimed from the outset that new media such as computers and networks also have implications for our approaches to knowledge. Members of academia and scholars have become increasingly interested in such claims, leading to a spectrum of conclusions. At one extreme, individuals such as Derrick de Kerckhove,10 follow the technologists in assuming that the overall results will invariably be positive. This group emphasizes the potentials of collective intelligence. This view is sometimes shared by thinkers such as Pierre Levy11 who also warn of the dangers of a second flood, whereby we shall be overwhelmed by the new materials made available by the web.
Meanwhile, others have explored more nuanced assessments. Michael Giesecke,12 for instance, in his standard history of printing (focussed on Germany), has examined in considerable detail the epistemological implications of printing in the fifteenth and sixteenth centuries and outlined why the advent of computers invites comparison with Gutenbergs revolution in printing. Armand Mattelart,13 in his fundamental studies, has pointed out that the rise of networked computers needs to be seen as another step towards global communications. He has also shown masterfully that earlier steps in this process, such as the introduction of the telegraph, telephone, radio and television, were each accompanied by more global approaches to knowledge, particularly in the realm of the social sciences.
The present author has explored some implications of computers for museums,14 libraries,15 education16 and knowledge in general.17 In the context of museums seven elements were outlined: scale, context, variants, parallels, history, theory and practice; abstract and concrete; static and dynamic. Two basic aspects of these problems were also considered. First, computers entail much more than the introduction of yet another medium. In the past, each new innovation sought to replace former solutions: papyrus was a replacement for cuneiform tablets; manuscripts set out to replace papyrus and printing set out to replace manuscripts. Each new output form required its own new input method. Computers introduce fundamentally new dimensions in this evolution by introducing methods of translating any input method into any output method. Hence, an input in the form of an oral voice command can be output as a voice command (as in a tape recording), but can equally readily be printed, could also be rendered in manuscript form or potentially even in cuneiform. Evolution is embracing not replacing. Second, networked computers introduce a new cumulative dimension to knowledge. In the past, collections of cuneiform tablets, papyri, manuscripts and books were stored in libraries, but the amount of accessible knowledge was effectively limited to the size of the largest library. Hence knowledge was collected in many parts but remained limited by the size of its largest part. In a world of networked computing the amount of accessible knowledge is potentially equal to the sum of all its distributed parts.
In deference to the mediaeval tradition, we shall begin by expressing some doubts (dubitationes), concerning the effectiveness of present day computers. In a fascinating recent article, Classen assessed some major trends in new media18. He claimed that while technology was expanding exponentially, the usefulness19 of that technology was expanding logarithmically and that these different curves tended to balance each other out to produce a linear increase of usefulness with time. He concluded i) that society was keeping up with this exponential growth in technology, ii) that in order to have substantial improvements especially in education "fortunes have to be spent on R&D to get there," and finally iii) that "we in industrial electronics research can still continue in our work, while society eagerly adopts all our results."20
Dr. Classens review of technological progress and trends is brilliant, and we would fully accept his second and third conclusions. In terms of his first conclusion, however, we would offer an alternative explanation. He claims that the (useful) applications of computers have not kept up with the exponential expansion of technology due to inherent limits imposed by a law of usefulness. We would suggest a simpler reason: because the technology has not yet been applied. In technical terms, engineers and scientists have focussed on ISO layers 1-621 and have effectively ignored layer 7: applications.
Some examples will serve to make this point. Technologists have produced storage devices, which can deal with exobytes at a time (figure 1). Yet all that is available to ordinary users is a few gigabytes at a time. If I were only interested in word processing this would be more than sufficient. As a scholar, however, I have a modest collection of 15,000 slides, 150 microfilms, a few thousand books and seven meters of photocopies. For the purposes of this discussion we shall focus only on the slides. If I wished to make my 15,000 slides available on line, even at a minimal level of 1 MB per slide, that would be 15 gigabytes. Following the standards being used at the National Gallery in Washington of using 30 megabytes per image, that figure would rise to 450 gigabytes. Accordingly, a colleague in Rome, who has a collection of 100,000 slides, would need either need 100 gigabytes for a low resolution version or 4 terabytes for a more detailed version.
| 1000 bytes | = 1 kilobyte |
| 1000 kilobytes | = 1 megabyte |
| " " megabytes | = 1 gigabytes |
| " " gigabytes | = 1 terabyte |
| " " terabytes | = 1 petabyte |
| " " petabytes | = 1 exobyte |
Figure 1. Basic terms of size in electronic storage.
In Europe museums tend to scan at 50MB/image which would raise those figures to 5 terabyes, while research institutions such as the Uffizi are scanning images at 1.4 gigabytes per square meter. At this resolution 100,000 images would make 1400 terabytes or 1.4 petabytes. There are no machines available at a reasonable price in the range of 450 gigabytes to 1.4 petabytes. The net result of this math exercise is thus very simple. As a user I cannot even begin to use the technology so it might as well not exist. There is no mysterious law of usefulness holding me back, simply lack of access to the technology. If users had access to exobytes of material, then the usefulness of these storage devices would probably go up much more than logarithmically. It might well go up exponentially and open up undreamed of markets for technology.
Two more considerations will suffice for this brief excursus on usefulness. Faced with the limitations of storage space at present, I am forced as a user to employ a number of technologies: microfilm readers, slide projectors, video players (sometimes in NTSC, sometimes in PAL), televisions, telephones, and the usual new technologies of fax machines, computers and the Internet. All the equipment exists. It is almost impossible to find all of it together working in a same place, and even if it does, it is well nigh impossible to translate materials available in one medium, to those in another medium. We are told of course that many committees around the world are busily working on the standards (e.g. JPEG, JHEG, MPEG) to make such translations among media simple and nearly automatic. In the meantime, however, all the hype in the world about interoperability, does not help me one iota in my everyday efforts as a scholar and teacher. The net result again is that many of these fancy devices are almost completely useless, because they do not address my needs. The non-compatibility of an American, a European and a Japanese device may solve someones notion of positioning their countrys technology, but it does not help users at all. Hence most of us end up not buying the latest device. And once again, if we knew that they solved our needs, their usefulness and their use might well rise exponentially.
Finally, it is worthwhile to consider the example of bandwidth. Technologists have recently demonstrated the first transmission at a rate of a terabyte per second. A few weeks ago at the Internet Summit a very senior figure working with the U.S. military reported that they are presently working with 20-40 gigabits a second, and that they are confident they can reach terabyte speeds for daily operations within two years. Meanwhile, attempts by G7 pilot project five to develop demonstration centres to make the best products of cultural heritage accessible on an ATM network (a mere 622 MB/second) have been unsuccessful. A small number of persons in large cities now have access to ADSL (1.5 MB/sec) while others have access to cable modems (.5 MB/second). Even optimistic salesmen specializing in hype are not talking about having access to ATM speeds directly into the home anywhere in the foreseeable future. Hence, most persons are limited to connectivity rates of .028 or .056 MB/second, (in theory, while the throughput is usually much lower still), which is a very long way from the 1,000,000,000 MB (i.e. terabyte) that is technically possible today.
With bandwidth as with so many other aspects of technology, the simple reality is that use in real applications by actual users has not nearly kept pace with developments in technology. If no one has access to and chances to use the technology, if there are no examples to demonstrate what the technology can do, then there is hardly surprising that so-called usefulness of the technology lags behind. We would conclude therefore that there is no need to assert logarithmic laws of usefulness. If technology is truly made available, its use will explode. The Internet is a superb example. The basic technology was there in the 1960s. It was used for over two decades by a very select group. Since the advent of the World Wide Web, when it was made available to users in general, it has expanded much more each year than it did in the first twenty years of its existence.
So what would happen if all the technological advances in storage capacity, processing power, bandwidth were available for use with complete interoperability? What would change? There would be major developments in over thirty application fields (Appendix 1). Rather than attempt to examine these systematically, however, this paper will focus instead on some of the larger trends implicit in these changes. I shall assert that computers are much more than containers for recording knowledge, which can then be searched more systematically. They introduce at least seven innovations, which are transforming our concepts of knowledge. First, they offer new methods for looking at processes, how things are done, which also helps in understanding why they are done in such ways. Second, and more fundamentally, they offer tools for creating numerous views of the same facts, methods for studying knowledge at different levels of abstraction. Connected with this is a third innovation: they allow us to examine the same object or process in terms of different kinds of reality. Fourth, computers introduce more systematic means of dealing with scale, which some would associate with the field of complex systems. Fifth, they imply a fundamental shift in our methods for dealing with age-old problems of relating universals and particulars. Sixth, they transform our potential access to data through the use of meta-data. Seventh and finally, computers introduce new methods for mediated learning and knowledge through agents. This paper explores both the positive consequences of these innovations and examines some of the myriad challenges and dangers posed thereby.
2. Processes
Media also affect the kinds of questions one asks and the kinds of answers one gives to them. The oral culture of the Greeks favoured the use of What ? and Why? questions. The advent of printing in the Renaissance saw the rise of How? questions. As storage devices, computers are most obviously suited to answering questions concerning biography (Who?), subjects (What?), places (Where?) and chronology (When?). But they are also transforming our understanding of processes (How?) and hence our comprehension of relations between theory and practice. In the past decades there has, for instance, been a great rise in workflow software, which attempts to break down all the tasks into clearly defined steps and thus to rationalize the steps required for the completion of a task. This atomization of tasks was time consuming, expensive and not infrequently very artificial in that it often presented isolated steps without due regard to context.
Companies such as Boeing have introduced augmented reality techniques to help understand repair processes. A worker fixing a jet engine sees superimposed on a section of the engine, the steps required to repair it. Companies such as Lockheed are going further: reconstructing an entire workspace of a ships deck and using avatars to explain the operating procedures. This contextualization in virtual space allows users to follow all the steps in the work process.22
More recently companies such as Xerox23 have very consciously developed related strategies whereby they study what is done in a firm in order to understand what can be done. In the case of her Majestys Stationary Office, for example, they used VRML models to reconstruct all the workspaces and trace the activities on the work- floor. As a result one can examine a given activity from a variety of different viewpoints: a manager, a regular employee or an apprentice. One can also relate activities at one site with those at a number of other sites in order to reach a more global view of a firms activities. Simulation of precisely how things are done provides insights into why they are done in that way.
In the eighteenth century Diderot and DAlembert attempted to record all the professions in their vast enclopaedia. This monumental effort was mainly limited to lists of what was used with very brief descriptions of the processes. The new computer technologies introduce the possibility of a new kind of encyclopaedia, which would not only record how things were done, but could also show how different cultures perform the same tasks in slightly or even quite different ways. Hence, one could show, for instance, how a Japanese engineers approach is different from that of a German or an American engineer. Instead of just speaking about quality one could actually demonstrate how it is carried out.
Computers were initially static objects in isolation. The rise of networks transformed their connectivity among these terminals into a World Wide Web. More recently there have been trends towards mobile or nomadic computing. The old notion of computers as large, bulky objects dominating our desks is being replaced by a whole range of new devices: laptop computers, palmtop and even wearable computers.24 This is leading to a new vision called ubiquitous computing, whereby any object can effectively be linked to the network. In the past each computer required its own Internet Protocol (IP) address. In future, we are told, this could be extended to all the devices that surround us: persons, offices, cars, trains, planes, telephones, refrigerators and even light bulbs.
Assuming that a person wishes to be reached, the network will be able to determine whether they are at home, in their office, or elsewhere and route the call accordingly. If the person is in a meeting the system will be able to adjust its signal from an obtrusive ring to a simple written message on ones portable screen, with an option to have a flashing light in urgent cases. More elaborate scenarios will adjust automatically room temperatures, lighting and other features of the environment to the personal preferences of the individual. Taken to its logical conclusions this has considerable social consequences,25 for it means that traditionally passive environments will be reactive to users needs and tastes, removing numerous menial tasks from everyday life and thus leaving individuals with more time and energy for intellectual pursuits or pure diversion.
At the international level one of the working groups of the International Standards Organization (ISO/IEC JTC1/WG4) is devoted to Document Description and Processing Languages, SGML Standards Committee. At the level of G8, a consortium spearheaded by Siemens is working on a Global Engineering Network (GEN).26 Autodesk is leading a consortium of companies to produce Industry Foundation Classes, which will effectively integrate standards for building parts such as doors and windows. In future, when someone wishes to add a window into a design for a skyscraper, the system will "know" what kind of window is required.
The Solution Exchange Standard Consortium (SEL) consists of 60 hardware, software, and commercial companies, which are working to create an industry specific SGML markup language for technical support information among vendors, system integration and corporate helpdesks. Meanwhile, the Pinnacles Group, a consortium which includes Intel, National Semiconductor, Philips, Texas Instruments and Hitachi is creating an industry specific SGML markup language for semiconductors. In the United States, as part of the National Information Infrastructure (NII)27 for Industry with Special Attention to Manufacturing there is a Multidisciplinary Analysis and Design Industrial Consortium (MADIC), which includes NASA, Georgia Tech, Rice, NPAC and is working on an Affordable Systems Optimization Process (ASOP). Meanwhile, companies such as General Electric are developing a Manufacturing Technology Library, with a Computer Aids to Manufacturing Network (ARPA/CAMnet).28 ESI Technologies is developing Enterprise Management Information Systems (EMIS).29 In the automotive industry the recent merger of Daimler-Benz and Chrysler point to a new globalization. A new Automotive Network eXchange (ANX)30means that even competitors are sharing ideas, a process which will, no doubt, be speeded by the newly announced automotive consortium at MIT.31
3. Views and Levels of Abstraction
One of the fundamental changes brought about by computers is increasingly to separate our basic knowledge from views of that knowledge. In the case of earlier media such as cuneiform, manuscripts and books, content was irrevocably linked with a given form. Changing the form or layout required producing a new edition of the content. In electronic media this link between form and content no longer holds. Databases, for instance, separate the content of fields from views of that content. Once the content has been input, it can be queried and displayed in many ways without altering the content each time. This same principle applies to Markup Languages for use on the Internet. Hence, in the case of Standard Generalized Markup Language (SGML) and Extensible Markup Language (XML), the rules for content and rules for display are separate. Similarly in the case of programming, the use of meta-object protocols is leading to a new kind of open implementation whereby software defined aspects are separated from user defined aspects (figure 2). An emerging vision of network computers, foresees a day when all software will be available on line, and users will need only to state their goals to find themselves with the personally adapted tools. There are new trends towards reusable code.32
Software Defined User Defined
Base Program Meta Program
Base Interface Meta Interface
Figure 2. Separation of basic software from user defined modalities through meta-object protocols in programming.
Related to the development of these different views of reality, is the advent of spreadsheets and data-mining techniques, whereby one can look at the basic facts in a database from a series of views at different levels of abstraction. Once a bibliography exists as a database, it is easy to produce graphs relating publications to time, by subject, by city, country or by continent. In the past any one of these tasks would have comprised a separate study. Now they are merely a different "view."
4. Scale
These developments in views and different levels of abstraction are also transforming notions of scale. Traditionally every scale required a separate study and even a generation ago posed serious methodological problems.33 The introduction of pyramidal tiling34 means that one can now move seamlessly from a satellite image of the earth (at a scale of 1:10,000,000) to a life-size view of an object and then through a spectrum of microscopic ranges. These innovations are as interesting for the reconstruction of real environments such as shopping malls and tourist sites as they are for the creation of virtual spaces such as Alpha-World.35 Conceptually it means that many more things can be related. Systematic scales are a powerful tool for contextualization of objects.
These innovations in co-ordinating different scales are particularly evident in fields such as medicine. In Antiquity, Galens description of medicine was limited mainly to words. These verbal descriptions of organs were in general terms such that there was no clear distinction between a generic description of a heart and the unique characteristics of an individual heart. Indeed the approach was so generic that the organs of other animals such as a cow were believed to be interchangeable with those of an individual.
During the Renaissance, Leonardo added illustrations as part of his descriptive method. Adding visual images to the repertoire of description meant that one could show the same organ from a number of different viewpoints and potentially show the difference between a typical sample and an individual one. However, the limitations of printing at the time made infeasible any attempts to record all the complexities of individual differences.
Today, medicine is evolving on at least five different levels. The GALEN project is analysing the basic anatomical parts (heart, lung, liver etc.) and systematically studying their functions and inter-relationships at a conceptual level. The Visible Human project is photographing the entire human body in terms of thin slices, which are being used to create Computer Aided Design (CAD) drawings at new levels of realism. In Germany, the Medically Augmented Immersive Environment (MAIE), developed by the Gesellschaft für Mathematik und Datenverarbeitung (GMD) and three Berlin hospitals, dedicated to radiology (Virchow), pathology (Charité) and surgery (RRK) respectively, are developing models for showing structural relations among body parts in real time. This system includes haptic simulation based on reconstructed tomographic scans. Other projects are examining the human body at the molecular and atomic level (figure 3). At present these projects are evolving in tandem without explicit attempts to co-relate them. A next step will lie in integrating all this material such that one can move at will from a macroscopic view of the body to a study of any microscopic part at any desired scale.
Conceptual GALEN36
Physical Visible Human37
Structural OP 2000 Medically Augmented Immersive Environment (MAIE)38
Molecular Bio-Chemical
Atomic Human Genome39
Figure 3. Different levels of scale in the study of contemporary medicine.
In the past, anatomical textbooks typically provided doctors with a general model of the body and idealized views of the various organs. The Virtual Human is providing very detailed information concerning individuals (three to date), which can then serve as the basis for a new level of realism in making models. These models can then be confronted with x-rays, ultra-sound and other medical imaging techniques, which record the particular characteristics of individual patients. Potentially this could lead to a systematic linking of our general knowledge about universals and our specialized knowledge about particulars (see section 7 below).
A somewhat different approach is being taken in the case of the human genome project. Individual examples are studied and on the basis of these a "typical model" is produced, which is then used as a set of reference points in examining other individual examples. Those deviating from this typical model by a considerable amount are deemed defective or aberrant, requiring modification and improvement. A danger in this approach is that if the parameters of the normal are too narrowly defined, it could lead to a new a version of eugenics seriously decreasing the bio-diversity of the human race.40 If we are not careful we shall succumb to believing that complexity can be resolved through the regularities of universal generalizations rather than in the enormously varying details of individuals. Needed is a more inductive approach, whereby our models are built up from the evidence of all the variations.
Another important way in which computers are changing our approach to knowledge relates to new combinations of reality. In the 1960s the earliest attempts at virtual reality created a) digital copies of physical spaces, b) simplified digital subsets of a more complex physical world or c) digital visualizations of imaginary spaces. These alternatives tended to compete with one another. In the latter 90s there has been a new trend to integrate different versions of reality to produce both augmented reality and augmented virtuality. As a result one can, for instance, begin with the walls of a room, superimposed on which are the positions of electrical wires, pipes and other fixtures.
Such combinations have enormous implications for training and repair work of all kinds. Recently, for instance, a Harvard medical doctor superimposed an image of an internal tumour onto the head of a patient and used this as an orientation method for the operation. (This method is strikingly similar to the supposedly science fiction operation of the protagonists daughter in the movie Lost in Space). As noted elsewhere, this basic method of superimposition can also be very fruitful in dealing with alternative reconstructions of an ancient ruin or different interpretations of a paintings spatial layout. Other alternatives include augmented virtuality, in which a virtual image is augmented and double augmented reality in which a real object such a refrigerator has superimposed on it a virtual list which is then imbued with further functions.41 (cf. figure 4).
Reality Nature, Man Made World
Virtual Reality Sutherland, Furness
Augmented Reality Feiner, Stricker
Augmented Virtuality Gelernter, Ishii
Double Augmented Reality42 Mankoff43
Figure 4. Basic classes of simulated reality and their proponents.
Other techniques are also contributing to this increasing interplay between reality and various constructed forms thereof. In the past, for instance, Computer Aided Design (CAD) and video were fundamentally separate media. Recently Bell Labs have introduced the principle of pop-up video, which permits one to move seamlessly between a three-dimensional CAD version of a scene and the two-dimensional video recording of an identical or at least equivalent scene.44 Meanwhile, films such as Forrest Gump integrate segments of "real" historical video seamlessly within a purely fictional story. This has led some sceptics to speak of the death of photographic veracity,45 which may well prove to be an overreaction. Major bodies such as the Joint Picture Expert Group (JPEG) are working on a whole new framework for deciding the veracity of images, which will help to resolve many of these fears.
On the positive side, these developments in interplay among different kinds of reality introduce immense possibilities for the re-contextualization of knowledge. As noted earlier, while viewing images of a museum one will be able to move seamlessly to CAD reconstructions of the rooms and to videos explaining particular details. One will be able to move from a digital photograph of a painting, through images of various layers of the painting to CAD reconstructions of the painted space as well as x-rays and electron-microscope images of its micro-structures. One will be able to study parallels, and many aspects of the history of the painting. A new integration of static and dynamic records will emerge.
The systematic mastery of scale in the past decades has lent enormous power to the zoom metaphor, to such an extent that one could speak of Hollywoodization in a new sense. Reality is seen as a film. The amount of detail, the granularity, depends on ones scale. As one goes further one sees larger patterns, as one comes closer one notices new details. Proponents of complex systems such as Yaneer Bar-Yam,46 believe that this zoom metaphor can serve as a tool for explaining nearly all problems as one moves from atomic to molecular, cellular, human and societal levels. Precisely how one moves from physical to conceptual levels is, however, not explained in this approach.
Complex systems entail an interdisciplinary approach to knowledge, which builds on work in artificial neural networks to explain why the whole is more than the sum of its parts. The director of the New England Center for Complex Systems (NECSI) believes that this approach can explain human civilization:
One system particularly important for the field of complex systems is human civilization the history of social and economic structures and the emergence of an interconnected global civilization. Applying principles of complex systems to enable us to gain an understanding of its past and future course is ultimately an important objective of this field. We can anticipate a time when the implications of economic and social developments for human beings and civilization will become an important application of the study of complex systems.47
Underlying this approach is an assumption that the history of civilization can effectively be reduced to a history of different control systems, all of them heirarchically structured. This may well provide a key to understanding the history of many military, political and business structures, but can hardly account for the most important cultural expressions. If anything the reverse could well be argued. Greece was more creative than other cultures at the time because it imposed less hierarchical structures on artists. Totalitarian regimes, by contrast, typically tolerate considerably less creativity, because most of these expressions are invariably seen as beyond the parameters of their narrow norms. Hence, complex systems with their intriguing concepts of emergence, may well offer new insights into the history of governments, corporations, and other bureaucracies. They do not address a fundamental aspect of creativity, which has to do with the emergence of new individuals and particulars, non controlled elements of freedom, rather than products of a rule based system.
7. Individuals and Particulars
As was already suggested above, one of the central questions is how we define knowledge. Does knowledge lie in the details of particulars or in the universals based on those details? The debate is as old as knowledge itself. In Antiquity, Plato argued for universals: Aristotle insisted on particulars. In the Middle Ages, the debate continued, mainly in the context of logic and philosophy. While this debate often seemed as if it were a question of either/or, the rise of early modern science made it clear that the situation is more complex. One needs particular facts. But in isolation these are merely raw data. Lists of information are one step better. Yet scientific knowledge is concerned with laws, which are effectively summaries of those facts. So one needs both the particulars as a starting point in order to arrive at more generalized universals, which can then explain the particulars in question.
Each change in media has affected this changing relationship between particulars and universals. In pre-literate societies, the central memory unit was limited to the brain of an individual and oral communication was limited to the speed with which one individual could speak to another. The introduction of various written media such as cuneiform, parchment, and manuscripts meant that lists of observations were increasingly accessible. Printing helped to standardize this process and introduced the possibility of much more systematic lists. The number of particular observations on which universal claims and laws could be established thus grew accordingly. While there were clearly other factors such as the increased accuracy of instruments, printing made Tycho Brahes observations more accessible than those made at the court of Alphonse the Wise and played their role in making Keplers new planetary laws more inclusive and universal.
The existence of regular printed tables greatly increased the scope of materials, which could readily be consulted. It still depended entirely on the memory and integrating power of the individual human brain in order to recognize patterns in the data and to reach new levels of synthesis. Once these tables are available on networked computers, the memory capacities are expanded to the size of the computer. The computer can also be programmed to search both for consistencies and anomalies. So a number of the pattern discoveries, which depended solely on human perception, can now be automated and the human dimension can be focussed on discerning particularly subtle patterns and raising further questions.
In the context of universities, the arts and sciences have traditionally been part of a single faculty. This has led quite naturally to many comparisons between the arts and the sciences, and even references to the art of science or the science of art in order to emphasize their interdependence. It is important to remember, however, that art and science differ fundamentally in terms of their approach to universals and particulars. Scientists gather and study particulars in order to discern some underlying universal and eternal pattern. Artists gather and study examples in order to create a particular object, which is unique, although it may be universal in its appeal. Scientists are forever revising their model of the universe. Each new discovery leads them to discard some piece or even large sections of their previous attempt. Notwithstanding Newtons phrase that he was standing on the shoulders of giants, science is ultimately not cumulative in the sense of keeping everything of value from an earlier age. Computers, which are only concerned with showing us the latest version of our text or programme, are a direct reflection of this scientific tradition.48
In this sense, art and culture are fundamentally different in their
premises. Precisely because they emphasize the uniqueness of each object, each new
discovery poses no threat to the value of what came before. Most would agree, for
instance, that the Greeks introduced elements not present in Egyptian sculpture, just as
Bernini introduced elements not present in Michelangelo, and he, in turn, introduced
elements not present in the work of Donatello. Yet it would be simplistic to deduce from
this that Bernini is better than Michelangelo or he in turn better than Donatello. If
later were always better it would be sufficient to know the latest artists work in the way
that scientists feel they only need to know the latest findings of science. The person who
knows about the Egyptians, Greeks, Donatello, Michelangelo and Bernini is much richer than
one who knows only the latest phase. Art and culture are cumulative. The greatest
scientist succeeds in reducing the enormity of particular instances to the fewest number
of laws which to the best of their knowledge are unchanging. The most cultured individual
succeeds in bringing to light the greatest number of unique examples of expression as
proof of creative richness of the human condition. These differing goals of art and
science pose their own challenges for our changing understanding of knowledge. 49
Before the advent of printing, an enterprising traveller might have recorded their impressions of a painting, sculpture or other work of art which they encountered in the form of a verbal description or at best with a fleeting sketch. In very rare cases they might have attempted a copy. The first centuries after Gutenberg saw no fundamental changes to this procedure. In the nineteenth century, lithographs of art gradually became popular. In the late nineteenth century, black and white photographs made their debut.50 In the latter part of the twentieth century colour images gradually became popular.
Even so it is striking to what extent the horizons of those writing on the history of their subject remained limited to the city where they happened to be living. It has often been noted, for example, that Vasaris Lives of the Artists, focussed much more on Florence than other Italian cities such as Rome, Bologna, Milan or Urbino. At the turn of the century, art historians writing in Vienna tended to cite examples found in the Kunsthistorisches Museum, just as others since living in Paris, London or New York have tended to focus on the great museum that was nearest to home. The limitations of printing images meant that they could give only a few key masterpieces by way of example. From all this emerged a number of fascinating glimpses into the history of art, which were effectively summaries of the dominant taste in the main halls of the great galleries. It did not reflect the up to 95% of collections that is typically in storage. Nor did it provide a serious glimpse of art outside the major centres.
A generation ago scholars such as Chastel51 pointed to the importance of studying the smaller cities and towns in the periphery of such great cities: to look not only at Milan but also at Pavia, Crema, Cremona, Brescia and Bergamo. Even so, in the case of Italy, for instance, our picture is still influenced by Vasaris emphases from over four centuries ago. Everyone knows Florence and Rome. But who is aware of the frescoes at Bominaco or Subiaco, of the monasteries at Grottaferata and Padulo, or the architecture of Gerace, Urbania or Asolo? The art in these smaller centres does not replace, nor does it even pretend to compete, with the greatest masterpieces which have usually made their way to the worlds chief galleries. What they do, however, is to provide us with a much richer and more complex picture of the variations in expression on a given theme. In the case of Piero della Francesca, for example, who was active for much of his life in San Sepolcro, Arezzo and Urbino, we discover that these masterpieces actually originated in smaller centres although they are now associated with great cities (London, Paris, Florence). In other cases we discover that the smaller centres do not simply copy the great masterpieces. They adapt familiar themes and subjects to their own tastes. The narrative sequences at San Gimignano, Montefalco, Atri add dimensions not found even in Florence or Rome.
To be sure some of this richness has been conveyed by the medium of printing, through local guidebooks and tourist brochures. However, in these the works of art are typically shown in isolation without any reference to more famous parallels in the centres. Computers will fundamentally change our approach to this tradition. First they will make all these disparate materials accessible. Hence a search for themes such as Virgin and Child will not only bring up the usual examples by Botticelli or Raphael but also those in museums such as LAquila, Padua, and Volterra (each of which were centres in a previous era) . Databases will allow us to study developments in terms of chronology as well as by region and by individual artist. Filtering techniques will allow us to study the interplay of centre and periphery in new ways.
More importantly, we shall be able to trace much more fully the cumulative dimensions of culture, retaining the uniqueness of each particular object. In the past, each of the earlier media precluded serious reproductions of the original objects. As noted above, colour printing has only been introduced gradually over the past half-century. Even then, a single colour image of a temple or church, can hardly do justice to all its complexities. The advent of virtual and augmented reality, and the possibility of stereo-lithographic printing means that a whole new set of tools for understanding culture is emerging. They will not replace the value and sometimes the absolute necessity of studying some of the originals in situ, but if we always had to visit everything, which we wished to study in its original place, the scope of our study would be very limited indeed.
Earlier media typically meant that one emphasized one example often forgetting that it represented a much larger phenomenon. The Coliseum in Rome is an excellent case in point. History books typically focus on this amphitheatre and tell us nothing of the great number of amphitheatres spread throughout the Roman empire. Networked computers can make us aware of all known examples from Arles and Nimes in France to El-Djem in Tunisia and Pula in Croatia. This new encyclopaedic approach means that we shall have a much better understanding of how a given structure spreads throughout a culture to form a significant element in our cultural heritage such as the Greek temple, the Romanesque and Gothic Church, and the Renaissance villa. It means that we shall also have a new repertoire of examples for showing even as these styles spread, each new execution of the principle introduces local uniqueness. Hence the cathedrals at St. Denis, Chatres, Notre Dame, Cologne, Magdeburg, Bamberg, Ulm and Burgos are all Gothic, and yet none is a simple copy of the other.
A generation ago when Marshall McLuhan coined the phrase "the global village", some assumed that the new technologies would invariably take us in the direction of a world where every place was more or less the same: where Hiltons and McDonalds would spread throughout an increasingly homogenized planet. This danger is very real. But as critical observers such as Barber have noted,52 the new technologies have been accompanied by a parallel trend in the direction of regionalism and new local awareness. The same technologies, that are posing the possibility of global corporations, are introducing tremendous new efforts in the realms of citizen participation groups and of local democracy. Networked computers may link us together with persons all over the world as if we were in a global village but this does not necessarily mean that every village has to look the same. Indeed, the more the mass-media try to convince us that we are all inhabitants of a single interdependent ecosystem, the more individuals are likely to articulate how and even why their particular village is different from others. In this context, the new access to individuals and particulars introduced by networked computers, becomes much more than an interesting technological advance. It provides a key to maintaining the cultural equivalent of bio-diversity, which is essential for our well being and development in the long run.
In themselves the particulars are, of course, only lists and as such merely represent data or, at best, information. Hence they should be seen as starting points rather than as results per se. Their vital importance lies in vastly increasing the sample, the available sources upon which we attempt to draw conclusions. The person who has access to only one book in art history will necessarily have a much narrower view than someone who is able to work with the resources of a Vatican or a British Library. In the past, scholars have often spent much more time searching for a document than actually reading it. In future, computers will greatly lighten the burden of finding. Hence, scholarship will focus increasingly on determining the veracity of sources, weighing their significance, interpreting and contextualizing sources, and learning to abstract from the myriad details which they offer, some larger patterns of understanding. Access to new amounts of particulars will lead to a whole new series of universal abstractions.
Implicit in the above discussion are larger issues of knowledge organization that go far beyond the scope of this paper. We noted that while the arts and science typically share the same faculty and are in many ways interdependent, there are two fundamental ways in which they differ. First, the sciences examine individual facts and particulars in order to arrive at new universal summaries of knowledge. The arts, by contrast, are concerned with creating particulars, which are unique in themselves. They may be influenced by or even inspired by other particular works, but they are not necessarily universal abstractions in the way that the sciences are. Second, and partly as a result thereof, the sciences are not cumulative in the same way that the arts and culture are. In the sciences only the latest law, theory, postulate etc. is what counts. In the arts, by contrast, the advent of Picasso does not make Rubens or Leonardo obsolete, any more than they made Giotto or Phidias obsolete. The arts and culture are defined by the cumulative sum of our collective heritage, all the particulars collected together, whereas the sciences are concerned only with the universals abstracted from the myriad particulars they examine.53
It follows that, while both the arts and sciences have a history, these histories ultimately need to be told in very different ways. In the arts, that history is about how we learned to collect and remember more and more of our past. Some scholars have claimed, for instance, that we know a lot more about the Greeks than Aristotle himself. In the sciences, by contrast, that history is at once how scientists developed ever better instruments with which to make measurable that which is not apparent to the naked eye, and how they used the results of their observations to construct ever more generalized, universal, and at the same time, testable theories. To put it simply we need very different kinds of histories to reflect these two fundamentally different approaches to universals and particulars, which underlie fundamental differences between the arts and sciences. With the advent of networked computers the whole of history needs to be rewritten: at least twice.
Not unrelated to the debates concerning particulars and universals are those connected with the (static) fine arts such as painting and sculpture versus the (dynamic) performance) arts such as dance, theatre, and music. Earlier media such as manuscripts or print were at best limited to static media. They could not hope to reproduce the complexities of dynamic performance arts. Even the introduction of video offered only a partial solution to this challenge, insomuch that it reduced the three dimensional field to a particular point of view reduced to a two-dimensional surface. Hence if a video captured a frontal view of actors or dancers their backs were necessarily occluded. These limitations of recording media have led perforce to a greater emphasis on the history of fine arts such as painting than on the history of performance in dance and theatre.54
These limitations have had both an interesting and distorting effect on our histories of culture. It has meant for instance that we traditionally knew a lot more about the history of static art than dynamic art: a lot more about painting than about dance, theatre of music. It has meant that certain cultures such as the Hebrew tradition, which emphasize the now of dynamic dance and music over the eternal static forms of sculpture and painting were under-represented in traditional histories of culture. Conversely, it has meant that the recent addition of film, television, video and computers has focussed new attention on the dynamic arts, to the extent of undermining our appreciation of the enduring forms. Our visions of eternal art are being replaced by a new focus on the now.
From a more global context these limitations have also had a more general, subtle impact on our views of world culture. Those strands, which focussed on the static fine arts were considered the cornerstones of world cultural development. Since this was more so in the West (Europe, the Mediterranean and more in recently North America), sections of Asia Minor (Iran, Iraq, Turkey), and certain parts of the Far East (China, Japan and India),55 these dominated our histories of art. Countries with strong traditions of dance, theatre and other types of performance (including puppet theatre, shadow theatre and mime) such as Malaysia, Java and Indonesia were typically dismissed as being uncultured. The reality of course was quite different. What typically occurred is that these cultures took narratives from static art forms such as literature and translated them into dynamic forms. Hence, the stories of an Indian epic, the Ramayana, made their way through Southeast Asia in the form of theatre, shadow puppet plays, dances and the like. Scholars such as Mair56 have rightly drawn attention to the importance of these performance arts (figure 5).
Etoki Japan
Par India
Parda Da Iran
Pien Wen China
Waysang Beber Malaysia
Figure 5. Examples of narrative based performance art in various countries.
Ultimately, however, the challenge goes far beyond simple dichotomies of taste,namely, whether one prefers the static, eternal arts of painting to the dynamic, now, arts of dance and music. A more fundamental challenge will lie in re-writing the whole of our history of art and culture to reflect how these seeming oppositions have in fact been complementary to one another. In the West, for instance, we know that much renaissance and Baroque art was based directly on Ancient mythology either directly via books such as Ovids Metamorphoses, or indirectly via mediaeval commentaries on these myths. We need a new kind of hyper-linking to connect all these sources with the products, which they inspired. Such hyperlinks will be even more useful in the East where a same mythical story may well be translated into half a dozen art forms ranging from static (scupture and painting) to dynamic (dance, mime, shadow theatre, puppet theatre, theatre). From all this there could emerge new criteria for what constitutes a seminal work: for it will become clear that a few texts have inspired works over the whole gamut of cultural expression. The true key to eternal works lies in those which affect everything from now to eternity.
9. Meta-Data
How is the enormity of this challenge to be dealt with in practice? It is generally assumed that meta-data offers a solution. The meta concept is not new. It played a central role in the meta-physics of Aristotle. In the past years with the rise of networked computing, meta has increasingly become a buzzword. There is much discussion of meta-data, meta-databases, and meta-data dictionaries. There is a Metadata Coalition,57 a Metadata Council58 and even a Metadata Review.59 Some now speak of meta-meta data in ways reminiscent of those who spoke of the meaning of meaning a generation ago.
The shift in attention from data to meta-data60 and meta-meta-data is part of a more fundamental shift in the locus of learning in our society. In Antiquity, academies were the centres of learning and repositories of human knowledge. In the Latin West, monasteries became the new centres of learning and remained so until the twelfth century, when this locus began to shift towards universities. From the mid-sixteenth to the mid-nineteenth centuries universities believed they had a near monopoly on learning and knowledge. Then came changes. First, there was a gradual shift of technical subjects to polytechnics. New links between professional schools (e.g. law, business) and universities introduced more short-term training goals while also giving universities a new lease on life.
The twentieth century brought corporate universities of which there are now over 1200. It also brought national research centres (NRC, CNR, GMD), military research laboratories (Lawrence Livermore, Los Alamos, Argonne, Rome), specialized institutes (such as Max Planck and Fraunhofer in Germany) and research institutes funded by large corporations (AT&T, General Motors, IBM, Hitachi, Nortel). Initially the universities saw themselves as doing basic research. They defined and identified the problems the practical consequences of which would then be pursued by business and industry. In the past decades all this has changed. The research staffs of the largest corporations far outnumber those of the greatest universities. AT&Ts Lucent Technologies has 24,000 in its Bell Laboratories alone and some 137,000 in all its branches. Hitachi has over 34,000, i.e. more researchers than the number of students at many universities. Nortel has over 17,000 researchers.
The cumulative information produced by all these new institutions means that traditional attempts to gather (a copy of) all known knowledge and information in a single location is no longer feasible. On the other hand a completely distributed framework is also no longer feasible. A new framework is needed and meta-data seems to be a new holy grail. To gain some understanding of this topic and the scope of the international efforts already underway will require a detour that entails near lists of information. Those too impatient with details are invited to skip the next twelve pages at which point we shall return to the larger framework and questions.
Basic Description
| Internet and Computer Software | |
| Generic Top Level Domain Names | (GTLD)61 |
| Hypertext Transfer Protocol | (http) |
| Multipurpose Internet Mail Exchange | (MIME) |
| Uniform Resource Name | (URN) |
| Uniform Resource Locator | (URL)62 |
| International Standards Organization | (ISO) |
| International Standard Book Numbering, ISO 2108:1992 | (ISBN)63 |
| International Standard Music Number, ISO 10957:1993 | (ISMN)64 |
| International Standard Technical Report Number | (ISRN)65 |
| Formal Public Identifiers | (FPI)66 |
| National Information Standards Office | (NISO) |
| Serial Item and Contribution Identifier | (SICI) |
| International Standard Serials Number | (ISSN)67 |
| Publishers | |
| Confédération Internationale des Sociétés dAuteurs et Compositeurs | (CISAC)68 |
| Common Information System | (CIS) |
| International Standard Works Code | (ISWC) |
| Works Information Database | (WID) |
| Global and Interested Parties Database | (GIPD) |
| International Standard Audiovisual Number | (ISAN)69 |
| International Federation of the Phonogram Industry | (IFPI) |
| International Standard Recording Code | (ISRC)70 |
| Cf. Other Standard Identifier | (OSI) 71 |
| Universal Product Code | (UPC) |
| International Standard Music Number | (ISMN) |
| International Article Number | (IAN) |
| Serial Item and Contribution Identifier | (SICI) |
| Elsevier | |
| Publisher Item Identifier | (PII)72 |
| Corporation for National Research Initiatives and International DOI Foundation | |
| Digital Object Identifier | (DOI)73 |
| Libraries | |
| Persistent Uniform Resource Locator | (PURL)74 |
| Handles | |
| Universities | |
| Uniform Object Identifier | (UOI)75 |
Figure 6. Major trends in meta-data with respect to basic identification.
Summary Description
| Internet | |
| W3 Consortium | |
| Hyper Text Markup Language 3 Header | (HTML) |
| META Tag76 | |
| Hyper Text Markup Language Appendage | (HTML) |
| Resource Description Format | (RDF) |
| Extensible Markup Language | (XML) |
| Protocol for Internet Content Selection | (PICS) |
| Uniform Resource Identifier | (URI) |
| Uniform Resource Characteristics | (URC) |
| Universally Unique Identifiers77 | (UUID) |
| Globally Unique Identifiers | (GUID) |
| Whois++ Templates | |
| Internet Anonymous FTP Archives Templates | (IAFA)78 |
| Linux Software Map Templates | (LSM) |
| Harvest Information Discovery and Access System | |
| Summary Object Interchange Format | (SOIF)79 |
| Netscape | |
| Meta Content Framework | (MCF)80 |
| Microsoft | |
| Web Collections81 | |
| Libraries | |
| International Federation of Library Associations | (IFLA)82 |
| International Standard Bibliographic Description | (ISBD) 83 |
| Electronic Records ISBD | (ER) |
| Dublin Core | |
| Resource Organization and Discovery in Subject Based Services | (ROADS) |
| Social Science Information Gateway | (SOSIG) |
| Medical Information Gateway | (OMNI)84 |
| Art, Design, Architecture, Media | (ADAM) |
| Full (Library Catalogue Record) Description | |
| Libraries | |
| Z.39.50 | |
| Machine Readable Record85 with many national variants | (MARC)86 |
| Other Catalogue formats summarized in Eversberg87 | (e.g. PICA, MAB) |
| Full Text | |
| Libraries and Museums | |
| Standard Generalized Markup Language | (SGML)88 |
| Library of Congress Encoding Archival Description | (LC EAD)89 |
| Text Encoding Initiative | (TEI) |
| Consortium for Interchange of Museum Information | (CIMI) |
Figure 7. Major trends in meta-data with respect to more complete description.
It is generally accepted that meta-data is data about data,90 or key information about larger bodies of information. Even so discussions of meta-data are frequently confusing for several reasons. First, they often do not define the scope of information being considered. In Internet circles, for instance, many authors assume that meta-data refers strictly to Internet documents, while others use it more generally to include the efforts of publishers and librarians. Secondly, distinctions need to be made concerning the level of detail entailed by the meta-data. Internet users, for instance, are often concerned only with the most basic information about a given site. In extreme cases they believe that this can be covered through Generic Top Level Domain Names (GTLD), while publishers are convinced that some kind of unique identifying number will be sufficient for these purposes (see figure 6). Present day search engines such as Altavista, and Lycos also use a minimal approach to these problems, relying only on a title and a simple tag with a few keywords serving as the metadata.
Others, particularly those in libraries, feel that summary descriptions, full library catalogue descriptions or methods for full text descriptions are required. Meanwhile some are convinced that while full text analysis or at least proper cataloguing methods are very much desireable, it is not feasible that the enormity of materials available on the web can be subjected to rigorous methods requiring considerable professional training. For these the Dublin Core is seen as a pragmatic compromise (figure 7). As can be inferred from the lists above, there are a great number of initiatives with common goals, often working in isolation, sometimes even ignorant of the others existence. Nonetheless, a number of organizations are working at integrated solutions for meta-data. We shall begin by examining four crucial players. While presented separately it is important to recognize that there are increasing synergies between/among these players and their solutions, which are to a certain extent competing with one another.
i) Internet Engineering Task Force (IETF)
The IETF, which is directly linked with the Internet Society, is active on a great number of fronts. At present, sites on the World Wide Web typically have a Uniform Resource Locator (URL). These suffer from at least two basic problems: i) they often change location and ii) there may be several mirror sites for the same material. The IETF has been working on a more comprehensive approach:
Resources are named by a URN (Uniform Resource Name), and are retrieved by means of a URL (Uniform Resource Locator). Describing the resource for purposes of discovery, as well as making the binding between a resource's name and its location(s) is the role of the URC (Uniform Resource Characteristic). The purpose or function of a URC is to provide a vehicle or structure for the representation of URIs [Uniform Resource Indicators] and their associated meta-information.91
The precise meaning of these terms is not as clear as one might wish. Weider,92 for instance calls Universal Resource Names (URNs)93 the equivalent of an ISBD number for electronic resources, whereas Iannella calls them a naming method. As for Universal Resource Characteristics (URC), Iannella calls them metadata, whereas Ron Daniels94gives them quite a different take. Similarly, the exact nature and function of the Uniform Resource Indicators (URI) has been the subject of considerable debate and at a meeting in Stockholm (September 1997), the IETF URI committee was officially disbanded. Subsequently, the W3 Consortium has taken up the problem (see below). Meanwhile, URNs still need to be mapped back to a series of disparate URLs. To this end the IETF is exploring at least four methods of URN to URL Mapping (Resource Discovery) and URC95 using http:
i) Domain Name Server (dns)96
ii) x-Domain Name Server 2 (x-dns-2) with trivial URC syntax97
iii) SGML designed to interoperate with the trivial URC scenario98
iv) Path, same as 2 above except that it is hierarchically arranged.99
A fifth method, Handle, is being explored by ARPA. Ultimately the
technical details of these competing schemes is less important than the result they
promise: a framework which will allow various sources to interoperate. It is noteworthy
that institutions around the world are working on these challenges. The Distributed
Technology Centre (DSTC) in Brisbane has a Basic URN Service for the Internet (BURNS)
project,100 and The URN Interoperability Project101 (TURNIP), while Earth Observation at the Joint Research
Centre (JRC) has an URN Resolver Experiment102
as part of its European Wide Service Exchange (EWSE) initiative. Meanwhile the IETF, is
exploring Uniform Resource Agents (URA's) :103
as a means of specifying composite net-access tasks. Tasks are described as "composite" if they require the construction and instantiation of one or more Uniform Resource Locators (URL's) or Uniform Resource Names (URN's), and/or if they require transformation of information returned from instantiating URL's/URN's.104
Precisely, how all these initiatives should be integrated is still a matter of conjecture. For example, the Internet Anonymous File Transfer Protocol Archives Working Group (IAFA),105 initially worked on Templates for Internet data. This became a new group called Integration of Internet Information Resources Working Group (IIIR).106
This group also worked toward Query Routing Protocol (QRP), which they abandoned in favour of working on a Structured Text Interchange Format (STIF).107 More significantly, they also set out to integrate WAIS, ARCHIE, and Prospero into a Virtually Integrated Information Service (VUIS). To this end they introduced four Requests for Comments.108 Of these, the Integrated Internet Information Service (IIIS) foresees the integration of some of the major types of information used on the internet (figure 8):
Gopher WAIS WWW Archie Others
Resource discovery system perhaps based on Whois++
Uniform Resource Name to Uniform Resource Locator Mapping System perhaps based on Whois++ or X.500
| Transponder | Transponder | Transponder |
| Resource | Resource | Resource |
Figure 8. Basic Scheme from RFC 1727 showing how various protocols would be integrated using Whois++ and X.500.
Another attempt by the IETF at creating an integrated strategy for meta-data on the internet is their Common Indexing Protocol109 (CIP),110 which foresees a combination of four elements: a client, a protocol for the front-end, an indexing object and a database backend or query protocol (figure 9, cf. Appendix 2 which provides a glossary of some of key technical terms). While undoubtedly essential, such attempts are focussed mainly on information available on the Internet and do not yet address more complex challenges of other knowledge available in museums and libraries.| Client | Protocol | Indexing | Database Backend |
| Front End | Object | or Query Protocol | |
| Whois++ | Whois++ | SQL or | |
| PH | PH | Indexer API | Z39.50 or |
| LDAP | LDAP | GNU DBM or (GDBM) |
Figure 9. Basic Scheme concerning Common Indexing Protocol (CIP).
Work is also progressing on an Application Configuration Access Protocol (ACAP, RFC 2244).111 Meanwhile other groups within the IETF are addressing more wide-ranging solutions. One group, for instance, is working on World Wide Web Distributed Authoring and Versioning112 (WebDAV), which will deal with metadata, name space management, overwrite prevention and version management, and has become part of the W3s Resource Description Framework (RDF, see below).
ii) World Wide Web Consortium (W3)103
If the IETF is the chief body concerned with developing a pipeline for the Internet, the W3 Consortium is the main body devoted to integrating meta-data with respect to content on the Internet. It is, for instance, developing a convention for embedding metadata in HTML.114 When an IETF committee working on a Universal Resource Indicators (URI) was disbanded for want of agreement, the problem was taken up by W3C, who are tackling all the existing addressing schemes.115 The result of these efforts will be to create a universal solution for the stopgap measures outlined above in figure 8.
One of the key activities of the W3 Consortium has been in the context of markup languages. As was noted earlier, languages such as Standard Generalized Markup Language (SGML),116 helped the aims of meta-data by separating form from content: separating different views or presentation methods from the underlying information. The advent of Hyper Text Markup Language (HTML)117 as an interim pragmatic solution temporarily obscured this distinction. Since then the consortium has been working on a subset of SGML, which is adequate for dealing with simpler documents and re-establishes the distinctions between form and content. This Extensible Markup Language (XML)118 is also being submitted to the ISO (10179:1996).
It is foreseen that XML will form a basis to which one will add Cascading Style Sheets (CSS)119 as part of a Document Style Semantics and Specification Language (DSSSL).120 Similarly one can then add specialized markup languages, decription languages and formats (figure 10).
| Markup Languages | (CML) |
| Chemical Markup Language | |
| Handheld Device Markup Language | (HDML) |
| Mathematical Markup Language | (MML) |
| Precision Graphics Markup Language121 | (PGML) |
| Description Languages | |
| Hardware Description Language | (HDL) |
| Web Interface Description Language | (WIDL) |
| Formats | |
| Channel Definition Format | (CDF) |
| Resource Description Format | (RDF) |
Figure 10. Special markup and description languages and formats linked with XML.
XML will serve as the underlying structure for a comprehensive scheme,122 which includes Protocol for Internet Content Selection (PICS), Digital Signatures (Dsig), Privacy Information (P3P) within a Resource Description Framework (RDF). PICS initially began as a means of restricting access for children to pornographic and other dangerous contents. PICS is evolving into a common platform for labelling online resources and a system for describing content using a restricted vocabulary. The PICS labels (metadata) for Internet resources123 have five aims:
PICS entails three kinds of metadata:124 i) embedded in content; ii) along with, but separate from content and iii) provided by an independent provider (label bureau). In a next phase PICS will become part of a larger Resource Description Framework125 (RDF), which aims at machine understandable assertions of web resources in order to achieve:
RDF will have at least three vocabularies, namely a Protocol for Internet Content Selection (PICS) rating architecture; the Dublin Core (DC) elements for digital libraries and Digital Signatures (Dsig) for authentication. RDF uses a Document Object Model126 (DOM), and a Resource Description Messaging Format127 (RDMF). Implicit in this approach is the possibility of mapping a subject in the Dublin Core Framework, with subjects in one of the main classification schemes (e.g. Library of Congress, Dewey, Göttingen) and a version in everyday language.
XML will thus serve as an underlying structure for simple web documents, while SGML continues to be used for complex information such as the repair manuals for aircraft carriers or large jets.128 It is important to recognize that the W3s approach to metadata is constantly evolving and is likely to change considerably in the course of the next few years.129 For instance, the director of the W3 consortium, Tim Berners Lee, in a keynote to WWW7 (Brisbane, April 1998), recently outlined his vision of a global reasoning web, whereby every site would also be classed in terms of its veracity or truth value.
iii) Z39.50130
Complementing these efforts of the Internet community are those of the library world, which have focussed almost exclusively on interoperability among libraries and have left aside the more complex elements of Internet information. Chief among these is Z.39.50. This is the American National Standard for Information (ANSI) Retrieval. It is based on two ANSI-NISO documents (1992131 and 1995132), which led to a network protocol,133 that is session oriented and stateful, in contrast to http and gopher, which are stateless. An early version ran on WAIS. The new version runs over TCP/IP. It uses an Object Identifier (OID). Z39.50 has the following six attribute sets:
| Bibliographic 1 | (Bib-1)134 |
| Explain | (Exp-1) |
| Extended Services | (Ext-1) |
| Common Command Language | (CCL-1) |
| Government Information Locator Service | (GILS) |
| Scientific and Technical Attribute Set (Superset of Bib-1) | (STAS) |
In addition it offers six record syntaxes, namely:
| Explain | |
| Extended Services | |
| Machine Readable Card including national variants | (MARC) |
| Generic Record Syntax | (GRS-1) |
| Online Public Access Catalogue | (OPAC) |
| Simple Unstructured Text Record Syntax | (SUTRS) |
The Library of Congress has become the central library site for Z39.50 developments. The solution is being used in the European Commissions OPAC Network in (ONE), a project, which includes the British Library (BL), the Danish National Library (DB), the Dutch Electronic libraries project (PICA), an Austrian initiative (Joanneum Research) and the Swedish National Library. It is also being used in the Gateway to European National Libraries (GABRIEL).
Meanwhile the Z39.50 protocol has been accepted as a basic ingredient by the Consortium for the Interchange of Museum Information (CIMI), which in turn has been supported as a part of the European Commissions Memorandum of Understanding for Access to Europes Cultural Heritage. Hence, while some technologists may lament that the solution lacks elegance, it has the enormous advantage of having been accepted by virtually all the leading players in the international library and museum scene and thus needs to be considered as one of the elements in any near future solution.
iv) Dublin Core
Major libraries and museums typically have highly professional staff and therefore assume that records will be in a MARC format, or possibly with more complex methods such as SGML or EAD, or the variations provided by TEI and CIMI. Smaller libraries cannot always count on access to such resources. To this end, the Online Computer Center (OCLC) based in Dublin, Ohio in conjunction with the National Center for Supercomputing Applications (NCSA), sponsored an initial Metadata Workshop (1-3 March 1995),135 at which 17 elements of the Dublin Core (DC)136 also known as Monticello Core Elements (Mcore) were proposed (see figure 11 below) as well as three types of qualifiers,137 namely, language, scheme and type. Since this was the first of a series of meetings it is frequently referred to as Dublin Core 1.
A second meeting (Dublin Core 2), which took place in Warwick, England, produced the Warwick Framework.138 This provided containers for aggregating packages of typed meta-data and general principles of information hiding. A third meeting (Dublin Core 3) held in Dublin, Ohio focussed on images.139 A fourth meeting (Dublin Core 4) took place in Canberra140and a fifth (Dublin Core 5) in Helsinki.141
| Title | Format |
| Creator | Identifier |
| Subject | Identifier |
| Description | Source |
| Publisher | Language |
| Contributors | Relation |
| Date | Coverage |
| Type | Rights |
Figure 11. List of the fifteen Dublin Core (DC) or Monticello Core (Mcore) elements, seen as a basic subset of more complex records such as MARC, SGML, TEI etc.
The Dublin Core has nine working groups: rights management, sub-elements, data model, DC Data, DC and Z39.50; relation type, DC in multiple languages, coverage, format and resource types. The Dublin Core is being applied to the Nordisk Web Index and the European Web Index (NWI/EWI). One of the reasons why it is so significant is because it is being linked with a number of other meta-data formats, namely, HTML 2.0/3.2 META Elements, WHOIS ++ Document Templates, US MARC, SGML and possibly MCF.142 These meta-data records may be bibliographic, but can also relate to administration, terms/conditions as well as ratings (figure 12).
| MD Bibliographic | MD Administration | MD Terms/Conditions | MD Ratings |
| MD Dublin Core | MD MARC |
Figure 12. Basic scheme showing how meta-data (MD) pertaining to bibliographic records can be linked with administration, terms/conditions and ratings.
The true power of this approach is that it can readily be expanded into a more general method for handling, interchange and ultimately marketing of information and/or knowledge packages, which helps to explain why firms such as IBM have become very seriously interested in and supportive of this approach. It offers a new entry point for their e-business vision of the world (figure 13).
| Digital Object | Metadata Container | |||
| Metadata Package | ||||
| Handle | Content Container | |||
| Metadata Container | Content Element | Content Element | ||
| Content Container | Metadata Container | |||
| Content Package |
Figure 13. A more generalized scheme showing relations of meta-data sets to their various parts.143
As the above figures reveal, it is foreseen that the Dublin Core elements from personal sites and smaller institutions will interact with the more elaborate formats of major institutions (MARC etc.). Hence while the Dublin Core may, at first glance, appear to be merely a quick and dirty solution to a problem, it actually offers an important way of bridging materials in highly professional repositories with those in less developed ones. Moreover, while the Dublin Core in its narrow form is primarily a method for exchanging records about books and other documents, within this more generalized, expanded context, it offers a method for accessing distributed contents.
How will the extraordinary potentials of the technologies outlined above be developed? Any attempt at a comprehensive answer would be out of date before it was finished. For the purposes of this paper it will suffice to draw attention to a few key examples. One of the earliest efforts to apply these new tools is the Harvest Information Discovery and Access System144 The Harvest method uses the Summary Object Interchange Format (SOIF),145 which employs the Resource Description Message Format (RDMF), in turn a combination of IAFA templates and BibTex146 which is part of the Development of a European Service for Information on Research and Education (DESIRE)147 project linked with the European Commissions Telematics for Research Programme. It has been applied to Harvest, Netscape, and the Nordisk Web Index (NWI). This includes a series of attributes,148 a series of template types149 and other features.150 While this method is limited to Internet resources, it represents an early working model.
The challenge remains as to how these tremendously varied resources can be integrated within a single network, in order that one can access both new web sites as well as classic institutions such as the British Library151 or the Bibliothčque de la France. One possible solution is being explored by Carl Lagoze152 in the Cornell Digital Library project. Cornell is also working with the University of Michigan on the concept of an Internet Public Library153. Another solution is being explored by Renato Iannella154 at the Distributed Technology Centre (DSTC). This centre in Brisbane, which was one of the hosts of the WWW7 conference in 1998, includes a Resource Discovery Unit. In addition to its Basic URN Service for the Internet (BURNS) and The URN Interoperability Project (TURNIP), mentioned earlier, it has an Open Information Locator Project Framework (OIL). This relies heavily on Uniform Resource Characteristics (including Data,156 Type, Create Time, Modify Time, Owner). In the Uniform Resource Name (URN), this method distinguishes between a Namespace Identifier (NID) and Namespace Specific String (NSS). This approach is conceptually significant because it foresees an integration of information sources, which have traditionally been distinct if not completely separate, namely, the library world, internet sources and telecoms. (figure 14).
| urn:isbn: | publishing | ISBN no. |
| inet:dstc.edu.au . | internet servers | listname |
| telecom: | telecom | telephone no. |
Figure 14. Different kinds of information available using the Open Information Locator Project Framework (OIL).
Yet another initiative is being headed by the Open Management Group (OMG).157 This consortium of 660 corporations has been developing a Common Object Request Broker Architecture (CORBA),#~158158 which links with an Interoperable Object Reference (IOR). One of its advantages is that it can sidestep some of the problems of interaction between hyper text transfer protocol (http) and Transfer Control Protocol (TCP). It does so by relying on Internet Inter Object Request Broker Protocol (IIOP). It also uses an Interface Repository (IR) and Interface Definition Language (IDL, ISO 14750)159. CORBA has been adopted as part of the Telecommunications Information Networking Architecture (TINA).160
Some glimpse of a growing convergence is the rise of interchange formats designed to share information across systems. The (Defense) Advanced Research projects Agencys (ARPAs) Knowledge Interchange Format (KIF) and Harvesters Summary Object Information Format (SOIF) have already been mentioned. NASA has a Directory Interchange Format (DIF). The Metadata Coalition has a Metadata Interchange Specification161 (MDIS).
At the university level, Stanford University has a series of Ontology Projects.162 The California Institute of Technology (Caltech) has a project called Infospheres concerned with Distributed Active Objects.163 Rensselaer Polytechnic has a Metadatabase which includes an Enterprise Integration and Modeling Metadatabase164, a Visual Information Universe Model,165 a Two Stage Entity Relationship Metaworld (TSER) and an Information Base Modelling System (IBMS)166
Meanwhile, companies such as Xerox have produced Metaobject Protocols167 and Meta Data Dictionaries to Support Heterogeneous Data.168 Companies such as Data Fusion (San Francisco), the Giga Information Group (Cambridge, Mass.), Infoseek (Sunnyvale, California),169 Intellidex170 Systems LLC, Pine Cone Systems171 and NEXOR172 are all producing new software and tools relevant to metadata.173
Vendors of library services are also beginning to play a role in this convergence. In the past each firm created its own electronic catalogues with little attention to their compatibility with other systems. In Canada, thanks to recent initiatives of the Ontario Library Association (OLA),174 there is a move towards a province wide licensing scheme to make such systems available to libraries, a central premise being their compatibility and interoperability.
10. Global Efforts
Technologists engaged in these developments of meta-data on the Internet are frequently unaware that a number of international organizations have been working on meta-data for traditional sources for the past century. These include the Office Internationale de Bibliographie, Mundaneum,175 the International Federation on Information and Documentation (FID176), the International Union of Associations (UIA177), branches of the International Standards Organization (e.g. ISO TC 37, along with Infoterm), as well as the joint efforts of UNESCO and the International Council of Scientific Unions (ICSU) to create a World Science Information System (UNISIST). Indeed, in 1971, the UNISIST committee concluded that:
a world wide network of scientific information services working in voluntary association was feasible based on the evidence submitted to it that an increased level of cooperation is an economic necessity"178.
In 1977, UNISIST and NATIS, UNESCO's concept of integrated national information concerned with documentation, libraries and archives, were merged into a new Intergovernmental Council for the General Information Programme (PGI).179 This body continues to work on metadata.
Some efforts have been at an abstract level. For instance, the ISO has a subcommittee on Open systems interconnection, data management and open distributed processing (ISO/IEC JTC1/SC21). The Data Documentation Initiative (DDI), has been working on a Standard Generalized Markup Language (SGML) Document Type Definition (DTD) for Data Documentation.180 However, most work has been with respect to individual disciplines and subjects including art, biology, data, education, electronics, engineering, industry, geospatial and Geographical Information Systems (GIS), government, health and medicine, library physics and science. Our purpose here is not to furnish a comprehensive list of all projects, but rather to indicate priorities thus far, to name some of the major players and to convey some sense of the enormity of the projects already underway. More details concerning these initiatives are listed alphabetically by subject in Appendix 3.
The most active area for metadata has been in the field of geospatial and Geographical Information (GIS).181 At the ISO level there is a Specification for a data descriptive file for geographic interchange (ISO 8211),182 which is the basis for the International Hydrographic Organizations transfer standard for digital hydrographic data (IHO DX-90).183 The ISO also has standards for Geographic Information (ISO 15046)184 and for Standard representation of latitude, longitude and altitude (ISO 6709),185 as well as a technical committee on Geographic Information and Geomatics186 (ISO/IEC/TC 211), with five working groups.187 At the international level the Fédération Internationale des Géomčtres (FIG) has a Commission 3.7 devoted to Spatial Data Infrastructure. The International Astronomical Union (IAU) and the International Union of Geodesy and Geophysics (IUGG) have developed an International Terrestrial Reference Frame (ITRF).188
At the European level geographical information is being pursued by two technical committees, European Norms for Geographical Information (CEN/TC 287)189 and European Standardisation Organization for Road Transport and Traffic Telematics (CEN/TC 278),190 notably working group 7, Geographic Data File (GDF).191 At the national level there are initiatives in countries such as Canada, Germany, and Russia. The United States has a standard for Digital Spatial Metadata192, a Spatial Data Transfer Standard (SDTS)193and a Content Standard Digital Geospatial Metadata194 (CSDGM).195 Meanwhile, major companies are developing their own solutions, notably Lucent Technologies196, IBM (Almaden),197 which is developing spatial data elements198 as an addition to the Z39.50 standard, Arc/Info, Autodesk and the Environmental Systems Research Institute (ESRI).
Related to these enormous efforts in geospatial and geographical information have been a series of initiatives to develop metadata for the environment. At the world level, the United Nations Environmental Program (UNEP) has been developing Metadata Contributors.199 In the G 8 pilot project dedicated to environment, there is a Metainformation Topic Working Group200 (MITWG) and Eliot Christian has developed a Global Information Locator Service (GILS). 201There is a World Conservation Monitoring Centre,202 a Central European Environmental Data Request Facility (CEDAR). Australia and New Zealand have a Land Information Council Metadata203 (ANZIC). In the United States, the Environmental Protection Agency (EPA) has an Environmental Data Registry.204
In the field of science, the same Environmental Protection Agency (EPA) has a Scientific Metadata Standards Project.205 The Institute of Electrical and Electronic Engineers (IEEE)206 has a committee on (Scientific) Metadata and Data Management. In the fields of physics and scientific visualisation, the United States has a National Metacenter for Computational Science and Engineering207 with the Khoros208 project. In biology there are initiatives to produce biological meta-data209 and the IEEE has introduced a Biological Metadata Content Standard. In the United States there is a National Biological Information Infrastructure210(NBII) and there are efforts at Herbarium Information Standards.In industry, the Basic Semantic Repository211 (BSR), has recently been replaced by BEACON,212 an open standards infrastructure for business and industrial applications. In engineering, there is a Global Engineering Network (GEN) and, as was noted above there are a number of consortia aiming at complete interoperability of methods. In the United States, which seems to have some meta-association for almost every field, there is a National Metacenter for Computational Science and Engineering.213 In the case of electronics, the Electronic Industries Association has produced a CASE Data Interchange Format (CDIF).
In the field of government, Eliot Christians work in terms of the G7 pilot project on environment has inspired a Government Information Locator Service214 (GILS). In health, the HL7 group has developed a HL7 Health Core Markup Language (HCML). In education, there is a Learning Object Metadata Group,215 a Committee on Technical Standards for Computer Based Learning (IEEE P1484) and Educom has a Metadata Tool as part of its Instructional Management Systems Project. In art, the Visual Resources Association (VRA) has produced Core Categories Metadata.216
Not surprisingly, the library world has been quite active in the field of metadata. At the world level, the International Federation of Library Associations (IFLA) has been involved, as has the Text Entering Initiative (TEI), the Network of Literary Archives (NOLA), and the Oxford Text Archive (OTA). At the level of G8, it is a concern of pilot project 4, Biblioteca Universalis.217 At the European level there is a list of Library Information Interchange Standards (LIIS).218 In Germany there is a Metadata Registry concerned with metadata and interoperability in digital library related fields.219 In the United States, there is an ALCTS Taskforce on Metadata and a Digital Library Metadata Group (DLMG).
In the United Kingdom the Arts and Humanities Data Service (AHDS) and the United Kingdom Office for Library and Information Networking (UKOLN)220 have a Proposal to Identify Shared Metadata Requirements,221 a section on Metadata222 and for Mapping between Metadata Formats.223 They are concerned with Linking Publishers and National Bibliographic Services (BIBLINK) and have been working specifically on Resource Organization and Discovery in Subject Based Services (ROADS)224 which has thus far produced gateways to Social Science Information (SOSIG), Medical Information (OMNI)225 and Art, Design, Architecture, Media (ADAM). They have also been active in adopting basic Dublin Core elements. A significant recent by Rust has offered a vision provided by an EC project, Interoperability of Data in E-Commerce Systems (INDECS), which proposes an integrated model for Descriptive and Rights Metadata in E-Commerce.226Figure 15. Interrelationship of different kinds of meta-data as foreseen by Rust.
This concludes the detour announced twelve pages ago. Standing back from this forest of facts and projects, we can see that there are literally hundreds of projects around the world all moving towards a framework that is immensely larger than anything available in even the greatest physical libraries of the world. Tedious though they may seem, these are the stepping stones for reaching new planes of information, which will enable some of the new scenarios in knowledge explored earlier. They are also proof that the danger of a second flood in terms of information as foreseen by authors such as Pierre Levy is not being met only with fatalistic, passive, resignation.
Steps have been taken. Most of the projects thus far have focussed on the pipeline side of the problem. How do we make a database in library A compatible with that of library B such that we can check references in either one, and then, more importantly, compare references found in various libraries joined over a single network? Here the Z39.50 protocol has been crucial. As a result, networks are linking the titles of works in a number of libraries spread across a country, across continents and potentially around the world. Examples include the Online Computer Center (OCLC), the Research Library Information Network (RLIN) based in the United States and PICA based in the Netherlands. The ONE227 project, in turn, links the PICA records with other collections such as Joanneum Research and the Steiermärkische Landesbibliothek (Graz, Austria), the Library of the Danish National Museum, Helsinki University Library (Finland), the National Library of Norway, LIBRIS (Stockholm, Sweden), Die Deutsche Bibliothek (Frankfurt, Germany), and the British Library. Some of these institutions are also being linked through the Gateway to European National Libraries Project (GABRIEL).228 The German libraries are also working on a union catalogue of their collections. In the museum world there are similar efforts towards combining resources through the Museums Over States in Virtual Culture (MOSAIC)229 project and the MEDICI framework of the European Commission. In addition, there are projects such as the Allgemeine Künstler Lexikon (AKL) of Thieme-Becker, and those of the Getty Research Institute: e.g. Union List of Author Names (ULAN) and the Thesaurus of Geographic Names (TGN).230
What are the next steps? The Maastricht McLuhan Institute, a new Centre for Research in European Digital Culture, will focus on two. First, it will make these existing distributed projects accessible through a common interface using a System for Universal Media Searching (SUMS). The common interface will serve at a European level for the MOSAIC project and at a global level as part of G8, pilot project five: Multimedia access to world cultural heritage. A second, step will be to use these resources as the basis for a new level of authority lists for names, places and dates. In so doing it will integrate existing efforts at multilingual access to names as under development by G8 pilot project 4, Biblioteca Universalis, and earlier efforts of UNEP, to gain new access to variant names. In the case of terms, it will make use of standard classifications (e.g. Library of Congress, Dewey, Göttingen and Ranganathan231), as well as specialized classification systems for art such as Iconcl