DocumentThis article may require cleanupto meet Wikipedia's quality standards.
Please improve this articleif you can. (May 2008)
A document (noun) is a bounded physical representation of body of information designed with the capacity (and usually intent) to communicate. A document may manifest symbolic, diagrammatic or sensory-representational information. To document (verb) is to produce a document artifact by collecting and representing information. In prototypical usage, a document is understood as a paper artifact, containing information in the form of ink marks. Increasingly, documents are also understood as digital artifacts.
Colloquial usage is revealed by the connotations and denotations that appear in a Web search for document. From these usages, one can infer the following typical connotations:
- Writing that provides information (especially information of an official nature)
- Anything serving as a representation of a person's thinking by means of symbolic marks
- A written account of ownership or obligation
- To record in detail; "The parents documented every step of their child's development"
- A digital file in a particular format
- To support or supply with references; "Can you document your claims?"
- An artifact that meets a legal notion of document for purposes of discovery in litigation
The variety of usage reveals that the notion of document has rich social and cultural aspects besides the physical, functional and operational aspects.
- 1 Conceptualization in analytical philosophy
- 2 Empirical characterization
- 3 Social aspects of documents
- 4 Functional characteristics
- 5 Classical roles and workflows in document production
- 6 Document production technology
- 7 Document life cycle management technology
- 8 The document economy
- 9 Future of documents
- 10 See also
- 11 References
Conceptualization in analytical philosophy
The notion of document admits both an empirical (in terms of a fuzzy set of real-world instances) and analytical characterization. The analytical characterization hinges on the semantic character of the word document, as well as the use of a primitive notion of document in accounts of larger communication constructs such as discourses, or related constructs such as language games.
The nominal 'document', like other nominals, exhibits familiar patterns of polysemy (a kind of ambiguity). For example, "document" might be used on an occasion to denote a certain body of information independently of how that information is physically rendered (as in 'the Bible is my favorite document.'; 'Have you finished reading all the documents for Monday's class yet?'), or it might be used to denote a particular physical instantiation of a body of information (as in 'that document is worn and needs to be re-bound.'; 'Return the documents you borrowed to the reference desk.'). This kind of polysemy bears some similarity to what Nunberg, 1979 termed "container/contents polysemy" (as in 'Mary broke the bottle' versus 'the baby finished the bottle'). These patterns of polysemy exhibited by 'document' matter for the following reason. A certain document qua body of information (e.g. the Bible, not a particular bound copy thereof) will have different properties than a document qua physical rendering of a body of information (e.g. a particular bound copy of the Bible). Importantly, the latter would have the property of being a static, physically bounded thing. The former would have the properties of being able to evolve over time, being susceptible of certain changes to information content, and being capable of supporting multiple physical instantiations that have allowable differences in information content. This distinction is relevant to the discussion of aspects and history of documents below.
In light of the polysemy of the core concept of document, it is useful to note a number of examples ranging from instances commonly understood as prototypical documents, to instances that are understood as documents only in specialized or rare situations.
- Prototypical Documents: Letters, memos, legal forms, Instruction manual
- Documents of Record: Newspapers and magazines
- Books: Text book, Novels, Recipe books, Encyclopedia, Comic books
- Canonical Documents: The Bible,Iliad and Odyssey,Vedas, Ramayana, Mahabharata, Quran, Code of Hammurabi,Tao Te Ching
- Transactional Documents: Cheque, Contracts, Prescription, Receipt, Form (document), Postage Stamp
- Functional Documents: PDF files, PostScript files, XML files, Email
- Non-Prototypical Documents: Post-it notes, Fortune cookie strips, Maps, Paintings, milk cartons, cereal boxes
- Non-Classical Digital Documents: Web Page, Weblog, Wiki
- Boundary Examples: The plaque on the Pioneer 11 spacecraft, designed by astronomer Carl Sagan, and using information assumed to be universal is an extreme example of a document that is intended to communicate with aliens. Conversely, the recorded and printed signals of the SETI project would constitute documents if they were discovered to contain alien communication.
Social aspects of documents
Documents play a key role in the construction of social reality (Searle, 1996) and therefore play a part in accounts of every important aspect of human society and culture. An example of this type of account is in the seminal account of the role of print in political evolution, Imagined Communities, (Anderson, B., 2006). More direct examples include the works of Marshall McLuhan (McLuhan, 1964 and 1969). Many key social aspects of documents arise from their historically unchanging character. This aspect leads to a definition of a document as a talking thing (Levy, D., 2003), whose strengths and weaknesses both arise from its relative (historical) immutability with respect to oral forms of communication. The relative immutability of documents has thus historically been important for establishing a record of transient events, or for preserving information whose precise linguistic form is of ritual or practical importance (such as religious texts or legal documents). Note though, that historically many societies have accorded greater authority to disciplined oral traditions as more reliable than parallel written ones. With this caveat in mind, the following social aspects of information may be noted.
- Social Value: The information in documents as well as documents themselves are often valuable; the information because of the influence represented, and the document itself when it is believed to be a rare or unique and authentic representation of the information it contains.
- Manifestation of authority: Documents are often produced to provide a record that will be considered authoritative in the future, particularly with respect to government. Consider receipts, titles, and deeds as examples of proof of ownership, and passports or driver's licenses as proof of identity.
- Conventional: Documents inherit a key feature of language-based communication in general: they are denoted as documents by convention (Lewis, 2002). Virtually any medium can constitute a document provided the people involved can agree on the meaning represented. Hence cave drawings, hieroglyphics, scrolls of sheepskin, sheets of papyrus, ink on paper, magnetic tape and electronic files are all documents under certain accounts.
- Manifestation of economic labor: Historically, the effort required to produce a document has been significant, so only the most important documents were created. The Illuminated manuscript of the pre-Gutenberg era demonstrates the cost (and associated imputed value) of documents. Historically, the cost of producing documents has declined, while their functional characteristics ("affordances" in the sense of Sellen and Harper, 2001) have become richer.
- Manifestation of business processes: Documents play many roles in the internal management of a business as well in the interfaces between businesses and their suppliers, employees, and customers. Current trends toward longer value chains and increased regulation increase the number of documents that must be generated and processed.
- Instruments of Governance and Law: The unchanging aspect of documents is crucial to the consistent communication of policy and administration of law to citizens. Documents that play such roles include constitutions, corporate annual reports and religious texts.
- Analytical philosophical character: The notion of document plays a role in political philosophy (example, the notion of social contract as a primitive construct), as well as in the philosophy of law
- Role in Religion: Documents play a key role in religion, and constitute canonical content. Document-related terms such as dogma and doctrine have today acquired pejorative connotations primarily due to historical events associated with religious documents.
- Cultural Significance: Documents play a central role in art of all varieties. In the movie Office Space for instance, central plot elements are frustration with bureacratic process involving the fictional "TPS reports" and a malfunctioning printer.
- Metaphoric Significance: Metaphors based on documents permeate our thinking, ranging from the obvious ("let's start with a clean sheet for this design", "this is a new chapter in my life" and "she wrote the book on that") to the highly allegorical ("All mankind is of one author, and is one volume; when one man dies, one chapter is not torn out of the book, but translated into a better language; and every chapter must be so translated" — John Donne).
Documents also manifest several, more localized characteristics that determine how we use them in everyday life:
- Manifest nature: Information is physical, i.e. it always must exist in a tangible form, even when digital. IBM computer scientist Rolf Landauer is credited with this observation and working out its implications. By virtue of being realizations of chunks of information, documents are necessarily physical in all their forms.
- Contextuality and Situatedness: All communication takes place in a context, which includes at least the shared understanding of the parties communicating (Lewis, 2002). Explicit and implicit references to the context can convey a large amount of meaning by building on the shared understanding, but that meaning is lost to another party that does not share that context. For example, Shakespeare in the original would be incomprehensible to modern readers simply because of the evolution of language and spelling since the seventeenth century, and modern readers (besides Shakespeare scholars) normally read modernized versions. Similarly, hypertext documents exist in a context which is lost if printed, leading to a different offline reading context.
- Evolvability: When we think of a document as a definitive source containing the best known information about a topic there is need to change that information as more is learned. This is frequently done by revising the document into a new version or edition. Typically, older versions are archived to facilitate understanding how the document has changed. In modern contexts, when technologies such as wikis or software source code are under discussion, this evolvability can require very sophisticated version control technologies.
- Renderability: Every abstract entity that is understood to be a document in some context can be rendered, often in more than one way. A rendition of a document refers to a particular physical or electronic representation of the information from the document. For example, a portable document format (pdf) representation and a web page may contain the same information but have substantially different properties and appearances. We think of them as different renditions (or renderings) of the same document. We might similarly consider different translations of a document to be the same document although differences in language context and structure may make it impossible to express precisely the same meaning in both languages.
- Affordances: Documents in digital and physical forms manifest various "affordances" (Sellen and Harper, 2001, Gladwell, 2002)). The affordances of a particular rendition of a document determine its uses. For example, paper has the affordances of allowing flipping and easy tactile manipulation, while digital forms are easier to edit.
Classical roles and workflows in document production
There are a number of roles in which people are involved in the creation and distribution of traditional paper documents (Romano, 1989); some, but not all documents are processed by people acting in each role, each of which may be performed by an individual or a group. Books are a well known example of documents that require an extensive publication process, but many other documents undergo similar processes to at least some of those from book publication. Each of these roles is considered to improve or add value to a document. These roles are generally understood as being clustered in various phases in the production of a classical document, including authorship, editing and prepress. Roles and workflows in the production of modern digital documents are more variable and are discussed in the section on future documents.
- An author selects the content to be communicated and performs the initial organization and recording of the content. A document in this state is often called a manuscript.
- A reviewer reads the content and evaluates it with respect to the intended audience. Reviewers often recommend only the best documents to be published. Documented reviews are frequently published as guidelines for document consumers as well.
- An editor helps to organize and express the content so that the meaning is clear and understandable, and follows the conventions of the symbolic representation such as spelling and grammar.
- A publisher orchestrates the process of producing a document, often decides whether a document is worth the effort of publishing (usually an economic decision), and collects and disseminates the profits from sales of a produced document.
- A printer formats the document into a comfortable form
such as a bound book. Printing can be a very complex and elaborate process,
- pagination - function performed by an individual who takes on the tasks of organizing text, fonts, images, headings, footnotes, chapters and sections to accommodate the physical constraints of a printed page aesthetically.
- pre-press -- function performed by print shops in preparing paper documents for production.
- imposition - organizing desired pages on a larger media such that when folded and trimmed the pages will be upright and in order.
- printing - marking paper with ink or toner
- folding pages into sections
- binding pages together and covering
- A distributor manages inventory and physical distribution of printed documents to retailers.
- A retailer manages a local inventory and sales to consumers, and often is familiar with the content and can make appropriate recommendations.
- A librarian organizes, tracks borrowing of, and archives documents.
A publication process enables a consumer to purchase or borrow, read and learn from documents. Consumers are often the intended audience of the publication process.
Document production technology
Document production technology has evolved significantly through history. While a great deal can be said about ancient production technologies including papyrus, palm leaves, stone tablets and marking devices ranging from quills to chisels, the modern form of the document has evolved largely under the influence of printing technologies. The Illuminated manuscript of Europe is a useful prototypical instance of the document at the end of its evolution before the widespread use of printing. The associated technology was largely a human one. Other cultures at this stage used other forms of pre-print era documents. The history of printing can be traced as follows:
Bronze age civilizations made extensive use of seals for commercial and transactional purposes. The particular case of the signet ring was of particular importance, and is still in use in place of signatures in East Asian countries like Korea, where it is common for individuals to carry a seal.
Chinese Woodblock printing was the first widespread technology that automated important parts of the document production process.
The Gutenberg Printing Press (McLuhan, M., 1969) enabled the mass production of faithful copies of documents, and hence the widespread dissemination of information. The widespread access to information enabled (and necessitated) fundamental changes to society in religion, government, law, business, and entertainment. Prior to the press the huge effort required to faithfully hand-copy severely limited the number of documents available, and hence access to the information contained therein. The effort to set type and prepare a document for reproduction was still high, but many high fidelity copies could be produced.
The development of Lithography constituted the next great advance in document production technology and continues today to dominate the economic landscape of document production, an economic sector estimated to be of the order of $1 trillion. Lithography brought economies of scale and extremely high quality and low cost to documents.
The typewriter improved the accessibility of document production technologies and enabled it to enter mainstream workplaces. Carbon paper enabled a modest number of copies to be produced concurrently with the original. A brief era of photography-based technologies flourished (including the photostat and cyclostyle processes) in parallel with the age of typewriters.
The Xerox Copier became a major milestone in document production by eliminating the typesetting effort required by a printing press. The Xerographic ("dry writing") technology (also referred to as electrophotography) could produce durable and economical copies of a paper document easily and quickly. Modern digital printers from Xerox and other companies such as HP, Canon and Ricoh, can produce more than 240 black and white or 170 copies of a page each minute, and work with up to 6 colors and dry and wet inks. This technology supports a $100 billion market in digital printing, particularly in domains where lithography has clear limitations.
Computers enabled information to be stored electronically in databases and electronic files on magnetic tapes, drums, and disks. This led to a radical disruption of all document production technologies. Initially most of this information was printed onto paper by teletypes (automated typewriters), but computer printers rapidly became faster and more sophisticated. Computers, by controlling lasers in xerography, micro-nozzles in inkjet systems, and tiny solenoids in mechanical systems, became capable of being serially embedded in the document production process. Computers are also critical to modern lithography.
Today, epaper is viewed as one potential future evolutionary physical form of the prototypical document.
Document life cycle management technology
Technology to manage documents has evolved in parallel with documents themselves. Of particular importances are practices concerning the preservation, archival, destruction and management of documents. These constitute what is known as the "document life cycle"
- Physical preservation: Documents in both traditional physical forms and in digital physical forms such as magnetic media must be physically preserved. This aspect of document management deals with such issues as the aging of paper (the innovation of acid-free paper is an advance in preservation) and obsolescence of magnetic media.
- Storage: This aspect includes management of scarce resources such as shelf space and disk space, and associated technologies such as optimal space utilization. Modern libraries such as the University of Nevada and the University of Michigan often use complex space-saving technologies such as robotic retrieval systems for stacks and moving bookshelves. In the digital realm, the entire discipline of compression technologies can be viewed as concerned with the storage of documents.
- Cultural Preservation: This function, traditionally ascribed to librarians involves the selection, arrangement and storage of documents in safe places. The importance of this part of document life cycle management can be seen in the impact of historical events such as the destruction of books in ancient China and the burning of the library at Alexandria. Today, library and information science has evolved into an important academic discipline.
- Bibliometrics: This aspect of document management involves functions of indexing, generating statistics and taxonomies, and improving the usability of large collections of documents. The modern history of this management technology dates back to Melvil Dewey and the Dewey Decimal System. Today, the science of bibliometrics is largely concerned with managing the impact of electronic technologies. This aspect must also deal with ISBN numbers, Library of Congress data and other standards.
- Digital Content Management: The explosion of digital content has resulted in technologies to manage large collections of digital information generated by organizations. Such systems must manage access control and privileges, multiple electronic format, interface with printing infrastructures and enable collaborative workflows around documents.
- Digital-Physical Interaction Management: As long as both paper and digital documents continue to have value, the modern management technologies to manage their interaction will continue. Key to this management is the management of large scale and systematic scanning of physical documents (such as the Google book scanning project).
- Destruction: With the increased cost of identity theft, corporate scandals and privacy concerns, the destruction of both paper and electronic documents has become increasingly important to manage. Technologies such as shredders play a role, as do verifiable processes of destruction of electronic documents to ensure compliance with privacy laws.
- Security: Shannon's information theory has led to an entire discipline that concerns itself with the security of documents, and associated technologies such as encryption, as well as more physical security features such as watermarks and making currency documents safe from counterfeiting.
- Transportation: The entire postal system, as well as modern courier systems, is largely built on the need to move documents physically from one location to the other.
The document economy
The economics of the production and management of documents indirectly impacts every economic sector. While the total economic value of the document economy is hard to estimate, the economic sectors with business models directly dependent on documents include:
- Document Authoring Technology: This sector supports a huge variety of digital and physical production technologies, ranging from Microsoft Word to LaTeX to advanced layout software.
- Education: The production and processing of documents is so critical that entire educational disciplines have evolved around writing, editing, layout and design of documents. The information sciences are also part of the document economy.
- Electronic Document Management: Managing documents within organizations and in public and personal contexts supports a huge industry in content management systems, ranging from free public infrastructure such as wikipedia to proprietary enterprise applications such as Docushare and Documentum.
- Physical Document Management: Large manufacturing sectors producing everything from 3-ring binders to filing cabinets and office desks exist largely due to the need to process documents.
- Media: The paper industry exists to support the document economy.
- Print equipment: From lithography and xerography to pencils and crayons, an extraordinarily diverse set of equipment industries depend on documents.
- Document Services: In large organizations, the life of documents in the work flows and processes of daily activity represent an enormous locus of value addition and cost reduction, which has led to a burgeoning industry in managed document services, ranging from specialized niches (such as payroll management by PayChex Inc.,) to managed office printing.
- Retail Production: From large chains such as Kinko's in the United States to small copy shops and offset print shops, documents support a large production sector for the end user.
- Publishing: All publishing, ranging from offset-based newspaper and magazine printing, to highly customized modern publishing using publish-on-demand digital print technology, is part of the document economy. The publishing industry includes major sub-areas such as the writer's market, small, medium and large publishing houses, small and large distributors and a vast network of independent and chain bookstores, online retailers, a large used-documents market and subscription-based markets.
- Document Transportation: The international postal system, as well as the commercial package transportation systems represented by companies such as DHL and UPS have economic models based largely on the demand for document transportation.
Future of documents
Since the advent of the digital era, documents have been evolving on a trajectory of radical evolution, requiring fundamental reconceptualization (Wesch, 2006). Efforts at reconceptualization date as far back as Vannevar Bush's initial conceptualization of hypertext (Bush, V., 1945) to modern treatments of hypertext. The impact of digital technology can be understood in terms of several key aspects:
- Blurring of the notion of document boundary: hypertext and Web content make it hard to determine what is being denoted by the term document. While the early days of the Web resulted in documents that mimicked their physical ancestors, Web content rapidly took on new characteristics. Reconceptualization of the notion of "boundary" is a key intellectual challenge (Sweet, 2002).
- Increasing structure and openness: The document is going from an opaque container of information to a much more open, structured document. XML is underlying most document formats today (OpenDocument or Office Open XML). In the future, it will become even more queriable, with the actual elements of this document being tagged — e.g. HR-XML.
- Dynamic nature: Web analogs of traditional paper documents like a newspaper column have taken on a dynamic character due to the impact of technology enabling the addition of comments from readers. The document will increasingly become "virtual", bringing up-to-date information from various sources in one container (a la "mash-up") - as such,it will be kept evergreen.
- Paper and electronic are reconciling: Paper has traditionally been a gap in document processing workflows. Technologies such as OCR, OMR, or 2D Barcodes are helping get its content back into the electronic world. In the future however, Not only will that transition be seamless, but it will also be possible to track it while in the "physical" world through RFID or MemorySpot.
- Hybrid automated/human authorship: authorship workflows for digital documents have evolved to include the computer in a key role. Dynamic Web pages may be viewed as the joint output of a human author (who produces a template) and a software system (that fills in the template). Sophisticated examples of this phenomenon can be found in recent evolutions in paper documents as well. Variable data technology, for instance, allows creators of direct mail marketing documents to vary the content of every piece in a print run using technologies such as XMPie.
- Prosumer workflows: Content repositories such as wikipedia radically alter traditional document production workflows by blurring roles such as author and editor.
- Customizability: Digital technology allows users to actively participate in the construction of documents they see, realizing the postmodern notion of construction of meaning in an unexpectedly literal way.
- Long Tail Economics: Technologies such as blogs have allowed document production economics to operate with such radically cheap cost structures that single individuals can derive an income from a global audience with low capital expenses. This has led to an explosion of niche content.
- Blurring of Documents and Interfaces: Technologies such as Ajax or Apollo blur the distinction between documents and user interfaces to "intelligent" technologies, leading to a whole class of smart documents that can go beyond the passive nature of traditional documents.
- Fluidity and Dynamic Microstructure: Distinct from the impact of hypertext on the notion of document is the fluid potential of modern documents at the microlevel, which allows an enormous variety of word and sentence level dynamic phenonomenology (Kelly, K., 2006).
See alsoLook up Document in
Wiktionary, the free dictionary.
- Sellen, A. J. and Harper, R. H. R., 2001, The Myth of the Paperless Office
- McLuhan, M., 1969, The Gutenberg Galaxy
- McLuhan, M., 1964, Understanding Media: The Extensions of Man
- Landow, G. P., 2006, Hypertext 3.0: Critical Theory and New Media in an Era of Globalization
- Bush, V., 1945, As We May Think, Atlantic Monthly, http://www.theatlantic.com/doc/194507/bush
- Kelly, K. 2006, Scan This Book!, New York Times Magazine, http://www.kk.org/writings/scan_this_book.php
- Owen, D., 2004, Copies in Seconds: How a Lone Inventor and an Unknown Company Created the Biggest Communication Breakthrough Since Gutenberg — Chester Carlson and the Birth of the Xerox Machine
- Searle, J. R., 1997, The Construction of Social Reality
- Anderson, B., 2006, Imagined Communities: Reflections on the Origin and Spread of Nationalism, New Edition
- Levy, D., 2003, Scrolling Forward: Making Sense of Documents in the Digital Age
- Gladwell, M., 2002, The Social Life of Paper, New Yorker Magazine, http://www.gladwell.com/2002/2002_03_25_a_paper.htm
- Lewis, D. K., 2002 Convention: A Philosophical Study (Revised edition)
- Pedauque, R. T., Document: Form, Sign and Medium, as Reformulated for Electronic Documents 
- Romano, F., 1989, Pocket Guide to Digital Prepress
- Sweet, J., 2003, Document Boundaries Master's Thesis, Rochester Institute of Technology
- Wesch, M., 2006, The Machine is Us/ing Us, video short documentary, http://www.youtube.com/watch?v=6gmP4nk0EOE
Link former page on this page
Related word on this page