|
|
|
 |
As
described above, the Internet Archive (IA) is a comprehensive
web-based digital library that continually crawls and
archives all public internet pages, which are then publicly
accessible through the Wayback Machine search service.
Some of the common principles revealed about digital archives
are also defining motivations for IA while others are
important underlying differences. The project team recognized
a need to identify, understand and satisfy the requirements
of the Internet Archive's growing user base. Improvements
in the archives usefulness and benefit to its users could
then be extracted to maximize the collective benefit.
A qualitative data-gathering process was implemented and
the results of this process resulted in a list of recommendations
that provide a platform for future work related to this
information service. We feel that addressing these concerns
will enhance the user's experience with the Wayback Machine.
back
to top
The research team for this project consisted of three graduate
students - Pallavi Aravind, Vanessa Arce, and Peter Roessler
- of the School of Information Management and Systems at
the University of California, Berkeley. This research was
carried out under the direction of Professor Peter Lyman,
whose focus is on the ethnographic study of communication
and social formation in digital and networked environments.
back
to top
Introduction to Digital Archives
The
role of archives has traditionally been for the preservation
of cultural heritage through artifacts (i.e. libraries (focused
on books and allied media), and museums (focused on carefully
selected textual documents, graphical objects like paintings,
and structures like sculpture)). A binding characteristic
of these artifacts is that they "move through life
cycles. They are created, edited, described and indexed,
disseminated, acquired, used, annotated, revised, re-created,
modified and retained for future use or destroyed by a complex,
interwoven community of creators and other owners, disseminators,
value-added services, and institutional and individual users."
Without some form of static repository, these and other
factors clearly limit what can be understood, at least from
an anthropological perspective, about the cultural context
of the archive in question. It is clear that society then
"has a vital interest in preserving materials that
document issues, concerns, ideas, discourse and events"
within such contexts.
A digital
archive, then, also preserves cultural and historical information-
artifacts in digital formats- and similarly unites "communit[ies]
of actors in their various information-based activities
[and] their common purpose. Today, for example, we rely
on digital archives to "track our genealogies, to understand
what science has discovered, to appreciate the stories people
told a hundred years ago, and to know how we educated our
children during the Depression." The value-add provided
by digital archives generally reflects its "individual
purpose,
tailored to the necessities of different
user groups." In addition to aggregating resources
for a specific purpose, digital archiving also provides
a way to alleviate the common problem of accessibility,
the locating of relevant items, in large collections. Digital
archives therefore serve the same humanistic functions as
traditional archives while their technical characteristics
provide a novel way for a user to access the information
contained within it.
In order
to corroborate these traits, we felt it was necessary to
conduct a broad survey of existing digital archives. The
survey primarily looked for commonalities reflecting the
motivation, the reasons, for aggregating their components.
We reasoned that the patterns may or may not be obvious,
but that careful thought on a representative sampling should
suffice for our purposes. We performed some simple searches
on popular Web search engines for our sample. At first glance,
every major digital archive we explored maintained, without
exception, some content specificity. Representative examples
were as follows:
·
NASA's digital image collection3
· The Digital National Security Archive4- the most
comprehensive digital collection of declassified primary
documents defining U.S. government policy
· USGenWeb Project5- offers transcriptions of public
domain records for genealogical research
· Swiziland Digital Archive6- focuses on the country's
historical photographs
· Japanese American Relocation Digital Archives
(JARDA)7 - a "thematic collection" documenting
the experience of Japanese Americans in World War II internment
camps
· UCSF Tobacco Control Archives8 - provides papers,
unpublished documents and electronic resources relevant
to tobacco control
· The Pandora Archive of the National Library of
Australia9
It seemed
each archive was created purposefully to support specific
tasks and in many cases provide topic-focused content to
their audience, "rubrics for coordinating a user's
group of common activities". They had specific users
in mind, who all had closely related possible usage scenarios
to satisfy their needs. This has been called an 'actor-network'
scenario, "linking people and things in the environment"10.
There was an implicit principle of historic preservation
illustrated in each example. All were deliberate collections
of specialized digital artifacts created to ensure their
availability.
back
to top
Introduction to Internet Archive
It seems
most appropriate to begin describing the motivation for
the Internet Archive, the Wayback Machine service, and ultimately
the subsequent research described here, by presenting the
statement offered up by the service itself.
"The
Internet Archive Wayback Machine is a service that allows
people to visit archived versions of stored Web sites. Visitors
to the Wayback Machine can type in a URL, select a date
range, and then begin surfing on an archived version of
the Web. Imagine surfing circa 1999 and looking at all the
Y2K hype, or revisiting an older copy of your favorite Web
site. The Internet Archive Wayback Machine can make all
of this possible."11
As straightforward
as this statement might seem, it precludes many curiosities
about what the Archive actually is. The Web, we know, is
the largest document ever written" and " ninety-five
percent of Web pages are publicly-accessible"12
This makes this Internet Archive absolutely unique in content
and scope. One might wonder how a comprehensive archive
such as this is possible or, more importantly, what the
motivations behind such an endeavor might be in the first
place, in order to better understand its utility. We learned
that this "Internet equivalent of the Library of Congress
has been capturing and archiving every public Web page since
1996" and that there are certainly two clearly stated
motivations. That of its use for documenting the provenance
of the Internet, as "a historical record of cyberspace
[and] as part of an innovative search tool that lets users
call up ``out-of-print'' Web pages." This is coupled
with grander plans to then "make [the Archive] part
of the infrastructure of the Internet.''13 Given this high-level
context and purpose, there naturally seems to be more to
the story of the Archive under the surface. A collection
and service of this magnitude couldn't possibly be summed
up within the text of a few sentences. These motivations
described here are certainly plausible ones, even noble.
Yet, one is left to wonder about what the Archive is actually
making possible. One knows now that they can go back and
look at an old version of a Web site, but one might wonder,
as we at the School of Information Management and Systems
did, if that is realistically the Wayback Machine's sole
use, or if there are also unexplored, undocumented, or unrealized
uses beyond what is touted.
The
sheer scope of this digital collection incites many new
questions. What exactly has been archived in the public
domain? Is everything that was ever out there really available
to view? What is the ultimate purpose for collecting all
of this information? Who is using this on a regular basis?
Why? What for? Before we could determine the Archive's relevance
and usefulness to anyone (or at least to better define it)
and focus the scope of our research pursuits, we felt it
was imperative to critique the Archive in terms of what
we learned about other digital collections. It seemed an
ambitious and misdirected task to assess any of the described
motivations without taking a look at similar digital archives.
Given
that archives serve as historical artifacts within the context
of a specific topic, we recognized that despite the unprecedented
scale in collection size, the Internet Archive had no specific
topic of focus to speak of. One could say that the Archive
uniquely attempts to capture all possible topics at once.
Should the focus for this unique archive be, then, simply
to continue to preserve valuable social and cultural artifacts,
to provide a variety of topic-specific content for academic,
research or other purposes like other digital archives,
or is it intended to be all-encompassing? There is also
the possibility of future benefits, not yet known, that
would result from its usefulness in conjunction with technological
innovation or some future social context. 14
Initial Evaluation by Research Team
We therefore
observed, to a large extent, examples of digital archives
that collectively encompassed many varied topics and that
were each content-centric. The implicit motivations behind
many of them were quite similar to what has been stated
about the Archive. Yet, all were also largely defined by
their specialized contents, a characteristic missing from
the Internet Archive. What we were seeing was a major diversion
from the common threads of most other digital archives.
Here is a vast collection mirroring the Internet itself
across time, something wholly unique. If any user of any
of the other archives would, by default, be engaged in some
specific interest or need for them to use it in the first
place, then what of the users of this 'Wayback Machine'?
Perhaps there are some useful general trends to extract
about its use that just aren't so obvious, as is otherwise
the case.
Perhaps
less obvious was how we might juxtapose the idea of a 'users
with a common purpose' from the survey with any user of
the Internet Archive. The literature makes the point that
concerning archiving, "the intellectual integrity of
[the artifacts of the archives] is maintained and [the]
individual [artifacts] are always contextualized."10
It was obvious that this assumed contextualization was uniquely
missing from the Internet Archive altogether as it encompassed
n number of possible categories for its terabytes of Web
pages and associated metadata. We then wondered what the
most common contexts might be for current users of the Archive
and the myriad ways their needs are possibly overlooked.
Due to the breadth of the collection itself, we thought
some sort of user study would have to be carried out to
define any 'users with a common purpose'. Work needed to
be done to collect the missing user information that is
normally obvious with respect to other digital collections.
We wanted
to map out a process for identifying these user communities
in order to look at the tool and whether or not it supports
those user communities and their usage patterns. Gilliland-Swetland
commented on this approach, stating that "it is important
to understand the societal roles of archives because it
is in the fulfillment of these roles that archivists provide
the necessary skills and knowledge to contribute to the
[current] paradigm."15 Might we identify and define
ways, beyond what is described here at the surface, in which
the Archive could be better utilized? Are there demands
for potential use that are not currently satisfied? Would
we be improving the current and future usefulness of such
a unique information service if we address its pitfalls
and potential alike with a user-centered approach?
We, therefore, made some initial decisions about the necessity
of qualitative data collection. This was grounded in our
recognition not only of the inherent qualities of other
existing digital archives but also of our resulting suspicion
that there must then exist some common threads worth documenting
among the total population of users. It seemed logical to
carry out some foundation research here for future related
projects and for larger-scale user population sampling and
data collection in order to support our findings.
back
to top
|
 |
| |
|
 |
 |
|
|
|
|