What Is Library Research?
In preparing to carry out library research in support of the MSEV Capstone Project, it is helpful first to classify research.
Primary research is a unique investigation carried out by a researcher. The investigation must conform to a valid research design. Examples of primary research include case studies, experiments, statistically valid surveys, and other types of investigations. At MSOE, certain types of primary research must have prior approval from the Institutional Review Board (IRB).
Secondary research consists of documents and sources that contain data, information, and results generated from primary research. Secondary research generally exists in the form of reports, studies, and company documents, in addition to books and journal articles. As such, library research often uncovers secondary research. Secondary research publishes or in some way makes available the results obtained in primary research investigations.
Tertiary research consists of documents and sources that summarize or report on information contained in secondary research. Tertiary research typically is found in many kinds of documents, including books, articles, and newspapers. Library research is the most effective way to locate and obtain tertiary research.
It is not the case that one type of research is necessarily superior to the other types. Rather, it is the case that all research must be critically evaluated. For details on evaluating research, see below. As a general rule of thumb, if asked about the use of a resource in a report, research paper or project, the MSEV student should be able to defend the resource on one or all of the following grounds:
- The resource
is objective, accurate, or reliable
- The resource contributes
to an understanding of the Capstone topic, or is
associated with a publication that enjoys the general reputation
of contributing to the practice and understanding of the Capstone topic
- The resource in some way is significantly relevant to the purposes of the Capstone Project.
Steps To Follow Before You Begin Library Research
Effective library research begins before an actual search is undertaken. Consider the following activities.
- Write up a list of concepts associated with your topic.
Think broadly. Work outward to the "big picture."
- Write a description of your topic with
professional jargon whose intended target is
knowledgeable professionals. Use this description to
generate a list of search terms.
- Write a description of your topic for a layperson with little
or no knowledge of your topic. Use the description to generate a
list of search terms.
- Use the Ulrich's Database to
identify any journals that may publish articles
on your topic or related to it (or use other
resources described elsewhere in this tutorial).
Find out where those journals
are INDEXED by also looking in the
Ulrich's Database.
Focus on those databases where the relevant journals are indexed.
- Identify any professional associations, organizations,
groups, etc. that may be related to your topic. Look them up
in the
Encyclopedia of
Associations. It is also not unusual to discover
associations and groups when you start searching the literature.
Try to verify if the group has a library and if it sponsors research.
- Establish a plan or a system for tracking bibliographic information. Be prepared to provide copies of all work cited in your project.
Tips For The Library Database Searching Process
Consider the following tips when using library databases to carry out library research:
- It is fundamentally important to realize that for
many reports, research papers, and projects, a student
should not rely exclusively on database searching
to locate useful information. For a brief paper or
project, a database search in an excellent business and
management database may be sufficient. However,
certainly for more ambitious projects -- including
thesis and capstone work -- the student should
not neglect print resources. Many good resources
are available only in print at MSOE. A database search is not
necessarily a comprehensive search; it may even be the wrong
type of search, depending on the research topic.
- It is additionally important to understand that
library databases and web search engines are two very
different things. Library databases are true
databases that often feature indexing and other features.
Search engines are essentially collections of web pages
that can be searched, sometimes with sophisticated
weighting algorithms. Library databases tend to
provide access to published literature (i.e.,
literature published in books and journals); search engines
provide access to web pages, which sometimes feature
published literature, but usually not. See below for
further details about using the Web for research.
- Database searching is a process. Work in
stages. Use the literature itself and your
results to modify searches, to suggest other search
terms, concepts, and ideas, and to suggest ways of
systematically adding and subtracting individual
search components. In other words, stop searching,
and look at what you've got. Look for clues. Use these
clues, other search terms, and additional ideas to
rework search statements.
- When doing library research in a database, try a
quick, "ideal" search first. If no results -- or
no relevant results -- are retrieved, you may need to
eliminate search terms. This broadens the search. You may
need to work through results and gradually achieve
focus. You may also have to use controlled vocabulary
(see below).
- Study all available help documentation for databases.
To search databases effectively, it is often necessary
to employ the native search synatx. Learn what to expect
from the database, and what it can and can't do. Does it
support boolean searching, exact match searching, and
other features? What journals are actually indexed?
Does full-text mean that all articles in a journal appear
in the database, or only selected articles?
- Keep a log of your search statements. It may seem
tedious, but it will save time in the long run. Moreover,
try variations of the search statements that appear in
your log.
- Block out the time for database searching. A wealth
of information is available. You can literally search
hundreds of databases. The sheer number available,
however, and the fact that they are all different means
that database searching is tedious. It is time-consuming.
Bibliographic software may help to smooth the process a
bit, but it cannot replace the cognitive activities--the
intellectual evaluation and critical assessment of
material--that take place in database searching. Only
you can do that. Accordingly, you need to plan and make
the time to do the searching.
- Be systematic and persistent. For large projects,
including thesis and capstone work, consider searching
in a large majority of databases available to you here
at MSOE. Eliminate obvious databases that are probably
not useful for your purposes,
but ultimately do not neglect at least evaluating the merits
of searching every database available to you. Take the time
to look in databases that may not seem to be good candidates
at first glance.
For short papers and projects, it may not be necessary to search all databases, but be sure that you search the relevant ones in the correct manner.
However, for both large projects and smaller projects, always let your results guide you.
In practice, it is often the case that useful information may be retrieved from a database that initially may not appear to be a likely candidate for supplying helpful information.
- Study the index fields and their designations in
your selected database. If you wish to limit your search
statement in a database, one effective way to do it is by
means of index fields.
- Employ Boolean search techniques in databases. If you
are not familiar with simple Boolean searching, study the
Help documentation.
- Use the Advanced Features of database services. If a service offers "Natural Language Searching," try it.
Important Library Research Considerations
Two concepts are particularly useful to understand when carrying out library research. These concepts include controlled vocabulary versus keyword searching, and use of the citation network to locate relevant literature.
Controlled Vocabulary versus Keyword Searching
This concept is particularly important in database searching.
The Growth in Information
As a result of many factors, including increased rates of literacy, the spread of education to the middle classes, the decline in state censorship, and the development of better printing technology, a great increase in reading matter occurred in the world and beginning at about 1650 (see "Publishing" in Encyclopedia Britannica, 2002, The New Encyclopedia Britannica (Chicago: Encyclopedia Britannica, Inc.), pp. 415-449). Book publishing in Britain, for example, produced roughly 100 new titles each year up to 1750, "rising to 600 by 1825, and to 6,000 by the end of the century" (see "Publishing" in New Encyclopedia Britannica, p. 425). Wang, citing a book by Ziman (J. Ziman, 1976, The Force of Knowledge: The Scientific Dimension of Society (Cambridge: Cambridge University Press)), reports that
|
from about 1750 to about 1950, the growth of professional
journals followed the mathematical relationship:
where Y is the number of journals and X is the year. Thus, by the year 2000, approximately 1 million journals ... have been issued (see Shirley Wang, "Publishing in Professional Journals," in Richard H. McCuen (ed.), 1996, The Elements of Academic Research (New York, New York: American Scoiety of Civil Engineers), p. 254). |
Worldwide, a total of 968,735 books were published in 1999; book publishing grows at a rate of 2% annually which means that over a million books are now published (see "Executive Summary" in Peter Lyman and Hal Varian, 2000, "How Much Information" [Internet, WWW], ADDRESS: http://www.sims.berkeley.edu/research/projects/how-much-info/summary.html [Accessed: 5 December 2002]).
According to R.S. Wurman, "a weekday edition of The New York Times contains more information than the average person was likely to come across in a lifetime in seventeenth-century England (see Richard Saul Wurman, 1989, Information Anxiety (New York: Doubleday), p. 32).
Currently, approximately 120,000 journals and magazines are published worldwide (see "Executive Summary" in Peter Lyman and Hal Varian, 2000, "How Much Information" [Internet, WWW], ADDRESS: http://www.sims.berkeley.edu/research/projects/how-much-info/summary.html [Accessed: 5 December 2002]).
Altogether, the "world produces between 1 and 2 exabytes of unique information per year, which is roughly 250 megabytes for every man, woman, and child on earth. An exabyte is a billion gigabytes, or 1018 bytes" (see "Executive Summary" in Peter Lyman and Hal Varian, 2000, "How Much Information" [Internet, WWW], ADDRESS: http://www.sims.berkeley.edu/research/projects/how-much-info/summary.html [Accessed: 5 December 2002]).
Over time, as the amount of information grew in the world, methods were developed -- and continue to be refined and developed -- to enable people to learn what information was -- and is -- available. This is crucial for information retrieval.
Early methods included simple inventories of books.
Subject Indexing
Eventually, it was recognized that alphabetized and systematic subject lists would also be useful. A person using a subject list could search for information in terms of a subject area. The list -- consisting of words called "subject terms," or "subject headings," or simply "subjects" -- was employed to organize, label and categorize information -- and thus facilitate the retrieval of information.
"Subject indexing" is the process of assigning "subject terms" or "subject headings" to documents (such as books, articles, etc.). Subject indexing is carried out by an indexer: "terms assigned by an indexer serve as access points through which (an) ... item can be located and retrieved in a subject search in a published index or ... database (see F.W. Lancaster, 1991, Indexing and Abstracting in Theory and Practice (Champaign, IL: University of Illinois), p. 5).
Subject indexing entails "conceptual analysis," or the process of "deciding what a document is about -- that is, what it covers" (see F.W. Lancaster, 1991, Indexing and Abstracting in Theory and Practice (Champaign, IL: University of Illinois), p. 8).
A multitude of subject lists currently exist. A variety of organizations produce subject lists, including the producers of print indexes that index journal articles and the producers of databases that index documents. In the case of database producers, these lists are often referred to as "thesauri." One of the earliest subject heading lists was the Subject Headings Used in the Dictionary Catalogues of the Library of Congress, which first appeared in 1914 (see Lois Mai Chan, 1995, Library of Congress Subject Headings: Principles and Application, 3rd ed. (Englewood, CO: Libraries Unlimited, Inc.), p. 5). Library of Congress Subject Headings (LCSHs) are assigned to books by Library of Congress subject indexers.
Controlled Vocabulary
It may be helpful to conceive of a subject list as a standard. The list is essentially an authorized list of words that can be assigned to documents in an effort to decribe what the document is about. As such, the words in the subject list -- that is, the subject headings -- are often described as controlled vocabulary.
Controlled vocabulary
|
is more than a mere list. It will generally incorporate
some form of semantic structure. In particular, this structure
is designed to:
|
An important advantage of controlled vocabulary is that it facilitates the grouping of information in one place.
For example, a database producer may index all documents on "computer chips" under the subject heading of "integrated circuits." A searcher simply has to perform a subject search on "integrated circuits" to find all relevant documents. If the database producer has done a good job of indexing and assigning subject headings, all documents retrieved should be concerned with the subject in a significant manner. In other words, searching with controlled vocabulary can be efficient, effective, and precise.
To see this in action, consider a subject search in the MSOE Library Catalog on the term "computer chips" which produces the following result:
| SUBJECT | TITLES |
| Computer chips | 0 |
|
|
69 |
A second advantage of controlled vocabulary is that it helps to suggest relationships and other search terms.
A subject search on the term integrated circuits produces additional search terms:
| SUBJECT | TITLES |
| Integrated ciruits | 69 |
|
|
124 |
|
|
3 |
|
|
24 |
|
|
1 |
|
|
1 |
|
|
1 |
|
|
1 |
|
|
1 |
|
|
8 |
A final advantage of of controlled vocabulary searching is that it eliminates confusion between the same words with different meanings.
Consider the following subject search on "aids" in the MSOE Library catalog:
| SUBJECT | TITLES |
| Aids | 0 |
|
|
7 |
|
|
6 |
| Aids, Audio-Visual | 0 |
|
|
12 |
| Aids, First | 0 |
|
|
2 |
| Aids to navigation | 0 |
|
|
2 |
Virtually each database service, and certainly each indexing service, features its own controlled vocabulary list.
How do you know which words are controlled vocabulary words in a particular database service? You must consult a subject list or thesaurus for the database, or you can carry out a keyword search and then check the results for controlled vocabulary. Many database thesauri are now available online.
To see controlled vocabulary assigned to documents in a database, carefully check the database record for the document, and look for field designations such as descriptors or subjects.
For example, following is the database record of a journal article. The database record is from the EI Compendex engineering database. In the record, the controlled vocabulary is labeled Controlled Terms.
| "Equipment replacement decisions and lean manufacturing" Sullivan, William G. (Grado Dept. of Indust./Systems Eng., Virginia Polytechnic Institute, Virginia State University, Blacksburg, VA, United States); McDonald, Thomas N.; Van Aken, Eileen M. Source: Robotics and Computer-Integrated Manufacturing , v 18, n 3-4, June/August , 2002, 11th International Conference on Flexible Manufacturing, Dublin, p 255-265. Publisher: Elsevier Science Ltd, 2002 ISSN: 0736-5845 CODEN: RCIMEB In English |
| Controlled Terms: |
| Computer integrated manufacturing | Cellular manufacturing | Product design | Costs | Investments | Decision theory | Problem solving |
Keyword Searching
A typical database record in a database features fields. We have already learned that one such field is the controlled vocabulary field. Other fields usually include:
- author
- title
- uncontrolled vocabulary
- classification codes
and other fields.
A keyword search in a database searches all of these fields; it may also search the full text of documents if they are available in the database. As a result, keyword searching is not always precise. For example, the following basic search in the ABI/Inform business and management database --
crisis management
-- retrieves a large number of full-text articles, including:
The article is a discussion of the foreign policy of the European Union. In passing, it refers to the "Rapid Reaction Force," "a military arm of the EU," which is in place for "peacekeeping and crisis management."
For a student writing a report on in crisis management and the manager's responsibility, this document is clearly not relevant. Instead, further searching reveals that the correct controlled vocabulary term in the database is management of crises. Searching on this controlled term retrieves only relevant articles.
Today, many database services are becoming "smarter." For example, some databases respond to the search above on crisis management by automatically searching for the two words as a phrase. Other databases do not. In other words, records can be returned in which "crisis" may appear in one field and "management" may appear in the full text.
One way to force searching as a phrase in this case is to employ an exact match search. An exact match search features quotation marks around a phrase that is intended to be searched as a phrase:
"crisis management"
Effective keyword searching most likely will entail the use of Boolean searching. Consult The Library of Congress tutorial on Boolean Searching for details. Keyword Boolean searching combined with field designations (other than controlled vocabulary) in databases can be very powerful. For example, the following search in the ABI/Inform business and management database retrieves articles that feature the exact match phrase of "knowledge management" in the title of the document:
Keyword searching can be extremely useful for the following reasons:
- To search the full text of documents and to pull
relevant content from within documents
- To search for new terms and concepts which
have not been assigned controlled vocabulary
- To search for specific topics, including names and organizations.
Good library research should employ both controlled vocabulary and keyword searching.
The Citation Network
The essential idea of the citation network is simple:
| Virtually all scholarly documents (journal and magazine articles, books and chapters in books, letters to the editor, reports, etc.) include references to related work, in the form of footnotes or lists of references. These references, or citations, are usually bibliographic descriptions of other documents in which related material may be found ... . These lists of citations are of obvious value in referring the reader to related, generally older work: following up references is perhaps the most widely used technique of information retrieval (see David Bawden, "Citation Indexing" in C.J. Armstrong and J.A. Large (eds.), 1987, Manual of Online Search Strategies (Boston, Massachusetts: G.K. Hall and Co.), pp. 44-45). |
Using the citation network entails precisely the "following up (of) references." In today's networked environment, it also entails (a) seeing what else a cited author has written and (b) determining if an article is cited as a reference in another document. The idea of seeing where else an article is cited is based on the notion that any document citing an article may be concerned with the same topic. In this manner, a bibliography can be quickly constructed.
Specialized citation indexing databases exist to see how many times and where a specific document has been cited. These databases are not available for general use at MSOE.
However, many databases available to MSE students from the library's website allow students -- to a limited degree -- to see quickly and easily what else an author has published and where else a specific document might be cited.
In most databases today, the names of authors are hot-linked; it is therefore quick and easy to search for additional documents by the same author.
Evaluating Research Results
In preparing to carry out library research in support of the MSEV Capstone Project, the student must approach the task with a critical eye. Whereas at the undergraduate level it may have been sufficient to complete a report or project by simply locating and using a stipulated number of resources, at the graduate level, students must at all times go beyond simple quantity measures and seek out quality information resources. In other words, the graduate student must at all times critically evaluate the results obtained in all library research.Not all information obtained in library research is reliable or valid:
| The regular user of ... information often develops a healthy skepticism about information provided by others. There are many ways that data may be misleading if they are not evaluated carefully (David W. Stewart and Michael A. Kamins, 1993, Secondary Research: Information Sources and Methods 2nd ed. (Newbury Park: Sage Publications), p. 17). |
In their excellent chapter entitled "Evaluating Secondary Sources," Stewart and Kamins, in their book Secondary Research, illustrate the need to "question information collected and reported by others" (p. 18) by considering the case of Tambrands vs. the Warner-Lambert Comapny (see David W. Stewart and Michael A. Kamins, 1993, Secondary Research: Information Sources and Methods 2nd ed. (Newbury Park: Sage Publications), pp. 17-32 for Chapter 2; see pp. 17-18 for the Tambrands summary).
Warner-Lambert claimed that their home-pregnancy test EPT Plus provided results in "as soon as 10 minutes" (Stewart and Kamins, p. 17). Warner-Lambert claimed that these results were based on a research study. Questioning the validity of the claim, Tambrands (a competitor) took Warner-Lambert to court.
In court, Warner-Lambert revealed that the research study employed to substantiate the claim actually involved testing of only 19 pregnant women (Stewart and Kamins, p. 18). A total of 10 (52.6%) of these women obtained results in 10 minutes, and on this basis, Warner-Lambert made their advertising claim (Stewart and Kamins, p. 18).
In fact, the 19 women who were tested "were actually enrolled at a Cincinnati fertility clinic" (Stewart and Kamins, p. 18): more importantly, the 52.6% of women who obtained results in 10 minutes was not statistically significant. In this case, a statistically significant sampling would have entailed the testing of "approximately 1,400 women." Warner-Lambert's claim was meaningless (Stewart and Kamins, p. 18).
It follows that part of the preparation for library research entails clarifying questions that should be asked in order to evaluate results to determine their reliability and validity. In evaluating information that is retrieved in library research, the student should begin by seeking answers to the following questions:
How accurate is the information?
- Does the information appear to be reliable and error-free?
- Is there an indication that someone (an editor, a review board) checks and verifies the information?
- For information that reports primary research: Who was responsible for collecting data? What data were actually collected? How were the data collected? How were the data measured? Were the methods employed to collect data standard for the research area?
How authoritative is the information?
- Who is the author? For websites, try searching a registry of websites, such as InterNIC's Whois. A registry can tell you who has registered a website.
- Is the author an expert? Is the author qualified to write on the topic?
- For information that reports primary research: Did a particular group or organization sponsor the research? Did the group or organization have a purpose in mind for carrying out the primary research?
Is the information objective?
- Is the information presented in a manner that seeks to minimize bias?
- Is the information an example of "advocacy research"? That is, is its purpose from the outset to provide support for a particular conclusion? Or, does it begin with a "clean slate" and no preconceived notions about conclusions?
- For information that reports primary research: What are the findings? For other types of research, what conclusions are presented?
- Do the authors place limitations on their topic? Are their limitations on findings and conclusions?
- For information that purports to be objective, is it supported by scholarly apparatus, such as footnotes, references, a bibliography?
- For information that purports to be objective, is the tone strident? Does it exhort or attempt to persuade in a reasonable manner?
How current is the information?
- Is the information dated?
- For information that reports the results of primary research: When were the data collected?
Does the information offer adequate coverage of the topic?
- Is the information consistent with information available from other sources?
- Is a literature review included? If so, is it comprehensive?
- For information that reports primary research: what suggestions are made for future studies?
The evaluation of information and research can entail a great deal of work. In some cases, an in-depth evaluation of resources may not be necessary, but in all cases the MSEV student should approach the research process with both an understanding of, and a willingness to perform, the critical evaluation of resources.
Using the Web: Why Not Just Do A Google Search To Locate Research for My MSEV Capstone Project?
Before plunging into library research, a fair question to ask is: Isn't a Google search enough to locate resources for the MSEV Capstone Project? (Or, one can employ any other publicly available search engine.)It is possible to answer this question in a number of ways.
Consider first the following compilations of references published in three issues of the professional and scholarly literature in outstanding MSEV-related journals. The journals include the Water Environment Research, Journal of the Air & Waste Management Association, and Waste Management Research.
2005 Issues of Professional and Academic MSEV-Related Journals
| |
|
|
|
|
| Water Environment Research [July/August 2005] | |
|
|
|
| Journal of the Air & Waste Management Association [August 2005] | |
|
|
|
| Waste Management Association [August 2005] | |
|
|
|
|
|
|
|
|
|
2006 Issues of Professional and Academic MSEV-Related Journals
| |
|
|
|
|
| Water Environment Research [July/August 2006] | |
|
|
|
| Journal of the Air & Waste Management Association [August 2006] | |
|
|
|
| Waste Management Association [August 2006] | |
|
|
|
|
|
|
|
|
|
2007 Issues of Professional and Academic MSEV-Related Journals
| |
|
|
|
|
| Water Environment Research [November 2007] | |
|
|
|
| Journal of the Air & Waste Management Association [November 2007] | |
|
|
|
| Waste Management Association [April 2007] | |
|
|
|
|
|
|
|
|
|
A total of 120 articles were published in the three years of issues. The articles feature 3,197 citations. Only 110 (3.4%) of the references are to web sites.
Taking a closer look at the references in these articles, one observes that most of the citations refer to books, journal articles, conference articles, articles from proceedings, Ph.D dissertations, master's theses, working papers, standards, personal communications with other scholars, technical reports, government reports (mainly from the EPA), research reports, laws and regulations, and statistical reports.
In fact, a detailed study of the references in 16 articles in the August 2006 issue of the Journal of the Air & Waste Management Association reveals the following citation pattern.
| Capstone report | Total references | (a) Number of cited technical or research reports | (b) Number of cited books / dissertations / theses | (c) Number of cited journal / conference / proceedings articles | (d) Number of cited standards / legal references | Percent of references that cite (a), (b), (c), or (d) |
| 1 | 51 | 1 | 6 | 43 | 1 | 100% |
| 2 | 61 | 0 | 14 | 46 | 1 | 100% |
| 3 | 31 | 0 | 0 | 31 | 0 | 100% |
| 4 | 16 | 0 | 4 | 11 | 1 | 94% |
| 5 | 45 | 6 | 4 | 33 | 0 | 96% |
| 6 | 14 | 3 | 0 | 11 | 0 | 100% |
| 7 | 122 | 24 | 14 | 80 | 4 | 100% |
| 8 | 23 | 2 | 5 | 15 | 1 | 100% |
| 9 | 15 | 0 | 1 | 14 | 0 | 100% |
| 10 | 9 | 1 | 1 | 7 | 0 | 100% |
| 11 | 26 | 2 | 6 | 18 | 0 | 100% |
| 12 | 19 | 4 | 0 | 14 | 0 | 95% |
| 13 | 10 | 1 | 1 | 4 | 4 | 100% |
| 14 | 38 | 4 | 0 | 34 | 0 | 100% |
| 15 | 38 | 6 | 3 | 29 | 0 | 100% |
| 16 | 40 | 17 | 1 | 15 | 6 | 98% |
| TOTALS | 558 (Average number of references per article = 35) | 71 (13%) | 60 (11%) | 405 (73%) | 19 (3%) | 555 (99%) |
Examining seven randomly selected MSEV Capstone Project Reports, we can perform a similar analysis.
First, we can examine the usage of websites in the reports.
| |
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
Next, we can more closely evaluate the types of references cited.
| Capstone report | Total references | (a) Number of cited technical or research reports | (b) Number of cited books / dissertations / theses | (c) Number of cited journal / conference / proceedings articles | (d) Number of cited standards / legal references | Percent of references that cite (a), (b), (c), or (d) |
| 1 | 52 | 3 | 2 | 13 | 1 | 37% |
| 2 | 28 | 2 | 1 | 6 | 1 | 36% |
| 3 | 22 | - | 4 | 4 | - | 36% |
| 4 | 21 | - | - | - | - | 0% |
| 5 | 25 | 6 | 1 | 5 | - | 48% |
| 6 | 40 | 3 | 4 | 13 | 8 | 70% |
| 7 | 26 | - | 11 | - | 1 | 46% |
| TOTALS | 214 (Average number of references per article = 30) | 14 (7%) | 23 (11%) | 41 (19%) | 11 (5%) | 89 (42%) |
It is important to note that we are employing citation analysis simply to describe some characteristics of two different types of documents (professional journal articles and graduate capstone reports). Comparisons are not justified because the purposes of the documents are different. But this analysis does enable us to discuss some attributes of professional literature in the Environmental Engineering field, and to consider possible lessons about the features that should characterize what is often the first serious and sustained professional writing for the MSEV student -- the MSEV Capstone Design Report.
This analysis is not an attempt to persuade you not to use the publicly available web for your work in the MSEV program.
Rather, one purpose is to point out that the professional and scholarly literature makes judicious use of the web. The web is not avoided in research, but a careful study of web sources cited reveals that scholars in MSEV-related disciplines use reliable and valid web sites. As graduate students, you should practice a similar discretion in the use of web resources for the MSEV Capstone Project.
For example, one of the web citations in the professional literature is to an Environmental Protection Agency (EPA) document entitled, "Clearing The Air: The Facts About Capping and Trading Emissions." Another citation is for the "Central California Air Quality Studies" by the Air Resources Board of the California Environmental Protection Agency.
For many students, good reasons exist, of course, for using the publicly available web as a means of locating resources for reports, research papers and other projects.
These reasons include the following:
- It's fast and easy and it permits access from any location;
- You can usually find something helpful;
- Many times, there are things easily available on the web that are not easily available elsewhere (for example, data sheets, product specifications, prices, and patents);
- There are excellent resources on the web, including some scholarly publications. The United States Government also publishes a great deal of material on the Web. There are also many trade journals and magazines available freely on the Web (advertising pays the bill for this);
- Information is delivered electronically so it can be reused and manipulated easily;
- It minimizes dependence on libraries (see John Lubans, Jr., Fall 2002, "Act or React? Leadership and the Internet," Library Administration & Management Vol. 16(4), pp. 208-210).
However, the reports completed in the MSEV program should rely on creditable resources that are well documented. These resources may be obtained from the publicly available web, but in many cases, such resources can only be located through library research. A good research approach is to employ both the publicly available web and library research, in addition to other resources as required by a class. Under no circumstances should a MSEV student limit library research activities exclusively to publicly available web resources.
The web can be very useful in research, but it is necessary to understand how to use it; it's also necessary to understand what is on the publicly available web and -- more importantly -- what isn't on the publicly available web. In order to do this, you need to understand a few basic things about the web.
The "Internet Fallacy"
The "Internet Fallacy" is essentially the false belief that "everything's on the web". The reality is much different:
- FACT: Everything is not on the web.
- FACT: Information not available for free on
the web includes
virtually all copyright-protected documents
(e.g., journal articles, conference literature,
most books).
- FACT: Information not on the web includes the majority of everything ever published. One significant reason for this state-of-affairs is simply the fact that large-scale print-to-electronic conversion is labor-intensive and expensive.
Things To Be Aware of When Using the Web For Research
Copyright Law
- Copyright law protects the intellectual property and
products of information producers, such as book publishers,
magazine and journal publishers, and others. It is hardly
in the interest of information producers to give this
information away for free on the web when they are in the
business to make a profit from it. Generally speaking,
then, they don't give it away. On the web, you occasionally
find protected material, but in reality, most protected
material is not on the web. Moreover, spurred by fears
of how easy it is to disseminate materials over the web,
many publishers have successfully pushed for more strict
amendments to the copyright law.
- To realize how important copyright law is in terms of the availability of materials on the web, it may be useful to think of an analogous situation: Napster.
Web Noise
- There is no doubt that the web contains a tremendous
amount of information. Lower bound estimates suggest the
web features more than 320 million indexable pages containing
over 15 billion words (and this does not include non-indexable
sites).
- Some of these sites are very good and very useful. There
are many sites which feature scholarly documents, research
results, government publications, etc.
- Many academics -- in the interest of scholarly communication
and cooperation -- often publish documents on the web (assuming
that they own the copyright for those documents).
- A lot more sites on the web, however, are devoted to
advertising and personal interests. One researcher, for
example, examined 1,160 different cited web pages which were
retrieved in answer to 60 questions. Of these pages, 33%
were either dead links or duplicate pages; only 14% provided
complete and correct answers; 10% provided correct -- but
incomplete -- answers; 8% gave incorrect information; and
an astonishing 56% provided no information to answer
the question at all (James H. Sweetland, Spring 2000,
"Reviewing the World Wide Web -- Theory Versus Reality"
Library Trends Vol. 48(4), pp. 748-768).
- The recommendations and details covered above
concerning the critical evaluation of resources
are equally important when the information is made
available on a web site. When evaluating
information on the web, it is important to ask: "Who created
this web page?", "Who placed this information on the web?",
"Is the information biased?" In other words, it is important
to critically assess information found on the web.
- Misinformation is a serious problem on the web, as
evidenced by a growing body of investigative literature
(see Heinke Kunst et al., 9 March 2002, "Accuracy of Information
on Apparently Credible Websites: Survey of Five Common Health
Topics," British Medical Journal Vol. 324(7337), pp. 581+;
Carol Ebbinghouse, October 2000, "Medical and Legal Misinformation
on the Internet," Searcher Vol. 8(9), pp. 18+; Paul S.
Piper, September 2000, "Better Read That Again: Web Hoaxes and
Misinformation," Searcher Vol. 8(8), pp. 40+).
- A significant problem associated with "web noise" is the
lack of controlled vocabulary (see above). New methods
are being developed to provide web pages with controlled indexing
(such as XML -- Extensible Markup Language), but for all
practical purposes, the vast majority of web pages do not
feature controlled indexing; they are keyword-searchable only,
as a result, web search engine searching is not generally
precise and efficient for in-depth searching.
- Yet another problem associated with "web noise" is the issue of dead links in search engines. Before a search engine's index can be refreshed, it is possible that the web pages it refers to have disappeared. A web page's "half-life" is estimated to be less than two years [see Jaroslav Pokorny, July/August 2004, "Web Searching and Information Retrieval," IEEE Computing in Science & Engineering, pp. 43-48.]
Inadequate and Biased Information Retrieval on the Web
- No single web search engine covers more than 34% of the
web, with an average coverage of only 18%. By combining
engines, coverage can be increased, but does not exceed
50% [Steve Lawrence and C. Lee Giles, 3 April 1998, "Searching
the World Wide Web" Science Vol. 280(5360), pp.
98-100; Steve Lawrence and C. Lee Giles, January 1999,
"Searching the Web: General and Scientific Information
Access" IEEE Communications Vol. 37(1), pp. 116-122;
Kurt D. Bollacker, Steve Lawrence and C. Lee Giles,
March/April 2000, "Discovering Relevant Scientific
Literature on the Web" IEEE Intelligent Systems &
Their Applications Vol. 15(2), pp. 42-47].
- Search engines do not index some content on the web,
including web-accessed databases. In fact,
according to the OCLC Online
Computer Library Center's Office of Research, only
35% of web sites are publicly available and searchable via
search engines; the other sites are private, provisional,
or not accessible to search engines.
- Search engines provide access to web sites by means of
programs called "spiders" and "crawlers," which index the
static web pages on the
web.
These search engines employ traditional information retrieval (IR) algorithms and techniques to rank the importance of pages that are retrieved. For example, an algorithm may determine the importance of a page by calculating "the ranks of all the pages pointing to it, with each rank divided by the number of out-links those pages have" [see Jaroslav Pokorny, July/August 2004, "Web Searching and Information Retrieval" IEEE Computing in Science & Engineering, p. 45]. Current search engine algorithms -- such as the PageRank algorithm and Kleinberg's algorithm -- are actually based on old ideas that were developed to deal with "smaller, more coherent collections than what the Web has become" [see Pokorny, p. 44].
Sites that cannot be indexed by today's search engines include: HTML pages that require user input; dynamically generated web pages; web pages containing nonindexable elements (such as Flash and various image files); sites offering dynamic, real-time data; and sites with .pdf files or web-accessed database (see Chris Sherman and Gary Price, 2001, The Invisible Web: Uncovering Information Sources Search Engines Can't See (Medford, New Jersey: Information Today, Inc.), pp. 64-68). As a result of this state of affairs, a so-called "Invisible Web" exists that cannot be searched by search engines. The "Invisible Web" is also referred to as the "Deep Web," or the "hidden web." To reach at least some (but not all) of these "invisible" sites, the diligent researcher must use specialized search engines and databases. Fortunately, some of these specialized search engines and databases are publicly available. A good starting point includes:
- Efforts are being made to improve search engine
technology. For example, intelligent search engines are being
developed which deploy web semantics and machine learning. Web
semantics is a concept that includes techniques to tag web pages
with mark-up that "captures the meaning of the content." Other
efforts are underway to develop search engines that "identify,
retrieve, and classify Deep Web content" [see Pokorny,
July/August 2004, "Web Searching and Information Retrieval" and
Thanaa M. Ghanem and Walid G. Aref, January 2004, "Databases
Deepen the Web" IEEE Computer, pp. 116-117]. However,
much of this work is proprietary and commercial, and will likely
result in searching costs for users. Current commercial
products include BrightPlanet
, Quigo Technologies'
Intellisonar, and Deep
Web Technologies' Explorit.
- Academic libraries and other types of libraries are working
with search engines such as Google and Yahoo to make valuable
research materials available in search engine searches. In
particular, libraries are trying to make available digital
archives and digital libraries that they produce. An international
library organization -- the Online Computer Library Center (OCLC) -- produces
a number of unique and powerful databases, including the world's
largest bibliographic database, which is a collection of more than
12,000 library catalogs from around the world. OCLC is working with
both Google and Yahoo to make a limited number of records from this database available
in search engine searches.
Database vendors are also working to "Googlize" their content -- or rather, to make their content accessible via Google to their paying customers. However, access is still restricted only to paying customers.
Librarians, therefore, currently are making strong efforts to work with search engine companies to make materials available in search engine searches, but some materials will most likely never be available in a digital format. Most books published before 1995 fall into this category, as well as many older journal articles, newspaper articles, historical maps, archives, letters, diaries, older census statistics, and genealogical materials.
- A growing concern is also the bias of web search engines.
Researchers have confirmed that "many leading search engines
give prominence to popular, wealthy, and powerful sites at
the expense of others" and that "the rich and powerful clearly
can influence search engine tendencies; their dollars can,
and in some ways already do, play a decisive role in what sites
a given search retrieves" (for an excellent article on this topic,
see Lucas Introna and Helen Nissenbaum, January 2000, "Defining
the Web: The Politics of Search Engines," Computer
(IEEE) Vol. 33(1), pp. 54-62).
- Following the terrorist attacks on September 11, 2001, another issue to consider with respect to government-sponsored information and research is the fact that some of it "is disappearing from government web sites, much of it in the name of national security." For an article about this topic, see Marylaine Block, 6 December 2002, "Disappearing Data," Ex Libris, Issue Number 161, [Internet, WWW], ADDRESS: http://marylaine.com/exlibris/xlib161.html [Accessed: 20 December 2002].
Final Thoughts on the Web
This discussion of things to be aware of when using the web for research is not intended to stop you from using the web. You shouldn't hesitate at all to try carrying out research on the web for your work in the MSEV program. To save time, however, and to insure that you use credible resources, use the web in conjunction with a good guide to credible web resources, and keep in mind the problems associated with web research.
In the future, it will be even more imperative to understand the
strengths and weaknesses of web search engines. It is likely that
two types of search engines will be available. First-generation
search engines will continue to be refined with "intelligent" features
that allow them to infer simple queries. These search engines
will index only the "surface web" (and not the Deep Web), although
no effort will be made to index the entire surface web. Ranking of results of
searches in these search engines increasingly will be based on
payments and popularity. Full-text, specialized search engines, on the
other hand, will feature powerful search tools (such as complete
Boolean searching). These search engines will specialize in making
available information in specific subject areas. They will be capable
of searching the Deep Web, and they will likely be commercial
endeavors.
