button button button button button button button button button button button button
Side Image
Page Banner

Why Use The Library?

A Professional Senior Design Project

 

The engineering carried out by MSOE EE students in Senior Design Projects is not surpassed by the students of any other university or college.


However, consider the following list of references from a previous Senior Design Project. Exactly 100% of the references listed are web sites.


(An additional problem with the above list of references is that the format of the references is not helpful to the reader. When citing references in a bibliography, use the format requirements in a good Style Guide. For example, the Milwaukee School of Engineering publishes two Style Guides -- one for the Rader School of Business and one for several graduate program degrees.)


Now consider the following compilation of the list of references published in the three most recent issues of the professional electrical engineering journals IEEE Transactions on Speech and Audio Processing, IEEE Transactions on Consumer Electronics, and IEEE Transactions on Communications:

 

2004 Issues

Journal Number of Articles Total References References that Cite Web Sites
IEEE Transactions on Consumer Electronics 61 809 56
IEEE Transactions on Communications 25 488 2
IEEE Transactions on Speech and Audio Processing 10 277 0
Totals 96 1,574 58

 

A total of 96 articles were published in the three issues. The articles feature 1,574 references. Only 58 (3.7%) of these references are to web sites. Most of the citations refer to books, journal articles, Ph.D dissertations, and various technical documents. The number of web citations in these three publications has grown slightly in the past few years -- click here to see previous citation checks.


However, as in previous years, the web citations in this year's journals make judicious use of the web. The web is not avoided in research, but a careful study of the citations reveals that professionals in EE fields tend to cite reliable and valid web sites.


For example, some of the web citations in the literature include data sheets, company websites, and technical papers:

The citation review is not an attempt to persuade you not to use the web for your Senior Design research. For one thing, the purpose of a Senior Design Project differs from the purpose of a research article in an IEEE publication.


There are also good reasons for using the web, including:

  • It's fast and easy and it permits access from any location;
  • You can usually find something helpful;
  • Many times, there are things easily available on the web that are not easily available elsewhere (for example, data sheets, product specifications, prices, and patents);
  • There are excellent resources on the web, including some scholarly publications. The United States Government also publishes a great deal of material on the Web. There are also many trade journals and magazines available freely on the Web (advertising pays the bill for this);
  • Information is delivered electronically so it can be reused and manipulated easily;
  • It minimizes dependence on libraries (see John Lubans, Jr., Fall 2002, "Act or React? Leadership and the Internet," Library Administration & Management Vol. 16(4), pp. 208-210).

The purpose of this little investigation into references is not to discourage you from using the web but rather to suggest to you that web research should be supplemented with library research during the Senior Design Project process and that discretion should be exercised in the use of websites. The combination of web and library research should help to make possible the creation of a professional, well-documented project report.


In addition to the examples listed above on how library research might contribute to your Senior Design Project, consider the following reasons for not limiting your research exclusively to the web.


The Web clearly can be very useful in research, but it is necessary to understand how to use it; it's also necessary to understand what is on the Web and -- more importantly -- what isn't on the Web. In order to do this, you need to understand a few basic things about the Web.


The "Internet Fallacy"

The "Internet Fallacy" is essentially the false belief that "everything's on the web". The reality is much different:

  • FACT: Everything is not on the web.
  • FACT: Information not available for free on the web includes virtually all copyright-protected documents (e.g., journal articles, conference literature, most books).
  • FACT: Information not on the web includes the majority of everything ever published. One significant reason for this state-of-affairs is simply the fact that large-scale print-to-electronic conversion is labor-intensive and expensive.

Things To Be Aware of When Using the Web For Research

Copyright Law

  • Copyright law protects the intellectual property and products of information producers, such as book publishers, magazine and journal publishers, and others. It is hardly in the interest of information producers to give this information away for free on the web when they are in the business to make a profit from it. Generally speaking, then, they don't give it away. On the web, you occasionally find protected material, but in reality, most protected material is not on the web. Moreover, spurred by fears of how easy it is to disseminate materials over the web, many publishers have successfully pushed for more strict amendments to the copyright law.
  • To realize how important copyright law is in terms of the availability of materials on the web, it may be useful to think of an analogous situation: Napster.


Web Noise

  • There is no doubt that the web contains a tremendous amount of information. Lower bound estimates suggest the web features more than 320 million indexable pages containing over 15 billion words (and this does not include non-indexable sites).
  • Some of these sites are very good and very useful. There are many sites which feature scholarly documents, research results, government publications, etc.
  • Many academics -- in the interest of scholarly communication and cooperation -- often publish documents on the web (assuming that they own the copyright for those documents).
  • A lot more sites on the web, however, are devoted to advertising and personal interests. One researcher, for example, examined 1,160 different cited web pages which were retrieved in answer to 60 questions. Of these pages, 33% were either dead links or duplicate pages; only 14% provided complete and correct answers; 10% provided correct -- but incomplete -- answers; 8% gave incorrect information; and an astonishing 56% provided no information to answer the question at all (James H. Sweetland, Spring 2000, "Reviewing the World Wide Web -- Theory Versus Reality" Library Trends Vol. 48(4), pp. 748-768).
  • The recommendations and details covered above concerning the critical evaluation of resources are equally important when the information is made available on a web site. When evaluating information on the web, it is important to ask: "Who created this web page?", "Who placed this information on the web?", "Is the information biased?" In other words, it is important to critically assess information found on the web.
  • Misinformation is a serious problem on the web, as evidenced by a growing body of investigative literature (see Heinke Kunst et al., 9 March 2002, "Accuracy of Information on Apparently Credible Websites: Survey of Five Common Health Topics," British Medical Journal Vol. 324(7337), pp. 581+; Carol Ebbinghouse, October 2000, "Medical and Legal Misinformation on the Internet," Searcher Vol. 8(9), pp. 18+; Paul S. Piper, September 2000, "Better Read That Again: Web Hoaxes and Misinformation," Searcher Vol. 8(8), pp. 40+).
  • A significant problem associated with "web noise" is the lack of controlled vocabulary. New methods are being developed to provide web pages with controlled indexing (such as XML -- Extensible Markup Language), but for all practical purposes, the vast majority of web pages do not feature controlled indexing; they are keyword-searchable only, as a result, web search engine searching is not generally precise and efficient for in-depth searching.
  • Yet another problem associated with "web noise" is the issue of dead links in search engines. Before a search engine's index can be refreshed, it is possible that the web pages it refers to have disappeared. A web page's "half-life" is estimated to be less than two years [see Jaroslav Pokorny, July/August 2004, "Web Searching and Information Retrieval," IEEE Computing in Science & Engineering, pp. 43-48.]


Inadequate and Biased Information Retrieval on the Web

  • No single web search engine covers more than 34% of the web, with an average coverage of only 18%. By combining engines, coverage can be increased, but does not exceed 50% [Steve Lawrence and C. Lee Giles, 3 April 1998, "Searching the World Wide Web" Science Vol. 280(5360), pp. 98-100; Steve Lawrence and C. Lee Giles, January 1999, "Searching the Web: General and Scientific Information Access" IEEE Communications Vol. 37(1), pp. 116-122; Kurt D. Bollacker, Steve Lawrence and C. Lee Giles, March/April 2000, "Discovering Relevant Scientific Literature on the Web" IEEE Intelligent Systems & Their Applications Vol. 15(2), pp. 42-47].
  • Search engines do not index some content on the web, including .pdf files and web-accessed databases. In fact, according to the OCLC Online Computer Library Center's Office of Research, only 35% of web sites are publicly available and searchable via search engines; the other sites are private, provisional, or not accessible to search engines.
  • Search engines provide access to web sites by means of programs called "spiders" and "crawlers," which index the web. Sites that cannot be indexed include: HTML pages that require user input; dynamically generated web pages; web pages containing nonindexable elements (such as Flash and various image files); sites offering dynamic, real-time data; and sites with .pdf files or web-accessed database (see Chris Sherman and Gary Price, 2001, The Invisible Web: Uncovering Information Sources Search Engines Can't See (Medford, New Jersey: Information Today, Inc.), pp. 64-68). As a result of this state of affairs, a so-called "Invisible Web" exists that cannot be searched by search engines. The "Invisible Web" is also referred to as the "Deep Web," or the "hidden web." To reach at least some (but not all) of these "invisible" sites, the diligent researcher must use specialized search engines and databases. Fortunately, some of these specialized search engines and databases are publicly available. Good starting points include:
  • Efforts are being made to improve search engine technology. For example, intelligent search engines are being developed which deploy web semantics and machine learning. Web semantics is a concept that includes techniques to tag web pages with mark-up that "captures the meaning of the content." Other efforts are underway to develop search engines that "identify, retrieve, and classify Deep Web content" [see Pokorny, July/August 2004, "Web Searching and Information Retrieval" and Thanaa M. Ghanem and Walid G. Aref, January 2004, "Databases Deepen the Web" IEEE Computer, pp. 116-117]. However, much of this work is proprietary and commercial, and will likely result in searching costs for users. Current commercial products include BrightPlanet , Quigo Technologies' Intellisonar, and Deep Web Technologies' Explorit.
  • Academic libraries and other types of libraries are working with search engines such as Google and Yahoo to make valuable research materials available in search engine searches. In particular, libraries are trying to make available digital archives and digital libraries that they produce. An international library organization -- the Online Computer Library Center (OCLC) -- produces a number of unique and powerful databases, including the world's largest bibliographic database, which is a collection of more than 12,000 library catalogs from around the world. OCLC is working with both Google and Yahoo to make a limited number of records from this database available in search engine searches.

    Database vendors are also working to "Googlize" their content -- or rather, to make their content accessible via Google to their paying customers. However, access is still restricted only to paying customers.


    Librarians, therefore, currently are making strong efforts to work with search engine companies to make materials available in search engine searches, but some materials will most likely never be available in a digital format. Most books published before 1995 fall into this category, as well as many older journal articles, newspaper articles, historical maps, archives, letters, diaries, older census statistics, and genealogical materials.

  • A growing concern is also the bias of web search engines. Researchers have confirmed that "many leading search engines give prominence to popular, wealthy, and powerful sites at the expense of others" and that "the rich and powerful clearly can influence search engine tendencies; their dollars can, and in some ways already do, play a decisive role in what sites a given search retrieves" (for an excellent article on this topic, see Lucas Introna and Helen Nissenbaum, January 2000, "Defining the Web: The Politics of Search Engines," Computer (IEEE) Vol. 33(1), pp. 54-62).
  • Following the terrorist attacks on September 11, 2001, another issue to consider with respect to government-sponsored information and research is the fact that some of it "is disappearing from government web sites, much of it in the name of national security." For an article about this topic, see Marylaine Block, 6 December 2002, "Disappearing Data," Ex Libris, Issue Number 161, [Internet, WWW], ADDRESS: http://marylaine.com/exlibris/xlib161.html [Accessed: 20 December 2002].

 

Final Thoughts on the Web

There is good information on the Web. Don't avoid it. But recognize both its strengths and its limitations.

When deciding whether or not to use information that you find on the Web -- or anywhere, for that matter -- consider the answers to the following questions.


How accurate is the information?

  • Does the information appear to be reliable and error-free?
  • Is there an indication that someone (an editor, a review board) checks and verifies the information?
  • For information that reports primary research: Who was responsible for collecting data? What data were actually collected? How were the data collected? How were the data measured? Were the methods employed to collect data standard for the research area?

How authoritative is the information?

  • Who is the author?
  • Is the author an expert? Is the author qualified to write on the topic?
  • For information that reports primary research: Did a particular group or organization sponsor the research? Did the group or organization have a purpose in mind for carrying out the primary research?

Is the information objective?

  • Is the information presented in a manner that seeks to minimize bias?
  • Is the information an example of "advocacy research"? That is, is its purpose from the outset to provide support for a particular conclusion? Or, does it begin with a "clean slate" and no preconceived notions about conclusions?
  • For information that reports primary research: What are the findings? For other types of research, what conclusions are presented?
  • Do the authors place limitations on their topic? Are their limitations on findings and conclusions?
  • For information that purports to be objective, is it supported by scholarly apparatus, such as footnotes, references, a bibliography?
  • For information that purports to be objective, is the tone strident? Does it exhort or attempt to persuade in a reasonable manner?

How current is the information?

  • Is the information dated?
  • For information that reports the results of primary research: When were the data collected?

Does the information offer adequate coverage of the topic?

  • Is the information consistent with information available from other sources?
  • Is a literature review included? If so, is it comprehensive?
  • For information that reports primary research: what suggestions are made for future studies?

For example, here is a newspaper article by Cay Dickson, available from the 12 February 2001 edition of the HoustonChronicle.com website. Dickson examines several HDTV websites and notes some biases in some of the sites.


In the future, it will be even more imperative to understand the strengths and weaknesses of web search engines. It is likely that two types of search engines will be available. First-generation search engines will continue to be refined with "intelligent" features that allow them to infer simple queries. These search engines will index only the "surface web" (and not the Deep Web), although no effort will be made to index the entire surface web. Ranking of results of searches in these search engines increasingly will be based on payments and popularity. Full-text, specialized search engines, on the other hand, will feature powerful search tools (such as complete Boolean searching). These search engines will specialize in making available information in specific subject areas. They will be capable of searching the Deep Web, and they will likely be commercial endeavors.