Searching the internet - things to be aware of:
A. Search Engines:
1) Each search engine looks in its catalog for the keywords or search strings that you enter. No search engine looks inside the whole internet. That is why each engine you use will often contain different “hits” of information.
2) A search engine’s catalog may be passive or active. An active catalogue collects information by itself by using a ‘spider’ (robot, wanderer, crawler) program. The spider may run continuously or run on a weekly or monthly basis. A passive engine requires its operators to enter information or else allows interested parties to ‘register’ web pages on it. Most search engines now combine the two approaches.
3) Meta-Search Engines forward the queries to other search engines and then preset the results to users. Some are dedicated meta-search engines and others like Yahoo may do a meta-search only if they fail to find information in its own catalogue.Help sections of some search engines are found at http://www.erin.utoronto.ca/~w3car/searchhelp.html
- Have large catalogues that are frequently updated
- Give many “hits” that are often irrelevant
- May catalog every page of a site, not just the main page
- They are not supervised by anyone and so may not catalogue very well
- Potentially they are better organized since they are supervised to a greater extent
- Have smaller catalogs
- Quality depends on how the administrator sets up the catalog
- May not be updated as quickly as the active ones
- Searches many sites
- Takes longer
- May present only the first few matches
- May or may not sort and prioritize the hits
B. Query Strategies
1) A simple query is often the best way to start. Type in a keyword and see what it gets you. Often you will get too many hits, and you will have to modify your approach to cut down on the number of hits. Instead of general terms, try more specific keywords. For example, the term “pollution” will generate too many hits to look at - “air pollution” will reduce the number considerably. Entering the phrase “indoor air pollution” will reduce the number even more.
2) Changing your query will change the results you get. Try related terms and phrases. In the example above, asking for the phrase “sick building syndrome”, “smog”, “acid rain”, or “air quality” may get you much different information that the broad term of “air pollution”.
Beware that different search engines use different terms to search for the same concept. You may want to try alternate word forms - first try to look for the singular form and, if that fails, then try the plural form (e.g. dog & dogs, ox & oxen). Try the noun and the adjective form (e.g. poison & poisonous). Words may have alternate spellings (grey & gray, color & colour). Use the term “OR” (called an operator) to prevent the search from missing something. For synonyms also use the “OR” operator or search for the terms separately (e.g. car, automobile, and motor vehicle). Be aware that a search engine may or may not correct for misspellings.
3) No simple queries exist! What you thought was a simple query may not be one. Different search engines may give you give you different results because they are set up differently. For example, if you type the phrase ‘space shuttle’, you may get any of these:
- pages containing the complete term/phrase ‘space shuttle’
- pages that have the words ‘space’ and ‘shuttle’ located near each other in the text
- pages that have the words ‘space’ and ‘shuttle’ located anywhere on the page
- pages that contain either the words ‘space’ and ‘shuttle’
- pages related in some way to the term/phrase ‘space shuttle’
Depending on your intention, only some of the above might be helpful. The fifth type would probably lead to many irrelevant hits. How the search is carried out depends on the default operator settings. You can often find out how the engine operates by examining these default settings.
4) Most search engines have ‘help files’, ‘options menus’, and ‘search tips’ that can help you customize your search. These aids also give you an idea how individual search engines look for material and what the default settings are.
C) Some Advanced Strategies
1) The operator “OR” tells the engine to match at least one of the terms.
2) The operator “AND” tells the engine to match all of the terms.
3) The operator “NOT” looks for a match if the keyword is not present.
4) The operator “SOME” means that only some of a string of terms need to be present.5) The operator “NEAR” means that the terms are matched if close to each other.
6) If you want adjacent terms in a certain order, put them in “quotation marks”.
7) If you want some term grouped, put them in (parentheses), and they will be matched before any other terms.
8) You can often set (see point #4 Query Strategies) the engine to allow misspellings and partial words. You may also be able to specify where on the page or document the search is done (URL, title, headings (1st or all), summary, or everything on the page). Be aware that some settings, such as those that allow for partial word substrings, will increase the number of irrelevant ‘hits’.
9) It may be best to try a general search (i.e. broad categories) and then, based on that information, narrow down the list by doing a specific search.
D. Common search problems (Arhhhhhhhhhhhh!!!!!!)
1) You have looked for a long time and cannot find anything. This could mean that what you want is not on the internet - despite all the hype, lots of “stuff” is not yet available over the net. It is also possible you are not looking in the right place or asking in the best way - modify your search strategy.
2) You find so much information you are overwhelmed and cannot possibly weed through it all (i.e. 100,000 hits) - modify your search strategy to reduce the number of hits.
3) The network is slow, the network is down, the site is ‘under construction’, the site you want has moved (change on the net is rapid) - smell the roses and try again later, and modify your search strategy if need be.
4) Remember that finding “useful” information on the net is a combination of experience, insight, perseverance, and luck.
Daniel J. Barrett (Net Research: Finding Information Online, 1997, p. 5) gave the following Internet Searcher’s Rules for the Road [and these are still valid today]:
- Carefully choose a starting place. As the old saying goes, “Sometimes you can’t get there from here.” Different starting points may lead to different results.
- Don’t assume failure too quickly. Try variations on your search.
- Don’t assume success too quickly. Even when you locate what you want, there may be other sources of information “out there” that are valuable. Don’t rely just one on site or search technique. Keep an open mind and experiment.
- Think about your route. Even if you reached your goal, there might be faster ways to get there. Pay attention when a search strategy provides quick results; the same strategy might be usable in other situations.
- Know your tools. Read the online manual and online help. Try all of the commands and options.
- Intuition is your best search tool. The internet changes rapidly, and so does the software we use to access it. Knowledge, on the other hand, accumulates. As you learn from experience, you’ll get progressively better at tackling new situations.
5) Look for sites that specialize in the type of information or data that you require. These sites may provide useful links to other sites, including those that are hard to locate.
6) If possible find an FAQ (frequently asked question) list for your topic. These lists can shorten your search time and point you in the right direction.
E. The net is a great resource - however accuracy is not guaranteed online
How can you tell if the online resource is believable? The URL (address) can provide some assurance. If the web page resides on a computer belonging to an official organization that is a good sign. Web sites have the same credibility as their ‘regular’ counterparts. Newspaper sites are comparable to their in-print counterparts since the publishers try to keep both accurate. The advertising in both will most likely contain plenty of ‘hype.’
Daniel J. Barrett (Net Research: Finding Information Online, 1997, p. 6-8) listed some of the major issues of online accuracy that are as relevant now as they were in 1997:
- Mistakes. Typographical errors, factual errors, accidental omissions, incorrect URLs, careless statements, and ambiguity can occur in any material online.
- Outdated information. Material online is not always kept up to date. Web page creators may be slow, and forget or neglect to update their pages. Updated information may become available online in a different location from the original, with no indication that this is so. Articles written long ago may float around the web with the date of authorship missing, so it’s not easy to tell that the material is old. Web pages may become obsolete as locations change.
- Opinions stated as facts. Some kinds of online information are obviously opinions, such as product reviews and political beliefs. Other opinions can be ‘dressed up’ to look like facts. Caveat emptor (buyer beware) applies - anybody can say just about anything they want online, and it’s up to you to judge the validity of the source.
- Bias and conflict of interest. When considering information online, consider the source. Paid sponsorship may be an important factor in judging material found online.
- Fraud. Although most people on the internet behave honestly, there are always some troublemakers in every crowd. Be aware that phony businesses, illegal money-making schemes, deceptive advertisements, and hoaxes are sometimes found online. Be skeptical of claims that are too good to be true.