News

Insights from practitioners in Information Management

Issue 43 – Advanced internet searching

Welcome to the March edition of Information Overload for Registrants and other interested readers. This month we have a look at the Internet and how to find quality information. As we have mentioned in previous editions of Overload, we advise job seekers to research the organisation they are hoping to work for in order to find useful information to drop into the Q&A session. Not only does this make you seem like you are interested in joining the organisation, but it gives you lots of reference points you can relate your own skills and experience to. But researching an organisation should never start and stop with a quick visit to the organisations own website, after all they are going to say nice things about themselves aren’t they? How do you get beyond the hype to find information that really matters?

=======================================
In this issue we will look at:
• Can you find what you are looking for on “the net”?
• Why can’t I find what I’m looking for?
• Saving yourself time when conducting Internet research
• A Thought to ponder

Can you find what you are looking for on “the net”?
If I asked you to do a search on the Internet for information relating to a particular organisation, where would you go? Chances are you said “GOOGLE” and you are not alone. It’s easy to use and is the number one search engine of choice for most people. Simply type in a few words in the search box and hope the information you wanted comes back within the first one or two pages.

I’m sure you will have noticed that the first results that appear may not be the one you are looking for. It is interesting to note that most (if not all of the first hits) will have something in common – they tend to be paid listings. Someone has paid the search engine to reach that number one slot. The thing is of course is that someone with even deeper pockets can soon knock you off your perch, and you may end up paying more just to stay ahead. Alternatively, the organisations have paid someone to optimise their website so that it appears higher up the rankings than a similar organisation that hasn’t got the time, money or expertise to do the same thing.

Once you have trawled your way through the paid listings (sometimes listed as Sponsored Links) the next group of sites that normally appear will be governmental and educational establishments. (.edu and .gov) these are classed as sites of quality and tend to appear further up the rankings than companies, organisations and associations.

Of course you could be very lucky and have the URL of the web site that you are looking for, in which case all you need to do is open up your browser of choice and type the string of characters into the address box at the top and press “GO” and wait for the site to appear.

But what happens if the site you are looking for didn’t come back in the first one or two pages? And how do you find information relating to an organisation that you are hoping to work for? What do you do next?

Why can’t I find what I am looking for?
There are many reasons why you may not be able to find information on the Internet. The trouble is, simply typing in a couple of keywords into a major search engine such as Google, will usually give you back hundreds if not thousands of potentially good web sites, or absolutely nothing of use at all. The question is why?

• The keywords that you have chosen to search for may not be the most appropriate. Remember that different people use different words to mean the same thing. For example HR, Human Resources, Personnel. You might also like to consider that some organisations are known by acronyms rather than their full titles – NATO = North Atlantic Treaty Organisation.
• The person who created the web site didn’t name each page – if you open up your web browser, and have a look at the very top blue bar – this “name” should be unique for each page that is created. When creating a “hit list” the search engines use this information to begin their ranking process. If each page says the same thing, the automated trawlers will assume the site has little or nothing of value and may not index the additional pages.
• The page properties have not been set. Again, the trawlers use this information to rank the sites. If you are planning on building a web site for yourself or your organisation please ensure that you add the page properties.
• If the web site is frames based however, the trawlers may not see the additional pages as all the pages look alike at this point. If you go to IEA’s current web site (http://www.iea.com.au) you will notice that following the main URL are the words /webframe.htm
• The meta data has not been added to the page information, or the meta data does not adequately reflect what is on the web page. Too many keywords or key phrases in the meta tags may indicate to the trawlers that the person creating the site was trying too hard and may rank the site as SPAM. Too few words and the trawlers will not stick around for long. And with a frames based web sites – if all the pages look the same, there is little or no point adding extra words to the meta headings, as the trawlers may deem the site to be SPAMMING – because not all the words relate directly to the content on the first page.
• The site has been created using Java script, is hidden behind firewalls or is password protected. Any site that needs you to input a question before it can generate the answer may not be visible to the major search engines. Which is why you will rarely (if ever) find name, telephone and address information in a “hit list”. Yellowpages, Whitepages and Amazon are all good examples of these kinds of sites. The information is contained within a database and the results are only generated once you have asked the right kind of question(s).
• You haven’t spelt the words correctly – whilst missing out letters in the initial search strategy may have returned a few results, consider that we spell things differently to other countries – in particular the UK and the US. For example DEFENCE and DEFENSE. In order to find all the information relating to a particular area you will need to ensure that your search string contains both/all the variations, and separate each “like” term with an OR eg defense OR defence. The OR should always be in capital letters otherwise the search engines will assume you are looking for sites that contain both defence and defense and the word “or”.
• And talking of spelling, remember that different words can mean the same thing – depending on which country you live in. For example Trunk and Boot; Elevator and Lift; Sidewalk and Pavement etc etc.
• You haven’t used the correct mix of keywords in your search strategy. If you are looking for information on a subject and the keywords are usually found next to one another, for example: World Health Organisation it is important to search for all the variations relating to that organisational name.
o World Health Organisation
o World Health Organization
o WHO

However, when typing in the key words remember that unless you tell the search engine that these words should appear next to one another, you will receive hits with the words World and Health and Organisation in them. In order to force the search engine to search for them together add quotation marks around the phrase – “World Health Organisation”. In order to search for all the variations within the same search string you simply add the word OR between the phrases or words – “World Health Organisation” OR “World Health Organization” OR WHO
• Remember to utilise the Advanced Search options – usually hidden as a tiny link next to the main search box. You can tell the search engine that you are only interested in searching for a particular domain eg government information, or if you only want information in a certain language. This can dramatically reduce the number of hits.
• You don’t really know what you are searching for. Fishing expeditions don’t usually produce quality results. Once you have spent a little time working out the terms you need to search for – use the advanced search option and narrow your search. 

Saving yourself time when conducting Internet research
• Open a blank word document – when you find information of interest and use, simply highlight the information that you want and copy it directly into the document. However, you should also make sure you copy the entire URL (Universal Resource Locator) and paste it into the document along with the date that you accessed the page. You should also make a note of the author, the author’s organisation and the title of the piece. If you plan on using any of the information when you go to write your report or assignment, you will need all this information, especially if you don’t want to be penalised for plagiarism. Oh and don’t forget to save the document.
• If you are using Internet Explorer as your browser of choice, right click on the hypertext link and open the new site in a new window. That way you do not lose your train of research thought and you don’t have to keep going backwards using the “back” button, as it is easy to get confused as to where you have been and where you still need to go.

Once the site is opening, carry on down your “hit list” opening up new windows for each site that has potential. Then spend time browsing for useful information within the sites.

Create a new favourite category for the area of research that you are doing, and save each new site as soon as you open it. That way you don’t forget and you can go back to the site should you need to research it further. 

NB With the Mozilla Firefox browser each site opens in a new tab within the one window, rather than a new window for each site visited. You can then see at a glance where you have been. If you use IE and you have more than a dozen programs or windows open you can’t usually see what they are unless you re-open them because the words are not able to be displayed. You might also like to note, that Mozilla can be downloaded free of charge – yes from the Internet!!!!
• Keep your favourites list tidy. Use folders and sub-folders to index your sites, that way you don’t have to spend time trawling through your own personal hit list to find the information you need. If you find that one of your sites/links has disappeared it may mean that the information is not available – or has been moved. Take the URL back to the main information (eg., http://www.iea.com.au/ and try searching for the page/information you needed from within their own search engine (assuming they have one). Alternatively, make use of the site map – it’s sometimes quicker. For example, news articles are one of the first things to disappear – once a news item isn’t newsworthy anymore, the item will be taken down and replaced with something that is. However, you may find that the item hasn’t disappeared completely, but may have been moved to the archive for instance. Delete the old link and replace it with the new one, that way your favourites list is a valuable tool, rather than a dumping ground for old and outdated information.

Clearinghouses, Subject Hubs and the Deep Web
As we have discussed, web sites that contain information within databases (eg Amazon) cannot usually be accessed from outside the web site – and so it is with most of the clearinghouses of information, subject hubs and ways into the Deep Web. These are simply gateways that lead to doors of individual sites.

Perhaps one of the best-known sites is the Librarians Index to the Internet (http://lii.org) whilst American in focus, this web site contains guaranteed information. The links have been checked for accuracy by a team of professional librarians, and abstracted for ease of use.

Oh if it were all that easy.

For the most part, the Internet is unstructured, which is why it can be difficult to generate a results list that contains the perfect answers every time. As with all things Internet based – please remember that anyone can create a web site or directory and probably has.

With the case of most (if not all) of the Deep Web Clearinghouses and Subject Hubs – the sites have been selected and categorised by the compiling organisation or person. However, a word of caution – these are all subjected to someone’s own personal opinion, bias and hearsay. Information may be deliberately missed out simply because they do not fit with someone’s own personal belief system. For example, a Catholic funded clearinghouse (and I don’t know whether there is one) may not contain information relating to other religions, or medical sites that contain information on abortion.

The following is a sample of a few subject hubs and clearinghouses – happy surfing!

BUBL – http://www.bubl.ac.uk/ use the Dewey Classification Scheme as their entry points.
Chemistry links – http://www.liv.ac.uk/Chemistry/Links/govreslabs.html
Internet guide to Engineering, Mathematics and Computing – http://www.eevl.ac.uk/
Invisible Web Net – http://www.invisible-web.net/ – this is a hand compiled listing of entry points to the invisible web and was created by the sites authors as examples for their book “The Invisible Web”
Profusion – formerly known as “invisible web” – http://www.profusion.com/nav  – A search engine style collection of over 10,000 databases
Complete Planet – http://aip.completeplanet.com – Developed by the Deep Web experts Bright Planet, this site gives access to over 70,000 searchable databases and speciality search engines. Also has 2 white papers that explain in depth what the deep (invisible) web is.
These were all last accessed on the 7th March 2006 and were available on this date.

=======================================
A Thought to Ponder
“I’m all in favour of keeping dangerous weapons out of the hands of fools. Let’s start with typewriters”
Frank Lloyd Wright
1868 – 1959
=======================================