FindinSite-MS: Search engine for an ASP.NET website   .
  search
Powered by FindinSite-MS
. Home | Installation | Indexing | Control Panel | Web services | Advanced | Purchasing .
. .
  Web.Config options | Look and Feel | Languages | Word highlighting | Runtime parameters | Rules | Subsets | Meta-data fields

 

findinsite-ms word highlighting


findinsite-ms can highlight search words in result web pages. Highlighting is on by default, but can be turned off in the Control Panel Look and Feel screen - see below for details of how to change the highlighting HTML. It is recommended that you only use highlighting if you are displaying hits on your own web site.

Word highlighting is very useful because it lets the user see their search words on the page straight away, making the search process much more friendly. findinsite-ms does not use cached web pages when highlighting - it uses the live page.

The Search API supports word highlighting by returning a HighlightURL field for each hit.

Example:

If you do a search for brown car for example, the hits are listed as normal. If you click on a hit in an HTML web page, then findinsite-ms displays the page with all search words and their variants highlighted and contiguous words run together. The page is scrolled to show the first highlight.

Josie jumped out of the car and landed in the brown mud.
Brown cars came past and splashed her.
"Brown car, go away!" shouted Josie.

By default, a header is added to the page (just after the <body> tag) to tell the user that it has been highlighted by findinsite-ms. This can be switched off in the Control Panel, if desired. Example header:

Page http://www.phdcc.com/findinsite/highlite.htm highlighted by findinsite-ms

In the results list, clicking on the hit link will show the page with highlighting. If you right-click and choose any option (eg Open in New Window) then the hit page is shown without highlighting.

Cross-domain highlighting

The highlighting process will work across domain boundaries, so findinsite-ms on the phdcc web site http://www.phdcc.com/findinsite/ can highlight pages on your domain, eg www.example.org.

  • Warning:  While the highlighting method (see below) should correctly resolve links correctly, it is possible that mischievous code could interfere at some point with the findinsite-ms web site. It is therefore recommended that you only highlight on your own web site.

  • In a very small number of cases, highlighted pages are not shown correctly if the findinsite-ms domain is different from the site domain. If there is a problem, either switch off highlighting, or run findinsite-ms at the site domain.

    The one case that we have found is an unusual frameset, where a frameset is embedded in a larger page and created dynamically after the main page has been loaded. The problem is that the FRAME SRC is loaded by the browser from the findinsite-ms site, not the searched site. The Cached page shown by a major search engine also suffers from this problem.


Word Highlighting Technical Details

Overview:  findinsite-ms highlights words in a result web page by:
1.  Reading the result web page
2.  Adding in highlighting HTML
3.  Returning the amended web page to the user

When the user clicks on a (highlighting) link in the result list, the findinsite-ms page show.aspx is called. The link URL includes parameters to tell show.aspx which page to highlight and what words to highlight.

show.aspx retrieves the requested page. It then adds in the highlighting HTML and returns it to the browser. Normally this process would not work because all the page links would go wrong - findinsite-ms gets round this problem by adding a <base> tag at the top of the page. For example, for this page online, it would add in the following, together with the explanatory header:

<base href='http://www.phdcc.com/findinsite/highlite.htm' />

findinsite-ms passes all HTTP request headers to the requested page, and returns all received HTTP response headers in its response. This ensures that session state using cookies is maintained.

If there are any problems in the above process, then findinsite-ms aborts highlighting and redirects the browser to show the page without highlighting.

Note that the above process works with any sort of URL that produces HTML output, even dynamically generated pages produced by ASP, ASPX or PHP pages. Also note that findinsite-ms always requests the live page - it does not use a cached copy of the page, which could be out of date.


Changing the Highlighting HTML

findinsite-ms highlights found search words by inserting HTML before and after the found words. The default highlighting uses a background colour of yellow and a bold font colour of red. This is achieved using the following HTML:

highlightStart <SPAN style='background: yellow;'><FONT COLOR=red><B>
highlightEnd </B></FONT></SPAN>

You cannot change the highlighting definitions in the Control Panel because HTML entry is disallowed for safety reasons. To change the highlighting HTML, you must therefore edit the findinsite-ms settings file findinsite.xml in the work directory.

findinsite.xml stores various settings in XML format. The <highlightStart> and <highlightEnd> tags store the HTML that start and end highlighting. In the XML file, < and > must be encoded as &lt; and &gt;.

The default values should be stored as follows:

<highlightStart>&lt;SPAN style='background: yellow;'&gt;&lt;FONT COLOR=red&gt;&lt;B&gt;</highlightStart>
<highlightEnd>&lt;/B&gt;&lt;/FONT&gt;&lt;/SPAN&gt;</highlightEnd>

Carefully edit the findinsite.xml directly, eg using FTP software download a copy of this file, edit it in Notepad or similar, then upload using FTP. findinsite-ms will only read its setting file when it restarts. Either wait for a restart, or force a restart by updating the application Web.Config file. Once restarted, check in the Look and Feel screen that the highlighting has been set as you wish.

  All site Copyright © 1996-2011 PHD Computer Consultants Ltd, PHDCC   Privacy  

Last modified: 3 October 2008.