Main Page Content
DHTML Text Marker - An Experiment
Rated 3.93 (Ratings: 3) (Add your rating)
Log in to add a comment
(2 comments so far)
What is a text marker?
Anyone who uses google newsgroups search will encounter a text marker a.k.a highlighter. For instance, search on http://groups.google.com for a recipe to make Minestrone. View any of the search results, the word 'minestrone' will appear highlighted (see screen-cap below).
This feature makes a search engine on a website much more user friendly, the user now does not have to scan the page visually for the part of the text that has the 'minestrone' recipe. Instead, the user now knows immediately which part of the content has the recipe.
Google achieves this 'marking' of the text by applying a
<span style="background-color:#color">
We will look at one particular way to implement this feature.
RIGHT : Screen shot of page from Google Groups,
I searched for a recipe to make 'Minestrone' (this is a tasty italian creation). This was one of
the search result pages that I opened.
Various Implementation methods
There are a lot of "out of the box" search engine tools available that can be used with a website. A few of these search engine solutions come with some kind of search results text highlighting, but what about the rest? (I am aware of a couple : Lotus Domino search and Perlfect Search ). Some people also end up rolling out their own site-specific search engine solutions.
A good way to implement this feature would be on the server side (as Google Groups search does). This involves script based processing of the content using the search keyword (in our example: minestrone) as the parameter, and then output the tagged/highlighted content to the browser. The negative side of this approach is that a different implementation might be needed for a different server side platform. Like many kinds of server side programs this could also turn out be be resource and processor intensive.
I decided for the quick 'n' dirty client-side JavaScript route - which I thought I could then re-use on any server side platform or even for static html pages (though as you read on you will learn I wasn't exactly successful).
Compatibility Issues
There are a few compatibility issues in the JavaScript approach. The original (and single unto now) guinea-pig-implementation for the code is a restricted Intranet scenario (my workplace). Almost everybody is compulsorily on IE 6, so I basically wrote it without giving a damn about other browsers (now, is that evil or what?). But the good news is I got it to work with minimum fuss on Mozilla (yeah! I am a decent guy now). I have tested it successfully on the following browsers:
- IE 6 (Windows versions)
- Mozilla 1 RC1
I tested the script on IE 5.0 and IE crapped out completely: when the browser reached a particular regular expression function in the code my CPU hit 100% processor utilization and IE came to a grinding halt. I guess the regular expression object in IE 5 isn't anywhere as sturdy as the one in IE6 and Mozilla.
I don't have IE 5.5, IE 5.01, Netscape 6.x installed, nor do I have MacOS running anywhere close by, so I have no idea of how the code would behave on those browsers. Though if the code is not working on IE 5, with IE 4.x your guess is as good as mine.
Theoretically it should be possible to make it work on NN4 or any browser that
supports the innerHTML object, (i.e. if the regular expression
object can handle the stress). I did try to make it work on Opera 6 but it doesn't
support innerHTML :-(.
Which is why I have called this article an experiment ! But at least the article is futuristic with its upward compatibility ;).
JavaScripting our implementation
Here is a summarized list of steps needed to implement this:
- Enclose the main content portion of the page within a single named tag
<div>
- Add an
onLoadevent to the<body>tag . - Call the highlighting function from the
onLoadevent, if a particular query string has been passed to the page.
Step 1 : <div>ing your content
To begin with I designed all my content pages such that, the whole content
section on a page was wrapped inside a named <div> tag, something like
this:
<div id="contentdiv"><!--begin content section--> <p> skajfl sa fjla safkljasl lkfasj ja akjflka .....<br> <a href="dudu.com">my link</a> <br> </p> <ul><li>....</ul> <table> ......</table> <p>......</p> </div><!--end content section-->
Note: this is the content area of the page, kind of similar to the article content part of a page on evolt.org. This doesn't include stuff like the evolt side bar or the top menu.
Why did I have to use a <div> tag?
More for convenience, to access all the HTML Content area in a page through JavaScript all I would need to do now is:
elemObj = document.getElementById('contentdiv');<br> strInHtml = elemObj.innerHTML;
strInHtml is now a string containing all the HTML contained by
<div
id='contentdiv'>innerHTML property of the <div>
tag to access the raw HTML. The good thing about the innerHTML
property is , it is also a settable property.
I can do something like:
x.innerHTML = '<h2>My new stuff </h2>';
which will overwrite all the HTML within the <div> with my
new stuff. Our implementation will be using this powerful little DOM property.
Note: There is a big ongoing
debate about the pros and cons of using innerHTML. Read
all about it!. The fact remains that innerHTML is very convenient
as against using the more complicated DOM methods which seems the more politically
correct method.
Step 2 : the onLoad event
When a keyword on the page needs to be highlighted, the keyword is passed to the page using a query string. If the page is normally invoked like this:
http://server/page/index.asp
With the highlight query string it will be invoked like this:
http://server/page/index.asp?hilite=minestrone
Add an onLoad event in the <body> tag. Something like:
<body>
Step 3 : Calling the highlighting function from ūcode˙onLoad
Here I read the innerHTML property of the <div>
as raw HTML into a string variable. Then I do a search and replace of every
instance of the keyword with the highlighted version of the keyword. Then finally
I write the replaced version of the string back into the <div>.
The highlighting is achieved using a simple <span> tag with style="background-color:yellow;" .
There is a precaution to take here when inserting the tags, consider some content like this:
<p> ............. <!--content-->............ <a href="minestrone.com" title="link to minestrone home page">minsestrone home page</a>. ............. <!--more content-->............ </p>
The replace function should not place highlight tags around the word minestrone
found within the "title" attribute of the <a> tag,
that would break the HTML. It should replace only stuff within the <a></a>
tags.
I used the javascript RegExp object to filter out illegal matches of this kind.
Just a couple of points before we dive into the actual code.
- Instead of trying to match only text within an opening
<*>and closing<*/>I match everything to the right of
<* >. This is because not all tags have opening and closing pairs and people invariably forget to close the good old<p>tag. - The code assumes there are no
<script>tags within the content. I didn't build any checks in for these tags.
There is some preliminary stuff that onLoad does, that I will not be getting
into explaining here as they have been dealt with earlier in articles by other
people on this website:
- Extracting the keyword to be highlighted from a querystring using a javascript
DOM property (
document.location.href). - Extracting the
innerHTMLfrom the<div>again using the DOM properties. - Writing back the highlighted text into the
<div>'sinnerHTML.
The actual function that applies the <span> highlighting tags to the
innerHTML is quite small , lets examine it part by part:
function markText(txtKeyword, inputHtml)
{
var re; /*regex object*/
var varMatches; /*matches array*/
var outHtml; /*output html*/
var replaceText;/*build the span tag with the keyword in advance*/
replaceText = '<span style="background-color:yellow;color:red;font-weight:bold;">'+txtKeyword+ '</span>';
The function takes two paramters, the keyword to be highlighted (txtKeyword)
and the raw HTML content string extracted using the innerHTML property (inputHtml).
All the neccessary string and regular expression object variables are declared.
The highlighted keyword string is built up in advance, in Line 6 by prefixing
& suffixing it with a <span style=....> tag.
re=new RegExp("(\<[^>][^<]*\>)([^<]*)","g"); /*create non-greedy regex match*/
outHtml=new String(''); /*init html string*/
A new instance of the RegExp Object is declared ,every opening
(<) and closing (>) tag is matched and any
non-tag expression to the right of the closing tag. The second parameter to
the RegExp object ("g") indicates that the RegExp
match will be done recursively(globally).
I had to slip in the extra [^<]
in the first part of the expression, sometimes the match used to bomb on encountering
a non-visible character. The extra expression seemed to fix that.
while ((varMatches = re.exec(inputHtml)) != null)/*exec sequentially to apply span tags*/
{
outHtml+=varMatches[1]; /*html tag part*/
outHtml+=replaceMe(varMatches[2], txtKeyword, replaceText); /*call the search & replace function*/
}
return outHtml;
}
The innerHTML string is now evaluated against the regular expression
object. The exec() method searches the string using the regular
expression and returns an array (varMatches) containing the results
of the search. Dimension 1 of the array (varMatches[1]) contains
the matched HTML tag and Dimension 2 (varMatches[2]) contains the
non-tagged text to the right of the matched tag .
For example if the following is one of the matches :
<p class="xclass">hello there
varMatches[1] would contain <p class="xclass">
and
varMatches[2] would contain the string: "hello there"
The string in varMatches[2] is now searched for the keyword to
be highlighted and every instance of it is replaced with the <span>
tagged keyword (using the replaceMe() function).
Subsequently the highlighted output string from the markText()
function is written back to the <div> tag by setting the
innerHTML property , something like :
contentDivObj.innerHTML = strOutputFromMarkText;
The sample code should be self explanatory and it is commented. Most of the
layer writing methods like reading and setting the innerHTML, I learnt from
ppk's website.
That's about it.
There is a working example available : DHTML marker sample
Some possible improvements / optimizations :
- Portable code for other minor browsers .
- Right now the code treats multiple keywords as a phrase, changing this code to handle each word in the phrase individually shouldn't be hard to implement .
- I don't do character code conversions. For example: if someone searched for a word like: bonnie&clyde. I don't convert it to bonny&clyde. So maybe this could be added.



