Web Document Search, Organisation and Exploration Using Self-Organising Neural Networks
By Richard T. Freeman
[Back to Home Page]
Abstract
The amount of content stored electronically is rapidly and dramatically increasing with the advent of the Internet and
corporate Intranets, leading to an information overload. There is a requirement to make content more accessible through content and
knowledge management, allowing it to be efficiently searched and explored. A major limitation of existing hierarchical document
clustering methods used in information retrieval is that they typically generate a dendrogram representation, which is
unsuitable for browsing. Methods based on the self-organising map are more adequate for observing large numbers of clusters, but are
not as natural as tree structures such as those used in libraries, file explorers, web directories or enterprise information portals.
In this thesis, a method is proposed to generate such a tree using an algorithm called Adaptive Topological Tree Structure
(ATTS), which uses a set of hierarchically organised self-organising growing chains. Each chain fully adapts to a specific topic, where
its number of subtopics is determined using the proposed entropy based validation and cluster tendency schemes. This makes the
algorithm novel in that the tree in not a dendrogram or fixed size n-way tree, but rather adapts to the natural underlying
structure at each level in the hierarchy. The chains' topology also allocates similar topics together, and dissimilar ones apart.
The obtained topological tree can be defined as a hybrid graph-tree and taxonomy, with the unique property of both
representing hierarchical relations and the strongest links between clusters. This topology can be exploited to considerably
reduce the time needed for a top-down search as well as improve browsing and user comprehension.
Experimental results show that the ATTS method outperforms other hierarchical divisive clustering algorithms as well as
Self-Organising Maps based methods, for retrieval and makes browsing more intuitive. The generated topological tree is shown
to perform better in terms of document retrieval. The topology provides a unique feature that can be used for finding related
topics and extending the search space.
Keywords
Information retrieval; document clustering; search engine; self organising maps; topological tree; information access; faceted classification; guided navigation; taxonomy generation; neural networks; post retrieval clustering; taxonomy generation; enterprise portals; enterprise content management; enterprise search.
Bibliographic Details
@phdthesis{freemanPhD04,
Author = {Richard T. Freeman},
Title = {Web Document Search, Organisation and Exploration Using
Self-Organising Neural Networks},
School = {University of Manchester},
address={School of Electrical and Electronic Engineering,
Faculty of Engineering and Physical Sciences}
Year = {2004} }
}
[Back to Home Page]
|