Research Title: Combining
Multiple Keyphrase Extraction Algorithms to Achieve Better
Quality Keyphrase Sets.
(I carried out this research as a part of my MSc degree.)
Research Overview:
Given the fact that the condition of increase in the amount
of digital documents has only grown over the years, a question
that motivates this research is: How can we give the users
the ability to find only relevant information and to do so
effectively?
Keyphrases are a brief method of providing meaningful summary
information of the contents of documents. Since they completely
epitomise the information within documents, they have proved
to benefit the users in the information retrieval task where
the process of searching information using keyphrases is actually
transformed into the process of convergence of the information
need of the user with the gateway to the most relevant pool
of information that those keyphrases represent.
This research developed a new algorithm, which combined two
already existing keyphrase extraction algorithms, and used
some criteria to attempt to filter out keyphrases which would
be better than what these constituent algorithms would individually
produce as output.
Although no enhancements were achieved but very promising
results were observed.
As a part of this research, I implemented: (a) KEA (Keyphrase
Extraction Algorithm), a machine learning algorithm, (b) Extractor,
and (c) developed my own algorithm called HyPhEn (Hybrid Phrasing
Engine). A significant result of this research was that I
implemented Extractor algorithm and managed to produce similar
statistical results of key phrase extraction without having
to train it using the Genitor, genetic algorithm. More details
can be found in the document that I have attached below.
The research report spanned accross 131 pages, whereas the
complete work, including Appendices (E.g. Software Requirements
Specifications, Software Design, Testing documents etc.) spanned
around 750 pages.
Research Report:
By downloading this report, you imply that you understand
it to be a copyrighted material, where I as the author reserve
the copyrights to this original piece of work. No part or
whole of this report may be published, or distributed using
print, digital or any other media without my prior permission.
Failing to agree to these conditions and distributing or printing
this report would mean an infringement of copyright laws.
Hybrid
Phrasing Engine (Research Report) (Opens in a new window).
|