Research Experience
Research Title: Combining Multiple Keyphrase Extraction Algorithms to Achieve Better Quality Keyphrase Sets.
(I carried out this research as a part of my MSc degree.)
Research Overview:
Given the fact that the condition of increase in the amount of digital documents has only grown over the years, a question that motivates this research is: How can we give the users the ability to find only relevant information and to do so effectively?
Keyphrases are a brief method of providing meaningful summary information of the contents of documents. Since they completely epitomise the information within documents, they have proved to benefit the users in the information retrieval task where the process of searching information using keyphrases is actually transformed into the process of convergence of the information need of the user with the gateway to the most relevant pool of information that those keyphrases represent.
This research developed a new algorithm, which combined two already existing keyphrase extraction algorithms, and used some criteria to attempt to filter out keyphrases which would be better than what these constituent algorithms would individually produce as output.
Although no enhancements were achieved but very promising results were observed.
As a part of this research, I implemented: (a) KEA (Keyphrase Extraction Algorithm), a machine learning algorithm, (b) Extractor, and (c) developed my own algorithm called HyPhEn (Hybrid Phrasing Engine). A significant result of this research was that I implemented Extractor algorithm and managed to produce similar statistical results of key phrase extraction without having to train it using the Genitor, genetic algorithm. More details can be found in the document that I have attached below.
The research report spanned accross 131 pages, whereas the complete work, including Appendices (E.g. Software Requirements Specifications, Software Design, Testing documents etc.) spanned around 750 pages.
Research Report:
This report is a copyrighted material, where I as the author reserve the copyrights to this original piece of work. No part or whole of this report may be published, or distributed using print, digital or any other media without my prior permission. Failing to agree to these conditions and distributing or printing this report would mean an infringement of copyright laws.
Hybrid Phrasing Engine (Research Report) (Opens in a new window).