Sunday, February 23, 2014

Week 1/13/14

This week we are moving to a new approach.  We read 3 papers that looked into Short text Similarity. These papers mainly compared the similarity of single sentences. Because there are very few words in a sentence, many other features must be considered to determine if they are related. For example,  Wordnet has an enormous amount of works linked together in a hierarchy that determines if two words are used in similar context. These and many other methods used in these papers would be perfect to apply to our longer texts.  The papers used Takelab and DKPro to analyze all of their data.  These have all of the short text similarity features built in that we are interested in using. We are seriously considering running all of our experiments with the help of this software.  Before we can do this though we need to adapt our datasets slightly. We intend to generate pairs of duplicates and non duplicates to both calibrate and test our model. The papers we read used datasets comprised of 20% duplicates and 80% non duplicates. We are currently working on randomly generating these.

No comments:

Post a Comment