We have finished all of our experiments that we intend to include in our MSR conference paper. We submitted our abstract for MSR conference. We calculate the similarity scores for 25 different features. 18 of these were adapted from the short text similarity papers and were calculated using Takelab. The other 7 are binary features that are one if the feature considered is the same for both bugs and 0 otherwise. These were created under the assumption that two duplicate bugs will be from the same piece of software and share other qualities.
We then use a support vector regression to calibrate our Binary classification Model (BCM). BCM essentially means we classify each bug as either a duplicate or non duplicates. This is more manageable then the ranking problem that produces a top -K list of the K most similar bugs. We tested tree datasets: Eclipse, Open Office, and Mozilla. Our results indicate that this method is better than previously recorded results.
Another additional different in our model compared to previous results is that we do not keep bugs that are still open. This allows us to avoid training our model on mislabeled data since we can not confirm if these bugs are duplicates or not.
Our full paper is due on 2/7/14. Therefore we are busy writing up the final draft. I am writing the section on how our data was collected and the properties of the datasets.
No comments:
Post a Comment