Sunday, February 23, 2014

Week 2/17/14

This has been such a busy time. We submitted the abstract of our data paper to the MSR conference! The full paper is due on 2/21/14. 

We are looking more into the implementation of LDA.  I am currently figuring out a program called Vowpal Wabbit that will make the implementation much easier. 


I also prepared a 15 minute presentation in beamer on the math behind our research and the progress we have made so far. I  am going to present it at the Pi Mu Epsilon regional conference Saturday 2/22/14. This is the first time that I am presenting our research.  

Week 2/10/14

We submitted our paper to the short research paper session on MSR. Our results showed that we had a 3-6% increase in accuracy compared to state of the art published results. We are very excited to get some feedback on our work and hope for the best! 

We are already pushing ahead to make our model better. We would like to add some additional features.  We are considering implementing the LDA algorithm. Currently the largest data set we have experimented on has been mozilla that initially had 78,236 bugs in it. We would like to run our algorithm on some much larger data sets (At least an order of magnitude larger)! 


We would also like to submit a data paper to the MSR conference. This paper will talk about  specifically how we collected our data and processed it. They have a specific paper section on this topic. 


Week 1/27/14 - 2/3/14

We have finished all of our experiments that we intend to include in our MSR conference paper.  We submitted our abstract for MSR conference.  We calculate the similarity scores for 25 different features. 18 of these were adapted from the short text similarity papers and were calculated using Takelab. The other 7 are binary features that are one if the feature considered is the same for both bugs and 0 otherwise. These were created under the assumption that two duplicate bugs will be from the same piece of software and share other qualities. 

 We then use a support vector regression to calibrate our Binary classification Model (BCM). BCM essentially means we classify each bug as either a duplicate or non duplicates. This is more manageable then the ranking problem that produces a top -K list of the K most similar bugs.   We tested tree datasets: Eclipse, Open Office, and Mozilla.  Our results indicate that this method is better than previously recorded results.  

Another additional different in our model compared to previous results is that we do not keep bugs that are still open. This allows us to avoid training our model on mislabeled data since we can not confirm if these bugs are duplicates or not. 


Our full paper is due on 2/7/14. Therefore we are busy writing up the final draft. I am writing the section on how our data was collected and the properties of the datasets. 

Week 1/21/14

We have decided to use Takelab for all of our experiments. Because Takelab was designed to be run on short sentences, we must combine the summary and description of each bug report into one large section of text.  We intend to compare all possible pairs of duplicates. Although other researchers did not compare all  possible duplicates this is statistically better because we are testing our model on a representative set of data. Where as if you change the percentage of duplicates in a dataset, you run into other problems. 

Another issue we are considering has to deal with stack traces. Often times people will submit the stack trace in the bug report's description field. We have two options. We can either leave in the periods or remove them. We intend to test out both methods to see what produces better results, or to see if it matters at all. 

I have also made a summary of the features from each of the papers that we read last week. We can use this to analize which features were the most commonly used and what features worked the best.  

Since we must submit our paper soon, I have also updated the bibliography and made it into a BibTex file for LaTex. I have also moved the paper into the MSR LaTex template and added in the references. 


Week 1/13/14

This week we are moving to a new approach.  We read 3 papers that looked into Short text Similarity. These papers mainly compared the similarity of single sentences. Because there are very few words in a sentence, many other features must be considered to determine if they are related. For example,  Wordnet has an enormous amount of works linked together in a hierarchy that determines if two words are used in similar context. These and many other methods used in these papers would be perfect to apply to our longer texts.  The papers used Takelab and DKPro to analyze all of their data.  These have all of the short text similarity features built in that we are interested in using. We are seriously considering running all of our experiments with the help of this software.  Before we can do this though we need to adapt our datasets slightly. We intend to generate pairs of duplicates and non duplicates to both calibrate and test our model. The papers we read used datasets comprised of 20% duplicates and 80% non duplicates. We are currently working on randomly generating these.