Wednesday, January 15, 2014

Week 1/6/14

Happy New Year!

This week, I downloaded pymongo, mongoDB, and all its dependencies. I was able to connect to the mongoDB server. This is needed to restore our small eclipse and eclipse 2008 databases into mongo databases. Although the small eclipse database can be accessed directly using simple techniques, eclipse 2008 has over 45,000 bugs. Therefore,  I had to adapted the programs that I wrote last week to read pymongo databases. I should now be able to create a dictionary of all duplicate bugs and calculate the number of unique duplicate pairs on all of our even larger datasets. We were actually able to use our results from this program to match results given in another paper.

We are also looking into possibly using other similarity measures other than the very common cosine measurement. I think it might be cool to use a norm instead. Possibly the euclidean norm, p norm or infinity norm would be appropriate. My advisors suggested that we calculate the Dice and Jaccard similarities first to have a baseline to compare with previously done research. Then we would be able to tell the effectiveness of these other norms.We are still brainstorming on other avenues to go down.  

The deadline for our MSR abstract is quickly approaching. We have been working on getting what we can written on the paper.

No comments:

Post a Comment