Monday, April 21, 2014

Week 4/14/14

This week I have completed the first version of the poster for the Grace Hopper Celebration of Women in Computing. I had to do a lot of finking with the poster to fit all of the important information on it. It was very hard to narrow down what should go on the poster. I think it looks pretty nifty now.

I also submitted an application for a scholarship to attend this celebration. I really hope that I get this scholarship. Since I am graduating, I am not sure if I can still get funding from the CREU program to attend. I need to look into this matter, but the scholarship would certainly help cut the costs down of attendance.  

I set a date for presenting this research to the department for my senior project. I get to present on May 28th at 11:00 am. It should be a jolly good time!

I also wrote the first draft of the final report for this project. I can't believe the year is almost over! This has been such a great experience for me. I will really miss the weekly meetings with Dr. Bonita and Dr. Sharif!

Week 3/30/14-4/7/14

This week, I presented our paper at the YSU Quest forum for student research. It was very well received and the audience seemed very interested in the topic. I really enjoyed having the opportunity to present and got some good feedback. Some interesting question were posed at the end of the presentation. Since we had very good results, we did not look too closely at the very few negative cases (non duplicates) that were mislabeled as duplicates. One audience member asked what happened in these small number of cases. I intend to look into this more closely to find the source of the problem to improve our system. Another audience member, who often writes bug reports himself, asked if we had any issues with profanity in the reports we studied (since he is not often happy when he is writing his reports). This seems like an interesting thing to look into (although we did not encounter any problems from it in our tests).

I also presented our project from a more mathematical stand point at the Ohio MAA sectional meeting. I talked about the different Metric preserving functions that were essential to make some of our machine learning algorithms that cluster data (K-nearest neighbors) a logical choice. The audience here was curious about the datasets that we used. One audience member suggested that we also try out our system on some proprietary software, to see if our excellent results still hold.

Sunday, March 30, 2014

Week 3/17/14 - 3/24/14

·     This week, I submitted a title and abstract to YSU's QUEST forum for student scholarship. My talk is going to be this Tuesday 4/1/14. I am very excited to present our research at this venue and have been working hard to create the presentation and practice it. 
     
    I am also attending the Ohio MAA sectional conference this friday where I will present all of the mathematics that I have learned to accompany this project. I am going to talk about the difference between similarity measures that are metrics and ones that are non-metrics. A metric is simply a function that preserves distance or always satisfies the triangle inequality. There is a lot of debate currently going on about wether or not it is important to use metrics. This topic is truly interdisciplinary (since it spans the subjects of Ecology, Biology, Chemistry, Computer Science, and Mathematics). In fact, the book I am currently ready on the subject is all about interpreting ecological data. I also intend to do an analysis of the techniques used for our model to determine if they are distance preserving or not. If time is permitting, I will also talk about subadditive and supperadditive functions, and the role they plan in this conundrum. 
·       
    The first very rough draft of our Grace Hopper Celebration poster has been created. Much work is still needed to get this ready for the conference, but we have plenty of time. I chose to create the poster in LaTex, because I really enjoy typesetting!
· 


Monday, March 17, 2014

Week 3/10/14

Our two papers have been accepted to the 2014 MSR Conference! One is a data paper that explains in detail how our data was collected and preprocessed.  The other paper was a short research paper that explains the progress we have made thus far. We received a list of referee comments for both papers. We will be busy over the next week or so making these changes and getting our papers camera ready. Once these revisions are made, we will resubmit the papers for final approval.  We will then travel to Hyperbad India to present this research!

I also wrote our extended abstract for the GHC poster session. We have already submitted this. I need to get working on our poster now. I am torn between using latex or an office environment to create this poster.  I really like latex and the equations will look sharp, but some formatting parts can really be finicky.

I also submitted an abstract to the Ohio MAA sectional math fest. In this presentation, I intend to talk more about the underlying math of the project and how it all comes together.

My list of research supplies from the USR Grant has been approved and ordered. I am getting three books that should be very helpful to understand all of the different aspects of the project. I am also getting a two terabyte portable hard drive to store the immense amounts of data needed for our study and a headset for our weekly online meetings.

Monday, March 10, 2014

Week 2/24/14 - 3/3/14

      This week, I presented our research at the Regional Pi Mu Epsilon Conference. I gave a 15 minute presentation. I talked about the goals of this research, the importance for software companies to look into these problems, the basic techniques involved, new features that we considered, and the results obtained. The talk went well. The audience seemed really interested and had a bunch of questions. I really enjoyed attending the conference. I got to see some interesting talks given by fellow students. It was a really great experience. 

      I also wrote some sort pseudo code for the programs I previously wrote. We may include this in future papers or just have it on file to look back on. its really weird converting from actual code to pseudo code. I generally do this in the reverse direction. 
  
      I also read a very advanced thesis paper on LDA. I have to admit this was a hard read. It used a lot of probability theory that I have never seen before. I am going to have to do a lot of background reading to get a better sense of what was done. 
    
      Since we received the USR grant from Youngstown State University, I am also getting together a list of possible books, equipment, or accessories that will help us in our research.

Sunday, February 23, 2014

Week 2/17/14

This has been such a busy time. We submitted the abstract of our data paper to the MSR conference! The full paper is due on 2/21/14. 

We are looking more into the implementation of LDA.  I am currently figuring out a program called Vowpal Wabbit that will make the implementation much easier. 


I also prepared a 15 minute presentation in beamer on the math behind our research and the progress we have made so far. I  am going to present it at the Pi Mu Epsilon regional conference Saturday 2/22/14. This is the first time that I am presenting our research.  

Week 2/10/14

We submitted our paper to the short research paper session on MSR. Our results showed that we had a 3-6% increase in accuracy compared to state of the art published results. We are very excited to get some feedback on our work and hope for the best! 

We are already pushing ahead to make our model better. We would like to add some additional features.  We are considering implementing the LDA algorithm. Currently the largest data set we have experimented on has been mozilla that initially had 78,236 bugs in it. We would like to run our algorithm on some much larger data sets (At least an order of magnitude larger)! 


We would also like to submit a data paper to the MSR conference. This paper will talk about  specifically how we collected our data and processed it. They have a specific paper section on this topic.