I am a Ph.D. Candidate in Computer Science at The Erik Jonsson School of Engineering and Computer Science (ECS), The University of Texas at Dallas (UTD). I have been working in Service-Oriented Computing with Dr. I-Ling Yen starting from Fall 2019. I spent my first 2 years working with Dr. Tien Nguyen in the area of Software Engineering.

Office: ECSS 3.213, UTD

Email: sonnguyen at utdallas.edu | sonnv.ict at gmail.com


Research

Interested topics

  • Service-Oriented Computing
  • Program Analysis
  • Configurable Code Analysis

Main projects

MNire: Detect and Fix Inconsistent Method Names

Misleading method names in a project or the APIs in a software library confuse developers about program functionality and API usages, leading to API misuses and defects. In this project, we introduce MNire, a machine learning approach to check the consistency between the name of a given method and its implementation. MNire first generates a candidate name and compares the current name against it. If the two names are sufficiently similar, we consider the method as consistent. To generate the method name, we draw our ideas and intuition from an empirical study on the nature of method names in a large dataset. Our key findings are that high proportions of the tokens of method names can be found in the three contexts of a given method including its body, the interface (the method's parameter types and return type), and the enclosing class' name. Even when such tokens are not there, MNire uses the contexts to predict the tokens due to the high co-occurrence likelihoods. Our unique idea is to treat the name generation as an abstract summarization on the tokens collected from the names of the program entities in the three contexts.

We conducted several experiments to evaluate MNire in method name consistency checking and in method name recommending on datasets with +14M methods. In detecting inconsistency method names, MNire improves the state-of-the-art approach by 10.4% and 11% relatively in recall and precision, respectively. In method name recommendation, MNire improves relatively over the state-of-the-art technique, code2vec, in both recall (18.2% higher) and precision (11.1% higher). To assess MNire’s usefulness, we used it to detect inconsistent methods and suggest new names in several active, GitHub projects. We made 50 pull requests and received 42 responses. Among them, five PRs were merged into the main branch, and 13 were approved for later merging. In total, in 31/42 cases, the developer teams agree that our suggested names are more meaningful than the current names, showing MNire’s usefulness.

More...

CoPro: Configuration Prioritization for Configurable Code

Unexpected interactions among features induce most bugs in a configurable software system. Exhaustively analyzing all the exponential number of possible configurations is prohibitively costly. Thus, various sampling techniques have been proposed to systematically narrow down the exponential number legal configurations to be tested. Since testing all selected configurations can require a huge amount of effort, fault-based configuration prioritization, that helps detect faults earlier, can yield practical benefits in quality assurance. In this project, we propose CoPro, a novel formulation of feature-interaction bugs via common program entities enabled/disabled by the features. Leveraging from that, we develop an efficient feature-interaction-aware configuration prioritization technique for a configurable system by ranking the configurations according to their total number of potential bugs.

We conducted several experiments to evaluate CoPro on the ability to detect configuration-related bugs in a public benchmark. We found that CoPro outperforms the state-of-the-art configuration prioritization techniques when we add them on advanced sampling algorithms. In 78% of the cases, CoPro ranks the buggy configurations at the top 3 positions. Interestingly, CoPro is able to detect 17 not-yet-discovered feature-interaction bugs.

More...

AutoSC: Automated Code Statement Completion

Automatic code completion helps improve developers’ productivity in their programming tasks. A program contains instructions expressed via code statements, which are considered as the basic units of program execution. In this project, we introduce AutoSC, which combines program analysis and the principle of software naturalness to fill in partially completed statements. AutoSC benefits from the strengths of both directions, in which the completed code statement is both frequent and valid. AutoSC is first trained on a code corpus to learn the templates of candidate statements. Then, it uses program analysis to validate and concretize the templates into syntactically and type-valid candidate statements. Finally, these candidates are ranked by using a language model trained on the lexical form of the source code in the code corpus.

Our empirical evaluation shows that AutoSC achieves 38.9–41.3% top-1 and 48.2-50.1% top-5 accuracy in statement completion and outperforms the state-of-the-art approach from 9X–69X in top-1 accuracy.

More...

CPatMiner: Semantic Code Change Pattern Mining

Prior research exploited the repetitiveness of code changes to enable several tasks such as code completion, bug-fix recommendation, library adaption, etc. These and other novel applications require accurate automated detection of repetitive changes, but the current state-of-the-art is limited to custom-tailored algorithms that detect specific kinds of changes at the syntactic level. Existing algorithms relying on syntactic similarity have lower accuracy, and cannot effectively detect semantic change patterns. In this work, we introduce a novel graph-based mining approach, CPatMiner, to detect previously unknown repetitive changes in the wild, by mining fine-grained semantic code change patterns from a large number of open-source repositories. To overcome unique challenges such as detecting meaningful change patterns and scaling to large repositories, we rely on fine-grained change graphs that capture program dependencies.

We evaluate CPatMiner by mining change patterns in a diverse corpus of 5K+ open-source projects from GitHub across a population of 170K+ developers. We use three complementary methods. First, we sent the mined patterns to 108 open-source developers. We found that 70% of respondents recognized those patterns as their meaningful frequent changes. Moreover, 79% of respondents even named the patterns, and 44% wanted future IDEs to automate such repetitive changes. We found that the mined change patterns belong to various development activities: adaptive (9%), perfective (20%), corrective (35%) and preventive (36%, including all refactorings). Second, we compared CPatMiner with the state-of-the-art, AST-based technique, and reported that CPatMiner detects 37 % more meangingful patterns. Third, we use CPatMiner to search for patterns in a corpus of 88 projects with longer histories consisting of 164M SLOCs. It constructed 322K fine-grained change graphs containing 3M nodes, and detected 17K instances of change patterns from which we provide unique insights on the practice of change patterns among individuals and teams. We found that 75% of the change patterns from individual developers are commonly shared with others, and this holds true for teams. Moreover, we found that the patterns are not intermittent but spread widely over time. Thus, we call for a community-based change pattern database to provide important resources in novel applications.

More...

Publications

  1. [ICSE'20] Son Nguyen, Hung Phan, Trinh Le and Tien Nguyen, "Suggesting Natural Method Names to Check Name Consistencies", in Proceedings of the 42th ACM/IEEE International Conference on Software Engineering (ACM/IEEE ICSE 2020). IEEE CS Press 2020. (PDF | Slides | ACM Digital Library)
  2. [ASE'19] Son Nguyen, Hoan Anh Nguyen, Ngoc Tran, Hieu Tran, and Tien N. Nguyen, "Feature-Interaction Aware Configuration Prioritization for Configurable Code", in the 34th IEEE/ACM International Conference on Automated Software Engineering Conference (ASE 2019), November 11 - 15, 2019. IEEE CS Press 2019. (PDF | Slides | IEEE Digital Library)
  3. [ASE'19] Son Nguyen, Tien N. Nguyen, Yi Li, and Shaohua Wang, "Combining Program Analysis and Statistical Language Model for Code Statement Completion", in the 34th IEEE/ACM International Conference on Automated Software Engineering Conference (ASE 2019), November 11 - 15, 2019. IEEE CS Press 2019. (PDF | Slides | IEEE Digital Library)
  4. [ICSE'19] Son Nguyen, "Configuration-Dependent Fault Localization", in Proceedings of the 41th ACM/IEEE International Conference on Software Engineering (ACM/IEEE ICSE 2019), May 25 - 31, 2019. IEEE CS Press 2019. (PDF | ACM Digital Library)
  5. [ICSE'19] Hieu Tran, Ngoc Tran, Son Nguyen, Hoan Nguyen and Tien N.Nguyen, "Recovering Variable Names for Minified Code with Usage Contexts", in Proceedings of the 41th ACM/IEEE International Conference on Software Engineering (ACM/IEEE ICSE 2019), May 25 - 31, 2019. IEEE CS Press 2019. (PDF | Slides | ACM Digital Library)
  6. [ICSE'19] Hoan Nguyen, Tien N. Nguyen, Danny Dig, Son Nguyen, Hieu Tran and Michael Hilton, "Graph-based Mining of In-the-Wild, Fine-grained, Semantic Code Change Patterns", in Proceedings of the 41th ACM/IEEE International Conference on Software Engineering (ACM/IEEE ICSE 2019), May 25 - 31, 2019. IEEE CS Press 2019. (PDF | Slides | ACM Digital Library)
  7. [OOPSLA'19] Yi Li, Shaohua Wang, Tien N. Nguyen, and Son Van Nguyen, "Improving Bug Detection via Context-Based Code Representation Learning and Attention-Based Neural Networks", in Proceedings of the 2019 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2019), October 20 - 25, 2019. ACM CS Press 2019. (PDF | ACM Digital Library)
  8. [ASE'19] Yi Li, Shaohua Wang, Tien N. Nguyen, Son Van Nguyen, Xinyue Ye, Yan Wang, "An Empirical Study on the Characteristics of Question-Answering Process on Developer Forums", in the 34th IEEE/ACM International Conference on Automated Software Engineering Conference (ASE 2019), November 11 - 15, 2019. IEEE CS Press 2019. (PDF)
  9. [ICPC'19] Ngoc Tran, Hieu Tran, Son Nguyen, Hoan Nguyen and Tien Nguyen, "Does BLEU Score Work for Code Migration?", in Proceedings of the 27th IEEE International Conference on Program Comprehension (IEEE ICPC 2019), May 25 - 31, 2019. IEEE CS Press 2019. (PDF | Slides | ACM Digital Library)
  10. [ESEC/FSE'18] Son Nguyen, "Feature-interaction aware configuration prioritization", in Proceedings of the 26th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ACM ESEC/FSE 2018), November 4-9, 2018. ACM Press, 2018. (PDF | ACM Digital Library)
  11. [SoICT'15] Son Van Nguyen, Hieu Dinh Vo, and Pham Ngoc Hung, "A Correlation-Aware Negotiation Approach for Service Composition", in Proceedings of the Sixth International Symposium on Information and Communication Technology (ACM SoICT 2015), December 2015. ACM Press, 2015. (ACM Digital Library)

Software

  • This is our R&D product for spelling error checking and plagiarism detection especially applied for academic Vietnamese documents such as essays, theses or dissertations. We have developed the techniques that are specialized for Vietnamese in both spelling error checking and plagiarism detection. Especially, to detect plagiarism, we have collected and processed a large-scale dataset of +3M high-quality web pages and +20K published academic documents in many different areas.
    Since 2017, the system has served +20K regular users including students, instructors and +15 universities and organizations, and helped them manage and improve the quality of their documents.
  • Change impact analysis for Java EE applications
    In this project, we develop a novel tool for change impact analysis of Java EE applications named JCIA. In practice, because of the complexity of Java EE applications, analyzing the source code of these applications is a great challenge. Moreover, Java EE applications are frequently developed using different frameworks such as CDI and JSF in not only native Java but also different languages like XML and JSP. This project funded by Mitani Sangyo Co., Japan.

Awards

  • Winner at the Student Research Competition - ESEC/FSE 2018
    Awarded at the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), sponsored by Microsoft
  • Winner in Vietnam Talent Awards 2017 (Nhân tài đất Việt)
    Awarded by Vietnam Posts and Telecommunications Group and the Ministry of Science and Technology, Vietnam to seek and honor talents in the fields of IT, sciences and technologies, environment, and medicine.
  • Winner Student Scientific Research Contest 2015
    Awarded by University of Engineering and Technology, Vietnam National University, Hanoi