Suggesting Natural Method Names to Check Name Consistencies

Misleading names of the methods in a project or the APIs in a software library confuse developers about program functionality and API usages, leading to API misuses and defects. In this paper, we introduce MNire, a machine learning approach to check the consistency between the name of a given method and its implementation. MNire first generates a candidate name and compares the current name against it. If the two names are sufficiently similar, we consider the method as consistent. To generate the method name, we draw our ideas and intuition from an empirical study on the nature of method names in a large dataset. Our key findings are that high proportions of the tokens of method names can be found in the three contexts of a given method including its body, the interface (the method's parameter types and return type), and the enclosing class' name. Even when such tokens are not there, MNire uses the contexts to predict the tokens due to the high co-occurrence likelihoods. Our unique idea is to treat the name generation as an abstract summarization on the tokens collected from the names of the program entities in the three above contexts.

We conducted several experiments to evaluate MNire in method name consistency checking and in method name recommending on large datasets with +14M methods. In detecting inconsistency method names, MNire improves the state-of-the-art approach by 10.4% and 11% relatively in recall and precision, respectively. In method name recommendation, MNire improves relatively over the state-of-the-art technique, code2vec, in both recall 18.2% higher and precision (11.1% higher). To assess MNire's usefulness, we used it to detect inconsistent methods and suggest new names in several active, GitHub projects. We made 50 pull requests and received 42 responses. Among them, 5 PRs were merged into the main branch, and 13 were approved for later merging. In total, in 31/42 cases, the developer teams agree that our suggested names are more meaningful than the current names, showing MNire's usefulness.


Exploratory Study

Uniqueness of Method Names

62.9% of the full method names are unique. For a given method, one cannot rely on searching for a good name in the data of the previously seen method names.

78.1% of the tokens in method names can be found in the other previously seen method names.

Method name Token
Mean #occcurrence 4.8 400.3
Median #occcurrence 1 3
#occcurrence = 1 62.9% 21.9%
#occcurrence > 1 37.1% 78.1%

Generic placeholder image
% of tokens in method names found in contexts
Generic placeholder image
The percentages of the methods whose names share certain tokens with the contexts

Common tokens shared between a method name and the contexts

There are high proportions of the tokens of method names which are shared with the three contexts. There are high percentages of the methods whose names share with the names of the entities in the contexts.


The conditional occurrences of tokens in method names on the contexts

When encountering all the tokens of the name of the program entities used in the body of a method, in 35.9% of the cases, we could see a token in the method’s name.

Even the tokens are not found in the contexts, one could use the contexts to predict the tokens in the method names due to those high conditional probabilities.

Each of the contexts could be used to provide the indication of the occurrences of the tokens in the good names more than those in the inconsistent names.

Generic placeholder image
Avg conditional occurrence of tokens in method names on the contexts

Accuracy Comparison

MNire outperformed the state-of-the-art approaches in both consistency checking and name recommending

43.1% of the method names suggested exactly match with the oracle (while only 37.2% of the method names occurring more than once)

13.1% of the generated names that are not previously seen in the training data

Consistency Checking Comparison Results (in %)
Liu et al MNire
IC Precision 56.8 62.7
Recall 84.5 93.6
F-score 67.9 75.1
C Precision 51.4 56.0
Recall 72.2 84.2
F-score 60.0 67.3
Accuracy 60.9 68.9
Name Recommending Comparison Results (in %)
code2vec MNire
Precision 63.1 70.1
Recall 54.4 64.3
F-score 58.4 67.1
Exact Match - 43.1

Study on Accuracy by the Sizes of Methods

MNire works well on the methods with the regular sizes (1-25 LOCs). Even on the longer methods (+25 LOCs), MNire accuracy decreased gracefully with the precision and recall of 47.0% and 41.4% respectively.
Generic placeholder image
MRire's accuracy in different methods' sizes

Context Analysis Evaluation Results

Impact of Contexts on MCC Results (in %)
IMP IMP+INF IMP+ENC IMP+INF+ENC
=MNire
IC Precision 60.2 61.7 61.0 62.7
Recall 90.0 92.1 91.3 93.6
F-score 72.1 73.9 73.1 75.1
C Precision 53.2 55.1 54.1 56.0
Recall 79.3 82.3 80.6 84.2
F-score 63.7 66.0 64.7 67.3
Accuracy 62.1 65.2 64.2 68.9
Impact of Contexts on MNR Results (in %)
IMP IMP+INF IMP+ENC IMP+INF+ENC
=MNire
Precision 49.7 63.2 54.4 66.4
Recall 43.3 57.8 48.9 61.1
F-score 46.3 60.4 51.5 63.6
Exact match 20.2 34.7 25.7 43.1

Sensitivity Results

Accuracy with Different Representations

Impact of Representation on MCC Results (in %)
Lexeme AST Graph MNire
IC Precision 59.0 57.2 55.3 62.7
Recall 88.3 85.6 80.3 93.6
F-score 70.7 68.6 65.5 75.1
C Precision 47.1 46.2 45.8 56.0
Recall 78.2 73.5 72.1 84.2
F-score 58.8 56.8 56.0 67.3
Accuracy 52.0 51.1 50.5 68.9
Impact of Representation on MNR Results (in %)
Lexeme AST Graph MNire
Precision 29.5 23.1 16.2 50.6
Recall 25.1 29.2 30.3 45.1
F-score 27.1 25.9 21.1 47.7
Exact Match 9.1 8.1 4.7 22.1

This result suggests that the naturalness of names is more important to the problem of method name suggestion. While structures and dependencies are important for code execution, to suggest a method name, which is the abstract of entire method, using tokensof the names in the contexts as in MNire yields better performance.

Impact of Contexts' Size and Lengths of Tokens in Contexts on Accuracy

Impact of Context's Size on MNR Results (in %)
1-10 tokens 10-20 tokens 20-30 tokens +30 tokens
F-score 35.9 41.1 43.2 51.0
Impact of Tokens' Length on MNR Results (in %)
0-80% 80-90% 90-95% +95%
F-score 37.0 39.4 42.5 48.5

Impact of Training Data’s Size on Accuracy

Impact of Representation on MCC Results (in %)
1.0M 1.25M 1.5M 1.75M 2.0M
IC Precision 59.6 60.1 60.3 61.5 62.7
Recall 92.2 92.5 93.2 93.4 93.6
F-score 72.4 72.9 73.2 74.2 75.1
C Precision 52.7 53.5 54.4 55.2 56.0
Recall 81.4 81.9 82.6 83.5 84.2
F-score 63.9 64.7 65.6 66.5 67.3
Accuracy 62.6 63.8 65.8 67.3 68.9
Impact of Representation on MNR Results (in %)
1.0K 2.5K 5.0K 7.5K 10.0K
Precision 41.1 56.2 63.1 64.9 66.4
Recall 47.8 53.7 57.6 59.5 61.1
F-score 44.2 54.9 60.2 62.0 63.6
Exact Match 19.9 29.4 34.8 26.9 38.2

Impact of Threshold for consistency checking

Generic placeholder image
Impact of Threshold on MCC results

Datasets

Corpus for Method Name Consistency Checking (Download)
Test data Train data
#Methods 2,700 1,962,872
#Files - 250,972
#Projects - 430
#Unique method names - 540,237
#occurence > 1 - 33.5%
Corpus for Method Name Recommending
Test data Train data Total
Comparison Experiment with code2vec (Download)
#Files 61,641 1,746,272 1,807,913
#Methods 458,800 14,000,028 14,458,828
Experiments for RQ4, RQ5, and RQ6 (Download)
#Projects 450 9,772 10,222
#Files 51,631 1,756,282 1,807,913
#Methods 466,800 13,992,028 14,458,828
Live Study on Real Developers (Download)
#Projects 100 - 100
#Files 18,980 - 18,980
#Methods 139,827 - 139,827