GITHUB PROFILE ANALYZER

Dublin Core

Title

GITHUB PROFILE ANALYZER

Author

ALI MARAT

Abstract

The GitHub Profile Analyzer was developed to assess developers' technical abilities by looking into the structure and meaning of their public repositories. Rather than just focusing on basic metrics like stars or forks, this project uses a combination of semantic embeddings (CodeBERT), structural analysis (ASTMiner), and rule-based heuristics for a more thorough understanding of coding practices.
The analyzer has gone through several updates: CRAv2 set the foundation with Random Forest classifiers, CRAv3 took a semantic-first approach, and CRAv4 introduced a mixed strategy that fuses rules, semantic pattern analysis, and weighted confidence scoring. There was also an experimental Smart Repository Classifier (SRC) that tried to combine rule-based and machine-learning techniques, plus a side project that looked into detecting design patterns using AST embeddings and Code Property Graphs.
The system was trained on 122 repositories and tested on 47 others across seven different categories. The results were impressive: CRAv3 hit an overall accuracy of 26.4%, while CRAv4 shot up to 70%, with big improvements in web applications (+133%) and CLI tools (+45%). Even though Random Forest tests only maxed out at about 35% in some categories, CRAv4's hybrid method turned out to be both more accurate and easier to understand.
Some key hurdles included the lack of comprehensive datasets, issues with ASTMiner's dependencies, and the author's initial unfamiliarity with machine learning. Tackling these challenges taught valuable lessons in building datasets, feature engineering, and evaluating model

In summary, this project highlights the benefits of blending semantic, structural, and rule-based strategies to gauge developer skills. Looking ahead, plans involve expanding datasets, diving into deep learning for design pattern detection, and enhancing the analyzer into a more general code intelligence tool.

Keywords

GitHub analysis, semantic embeddings, ASTMiner, CRAv, machine learning, rule-based classification, design pattern detection

Document Viewer