The industry today is on a constant look-out for good programmers. In this new age of digital services and products, it’s a premium to possess programming skills. Whenever a friend asks me to refer a good programmer for his company, I tell him – why would I refer to you, I will hire her for my team! But what does having programming skills really mean? What do we look for when we hire programmers? An ability to write functionally correct programs – those that pass test cases? Nah..
A seasoned interviewer would tell you that there is much more to writing code than passing test cases! For starters, we really care for how well a candidate understands the problem and approaches a solution than being able to write functionally correct code. “Did the person get the logic?” is generally the question discussed among interviewers. We’re also interested in seeing whether the candidate’s logic (algorithm) is efficient – with low time and space complexity. Besides this, we also care for how readable and maintainable a candidate makes her code – a very frustrating problem for the industry today is to deal with badly written code that breaks under exceptions and which is not amenable to fixes.
So then why do we all use automated programming assessments in the market which base themselves on just the number of test cases passed? It probably is because it is believed that there’s nothing better. If AI can drive cars automatically these days, can it not grade programs like humans? It can. In our KDD paper in 2014 , we showed that by using machine learning, we could grade programs on all these parameters as well as humans do. The machine’s agreement with human experts was as high as 0.8-0.9 correlation points!
Why is this useful? We looked at a sample of 90,000 candidates who took Automata, our programming evaluation product, in the US. These were seniors graduating with a computer science/engineering degree and were interested in IT jobs. They were scored on four metrics – percent test-cases passed, correctness of logic as detected by our ML algorithm (scored on a scale of 1-5), run-time efficiency (on a scale of 1-3) and best coding practices used (on a scale of 1-4). We find answers to all that an interviewer cares for –
- Is the logic/thought process right?
- Is this going to be an efficient code?
- Will it be maintainable and scalable?
Fig.1. Distribution of the different score metrics. Around 48% of the candidates who scored 4 on the code logic metric (as detected by our ML algorithm) had passed less than 50% of their test suite.
A clever man commits no minor blunders – Von Goethe
Typically, companies look for candidates who get the code nearly right; say, those who pass 80% test-cases in a test-suite. 36% of the seniors made it through such a criteria. In a typical recruiting scenario, the remaining 64% would have been rejected and not considered for further processes. We turned to our machine learning algorithm to see how many of these “left out” had actually written logically correct programs having only silly errors. What do we find? Another good 16% of these were scored 4 or above by our system, which meant they had written codes which had ‘correct control structures and critical data-dependencies but had some silly mistakes’. These folks should have been considered for the remainder of the process! Smart algorithms are able to spot what would be totally missed by test cases but which could have been spotted by human experts.
Sample codes which fail most of their testsuites but are scored high by our ML system (click to enlarge)
We find a lot of these candidates’ codes pass less than 50% test cases (see figure 1). Why would this be happening? We sifted through some examples and found some very interesting ways in which students made errors. For the problem requiring to remove vowels in a string, a candidate had missed what even the best in the industry fall prey to at times – an incorrect usage of ORs and ANDs with the negation operator! Another had implemented the logic well but messed up on the very last line. He lacked the knowledge of converting character arrays back to strings in Java. The lack of such specific knowledge is typical of those who haven’t spent enough time with Java; but this is easy to learn and shouldn’t cost them their performance in an assessment and shouldn’t stop them from differentiating themselves from those who couldn’t think of the algorithm.
Fig.2. Distribution of runtime efficiency scores and program practices scores in perspective of the nature of the attempted problems
The computing scientist’s main challenge is not to get confused by the complexities of his own making – E. W. Dijkstra
We identified those who had written nearly correct code; those who could think of algorithms and put them into a programming language. However, was this just some shabbily written code which no one would want to touch? Further, had they written efficient code or would it have resulted in the all so familiar “still loading” messages we see in our apps and software. We find that roughly 38% folks thought of an efficient algorithm while writing their codes and 30% wrote code which was acceptable to reside in professional software systems. Together, roughly 20% students write programs which are both readable and efficient. Thus, AI tells us that there are 20% programmers here whom we should really talk to and hire! If we are ready to train them and run with them a little, there are various other score cuts one could use to make an informed decision.
In all, AI can not only drive cars, but find programmers who can make a driverless car. Thinking how to find data scientists by using data science? – coming soon.
Interested in using Automata to better understand the programming talent you evaluate? Do you have a different take on this? Tell us by writing to firstname.lastname@example.org
Gursimran, Shashank and Varun
 Srikant, Shashank, and Aggarwal, Varun. “A system to grade computer programming skills using machine learning.” Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2014. Continue reading