AM Research is a division of Aspiring Minds. Aspiring Minds aspires to build an assessment-driven job marketplace (a SAT/GRE for jobs) to drive accountability in higher education and meritocracy in labor markets. The products developed based on our research has impacted more than two million lives and the resulting data is a source of continuous new research.


A cocktail of assessment, HR, machine learning, data science, education, social impact with two teaspoons of common sense stirred in it.

What we learn from patterns in test case statistics of student-written computer programs

Test cases evaluate whether a computer program is doing what it’s supposed to do. There are various ways to generate them – automatically based on specifications, say by ensuring code coverage [1] or by subject matter experts (SMEs) who think through conditions based on the problem specification.

We asked ourselves whether there was something we could learn by looking at how student programs responded to test cases. Could this help us design better test cases or find flaws in them? By looking at such responses from a data-driven perspective, we wanted to know whether we could .a. design better test cases .b. understand whether there existed any clusters in the way responses on test cases were obtained and .c.  whether we could discover salient concepts needed to solve a particular programming problem, which would then inform us of the right pedagogical interventions.

A visualization which shows how our questions cluster by the average test case score received on them. More info on this in another blog :)

A visualization which shows how our questions cluster by the average test case score received on them. More on this in another post :)

We built a cool tool which helped us look at statistics on over 2500 test cases spread across over fifty programming problems attempted by nearly 18,000 students and job-seekers in a span of four weeks!

We were also able to visualize how these test cases clustered for each problem, how they correlated with other cases across candidate responses and were also able to see what their item response curves looked like. Here are a couple of things we learnt in this process:

One of our problems required students to print comma-separated prime numbers starting from 2 till a given integer N. When designing test cases for this problem, our SMEs expected there to be certain edge cases (when N was less than 2) and some stress cases (when N was very large) while expecting the remainder of the cases to check the output for random values of N, without expecting them to behave any differently. Or so they thought. :) On clustering the responses obtained on each of the test cases for these problems (0 for failing a case and 1 for passing it), we found two very distinct clusters being formed (see figure below) besides the lone test case which checked for the edge condition. A closer look at some of the source codes helped us realize that values of N which were not prime numbers had to be handled differently – a trailing comma remained at the very end of the list and lots of students were not doing this right!

A dendogram depicting test case clustering for the prime-print problem

A dendogram depicting test case clustering for the prime-print problem

This was interesting! It showed that the problem’s hardness was not only linked to the algorithm of producing prime numbers till a given number, but also linked to the nuance of printing it in a specific form. In spite of students getting the former right, a majority of them did not get the latter right. There are several learnings from this. If the problem designer just wants to assess if students know the algorithm to generate primes till a number, s/he should drop the part to print them in a comma separated list – it adds an uncalled for impurity to the assessment objective. On the other hand, if both these skills are to be tested, our statistics is a way to confirm the existence of these two different skills – getting one right does not mean the other is doable (say, can this help us figure out dominant cognitive skills that are needed in programming?). By separating the test cases to check the trailing comma case and reporting a score on it separately, we could ideally give an assessor granular information on what the code is trying to achieve. Contrast this to when test cases were simply bundled together and it wasn’t clear what aspect the person got right.

More so, when we designed this problem, the assessment objective was to primarily check the algorithm for generating prime numbers. Unfortunately, the cases that did not handle the trailing comma went down on their test case scores in spite of having met our assessment criterion. The good news here was that our machine learning algorithm [2] niftily picked it up and was able to say by the virtue of their semantic features that they were doing the right job!

We also fit 3-PL models from Item Response Theory (more info) on each test case for some of our problems and have some interesting observations there on how we could relate item-parameters to test case design – more on this in a separate post!

Have ideas on how you could make use of such numbers and derive some interesting information? Write to us, or better, join our research group! :)

Kudos to Nishanth for putting together the neat tool to be able to visualize the clusters! Thanks to Ramakant and Bhavya for spotting this issue in their analysis.

– Shashank and Varun

 References -

[1] KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs, Cadar, Cristian, Daniel Dunbar, and Dawson R. Engler. OSDI. Vol. 8. 2008.

[2] A system to grade computer programming skills using machine learning, Srikant, Shashank, and Varun Aggarwal. Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2014.

Work on spoken English grading gets accepted at ACL, AM-R&D going to Beijing!

Good news! Our work on using crowdsourcing and machine learning to grade spontaneous English has been accepted at ACL 2015.

  • Ours is the first semi-automated approach to grade spontaneous speech.
  • We propose a new general technique which sits between completely automated grading techniques and peer grading: We use crowd for doing the tough human intelligence task, derive features from it and use ML to build high quality models.
  • We think, this is the first time anyone used crowdsourcing to get accurate features that are then fed into ML to build great models. Correct us if we are wrong!

Design of our Automated Spontaneous Speech grading system.

Figure 1: Design of our Automated Spontaneous Speech grading system.

The technique helps scale spoken English testing, which means super scale spoken English training!

Great job Vinay and Nishant.

PS: Also check out our KDD paper on programming assessment if you already haven’t.

- Varun

Who is prepared to learn computer programming?

Everyone wants to learn programming – or at least some of us want everyone to learn programming (See code.org). We believe that knowing programming doesn’t only enable to write software, but also teaches us how to think about problems objectively, pushing solution to a problem into a structure; a step by step process. This makes you a better problem solver in life in general, greatly improving your ability to manage things around you. Strangely enough, we found this hypothesis getting challenged in our own research group’s recruitment drives though – we found people ranking very highly on competitive programming websites doing very poorly when asked to think about an open problem, like how would one build a tool to automatically grade the quality of an email. This has led us to believe that knowing programming doesn’t cover everything, there are many other skills to take care of, but nevertheless…

In this blog we want to specifically look at the question of who has the pre-requisites to learn programming in a reasonable amount of time, say 3 months, in an adult education scenario. This is a very important question – for one, software engineering remains a lucrative career across the world given what it pays and the volume of jobs it has to offer. On the other hand, there’s a dearth of skilled people available in the market for these jobs! (See our report based on outcome data here). As a consequence, many companies want to hence hire ‘trainable’ candidates, whom they can train in 3 months with the right skills and deploy on live projects. Besides this, the answer to this question is equally important to students who take courses or MOOCs to make themselves more employable – they would need to know if they would be able to pick the required skills at the end of the course and make good of their investment.

I will share the results we got from one study we did involving 1371 candidates, but this result has now been confirmed multiple times over through various similar studies we’ve done. These candidates, just out of college, were to join a large IT company (150,000 plus people) as software engineers. They were to go through a 3 months training in programming. At the end of the training, the company would put them in three bucket – high, medium and low, the low being asked to leave. We tested all these candidates at the beginning of their training in four IRT based adaptive tests – English, Logical Ability, Quantitative Ability and Computer Programming (more about these here). Could their scores in these skills predict who would be eventually asked to leave?

The answer is yes: we could predict with a fairly good accuracy who was successful after the training. But then, the question that follows is – what skills finally mattered in predicting this information?

First, English and Logical Ability matter. English to understand instructions which are all in English and Logical ability, the basic deductive and inductive ability. But quantitative ability doesn’t matter. See the graph below. The model with Quantitative Ability scores included doesn’t do any better than just using the model with English and Logical scores. Thus we should not be testing and filtering candidates on quantitative ability for programming roles – unfortunately many have been doing this :( ! With a filter on a combination of English/Logical scores, we get a 20-20 type1-type 2 point.

graph_blog3

Figure 1:: Type 1: High/mid performers qualified as low performers. Type 2: Low performers qualified as Mid/High performers. EL: Model using just English and Logical scores. Type 2: Model using English and Logical and Quantitative Ability scores. The ELQ doesn’t add significant incremental value.

When we introduce the computer programming score, we can do a much better prediction. But, what scares institutions is that if they put a filter on programming, very few candidates will qualify the intake metric. This is actually untrue! We verified this empirically: if you use programming score in your model, more candidates from the population qualify the metric, but importantly more of the qualified succeed in training.

But for many this is counter intuitive. Given we have interpretable models, we can actually see why this is happening. Here is the rough qualification criteria in its barebones structure:

  LogicalScore + (1/2)* English score > Sc1

 Logical >Sc2

OR

 (1/2)*English score + Programming Score +  Logical score > Sc3

 Programming score >Sc4

So what does this mean? English is half as important as the others!

But more so, if the candidate doesn’t know programming, he/she needs a high logical ability (constrained by Sc2). On the other hand, if the person has some basic exposure to programming (Sc4 remove the bottom 30% candidates by their score in programming), their logical score can be offset by their programming ability. This means that candidates with higher programming scores can succeed even if they have a lower logical score. If we do not test for programming at all, all these candidates get cut out even if they know some level of programming which will make them succeed.

So the jury is out, a neat result: Ability in the language of instruction and logical ability predicts success in a short duration programming course. Language is half as important as Logical Ability and Quantitative ability is not important at all. If the person knows some programming, his/her level of programming can offset the requirement of Logical Ability and also language skills.

So, want to try to know whether you are trainable for programming? Talk to us! We will make you take AMCAT.

Want to know the details behind this simple neat result, ask us for a tech report! Vinay will be happy to send. :)

Till next time, learn how to code!

- Varun

The tree of program difficulty

What makes a programming problem hard?

Why are some programming problems solved by more number of students while others are not. The varying numbers we saw got us thinking on how the human brain responds to programming problems. This was also an important question for us to have an answer for when we designed an assessment or wanted guidance on pedagogy. Understanding what makes a programming problem hard would enable us to put questions into a programming assessment of a given difficulty where neither everyone would get a perfect score nor a zero and would also help us in creating equivalent testforms for the test.

We tried taking a jab at it by answering it empirically. We marked 23 programming problems on four different parameters on a 2 or 3 point scale — how hard is the data structure used in the implementation, what data structure is being returned from the target function, how hard it is to conceive the algorithm, the implementation of the algorithm and how hard is it to handle edge cases for the given problem. [See the attached PDF for more details on the metrics and the rubric followed]. There was some nuance involved in choosing these metrics – for instance, the algorithm to a problem could be hard to conceive if, say, it requires thinking through a dynamic programming approach, but its implementation can be fairly easy, involving a couple of loops. On the other hand, the algorithm to sort and then to merge a bunch of arrays can be simple in themselves but implementing such a requirement could be a hassle.

For these problems, we had responses from some 8000 CS undergraduates each. Each problem was delivered to a test-taker in a randomized testform. From this we pulled out how many people were able to write compilable code (this was as low as 3.6% :( to as high as 74% for different problems) and how many got all test cases right. We wanted to see how well we could predict this using our expert-driven difficulty metrics (our difficulties are relative and can change based on sample; for an absolute analysis we could have predicted the IRT parameters of the question — wanna try?)

So, what came out? Yes! we can predict. Here is the base correlations matrix. They are negative because a harder problem has a lower correct rate.

Correlations Data Structure Algorithm Implementation Edge-logic
Percent-pass all test cases -0.25 -0.42 -0.43 -0.05

program-difficulty-treeWe tried a first cut analysis on our data by building a regression tree with some simple cross-validation. We got a really cool, intuitive tree and a prediction accuracy of 0.81! This is our ‘Tree of Program Difficulty’ ;-) . So what do we learn?

The primary metric in predicting whether a good percentage of people are able to solve a problem right is the algorithmic difficulty. Problems for which the algorithm is easy to deduce (<1.5) immediately witness a high pass rate whereas those for which it is hard (>2.5) witness a very poor pass rate. For those that’re moderately hard algorithmically (between 1 and 2.5), the next criterion deciding the pass percentage is the difficulty in implementing the algorithm. If it’s easy to implement (<2), we see a high pass rate being predicted. For those that're moderately hard in implementation and algorithm, the difficulty of the data structures used in the problem then predicts the pass rate. If an advanced data structure is used, the rate falls to less than 6% and is around a moderate 11% otherwise.

So, what nodes do your problems fall on? Does it match our result? Tell us!

Thanks Ramakant for the nifty work with data!

-Shashank and Varun

March 2015

A re-beginning : Welcome to AM Research!

We finally have a place to feature the work which we began five years ago. Great effort, Tarun, to get this up and running.

We thought this was important since education technology and assessments are going through a revolution. We wish to add our two teaspoons of wisdom (did I actually say that!) to the ongoing battle against the conventional non-scalable and unscientific ways of training, assessing and skill matching. We look forward to making this as a means to collaborate with academics, the industry and anyone who feels positively about education technology.

Sector/Roles Employability(%)
BUSINESS FUNCTIONS
Sales and Business Development 15.88
Operations/Customer Service 14.23
Clerical/Secretarial Roles 35.95
ANALYTICS AND COMMUNICATION
Analyst 3.03
Corporate Communication/Content Development 2.20
IT AND ITeS INDUSTRY
IT Services 12.97
ITes and BPO 21.37
IT Operations 15.66
ACCOUNTING ROLES
Accounting 2.55
TEACHING
Teaching 15.23

Table 1: By using standardized assessments of job suitability, in a study of 60,000 Indian undergraduates, we find that a strikingly low proportion of them have skills required for the industry. All these students got detailed feedback from us to improve. The table shows the percentage of students that have the required skills for different jobs. (Refer: National Employability Report for Graduates, under Reports in Publications)

We think assessments will be the key to democratize learning and employment opportunity: it provides a benchmark for measuring success of training interventions, provides feedback to learners creating a ‘dialogue’ in the learning process and most importantly, helps link learning to tangible outcomes in terms of jobs and otherwise.

Let me state it simply: To scale learning and make employment markets meritocratic, we need to scale automated assessments. This is the space we dabble in!

If you are thirsty for data, refer to the table and figure in this post. It tells the story of the problem we are up against and trying to solve.

Figure 1: 2500 undergraduates were surveyed to find their employment outcomes one year after they got their undergraduate education. We categorized their colleges in three categories (tier 1-3) based on their overall performance in AMCAT, our employability test. We find that a candidate in a tier 3 college has 24% lower odds of getting a job and 26% lower salary when he/she has the same merit (AMCAT scores) as a tier 1 students. Similarly, a 1 point drop in college GPA (on a 10 pt scale) decreases job odds by 16% and salary by 9%. Neither of these two parameters are useful predictors of job success beyond AMCAT scores. This shows a clear bias in the employment ecosystem. (Refer ‘Who gets a job’ under Reports in Publications)

How do we solve it? Stay tuned to our subsequent job posts…

Varun

Tweets

Aspiring Minds is proud to be at the '9th Human Resource Study Day' in Shanghai | 25th May'18. Our Co-founder & CTO… twitter.com/i/web/status/9…

"‘India needs private initiative in creating great research universities like MIT and Stanford" @varaggarwal, our C… twitter.com/i/web/status/9…

Aspiring Minds' success story covered in Doordarshan's popular TV show, "Aap ki Baat". youtube.com/watch?v=78JZxf…twitter.com/i/web/status/9…

Our co-founder and CEO @himanshu0820 speaks on #AI powered #recruitment and future of #hiring at @asugsvsummittwitter.com/i/web/status/9…

Our Co-founder & CEO, Himanshu Aggarwal will be speaking today @asugsvsummit. Definitely not to be missed!… twitter.com/i/web/status/9…

Aspiring Minds is proud to be a #ASUGSV2018 'Participating Company'. Our Co-founder & CEO, @himanshu0820, will be… twitter.com/i/web/status/9…

"Humans need to learn to work with machines as machines have learnt to work with humans" ~@varaggarwal, Co-founder… twitter.com/i/web/status/9…

Humans need to learn to work with machines as machines have learnt from humans : @varaggarwal talks on Transforming… twitter.com/i/web/status/9…

Wipro's top-gear learning environment facilitates #upskilling for @Wipro employees ~ @SunilKalachar, Global Head, W… twitter.com/i/web/status/9…

#Technology is an enabler in redefining the #candidateexperience and reducing the supply demand gap in #recruitmenttwitter.com/i/web/status/9…

"Evaluating #Programming ability in freshers is now a must" ~ @SunilKalachar, Global Head, Wipro Campus Hiring Team @Wipro #DIGIHRSUMMIT

"Companies changing #hiring strategy is a result of evolving #IT landscape. Necessity of #automation in #HR process… twitter.com/i/web/status/9…

"Paradigm shift in expectation from millennials - To be on a constant learning curve. Evolve as per constantly evol… twitter.com/i/web/status/9…

#Skills are becoming obsolete and it's important to embrace new age skills ~@sdwivedy #futureofworktwitter.com/i/web/status/9…

A glimpse of spellbound audience all ears to hear what lies ahead in #HRTech #skills are evolving constantly and… twitter.com/i/web/status/9…

And we begin...#automation powered by AI and #robotstransforminh #HR in the IT/ITeS industry | The Future of… twitter.com/i/web/status/9…

The stage is set! Welcome to Aspiring Minds #future of #digital #HR summit #futureofwork #DIGIHRSUMMIT pic.twitter.com/BiufEoj6M3

Twitter Media

The future of Digital HR is here: Stage all set for our Bangalore Summit #futureofwork #HR #Hiring #IT #AItwitter.com/i/web/status/9…

Here's our impressive panel of speakers for #DIGIHRSUMMIT. We are up for some very interesting conversations tomorr… twitter.com/i/web/status/9…

Hear Preeti Das, Executive Vice President, Sutherland Global Services will be speaking on AI led HR transformations… twitter.com/i/web/status/9…

As digital transformation sets in, how will companies remodel their business strategies and redefine the future of… twitter.com/i/web/status/9…

Aspiring Minds’ English language assessments r powered by #MachineLearning & #speech recognition technology to scie… twitter.com/i/web/status/9…

With #AI and #automation rewriting traditional #recruitment mechanics, what is in store for #HR in 2018? Join us fo… twitter.com/i/web/status/9…

Launched today!! Our Co-Founder & CTO, @varaggarwal's latest book "IndiaNext". It shares compelling facts about Ind… twitter.com/i/web/status/9…

#CustomerService and #Accounting jobs have the highest #AUTOMATION potential in 2018. General management &… twitter.com/i/web/status/9…

Inductive and deductive #reasoning will emerge as the most sought-after #skills in 2018! Aspiring Minds report on F… twitter.com/i/web/status/9…

In this era of #AUTOMATION powered by #ArtificialIntelligence and #bots, which #Jobs will survive in 2018? Read Asp… twitter.com/i/web/status/9…

Aspiring Minds revelation of "80% #Engineering students found unemployable" continues to raise doubts on quality of… twitter.com/i/web/status/9…

Right #aptitude and right #attitude are two critical parameters for success. While #machines can have the former, t… twitter.com/i/web/status/9…

Top industry stalwarts | 1 stage | Great Conversations|Launch of Aspiring Minds Autoview - #ArtificialIntelligencetwitter.com/i/web/status/9…

#Autoview is The Future of #Digital #HR. Measures for multiple #skills across myriad #job roles. Generates suitabi… twitter.com/i/web/status/9…

Launching the latest invention from AM Research Labs #Autoview - #ArtificialIntelligence Intelligence and… twitter.com/i/web/status/9…

It's difficult to find the right people to #interview candidates ~ Johnson Varkey, VP Industry Vertical, Aspiring M… twitter.com/i/web/status/9…

Time for Q&A session between audience and experts ~ Will leadership competency grow narrow globally over a period o… twitter.com/i/web/status/9…

Ability to handle #data require substantial effort. @Joydeeproy on #ArtificialIntelligence led transformation in… twitter.com/i/web/status/9…

"Industry is now scouting for #talent that's high in quality and is experienced. #innovation in workforce is key"… twitter.com/i/web/status/9…

What was relevant yesterday is not relevant today and what is relevant today will not be relevant tomorrow ~ Pralay… twitter.com/i/web/status/9…

#MachineLearning algorithms and AI as means of #skill #evaluation leads to better matching between workers and jobs! #DIGIHRSUMMIT

A glimpse of spellbound audience all ears to hear what lies ahead in #HRTech #skills are evolving constantly and… twitter.com/i/web/status/9…

#automation and its impact on #Jobs. Which #skills will get redundant and what is going to rule? Manuj Sethi on the… twitter.com/i/web/status/9…

#technology driven #disruption in #HR is no longer a premise in the offing. Manuj Sethi, VP, Client Engagement… twitter.com/i/web/status/9…

Manuj Sethi, VP Aspiring Minds talks on AM's contribution is evaluating job competencies and helping in #Hiring the… twitter.com/i/web/status/9…

Registrations galore at Aspiring Minds product launch event. Ambience is abuzz with great talks and networking! Sta… twitter.com/i/web/status/9…

The stage is set! Welcome to Aspiring Minds product launch event. The Future of #Digital #HR - Top #tech for Top Ta… twitter.com/i/web/status/9…

Hear @Joydeeproy , FS Advisory Partner at @PwC_IN speak on The #Future of #Business and #Technology led transforma… twitter.com/i/web/status/9…

Hear Pralay Mondal, Senior Group President at @YESBANK speak on The #future of #Businesses and #technology led tran… twitter.com/i/web/status/9…

We are back! We thank everyone for your extended support and eagerly look forward to meeting you. 13 December 2017… twitter.com/i/web/status/9…

With deep regret, we inform you that due to #MumbaiRains & cyclone warning issued by Govt, we have postponed the As… twitter.com/i/web/status/9…

The stage is set! We eagerly look forward to meeting you today at the @AspiringMindsAM product launch event.… twitter.com/i/web/status/9…

Start early! #placement preparation should start from 1st year of college and not in the #final year! @varaggarwaltwitter.com/i/web/status/9…