AM Research is a division of Aspiring Minds. Aspiring Minds aspires to build an assessment-driven job marketplace (a SAT/GRE for jobs) to drive accountability in higher education and meritocracy in labor markets. The products developed based on our research has impacted more than two million lives and the resulting data is a source of continuous new research.


A cocktail of assessment, HR, machine learning, data science, education, social impact with two teaspoons of common sense stirred in it.

Data science camp for kids!

It is an open secret that data science is becoming pervasive. What was once the preserve of statisticians and computer scientists – deft at trudging through mountains of data – has found its tools and techniques percolating into every industry and every level. Peer into the crystal ball and you don’t need to suspend reality too much to imagine a future in which a factory manager looks at production data to predict what machine might break-down soon. A cab-operator analyzes his Uber receipts to figure out where he should drive to make the most money. A sales manager looks at what kinds of customers his sales agents are most successful with to ascertain who to deploy where. Decidedly, the future belongs to the data scientist. Where will these data scientists come from? Who is going to train them?

The very nature of the subject eschews traditional learning modes. The data scientist must have the ability to learn quickly the context of the dataData science camp!, build hypotheses, have the ability to use techniques to confirm his suspicions and then construct predictors or automated systems. It marries technology with knowledge; intuition with scientific rigor. Our education systems will be slow to adapt – they will have to devise new methodologies, develop syllabi and learn to simultaneously involve multiple teachers. In the meanwhile, a whole generation of students might graduate who do not have the skills that industry expects from them in a data rich environment.

At Aspiring Minds, we’re passionate about helping students reach their full potential. We plan to pursue a series of initiatives to help advance data science education in India and around the world. As a first step, we held a data science camp for elementary school students! The participants continuously surprised us – with their knowledge, their understanding and even their wit. Two things became clear quickly – a. kids seldom confront open-ended problems and it took some getting used-to the idea of there being no one correct, pre-decided answer and b. with some guidance, they learn astonishingly quickly.

Read more about our exciting and rewarding weekend here!

At the end of the camp, the participating kids blogged about their experiences and the plots/analysis that they came up with. Read about them here.

Our team got enthusiastically involved in mentoring the students through the exercise and ended up learning more about their own teaching styles in the process.

We’ve also put out the exercises and resources we used for the camp for you to replicate it in your school/university/workplace. If the thought of indulging high schoolers in data-science seems absurd to you, snap out of it! It is possible; we tried it and the kids had a fun time picking up these concepts.

Let us know what you thought of our data camp. Please do write to us if you go ahead and try this out with students around you. We’ll eagerly look forward to that!

Samarth Singal
Research Intern, Aspiring Minds
Class of 2017, Computer science, Harvard.

Paper accepts at ICML and KDD!

Some more good news!

Soon after our recent acceptance of our spoken English grading work at ACL, our work on learning models for job selection and personalized feedback gets accepted at the workshop Machine Learning for Education at ICML! Some results from this paper were discussed in one of our previous posts. The tool was built five years ago and has since helped a couple of million students get personalized feedback and aided 200+ companies hire better. I shall also be giving an invited talk at this workshop.

Earlier this month, we also got a paper at KDD accepted, which builds on our previous work in spontaneous speech evaluation. We find how well we can grade spontaneous speech of natives of different countries and also analyze the benefits the industry gets with such an evaluation system.

Busy year ahead it seems – paper presentations at France, Beijing, Australia and finally New Jersey, where we’re organizing the second edition of ASSESS, our annual workshop on data mining for educational assessment and feedback. It’s being organized at ICDM 2015 this winter. July 20th is the submission deadline for the workshop. Here is a list of submissions we saw in our workshop last year, at KDD. Spread the word!

– Varun

What we learn from patterns in test case statistics of student-written computer programs

Test cases evaluate whether a computer program is doing what it’s supposed to do. There are various ways to generate them – automatically based on specifications, say by ensuring code coverage [1] or by subject matter experts (SMEs) who think through conditions based on the problem specification.

We asked ourselves whether there was something we could learn by looking at how student programs responded to test cases. Could this help us design better test cases or find flaws in them? By looking at such responses from a data-driven perspective, we wanted to know whether we could .a. design better test cases .b. understand whether there existed any clusters in the way responses on test cases were obtained and .c.  whether we could discover salient concepts needed to solve a particular programming problem, which would then inform us of the right pedagogical interventions.

A visualization which shows how our questions cluster by the average test case score received on them. More info on this in another blog :)

A visualization which shows how our questions cluster by the average test case score received on them. More on this in another post :)

We built a cool tool which helped us look at statistics on over 2500 test cases spread across over fifty programming problems attempted by nearly 18,000 students and job-seekers in a span of four weeks!

We were also able to visualize how these test cases clustered for each problem, how they correlated with other cases across candidate responses and were also able to see what their item response curves looked like. Here are a couple of things we learnt in this process:

One of our problems required students to print comma-separated prime numbers starting from 2 till a given integer N. When designing test cases for this problem, our SMEs expected there to be certain edge cases (when N was less than 2) and some stress cases (when N was very large) while expecting the remainder of the cases to check the output for random values of N, without expecting them to behave any differently. Or so they thought. :) On clustering the responses obtained on each of the test cases for these problems (0 for failing a case and 1 for passing it), we found two very distinct clusters being formed (see figure below) besides the lone test case which checked for the edge condition. A closer look at some of the source codes helped us realize that values of N which were not prime numbers had to be handled differently – a trailing comma remained at the very end of the list and lots of students were not doing this right!

A dendogram depicting test case clustering for the prime-print problem

A dendogram depicting test case clustering for the prime-print problem

This was interesting! It showed that the problem’s hardness was not only linked to the algorithm of producing prime numbers till a given number, but also linked to the nuance of printing it in a specific form. In spite of students getting the former right, a majority of them did not get the latter right. There are several learnings from this. If the problem designer just wants to assess if students know the algorithm to generate primes till a number, s/he should drop the part to print them in a comma separated list – it adds an uncalled for impurity to the assessment objective. On the other hand, if both these skills are to be tested, our statistics is a way to confirm the existence of these two different skills – getting one right does not mean the other is doable (say, can this help us figure out dominant cognitive skills that are needed in programming?). By separating the test cases to check the trailing comma case and reporting a score on it separately, we could ideally give an assessor granular information on what the code is trying to achieve. Contrast this to when test cases were simply bundled together and it wasn’t clear what aspect the person got right.

More so, when we designed this problem, the assessment objective was to primarily check the algorithm for generating prime numbers. Unfortunately, the cases that did not handle the trailing comma went down on their test case scores in spite of having met our assessment criterion. The good news here was that our machine learning algorithm [2] niftily picked it up and was able to say by the virtue of their semantic features that they were doing the right job!

We also fit 3-PL models from Item Response Theory (more info) on each test case for some of our problems and have some interesting observations there on how we could relate item-parameters to test case design – more on this in a separate post!

Have ideas on how you could make use of such numbers and derive some interesting information? Write to us, or better, join our research group! :)

Kudos to Nishanth for putting together the neat tool to be able to visualize the clusters! Thanks to Ramakant and Bhavya for spotting this issue in their analysis.

– Shashank and Varun

 References -

[1] KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs, Cadar, Cristian, Daniel Dunbar, and Dawson R. Engler. OSDI. Vol. 8. 2008.

[2] A system to grade computer programming skills using machine learning, Srikant, Shashank, and Varun Aggarwal. Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2014.

Work on spoken English grading gets accepted at ACL, AM-R&D going to Beijing!

Good news! Our work on using crowdsourcing and machine learning to grade spontaneous English has been accepted at ACL 2015.

  • Ours is the first semi-automated approach to grade spontaneous speech.
  • We propose a new general technique which sits between completely automated grading techniques and peer grading: We use crowd for doing the tough human intelligence task, derive features from it and use ML to build high quality models.
  • We think, this is the first time anyone used crowdsourcing to get accurate features that are then fed into ML to build great models. Correct us if we are wrong!

Design of our Automated Spontaneous Speech grading system.

Figure 1: Design of our Automated Spontaneous Speech grading system.

The technique helps scale spoken English testing, which means super scale spoken English training!

Great job Vinay and Nishant.

PS: Also check out our KDD paper on programming assessment if you already haven’t.

- Varun

Who is prepared to learn computer programming?

Everyone wants to learn programming – or at least some of us want everyone to learn programming (See code.org). We believe that knowing programming doesn’t only enable to write software, but also teaches us how to think about problems objectively, pushing solution to a problem into a structure; a step by step process. This makes you a better problem solver in life in general, greatly improving your ability to manage things around you. Strangely enough, we found this hypothesis getting challenged in our own research group’s recruitment drives though – we found people ranking very highly on competitive programming websites doing very poorly when asked to think about an open problem, like how would one build a tool to automatically grade the quality of an email. This has led us to believe that knowing programming doesn’t cover everything, there are many other skills to take care of, but nevertheless…

In this blog we want to specifically look at the question of who has the pre-requisites to learn programming in a reasonable amount of time, say 3 months, in an adult education scenario. This is a very important question – for one, software engineering remains a lucrative career across the world given what it pays and the volume of jobs it has to offer. On the other hand, there’s a dearth of skilled people available in the market for these jobs! (See our report based on outcome data here). As a consequence, many companies want to hence hire ‘trainable’ candidates, whom they can train in 3 months with the right skills and deploy on live projects. Besides this, the answer to this question is equally important to students who take courses or MOOCs to make themselves more employable – they would need to know if they would be able to pick the required skills at the end of the course and make good of their investment.

I will share the results we got from one study we did involving 1371 candidates, but this result has now been confirmed multiple times over through various similar studies we’ve done. These candidates, just out of college, were to join a large IT company (150,000 plus people) as software engineers. They were to go through a 3 months training in programming. At the end of the training, the company would put them in three bucket – high, medium and low, the low being asked to leave. We tested all these candidates at the beginning of their training in four IRT based adaptive tests – English, Logical Ability, Quantitative Ability and Computer Programming (more about these here). Could their scores in these skills predict who would be eventually asked to leave?

The answer is yes: we could predict with a fairly good accuracy who was successful after the training. But then, the question that follows is – what skills finally mattered in predicting this information?

First, English and Logical Ability matter. English to understand instructions which are all in English and Logical ability, the basic deductive and inductive ability. But quantitative ability doesn’t matter. See the graph below. The model with Quantitative Ability scores included doesn’t do any better than just using the model with English and Logical scores. Thus we should not be testing and filtering candidates on quantitative ability for programming roles – unfortunately many have been doing this :( ! With a filter on a combination of English/Logical scores, we get a 20-20 type1-type 2 point.

graph_blog3

Figure 1:: Type 1: High/mid performers qualified as low performers. Type 2: Low performers qualified as Mid/High performers. EL: Model using just English and Logical scores. Type 2: Model using English and Logical and Quantitative Ability scores. The ELQ doesn’t add significant incremental value.

When we introduce the computer programming score, we can do a much better prediction. But, what scares institutions is that if they put a filter on programming, very few candidates will qualify the intake metric. This is actually untrue! We verified this empirically: if you use programming score in your model, more candidates from the population qualify the metric, but importantly more of the qualified succeed in training.

But for many this is counter intuitive. Given we have interpretable models, we can actually see why this is happening. Here is the rough qualification criteria in its barebones structure:

  LogicalScore + (1/2)* English score > Sc1

 Logical >Sc2

OR

 (1/2)*English score + Programming Score +  Logical score > Sc3

 Programming score >Sc4

So what does this mean? English is half as important as the others!

But more so, if the candidate doesn’t know programming, he/she needs a high logical ability (constrained by Sc2). On the other hand, if the person has some basic exposure to programming (Sc4 remove the bottom 30% candidates by their score in programming), their logical score can be offset by their programming ability. This means that candidates with higher programming scores can succeed even if they have a lower logical score. If we do not test for programming at all, all these candidates get cut out even if they know some level of programming which will make them succeed.

So the jury is out, a neat result: Ability in the language of instruction and logical ability predicts success in a short duration programming course. Language is half as important as Logical Ability and Quantitative ability is not important at all. If the person knows some programming, his/her level of programming can offset the requirement of Logical Ability and also language skills.

So, want to try to know whether you are trainable for programming? Talk to us! We will make you take AMCAT.

Want to know the details behind this simple neat result, ask us for a tech report! Vinay will be happy to send. :)

Till next time, learn how to code!

- Varun

Tweets

A5 : Replying to @Ester_Matters Technology helps analyze factors that sometimes a human eye can miss out. Its bet… twitter.com/i/web/status/1…

A4 : #AI. With the power of #AI you can assess not only functional skills but emotional intelligence, body language… twitter.com/i/web/status/1…

A3 : Replying to @Ester_Matters Driving #changemanagement from top to bottom. Build a dedicated #HRDigital squad t… twitter.com/i/web/status/1…

A1: #Tracking and #Assessing the potential candidates with the power of #AI and right technology. Also bridging the… twitter.com/i/web/status/1…

Aspiring Minds is partnering with HDFC Bank to launch the Future Bankers Program, in association with Manipal Globa… twitter.com/i/web/status/1…

Its time for Shanghai Recruitment Technology Conference, 2019. Look out for an interesting session on "How AI Power… twitter.com/i/web/status/1…

Codemeet can transform your 360-degree hiring experience with live online tech interviews, integrated code editor,… twitter.com/i/web/status/1…

Recruitment Tech Conference 2019, Shenzhen was a hit. Now its time for the Shanghai conference. Really looking forw… twitter.com/i/web/status/1…

Are you still finding needle in a haystack. Fret no more. Now with Aspiring Mind's CODEMEET platform get… twitter.com/i/web/status/1…

I just uploaded “GMT20190626-092813_Webinar-on_1366x768” to #Vimeo: vimeo.com/344760539

I just uploaded “CodeMeet webinar recording” to #Vimeo: vimeo.com/344757085

Join our webinar today on "Simplify tech hiring with AI enabled coding interviews" and understand how you can attra… twitter.com/i/web/status/1…

What if your efforts to hire the best Tech #talent are just like an endless loop of processes and instructions? Joi… twitter.com/i/web/status/1…

Excited to be a part of @SHRM19 Annual Conference & Exposition. Do drop by our Booth #725 to check our AI-powered t… twitter.com/i/web/status/1…

Catch @varaggarwal and @himanshu0820 at ET Now as they discuss how Aspiring Minds had the first mover's advantage… twitter.com/i/web/status/1…

Have you met #SmartMeet? Transform how you recruit and interview with AI-powered live interviewing platform. Intera… twitter.com/i/web/status/1…

Register now for the Webinar: bit.ly/2WWBMZb and know how #AI enabled coding interview platform can give e… twitter.com/i/web/status/1…

Very excited to meet #HR leaders @ Recruitment Technology #Conference 2019 at Shenzhen, #China. Lots of interesting… twitter.com/i/web/status/1…

When every function is driving the Digital DNA mantra, Know "How #AI in #Assessments" can help you in transforming… twitter.com/i/web/status/1…

Join us at Recruitment Technology Conference 2019 Shenzhen, China as we help you to unlock the #Talent potential wi… twitter.com/i/web/status/1…

Himanshu Aggarwal,CEO & Sushant Dwivedy,SVP,Aspiring Minds conducted an engaging session on latest innovations brou… twitter.com/i/web/status/1…

We are honored to be selected as Most Innovative Technology Partner from India at the prestigious Morgan Stanley CT… twitter.com/i/web/status/1…

Thank you all for such a wonderful response at #FICCI Talent Acquisition Conference 2019. We really had a good time… twitter.com/i/web/status/1…

It's a full house at the @ficci_india Talent Acquisition Conference 2019. An insightful panel discussion along with… twitter.com/i/web/status/1…

Glimpse of stimulating discussion happening right now #FICCI Talent Acquisition Conference. Sushant Dwivedy from… twitter.com/i/web/status/1…

Let the show begin. We are all buckled up for the #FICCI Talent Acquisition Conference today. Do drop by at our ass… twitter.com/i/web/status/1…

From #Assessments to interviews & beyond, we are all excited to talk about the new age #Recruitment solutions tomor… twitter.com/i/web/status/1…

Join Karan Chatrath & Rajeev Ranjan on Wed,12th June 2019 @ 3:00 pm as they demonstrate how #AI/#ML can help you op… twitter.com/i/web/status/1…

Lets discuss & share ideas on a very interesting session, being moderated by Sushant Dwivedy, SVP, Enterprise Clien… twitter.com/i/web/status/1…

Visit us at #FICCI Talent Acquisition Conference 2019 on 30 May, 2019 at New Delhi to experience our AI-powered ass… twitter.com/i/web/status/1…

I just uploaded “Optimize hiring for Call Center / BPOs” to #Vimeo: vimeo.com/338250881

I just uploaded “Aspiring Minds Overview - Optimizing hiring” to #Vimeo: vimeo.com/338249939

Thank you for connecting with #AspiringMinds at #SHRMTech19. It was great to share how we are helping leading compa… twitter.com/i/web/status/1…

I just uploaded “Motor skills video” to #Vimeo: vimeo.com/337974587

I just uploaded “Aspiring Minds' Automated English language test” to #Vimeo: vimeo.com/337949644

I just uploaded “Copy of Autoview - AI powered video interview” to #Vimeo: vimeo.com/337947238

I just uploaded “AUTOMATA_ Revolutionizing Computer Programming assessment using Machine Learning(1)” to #Vimeo: vimeo.com/337751605

Meet our experts at the #SHRM Tech APAC Conference 2019, Booth #85 to get more insights on how you can completely r… twitter.com/i/web/status/1…

Excited to be a part of the #SHRM Tech APAC Conference 2019, Come join us at Booth #85 and understand how you can a… twitter.com/i/web/status/1…

Our workforce is stuck in the complex web of being unemployed, underemployed and unemployable! @varaggarwal talks… twitter.com/i/web/status/1…

Since its 1st edition in 2010, National Employability Report for Engineers by @AspiringMinds has sparked national d… twitter.com/i/web/status/1…

Anirvan Mukherjee, director Human Resources at CGI India talks about the merits of Aspiring Minds' products and why… twitter.com/i/web/status/1…

Aspiring Minds' Advisor, Tarun Khanna talks to @TimesofIndia scientific illiteracy in India and how Aspiring Minds'… twitter.com/i/web/status/1…

Bengaluru chapter of the Digital HR Summit witnessed overwhelming response from the HR fraternity. Lots of insightf… twitter.com/i/web/status/1…

Siddharth Nayak talks about Aspiring Minds' #AI powered Codemeet interface that enables anytime, anywhere, seamless… twitter.com/i/web/status/1…

And it's a full house! @sdwivedy addresses our #HR fraternity as we Kickstart our Digital HR summit #Bangalore.… twitter.com/i/web/status/1…

We are about to start. Meanwhile the Smile App #ChallengeAccepted by the #HR fraternity gracing our event. The #ai-… twitter.com/i/web/status/1…

We are excited to meet the brightest minds of #HR at @AspiringmindsAM Digital HR Summit in Bangalore. Insightful co… twitter.com/i/web/status/1…

8 Days to go! #JoinUs on 24th April for the #AspiringMinds #DigitalHRSummit. Here's our impressive line-up of speak… twitter.com/i/web/status/1…

US institutes are leaders in research. Chinese colleges have made exponential progress while India slugs behind. Ho… twitter.com/i/web/status/1…