AM Research is a division of Aspiring Minds. Aspiring Minds aspires to build an assessment-driven job marketplace (a SAT/GRE for jobs) to drive accountability in higher education and meritocracy in labor markets. The products developed based on our research has impacted more than two million lives and the resulting data is a source of continuous new research.


A cocktail of assessment, HR, machine learning, data science, education, social impact with two teaspoons of common sense stirred in it.

The first interactive US Skill Demand Map- A big data approach

Jobseekers wish to know what skills are required by the industry in their region and also, what skills pay the most. So do institutions of higher and vocational education. Unfortunately, there is no information about this. It is considered hard to collate such information and the old school way of running surveys with corporations is time-consuming, expensive and mired by subjectivity.

We went after this problem the big data way – we scrapped some 4 million job openings from the web for the US, automatically matched them to our taxonomy of 1064 job roles and the 200+ skills required for these job roles. What did we get out of this? The US Skill Demand Map – For each state in the US, we know what percent of open jobs require a given skill and how much does a skill pay. For instance, see the Heat Map below — it shows how much does the software engineering skill pays in different US states.  All this is generated automatically and be updated in minutes every month based on the current open jobs in the market!

 Figure 1: Compensation for software engineering skill

Figure 1: Compensation for software engineering skill

This map is interactive. A jobseeker can enter his key skill to find which states demand it the most and which states pay for it the most. Additionally, s/he can scroll across the map to find the demand/compensation in each state for a given skill. On the other hand, the candidate can enter a state and find out top paying and high-demand skills in the state. Try it now!

Such analysis also helps us uncover policy trends (See our report). We found that agreeableness and finger dexterity are the most in demand skills after Information Gathering and Synthesis, which has the highest demand. One may see in the map below the states which have more percent of jobs requiring agreeableness and those where finger dexterity is required more often.

 

Figure 2: Skills in highest demand in each U.S. state (other than Information Gathering & Synthesis)

Figure 2: Skills in highest demand in each U.S. state (other than Information Gathering & Synthesis)

On the other hand, we can find the states which have the most demand and pay the most for say, analytical skills. New York pays the most for the skill, whereas the highest percent of jobs in Virginia need analytical skills. (See Figure 3)

Figure 3: Heat maps for demand and compensation for analytical skills

Figure 3: Heat maps for demand and compensation for analytical skills

The U.S. Skill Demand Map fills a major information gap in the labor market. To our knowledge, this is the first effort to objectively present the demand for skills across US states to aid better decision-making by job seekers. It is based on objective data, it is quick, accurate and user-friendly.

Trying to understand what skill to gain or how best to utilize your skills? Use our interactive map now!

-Varun

Scaling up machine learning to grade computer programs for 1000s of questions in multiple languages

Machine learning has helped solved many grading challenges – spoken english, essay grading, program grading and math problem grading to cite a few examples. However, there is a big impedance in using these methods in real world settings. This is because one needs to build an ML model for every question/prompt – for instance, in essay grading, a different model designed to grade an essay on ‘Socialism’ will be very different from one which can grade essays on ‘Theatre’. These models require a large number of expert rated samples and a fresh model building exercise each time. A real-world practical assessment works on 100s of questions which then translates to requiring 100s of graders and 100s of models. The approach doesn’t yield to be scalable, takes too much time and most of the times, is impractical.

In our KDD paper accepted today, we solve this challenge quite a bit for grading computer programs. In KDD 2014, we had presented the first machine learning approach to grade computer programs, but we had to build a model per problem. We have now invented a technique where we need no expert graded samples for a new problem and we don’t need to build any new models! As soon as we have around a few tens of ‘good’ codes for a problem (automatically identified using test case coverage and static analysis), our newly invented question-agnostic models automatically take charge. How will this help us? With this technology, our machine learning based models can scale, in an automated way, to grade 1000s of questions in multiple languages in a really short span of time. Within a couple of weeks of a new question being introduced into our question pool, the machine learning evaluation kicks in.

There were couple of innovations which led to this work, a semi-supervised approach to model building:

  • We can identify a subset of the ‘good’ set automatically. In the case of programs, the ‘good set’, codes which get a high grade, can be identified automatically using test cases. We exploit this to find other programs similar to these in a feature space that we define. To get a sense of this, think of a distance measure from programs identified as part of the ‘good set’. Such a ‘nearness’ feature would then correlate with grades across questions irrespective of whether it is a binary search problem or a tree traversal problem. Such features help us build generic models across questions.

  • We design a number of such features which are invariant to the question and correlate to the expert grade. These features are inspired by the grammar we proposed in our earlier work. For instance, one feature is how different is an unseen program from the set of keywords present in the ‘good set’; while another is the difference in the programs in the kind of computations they are doing. Using such features, we learn generic models for a set of problems using supervised learning. These generic models work super well for any new problem as soon as we get our set of good codes!

Check out this illustrative and easy-to-grasp video which demonstrates our latest innovation.

 

The table presents a snapshot of the results presented in the paper. As shown in the last two columns, the ‘question-independent’ machine learning model (ML Model) constantly outperforms the test suite based baseline (Baseline). The claim of ‘question-independence’ is corroborated by similar and encouraging results (depicted in last three rows) obtained on totally unseen questions, which were not used to train the model.

Metric
Question Set
#Questions
ML Model
Baseline
Correl
All questions
19
0.80
0.65
Bias
All questions
19
0.24
0.35
MAE
All questions
19
0.57
0.85
Correl
Unseen questions only
11
0.81
0.65
Bias
Unseen questions only
11
0.27
0.31
MAE
Unseen questions only
11
0.59
0.84

What does this all mean?

  • We can really scale ML based grading of computer programs. We can continue to add new problems and the models will automatically start working within a couple of weeks.
  • These set of innovations apply to a number of other problems where we can automatically identify a good set. For instance, in circuit solving problems, the ones with the correct final answer could be considered a good set; this can similarly be applied to mathematics problems or an automata design problem; problems where computer science techniques are mature to verify functional correctness of a solution. Machine learning can automatically then help grade other unseen responses using this information.

Hoping to see more and more ML applied to grading!

Varun

Work done with Gursimran Singh and Shashank Srikant

An Automated Test of Motor Skills for Job Prediction and Feedback

We’re pleased to announce that our recent work on designing automated assessments to test motor skills (skills like finger dexterity and wrist dexterity) has been accepted for publication at the 9th International Conference on Educational Data Mining (EDM 2016).
Here are some highlights of our work –

  • The need: Motor skills are required in a large number of blue collar jobs today. However, no automated means exist to test and provide feedback on these skills. We explore the use of touch-screen surfaces and tablet-apps to measure these skills.
  • Gamified apps: We design novel app-based gamified tests to measure one’s motor skills. We’ve designed apps to specifically check finger dexterity, manual dexterity and multilimb co-ordination.
    amultifingermanual

 

 

 

 

 

 

 

 

  • Validation on three jobs: We validated the scores from the apps on three different job roles – tailoring, plumbing and carpentry. The results we present make a strong case for using such automated, touch-screen based tests in job selection and to provide automatic feedback for test-takers to improve their skills!

If you’re interested in the work and would like to learn more, please feel free to write to research@aspiringminds.com

Data Science For Kids Goes International

We successfully organised our first international data science workshop for kids at the University of Illinois as a part of SAIL, a one-day event to learn more about life on campus by attending classes taught by current students.
The workshop aimed towards introducing the idea of machine learning and data-driven techniques to middle-to-high-school kids. Participants went through a fun exercise to understand the complete data science pipeline starting from problem formulation to prediction and analysis.cssail
Special mention and thanks to the mentors, Narender Gupta, Colin Graber and Raghav Batta, students at the university who helped us execute the academic and peripheral logistics of the workshop efficiently and making the experience engaging and interesting for the attendees.

naren

colinraghav

 

 

 

 

 

 

Narender Gupta                     Colin Graber                          Raghav Batta

To read the mentor experiences click here.
Visit
sail.cs.illinois.edu for more information on the event or workshop.

What AM Research told you in 2015 – the data science way?

As the year came to an end, we looked back on what we shared with the world in 2015. As data nerds, we pushed all our blog articles in to an NLP engine to cluster them to identify key themes. Given the small sample size and challenges to find semantic similarity in our specialized area, we waded through millions of unsupervised samples through deep learning with a Bayesian framework, ran it on a cluster of GPUs for a month…yada yada. Well, for some problems it is just that humans can do things easily and efficiently; so that is what we actually did.

The key themes were:

Grading of programs – 4 posts

We need to grade programs better to be able to give automated feedback to learners and help companies hire more efficiently and expand the pool considered for hiring. We at AM dream to have an automated teaching assistant – we think it is possible and will be disruptive. Thus we dedicated 4 of our posts on telling you about automatically grading programs and its impact.

The tree of program difficulty – We found that we could determine the empirical difficulty of a programming problem based on the data structures it uses, the control structures used and its return type, among other parameters. We used these features in a nice decision tree to predict how many test takers would answer the question correctly, and we predicted with a correlation of 0.81! This tells us about human cognition, helps improve pedagogy and also helps generate the right questions to have a balanced test. And this is just the tip of the iceberg. Second, we approached the same by looking at the difficulty of test-cases and their inter correlation. We understood what conceptual mistakes people make and also got a recipe to make better test cases for programs and had insights on how to score them. For instance, we found that a trailing comma in a test case can make it unnecessarily difficult!

Finding super good programmers – Given these thoughts on how to construct a programming test and score it, we showed you how all this intelligence put together with our super semantic machine learning algorithm, we can spot 16% good programmers missed by test case based measures. Additionally, we also found automatically the super good ones writing efficient and maintainable code. So please say a BIG NO to test case based programming assessment tools!

venn

Reproduced from “AI can help you spot the right programmers”. It shows a test case metric misses 16% good programmers. Furthermore AI can help spot 20% super good coders

Pre-reqs to learn programming - Stepping back, we tried determining who could learn programming through a short duration course. We found that it was a function of a person’s logical ability and English but not did not depend on her/his quantitative skills. Interestingly, we found that a basic exposure to programming language could compensate for lower logical ability in predicting a successful student who could learn programming. A data way to find course prerequisites!

Building a machine learning ecosystem – 3 posts

Catching them young! We designed a cognitively manageable hands-on supervised learning exercise for 5th-9th graders. We helped kids, in three workshops spread across different cities, make fairly accurate friend predictors with great success! We think data science is going to become a horizontal skill across job roles and want to find ways to get it into schools, universities and informal education.

“Exams. I would take my exam results, from the report card of every year. And then I will make it on excel and then I will remember the grades and the one I get more grades I will take a gift” [sic.]

flashcard

Reproduced from datasciencekids.org. Whom will you befriend? Can machine learning models devised by high school kids predict this?

The ML India ecosystem – Our next victims were those in universities. We launched ml-india.org to catalyse the Indian machine learning ecosystem. Given India’s very low research output in machine learning, we have put together a resource center and a mail list to promote machine learning. We also declared ourselves as self-styled evaluators of machine learning research in India and promise to share monthly updates.

Employment outcome data release – We recently launched AMEO, our employability outcome data set at CODS. This unique data set has assessment details, education and demographic details of close to 6000 students together with their employment outcomes – first job designation and salary. This can tell us so much about the labor market to guide students and also identify gaps – to guide policy makers. We are keenly looking forward to what wonderful insights we get from the crowd! Come, contribute!

Pat our back! – 3 posts 

blog4-image

Reproduced from “Work on spoken English grading gets accepted at ACL, AM-R&D going to Beijing!”. We describe our system that mixes machine learning with crowdsourcing to do spontaneous speech evaluation

We told you about our KDD and ACL papers on automatic spoken English evaluation – the first semi-automated automated grading of free speech. We loved mixing crowdsourcing with machine learning – a cross between peer and machine grading – to do super reliable automated evaluation.

And then our ICML workshop paper talked about how to build models of ‘employability’ – interpretable, theoretically plausible yet non-linear models which could predict outcome based on grades. More than 200 organizations have benefited by using this model in recruiting talent and they do way better than linear models!

Other posts

On the posts off these three clusters, we told you about –
Why we exist – why we need data science to promote labor market meritoracy

– The state of the art and goals for assessment research for the next decade (See ASSESS 2015)

Our work on classifying with 80-80 accuracy for 1500+ classes

It has been an interesting year at AM, learning from all our peers and contributing our bit to research, while using it to build super products. We promise to treat you with a lot more interesting stuff in open-response grading, labor market standardizing and understanding next year. Stay tuned to this space!

Varun

Tweets

Brainstorming session in progress: Discussing ways to improve #employability of #engineers #NECBangalore @myamcat pic.twitter.com/XA3pO8xlGV

Twitter Media

Naveen talks on significance of quantifiable #jobs at #NECBangalore @Mindtree_Ltd @myamcat

Aspiring Minds National #Employability Conclave 2016 opens in #Bengaluru today. #NEC #jobs #skillgap

More than 80% Indian #engineers #unemployable, says @WSJ quoting Aspiring Minds #NER Engineers 2016 bit.ly/2aRbcME @makeinindia

Digital #job search platforms like @myamcat increasingly help connect #jobseekers with opportunities @himanshu0820 #Blum2016

Importance of #job credentials in bridging the job #skills gap @himanshu0820 @BrookingsGlobal #Blum2016

Generic #skills like Inductive reasoning and Openness to experience draw the highest #compensation in U.S. ht.ly/8H12301vDVh #data

Lack of knowledge of the demand for #skills required by the #industry; major information gap in US #labor market. ht.ly/odTK301m05I

Do you know which #skills to cultivate in order to find #employment in the #US ? Find out here! ht.ly/pKEn301lWpL #career

Using #crowdsourcing and #datascience to find the #skills the industry wants and is ready to pay for. ht.ly/lHhA301lVYm #bigdata

Stay ahead of the curve! Know the #skills in demand in your state. ht.ly/33dw301jvAy #jobs #jobsearch #career #datascience #data

Know what #skills are needed by the industry and which #jobs pay the best. #datascience ht.ly/LmfD301jvtw pic.twitter.com/VWI1eenlrF

Twitter Media

Aspiring Minds launches US Skill Map – the first ever mapping of #job #skills in the U.S. ht.ly/zNbT301jrP4 pic.twitter.com/Z9fcldX93Q

Twitter Media

Got some intriguing #datascience / #machinelearning #ideas to brainstorm? Lets get a room. bit.do/b263E @ml_india #ThinkContent

Family run businesses in #Asian markets play major role in filling the existing #institutionalvoids @TarunKhannaHBS goo.gl/0B7nmZ

#hiring right candidates is more of a #science and here’s how you can nail it! @himanshu0820 @businessinsider linkedin.com/pulse/hiring-r…?

Aspiring Minds in #SkillIndia 's #SkillCertification and Placement Ceremony of Retiring #Indian #AirForce personnel. pic.twitter.com/qjL5T1h6xn

Twitter Media

"I wasn’t expecting students to know as much as they actually did,"says Narender Gupta, from #datascience camp #UIUC goo.gl/rYMLzD

#ML-India's #Bangalore chapter is organizing its 3rd meetup on 30th of April. #datascience enthusiasts, join today! lnkd.in/bWeE4Xz

#Engineering students at Universiti Malaysia Perlis take #AMCAT to get certified for #employability pic.twitter.com/FlYAEfZInC

Twitter Media

Aspiring Minds at the IFC 7th Global Private Education Conference in Hong Kong (25 - 27 April). Looking forward to a great event.

Our research paper on #Automated test of #MotorSkills for ##Job Selection and Feedback gets selected at #EDM2016 educationaldatamining.org/EDM2016/

Aspiring Minds at the #ASUGSVSummit this year. Looking forward to some great conversations!

#ML-India interviews Avishek Lahiri, doctoral student at #IITKharagpur. goo.gl/fZXHqm #datascience #data

#Datascience for kids goes international! - At #UIUC this weekend! sail.cs.illinois.edu

ml-india interviews Prof. Parag Singla 'Lifted Inference Learning' lnkd.in/btY8nYz #machinelearning #datascience

Why Job Portals Are Outliving Their Usefulness. @himanshu0820 @HuffingtonPost lnkd.in/bkHd882

"How do you see the #machinelearning field evolving in the next 5 years?" Intriguing question to @varaggarwal pic.twitter.com/R5hPZDzpJ6

Twitter Media

Finding an interesting problem is still a problem. Problem formulation is an art and needs imagination @varaggarwal #machinelearning #data

#Data is always going to go big and more interesting. @UnaMayMIT #bootcamp @myamcat

How to predict and evaluate the performance of our #machinelearning models. @UnaMayMIT #data #bootcamp pic.twitter.com/Iwu83wwthf

Twitter Media

Final chance to participate in the AM #data #bootcamp. Learn from @UnaMayMIT and @varaggarwal. Hurry up! Enroll at: lms.aspiringminds.in

Salute to womanhood! #WomenDay twitter.com/IndianYash/sta…

#Indian graduates lack basic numerical, logical & communication #skills which affects their #employability ow.ly/YRCfP @BT_India

#Electronics industry #Skillsgap :Students #fail to apply basic Kirchhoff's law & assemble a circuit - @BT_India ow.ly/YRCcz

40% #engineers can't comprehend English text. How do they understand their #curriculum ? ow.ly/YDI9g @BT_India #education #india

"Write a program of 15-20 lines, #IT companies will #hire you. But students FAIL" - @varaggarwal speaks to @BT_India ow.ly/YDI9g

Dive deep into the nuances of #machinelearning with @UnaMayMIT of @MIT Register for AM #Data #Bootcamp 2016: ow.ly/YJoS3

AM Datathon + Online Course + Workshop with @UnaMayMIT Register now: lms.aspiringminds.in #datascience #machinelearning