AM Research is a division of Aspiring Minds. Aspiring Minds aspires to build an assessment-driven job marketplace (a SAT/GRE for jobs) to drive accountability in higher education and meritocracy in labor markets. The products developed based on our research has impacted more than two million lives and the resulting data is a source of continuous new research.
A cocktail of assessment, HR, machine learning, data science, education, social impact with two teaspoons of common sense stirred in it.
Special mention and thanks to the mentors, Narender Gupta, Colin Graber and Raghav Batta, students at the university who helped us execute the academic and peripheral logistics of the workshop efficiently and making the experience engaging and interesting for the attendees.
As the year came to an end, we looked back on what we shared with the world in 2015. As data nerds, we pushed all our blog articles in to an NLP engine to cluster them to identify key themes. Given the small sample size and challenges to find semantic similarity in our specialized area, we waded through millions of unsupervised samples through deep learning with a Bayesian framework, ran it on a cluster of GPUs for a month…yada yada. Well, for some problems it is just that humans can do things easily and efficiently; so that is what we actually did.
The key themes were:
Grading of programs – 4 posts
We need to grade programs better to be able to give automated feedback to learners and help companies hire more efficiently and expand the pool considered for hiring. We at AM dream to have an automated teaching assistant – we think it is possible and will be disruptive. Thus we dedicated 4 of our posts on telling you about automatically grading programs and its impact.
The tree of program difficulty – We found that we could determine the empirical difficulty of a programming problem based on the data structures it uses, the control structures used and its return type, among other parameters. We used these features in a nice decision tree to predict how many test takers would answer the question correctly, and we predicted with a correlation of 0.81! This tells us about human cognition, helps improve pedagogy and also helps generate the right questions to have a balanced test. And this is just the tip of the iceberg. Second, we approached the same by looking at the difficulty of test-cases and their inter correlation. We understood what conceptual mistakes people make and also got a recipe to make better test cases for programs and had insights on how to score them. For instance, we found that a trailing comma in a test case can make it unnecessarily difficult!
Finding super good programmers – Given these thoughts on how to construct a programming test and score it, we showed you how all this intelligence put together with our super semantic machine learning algorithm, we can spot 16% good programmers missed by test case based measures. Additionally, we also found automatically the super good ones writing efficient and maintainable code. So please say a BIG NO to test case based programming assessment tools!
Pre-reqs to learn programming - Stepping back, we tried determining who could learn programming through a short duration course. We found that it was a function of a person’s logical ability and English but not did not depend on her/his quantitative skills. Interestingly, we found that a basic exposure to programming language could compensate for lower logical ability in predicting a successful student who could learn programming. A data way to find course prerequisites!
Building a machine learning ecosystem – 3 posts
Catching them young! We designed a cognitively manageable hands-on supervised learning exercise for 5th-9th graders. We helped kids, in three workshops spread across different cities, make fairly accurate friend predictors with great success! We think data science is going to become a horizontal skill across job roles and want to find ways to get it into schools, universities and informal education.
“Exams. I would take my exam results, from the report card of every year. And then I will make it on excel and then I will remember the grades and the one I get more grades I will take a gift” [sic.]
The ML India ecosystem – Our next victims were those in universities. We launched ml-india.org to catalyse the Indian machine learning ecosystem. Given India’s very low research output in machine learning, we have put together a resource center and a mail list to promote machine learning. We also declared ourselves as self-styled evaluators of machine learning research in India and promise to share monthly updates.
Employment outcome data release – We recently launched AMEO, our employability outcome data set at CODS. This unique data set has assessment details, education and demographic details of close to 6000 students together with their employment outcomes – first job designation and salary. This can tell us so much about the labor market to guide students and also identify gaps – to guide policy makers. We are keenly looking forward to what wonderful insights we get from the crowd! Come, contribute!
Pat our back! – 3 posts
We told you about our KDD and ACL papers on automatic spoken English evaluation – the first semi-automated automated grading of free speech. We loved mixing crowdsourcing with machine learning – a cross between peer and machine grading – to do super reliable automated evaluation.
And then our ICML workshop paper talked about how to build models of ‘employability’ – interpretable, theoretically plausible yet non-linear models which could predict outcome based on grades. More than 200 organizations have benefited by using this model in recruiting talent and they do way better than linear models!
On the posts off these three clusters, we told you about –
– Why we exist – why we need data science to promote labor market meritoracy
– Our work on classifying with 80-80 accuracy for 1500+ classes
It has been an interesting year at AM, learning from all our peers and contributing our bit to research, while using it to build super products. We promise to treat you with a lot more interesting stuff in open-response grading, labor market standardizing and understanding next year. Stay tuned to this space!
Aspiring Minds Research is pleased to announce that it will be co-organizing this year’s data challenge at CODS 2016, the annual top-tier conference on machine learning and data science organized by the Indian chapter of KDD.
Undergraduates – performance and salaries
This year, we wanted data science enthusiasts to get a flavor of the kind of data we have and work on. We have released AMEO 2015 – a dataset on Aspiring Minds’ Employability Outcomes. which captures the academic and demographic information of engineering undergraduates giving AMCAT, Aspiring Minds’ battery of standardized assessments. What makes this dataset unique and rich is that it also has employment outcomes (annual salaries of students’ first jobs) along with standardized test scores.
The answers to a lot of interesting questions possible lie in this dataset –
- Can we predict the salaries a particular undergraduate would get on graduating?
- Is the recruitment industry meritocratic – Do people with higher skills get paid higher? Or are there biases which don’t allow for these?
- How important are English skills in getting a job?
and many more!
Participate and spread the word – 1000 USD cash prizes!
Interested in finding out the answers to these questions?
Take a stab at the data right away by downloading it from the contest website (mentioned below).
Get started right away and help spread the word and!
1000 USD cash prizes to those with the best submissions!
Machine learning is the science of learning to do tasks by observing examples. It is transforming the world by enabling machines do all sorts of ‘intelligent’ tasks such as understanding images, human speech, predicting preferences, diseases and many others. With tremendous amount of data, interconnectedness, sophisticated algorithms and huge processing power in small devices, machines do things which were beyond their reach until recently. On the other hand, machines are still unable to do many tasks which humans do effortlessly, say understanding a story – this constitutes the next big challenge for machines, well, the humans that build these machines!
In some way, it has never been so exciting! Where should India be, as machines are becoming more intelligent? It is simple – it should be making the most of the opportunity. We need to participate and contribute in high quality research, innovation and also convert new results into effective business models. The opportunity is global – the location of a digital business doesn’t constrain its market – a company in a Bangalore or a Gurgaon could serve the US market, the Europe market or even the whole world. Machine learning is not just a scientific or an academic pursuit. The economy and society can get great returns by the research and innovation in the area.
But are we there yet? Where are we placed in the global scene in both, academic and industrial research?
Read the full article here – http://ml-india.org/where-does-india-stand-machine-learning/
In fall 2014, we organized ASSESS, the first workshop on data mining for educational assessment and feedback, at KDD 2014 [link]. The workshop brought together a total of 80 participants including education psychologists, computer scientists and practitioners under one roof and led to a thoughtful discussion. We have put together a white paper which captures our key discussions from the workshop. The paper primarily discusses why assessments are important, what is the state of the art and what goals should we pursue as a community. It is a brief exposition and serves as a starting point for a discussion to set the agenda for the next decade.
Why are assessments important?
Automated and semi-automated assessments are a key to scaling learning, validating pedagogical innovations, and delivering socio-economic benefits of learning.
- Practice and Feedback: Whether considering large-scale learning for vocational training or non-vocational education, automating delivery of high-quality content is not enough. We need to be able to automate or semi-automate assessments for formative purposes. Substantial evidence indicates that learning is enhanced through doing assignments and obtaining feedback on one’s attempts. In addition, the so-called “testing effect” demonstrates that repeated testing with feedback enhances students long-term retention of information. By automating assessments, students can get real-time feedback on their learning in a way that scales with the number of students. Automated assessments may become, in some sense, “automated teaching assistants”.
- Education Pedagogy: There is a great need to understand which teaching/learning/delivery models of pedagogy are better than others, especially with new emerging modes and platforms for education. To understand the impact of and compare different pedagogies, we need assessments that can summatively measure learning outcomes precisely and accurately. Without valid assessments, empirical research on learning and pedagogy becomes questionable.
- Learning to socio-economic mobility: For learners that seek vocational benefits, there need to be scalable ways of measuring and certifying learning so that they may garner socio-economic benefits from what they’ve learnt. There need to be scalable ways of measuring learning so as to predict the KSOAs (knowledge, skills and other abilities) of learners to do specific tasks. This will help both learners and employers by driving meritocracy in labor markets through reduced information asymmetries and transaction costs. Matching of people to jobs can become more efficient.
We look forward to hearing your thoughts on the paper! Do feel free to write to firstname.lastname@example.org
This is an excerpt from the white paper ‘On Assessments – State of the art and goals’, which had contributions from Varun Aggarwal, Steven Stemler, Lav Varshney and Divyanshu Vats, co-organizers, ASSESS 2014 at KDD. The full paper can be accessed here.
- Plan what NOT to do in 2017!
- World’s first automated motor skill test – exploiting the power of touch tablets
- The first interactive US Skill Demand Map- A big data approach
- Scaling up machine learning to grade computer programs for 1000s of questions in multiple languages
- An Automated Test of Motor Skills for Job Prediction and Feedback
- assessment research
- Big Data
- Computer Program Assessments
- Data science
- decision trees
- hiring assessment
- hiring test
- item difficulty
- Kids learning
- Machine Learning
- motor skill test
- online hiring assessment
- online hiring test
- programming assessments
- programming test
- Test Cases
- testing research