Resources



We are keen to see a vibrant research community around assessments and help them deliver social good. We put in our little bit to help the community by making some of our data sets and code/APIs available for academic use. We also actively collaborate with various groups and look forward to engage with more, if our priorities align. Feel free to write to us!

a] Automata:
Automata is an automated test of programming skills. It produces a detailed report of information about the test case pass status of the code, time complexity, program maintainability/readability and machine learning based score. Feel free to use the tool in your Programming class with our or your problems. Get more detailed feedback for your students and class. Do a pre and post assessment based learning study. Work with us to make machine learning based prediction models for your problems!

b] Programming Features API:
In our KDD 2014 paper, we describe a new grammar to extract meaningful features from program which are highly predictive of the algorithm used to solve the problem. We show how the features work wonders with supervised learning. A million more things can be done by our features than naive ones like keyword counts, AST height, syntax errors, etc. Want to try them out? Happy to provide an API. They are currently available for C, C++, Java and Python.

c] Code Data Set:
We have a data set of more than 100,000 codes in C, C++ and Java. We also have data sets of human graded codes in C and Java for various problems. Want to play with it, write to us!

d] Assessment-Performance Data Set:
This is a data set we gave out for a machine learning competition in 2011. It contains assessment scores of candidates in multiple areas and their performance in a company as rated by the managers. Want to try building the best models for different type1/type2 error trade-offs? Try this! To download the dataset, click here.

e] Aspiring Minds’ Employment Outcomes 2015 (AMEO 2015):
The dataset contains various information about a set of engineering candidates and their employment outcomes. For every candidate, the data contains both the profile information along with their employment outcome information. Read more about it here.


Apart from this, we have tested more than two million candidates in language, cognitive skills, personality and functional skills. We have various data sets of company performance on individuals whom we have assessed. If you can think of joint research projects around this data, do write to us.
Also, we are a data hungry group, so if you can provide us any kind of data and have joint research goals in mind, do write to us.



Aspiring Minds’ Employability Outcomes 2015 (AMEO 2015) is a unique dataset which contains engineering graduates’ employment outcomes (salaries, job titles and job locations) along with standardized assessment scores in three fundamental areas - cognitive skills, technical skills and personality. A relevant question is what determines the salary and the jobs these engineers are offered right after graduation. Various factors such as college grades, candidate skills, proximity of the college to industrial hubs, the specialization one has, market conditions for specific industries determine this. In our understanding, this is the first time anyone has release emoployment outcome data together with standardized assessment scores publicly.

Coupled with biodata information, AMEO 2015 provides an opportunity for a unique and comprehensive study of the entry level labor market. The data can be used not only to make an accurate salary predictor but also to understand what influences salary and job titles in the labor market.

This data was first released as part of the IKDD CoDS 2016 Data Challenge. The competition attracted 11 entries with complete reports, out of which 5 were selected as the winners. 137 predictions were made by various people on our Leaderboard. You can try out your own predictions on the leaderboard. 5 entries, out of these, were announced as the winners at IKDD. Read about the Data Challenge here

Click here to download the dataset.

To read more about the dataset in our paper, click here.

Projects/Papers using AMEO data sets

  1. Shekhar, S. Putting India to Work: Resolving Information Failures to Fill the Skill Gap , Masters thesis, Harvard Kennedy School, 2016.
  2. Singh, R. A Regression Study of Salary Determinants in Indian Job Markets for Entry Level Engineering Graduates, Dissertation, Dublin Institute of Technology, 2016.
  3. S.Shariat Torbaghan, H. Rachakonda, P. Krishna Gattam, Predict job success based on student’s credentials , Machine Learning class, NYU, 2016.
  4. IKDD CoDS 2016 Data Challenge Finalists


Quick Links: AMEO Data Set | Press ReleaseLeaderboard

If you have a project report based on this data, please feel free to mail us!

If interested, fill the form below.

Request Data Set

 

Verification