Introducing Data Science to School Kids

[Link to paper
Shashank Srikant, Varun Aggarwal SIGCSE 2017

Data-driven decision making is fast becoming a necessary skill in jobs across the board. The industry today uses analytics and machine learning to get useful insights from a wealth of digital information in order to make decisions. With data science becoming an important skill needed in varying degrees of complexity by the workforce of the near future, we felt the need to expose school-goers to its power through a hands-on exercise. We organized a half-day long data science tutorial for kids in grades 5 through 9 (10-15 years old). Our aim was to expose them to the full cycle of a typical supervised learning approach - data collection, data entry, data visualization, feature engineering, model building, model testing and data permissions. We discuss herein the design choices made while developing the dataset, the method and the pedagogy for the tutorial. These choices aimed to maximize student engagement while ensuring minimal pre-requisite knowledge. This was a challenging task given that we limited the pre-requisites for the kids to the knowledge of counting, addition, percentages, comparisons and a basic exposure to operating computers. By designing an exercise with the stated principles, we were able to provide to kids an exciting, hands-on introduction to data science, as confi rmed by their experiences. Check them out on www.datasciencekids.org. This tutorial is the fi rst of its kind. We hope that educators across the world are encouraged to introduce data science in their respective curricula for high-schoolers and are able to use the principles laid out in this work to build full-fledged courses.

Apps to Measure Motor Skills of Vocational Workers

[Link to paper
Bhanu Pratap Singh, Varun Aggarwal UbiComp 2016

Motor skills are required in a large number of vocational jobs today. However, no automated means exist to test and provide feedback on these skills. In this paper, we explore the use of touch-screen surfaces and tablet-apps to measure these skills. We design novel gamified apps to predict the performance of candidates in doing manual tasks in the industry. We demonstrate two important results - we use the information captured on a touch-screen device to successfully predict the scores of traditional, non-automated motor skill tests. Further, we show that this information successfully predicts the performance of workers in their respective jobs. The results presented in this work make a strong case for using such automated, touchscreen based apps in job selection and to provide automatic feedback. This is the first attempt at using touch-screen devices to scalably and reliably measure motor skills.

Question Independent Grading using Machine Learning:
The Case of Computer Program Grading

Gursimran Singh, Shashank Srikant, Varun Aggarwal KDD 2016

Learning supervised models to grade open-ended responses is an expensive process. A model has to be trained for every prompt/question separately, which in turn requires graded samples. In automatic programming evaluation speci fically, the focus of this work, this issue is amplifi ed. The models have to be trained not only for every question but also for every language the question is off ered in. Moreover, the availability and time taken by experts to create a labeled set of programs for each question is a major bottleneck in scaling such a system. We address this issue by presenting a method to grade computer programs which requires no labeled samples for grading responses to a new, unseen question. We extend our previous work wherein we introduced a grammar of features to learn question speci fic models. In this work, we propose a method to transform those features into a set of features that maintain their structural relation with the labels across questions. Using these features we learn one supervised model across questions, which can then be applied to an ungraded response to an unseen question. We show that our method rivals the performance of both, question specifi c models and the consensus among human experts while substantially outperforming extant ways of evaluating codes. The learning from this work is transferable to other grading tasks such as math question grading and also provides a new variation to the supervised learning approach.

An Automated Test of Motor Skills for Job Selection
and Feedback

Bhanu Pratap Singh, Varun Aggarwal EDM 2016

Motor skills are required in a large number of blue collar jobs today. However, no automated means exist to test and provide feedback on these skills. In this paper, we explore the use of touch-screen surfaces and tablet-apps to measure these skills. We design novel app-based gami fied-tests to measure one's motor skills. We show this information to strongly predict the job performance of skilled workers in three di fferent occupational roles. The results presented in this work make a strong case for using such automated, touch-screen based tests in job selection and to provide automatic feedback. This is the fi rst attempt at using touch-screen devices to scalably and reliably measure motor skills.

AMEO 2015 - A dataset comprising AMCAT test scores,
biodata details and employment outcomes of job seekers

[Link to paper
Varun Aggarwal, Shashank Srikant and Harsh Nisar IKDD CODS 2016

The dataset contains various information about a set of engineering candidates and their employment outcomes. For every candidate, the data contains both the profile information along with their employment outcome information.

On Automated Assessments: The State of the Art and Goals

[Link to paper
Varun Aggarwal, Steven Stemler, Lav Varshney and Divyanshu Vats ASSESS at KDD 2014

This white paper is an outcome of the ASSESS workshop, which was held at KDD 2014. The paper primarily discusses why assessments are important, what is state of art and what goals should we pursue as a community. It is a brief exposition and serves as a starting point for a discussion to set the agenda for the next decade.

Spoken English Grading: Machine Learning with Crowd Intelligence

Vinay Shashidhar, Nishant Pandey, Varun Aggarwal KDD 2015

In this paper, we address the problem of grading spontaneous speech using a combination of machine learning and crowdsourcing. Traditional machine learning techniques solve the stated problem inadequately as automatic speaker-independent speech transcription is inaccurate. The features derived from it are also inaccurate and so is the machine learning model developed for speech evaluation. We propose a framework that combines machine learning with crowdsourcing. This entails identifying human intelligence tasks in the feature derivation step and using crowdsourcing to get them completed. We post the task of speech transcription to a large community of online workers (crowd). We also get spoken English grades from the crowd. We achieve 95% transcription accuracy by combining transcriptions from multiple crowd workers. Speech and prosody features are derived by force aligning the speech samples on these highly accurate transcriptions. Additionally, we derive surface and semantic level features directly from the transcription. We demonstrate the efficacy of our approach by predicting expert graded speech sample of 566 adult non-native speakers across two different countries - India and Philippines. Using the regression modeling technique, we are able achieve a Pearson correlation of 0.79 on the Philippines set and 0.74 on the Indian set with expert grades, an accuracy much higher than any previously reported machine learning approach. Our approach has an accuracy that rivals that of expert agreement. We show the value of the system through a case study in a real-world industrial deployment. This work is timely given the huge requirement of spoken English training and assessment.

Learning Models for Personalized Educational Feedback and Job Selection

Vinay Shashidhar, Shashank Srikant, Varun Aggarwal ML-ED at ICML 2015

Every year millions of students enter the job market in search of employment opportunities. Multiple studies show that these students do not have skills commensurate to the job requirements in industry. They have no feedback on their skill gap with respect to jobs in the marker and steps on how to improve. Also, learners have no easy way to signal their employability i.e. their job suitability to corporates thus making the employment market inefficient. There is a need to objectively quantify employability for different job profiles, to enable employability feedback to learners and also facilitate an easy way to connect meritorious students with matching job opportunities. We propose a new class of models to predict employability as a function of test scores. These models satisfy the constraints of the problem space, which are coordinate wise monotonicity, simplicity and human interpretability. Learning these models require non convex optimization. To address the same, we use particle swarm optimization, a population based optimization method, to learn multiple trade off models. Through a case study we show that the modelling approach is useful to predict employability for the software engineering role, does better than extant models in hiring accuracy and provides new insight in what constitutes employability for the software engineering profile in the IT services industry. We believe that our modelling language and technique has the potential to drive greater meritocracy into the job markets.

Automatic Spontaneous Speech Grading: A Novel Feature Derivation Technique using the Crowd

Vinay Shashidhar, Nishant Pandey, Varun Aggarwal ACL 2015

In this paper, we address the problem of evaluating spontaneous speech using a combination of machine learning and crowdsourcing. Machine learning techniques inadequately solve the stated problem because automatic speaker-independent speech transcription is inaccurate. The features derived from it are also inaccurate and so is the machine learning model developed for speech evaluation. To address this, we post the task of speech transcription to a large community of online workers (crowd). We also get spoken English grades from the crowd. We achieve 95% transcription accuracy by combining transcriptions from multiple crowd workers. Speech and prosody features are derived by force aligning the speech samples on these highly accurate transcriptions. Additionally, we derive surface and semantic level features directly from the transcription. To demonstrate the efficacy of our approach we performed experiments on an expert-graded speech sample of 319 adult non-native speakers. Using these features in a regression model, we are able achieve a Pearson correlation of 0.76 with expert grades, an accuracy much higher than any previously reported machine learning approach. Our approach has an accuracy that rivals that of expert agreement. This work is timely given the huge requirement of spoken English training and assessment.

A system to grade computer programming skills using machine learning

Shashank Srikant, Varun Aggarwal [press] [videoKDD 2014

The automatic evaluation of computer programs is a nascent area of research with a potential for large-scale impact. Extant program assessment systems score mostly based on the number of test-cases passed, providing no insight into the competency of the programmer. In this paper, we present a system to grade computer programs automatically. In addition to grading a program on its programming practices and complexity, the key kernel of the system is a machine-learning based algorithm which determines closeness of the logic of the given program to a correct program. This algorithm uses a set of highly-informative features, derived from the abstract representations of a given program, that capture the program's functionality. These features are then used to learn a model to grade the programs, which are built against evaluations done by experts. We show that the regression models provide much better grading than the ubiquitous test-case-pass based grading and rivals the grading accuracy of other open-response problems such as essay grading . We also show that our novel features add significant value over and above basic keyword/expression count features. In addition to this, we propose a novel way of posing computer-program grading as a one-class modeling problem and report encouraging preliminary results. We show the value of the system through a case study in a real-world industrial deployment. To the best of the authors' knowledge, this is the first time a system using machine learning has been developed and used for grading programs. The work is timely with regard to the recent boom in Massively Online Open Courseware (MOOCs), which promises to produce a significant amount of hand-graded digitized data.

Principles for using Machine Learning in the Assessment of Open Response Items: Programming Assessment as a Case Study

Varun Aggarwal, Shashank Srikant, Vinay Shashidhar [Link to paper NIPS 2013

Questions demanding subjective (open) responses have been considered to be the most desirable assessment format in order to gauge candidate learning. Such questions allow candidates to express creatively and help evaluators to understand a candidate’s thought process. The evaluation of such subjective responses, however, has traditionally required human expertise and is challenging to automate. On the other hand, automated assessments provide scalability, standardization and efficiency. Given the recent shift towards online learning and the massive scale of operations, there is a need to develop systems which can combine advantages of both, expert assessors and automated systems. Drawing from attempts made by both, the machine learning community and educational phsychologists, we provide general principles on how any subjective evaluation problem can be cast in the framework of machine learning. These principles highlight the various choices and challenges one would need to consider while devising a machine learning based assessment system. We go on to demonstrate, as a case-study, how a system to assess computer programs has been successfully designed using the principles described.

When predicting performance, less of a bad thing is better than more of a good thing

Steven Stemler, Varun Aggarwal, Siddharth Nithyanand, Nisha Bhatt AERA 2014

The success of schools, as well as organizations more generally, depends heavily upon the characteristics of employees at every level. While strong leadership at the administrative level is crucial (Bass, 1985), it is equally important that staff possess a variety of domain-general skills (e.g., communication ability) and domain-specific knowledge. In light of the current conversation in education surrounding the topic of what it means to be career and workforce ready, this presentation will focus on the question of whether the concept of “dark traits” is useful for predicting individuals who will and will not thrive within an organization.

Examining the Structure of Opportunity and Social Mobility in India: Who Becomes an Engineer?

Based on AMCAT data
Authored by Prof. Anirudh Krishna (Duke University)
Development and Change 2014

Rising inequality alongside rapid economic growth reinforces the need to examine patterns of social mobility in India. Are children from less well-off sections also able to rise to higher-paying positions, newly created by the growing economy, or are these positions mainly accessible to established elites? Powered in particular by the software industry, no sector has grown as fast as engineering in India. Examining the social origins of students at a range of engineering colleges, including higher- and lower-ranked ones, provides a useful lens for understanding how the new opportunities have availed different social segments. These results provide some grounds for optimism: women, scheduled castes, and sons and daughters of agriculturists have improved upon historical trends. However, the rural–urban divide remains deep: the more rural one is, the lower are one's chances of getting into any engineering college. Multiple simultaneous handicaps — being poor and rural or scheduled caste and rural — reduce these chances to virtually zero. Improving education quality together with better information provision and more accessible career advice are critical for making opportunity more equitable.

National Employability Report, Engineers - Annual Report, 2016

The National Employability Report - Engineers, 2016 is an analytical study of the employability quotient of 150,000 engineering students from 650+ engineering colleges across the country. In its fourth edition this year, the report has widened its study to create a skill map, through theoretical and empirical evidence, for alternate careers for engineers such as sales engineer, technical content developer etc and report employability for these. It also looks into the job aspirations of engineers.

SKILLS, Plumbers 2015

This analysis provides insight on what skills currently trained plumbers lack and areas where we should focus our training interventions. We believe this will help training institutions develop their interventions better and will influence both curriculum and pedagogy. It will also provide a better idea to all stakeholders of what is required by the industry. In turn, it will also help the industry design better programs. Although the data in this study is from India, it is a case study which may provide insights for programs across the world. The analysis is done with the same rigor as all earlier studies by Aspiring Minds. The instruments are scientifically designed, follow global standards and validated by the industry. They measure competencies spanning soft skills, functional knowledge, handling real-world situations and also understanding of health and safety. Most importantly, they were delivered with credibility, electronically, on tablets in a scalable way.

National Employability Report, Engineers - Annual Report, 2014

The National Employability Report - Engineers, 2014 is an analytical study of the employability quotient of Indian engineers. It has over the years become an authoritative source for employability statistics for engineers and an auditory mechanism for higher education. The present report aims to delve further into the employability situation and also focuses on job aspirations of engineers. It also aims to understand what factors apart from merit influence employability outcomes.

National Employability Report, Graduates - Annual Report, 2013

The Aspiring Minds National Employability Report, Graduates- Annual Report, 2013 is based on the objective assessment (AMCAT) data of 60,000 graduation students from various colleges across many states in India. The report is the first ever national audit of employability of 3-year bachelor degree graduates. It aims to succinctly analyze the existent variations in employability across verticals. Some stark findings from the report highlight lack of English communication skills and emphasis to rote learning over application of concepts as major impediments in employability of Indian graduates.

Computer Programming Learning Levels, Engineering Graduates - Annual Report, 2013

The Aspiring Minds' Computer Programming Learning Levels Report, 2013 is based on the objective assessment (AMCAT) data of 55,000+ engineering graduates, of 2013, from 250+ engineering colleges across India. The analysis of the report is based on performance of students in the Computer Programming module of AMCAT. The stark findings of the report is that around 30% of IT/CS engineers lacking basic theoretical knowledge of programming which is a grave concern. It aims to succinctly analyze skill gaps and areas of improvement in the concerned vertical.

Practical Intelligence in Top B-Schools in India - An empirical quantitative study, 2013

The Aspiring Minds' Practical Intelligence in Top B-Schools in India report is based on Situational Judgment Test (SJT) data of 230 fresh graduates in 3 out of top 10 B-Schools in India. The report is an empirical study highlighting practical intelligence of fresh MBA graduates on mid level managerial skills. The results show a large difference in the scores of management graduates and those experienced in the industries on a test of practical intelligence and situation handling. The major area of concern is client management and work management skills.

National Employability Report, MBA Graduates - Annual Report, 2012

The National Employability Report, MBA Graduates, 2012 is based on a sample of more than 32,000 MBA students from over 220 Business schools across India. This is the most authoritative study on the current state of employability of management graduates in India which covers a breadth of roles and skill sets that MBA graduates bring to the table. The report presents an in-depth comparison of the available MBA talent pool across multiple parameters like gender, specialization areas, tiers of cities, zones, job sectors and campuses among others. Some grave findings from the report include facts like 40% of employable MBA graduates being hidden from recruiters due to market inefficiencies.

Women in Engineering: A comparative study of barriers across nations

The Aspiring Minds report titled "Women in Engineering: A comparative study of barriers across nations" is an analytical study of trends for women in engineering in India. It aims to understand, by ways in which engineering represent a space for progress for women in India. It attempts to identify the barriers in terms of environmental factors and male-female ratio in Indian engineering colleges in comparison with United States. See More

Which engineer gets a job?

The engineering degree has emerged in India as one of the most preferred higher education qualifications to get a well paid job leading to a life of dignity. The large demand for the degree has resulted in engineering seats growing by more than 200% in the last 10 years. A number of large companies in the IT space hire from engineering colleges in thousands every year. Other than these, companies in other engineering domains such as mechanical, electrical or electronics and a number of small and medium sized enterprises hire fresh engineers in reasonably large numbers.