Life of a Data Scientist in an Internet Company
There are so many articles about “What wonders analytics can do for the business” and about data scientist jobs such as “How to be a data scientist”, “How to hire data scientists” etc. But, there is limited information about “What Data Scientists Exactly Do and How Are Their Lives In An Internet Company”. I think that I can cover this topic genuinely since I have worked as a data scientist many years in various industries like Banking, BPO, Defence etc. and currently leading team of data scientists at InfoEdge which is the parent company of various popular portals like Naukri.Com, JeevanSathi.Com, 99Acres.Com, Shiksha.Com etc. In this article, I will cover uniqueness of data science problem in an Internet company in terms of data and user behaviour. Why this problem excites the Data Scientist like me. How Data Scientist spends his life to build scalable real time personalized solutions to solve various challenges of the industry.
- Huge Amount of Data
All three V’s i.e. Volume, Variety, Velocity of Data exists in internet domain. Internet company like Naukri (which has more than 70% of market share in Jobs Category in India) generates tons of amount of data every day with all varieties i.e. structured, semi-structured, unstructured. Naukri generates every day lots of amount of data about many millions of Searches, Page Views, Applies, Registration etc.
- User Behavior
It’s always a challenge to engage an internet user and to bring him back especially on Mobile where conventional way of engaging like searching is difficult. Every user is interested only in services and products which are personalized to him/her. Nobody will engage to the site if we show generic product and services to everyone or segment specific.
Why Data Science Problem in Internet Domain Excites Data Scientists
For a data scientist, large amount of data is the prerequisites to play with. Our job as data scientists is to torture the data until it confess. Data Scientists love challenges and it excites us more if data is large and of varied type like unstructured, semi structured etc.
Internet Domain Challenges provides the opportunities to test and learn various advanced techniques (Machine Learning, Deep Learning, NLP, Semantic Technologies etc.) and Technologies like Hadoop, Nosql-MongoDB, neo4J etc. We at Naukri leverage most of these techniques and technologies to build scalable and accurate Real Time Personalized Recommendation Engines, Notifications Systems and Semantics Search and Alert Systems.
A Typical Day of Data Scientists in Internet Domain
I can share the typical day of Data Scientists @ InfoEdge.
The day start with looking at the numbers of previous experiments followed by discussion how to improve them further. Then, building the new features (such as how people move from one location to other location while switching the jobs, is there any industry and functional bias with respect to different experience group etc., how important skills and roles for an individuals, is it different than population or segment) by processing the large profile data, behavior data. Rebuilding the model with new set of features and test its performance in Testbed. Once model provides gains in Testbed then productionize the code to integrate in the respective Live Systems (RealTime Recommendations Engines, Alerts, Semantics Search etc.). To test and learn of the new feature, we always run A/B and evaluate its performance. Once experiment is successful then roll out the experiment to all and replicate it to other applications. Pace of experiments is very fast. In a single week, we can build features and see their performance since most of the systems are online and we can evaluate the performance in very quick time.
After the experiments goes successful, it’s time to have fun. Going for Team outing (bowling, movies, lunch and gupsup and leg pulling of each others) is always fun. Having Samosa and Jalebi to nearby sweet shop is favorite of us.
We also have frequent technical discussions of new tools and emerging techniques to keep ourselves upbeat in the industry. We often run POC and pilot of new emerging techniques.
We@InfoEdge feel proud not just contributing to grow the business but also making the life of so many users better by helping them to get desirable jobs (Naukri.Com), matching them to the right life partners(JeevanSathi.Com), screening out to right properties(99Acres.Com) and making out to right education choices (Shiksha.Com).
Author: Manish Gupta
Dr. Manish Gupta is an advanced analytics professional with more than 14 years of experience in building & leading Data Science, Analytics, BI Teams for developing competencies in Customer Analytics, Real Time Recommendation System, BigData Analytics, Click Stream Data Analytics, Web Analytics, Text Analytics, Financial Analytics, Fraud Analytics, Voice Analytics, Target Marketing across various industry e.g. Internet/E-commerce, Banking, BPO, Defence. He holds Ph.D. from Dept. of Mathematics, IIT Delhi in the area of data mining and machine learning with over 15 research/technical publications in leading international journals and conferences with 1 US Patents. He is currently working as Senior Vice President-Analytics at InfoEdge which is the parent company of various popular portals like Naukri.Com, JeevanSathi.Com, 99Acres.Com, Shiksha.Com etc and leading team of data scientists to build innovative & disruptive solutions for business insight using huge amounts of structured and unstructured data using cutting edge machine learning, data mining and text mining algorithms. He has previously worked as Assistant Vice President, Citigroup, Principal Analytics Consultant (Head R&D), Innovation Labs@24/7 Inc., and Scientist in DRDO. He is the recipient of several awards such as DRDO Scientist of the Year Award, DRDO Technology Award, Citi Super Star Award etc.