Our Purpose
Mastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we’re helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships and networks combine to deliver a unique set of products and services that help people, businesses and governments realize their greatest potential.
Title and Summary
Data Scientist-IIRole Overview
We are seeking a Senior Data Scientist with expertise in both data science and data engineering, specifically in building data pipelines and implementing AI solutions. This individual will play a key role in identifying customer challenges, translating them into AI-driven business opportunities, and deploying advanced machine learning solutions. The ideal candidate will have hands-on experience with AI/ML models, complex data handling, and the ability to integrate solutions into business processes.
Key Responsibilities
• Data Science and Advanced Analytics
○ Identify and understand customer challenges, working cross-functionally to translate them
into business opportunities focused on AI solutions.
○ Develop, implement, and deploy AI solutions, including GenAI models, to solve complex
business problems and enhance operational efficiency.
○ Perform exploratory data analysis to understand data structure and characteristics, using
advanced statistical methods and machine learning algorithms.
○ Analyze and interpret model results to generate actionable insights and recommendations
for stakeholders.
○ Conduct market and trend scouting to stay current with the latest tools, techniques, and
trends in AI and data science.
• Data Engineering and Pipeline Development (Preferred, a big PLUS)
○ Architect, build, and maintain scalable data pipelines and ETL processes, ensuring high-
quality data for analytical and machine learning models.
○ Work with large, complex datasets, applying advanced analytical methods and leveraging
ETL tools, data warehouses, and big data technologies (e.g., Hadoop, Spark).
○ Collaborate with engineering teams to define data requirements, build robust data models,
and optimize pipeline performance.
• Collaboration and Integration
○ Work closely with cross-functional teams to integrate AI solutions into business processes
and workflows, driving measurable impact.
○ Interact with stakeholders at various levels, effectively presenting findings through
simplified visual displays of complex quantitative information.
○ Act as an advocate for AI and data-driven solutions, fostering a culture of continuous
learning and development.
• Deployment and Operationalization
○ Use tools like ADO or Jenkins for CI/CD to streamline model deployment and monitor model
performance in production.
○ Perform statistical evaluation of models, tuning, and validation for optimization, neural
networks, and machine learning workflows.
Basic Qualifications
• Educational Background: Bachelors/Master’s in Computer Science, Data Science, Statistics,
Engineering, or a related field. (B tech in CS highly preferred)
• Experience:
○ 5+ years in data science or data engineering projects, with a strong foundation in
statistics, machine learning algorithms, and end-to-end ML component development.
○ Proven expertise in working on engagements from use-case conceptualization to design,
build, train, and deploy ML models.
○ 6+ years working with Big Data technologies (e.g., Hadoop, Spark) and cloud platforms
(AWS, Azure, Google Cloud).
• Programming and Development:
○ Advanced proficiency in Python and R for data science and machine learning
applications.
○ Experience with data science libraries such as NumPy, pandas, scikit-learn, PyTorch,
TensorFlow, Keras, and spaCy.
○ Familiarity with additional languages like SQL, Scala, and Java for data pipeline
integration.
• Data Engineering and Pipeline Tools:
○ Strong experience with ETL tools (e.g., Apache NiFi, Talend, Informatica) and orchestration
frameworks (e.g., Apache Airflow, Luigi).
○ Hands-on experience with big data processing frameworks like Apache Spark, Hadoop, Kafka
for data streaming, and batch processing.
○ Proficiency with data warehousing and storage solutions (e.g., Snowflake, BigQuery, Amazon
Redshift, Azure Synapse).
• Machine Learning and Deep Learning:
○ Solid understanding of statistical and machine learning algorithms (e.g., regression,
classification, clustering, decision trees, neural networks).
○ Proficient in deep learning frameworks such as TensorFlow, PyTorch, Keras, and specialized
NLP libraries (e.g., Hugging Face Transformers, BERT, GPT).
○ Familiarity with model deployment frameworks, including TensorFlow Serving, ONNX, and
MLflow for model tracking and versioning.
• Data Visualization and Reporting:
○ Expertise with data visualization tools like Tableau, Power BI, Plotly, and Seaborn for creating
dashboards and visual insights.
○ Proficiency in Matplotlib, ggplot2 (R), or other visualization libraries for in-depth analysis and
presentation.
• Data Storage and Management:
○ Strong understanding of SQL, NoSQL (e.g., MongoDB, Cassandra), and data lake solutions
(e.g., AWS S3, Azure Data Lake, Google Cloud Storage).
○ Experience in database optimization, indexing, and query tuning for RDBMS such as
PostgreSQL, MySQL, and Oracle.
• Cloud Platforms and Infrastructure:
○ Proficient in cloud services, including AWS (SageMaker, Redshift, Lambda), GCP (BigQuery,
Dataflow, AutoML), and Azure (Databricks, Synapse, ML).
○ Experience with containerization and orchestration tools (e.g., Docker, Kubernetes) for
scalable deployment.
○ Familiarity with multi-cloud AI deployments.
• CI/CD and DevOps:
○ Working knowledge of CI/CD pipelines with tools such as Jenkins, GitLab CI, or Azure DevOps
(ADO) for model deployment and data pipeline automation.
○ Proficient in version control with Git and experience with Docker for containerization and
reproducibility of models.
• Statistical and Advanced Data Analysis:
○ Expertise in statistical analysis, A/B testing, and hypothesis testing.
○ Knowledgeable in time series analysis, forecasting, and natural language processing (NLP)
techniques.
○ Familiarity with open-source packages for specialized analytics, such as Prophet for time
series, XGBoost, and LightGBM for gradient boosting.
○ Experience with deep learning models, such as those built in Keras, and expertise with
Jupyter notebooks.
○ Big Data certifications (e.g., Cloudera, Hortonworks, AWS, GCP, Azure) are highly preferred.
Corporate Security Responsibility
All activities involving access to Mastercard assets, information, and networks comes with an inherent risk to the organization and, therefore, it is expected that every person working for, or on behalf of, Mastercard is responsible for information security and must:
Abide by Mastercard’s security policies and practices;
Ensure the confidentiality and integrity of the information being accessed;
Report any suspected information security violation or breach, and
Complete all periodic mandatory security trainings in accordance with Mastercard’s guidelines.