Advanced Data Scientist

Honeywellالهندمنذ 3 أسابيع

البيانات والتحليلات البحث والعلوم

تقديم الطلبشارك عبر واتساب

دوام كامل

Honeywell

Job Description

Job Description Advanced Data Scientist Location Bangalore, India Role Overview We are looking for a Advanced Data Scientist who can own end‑to‑end data science and machine learning solutions , from problem formulation to production deployment. This role requires a strong blend of machine learning expertise, data engineering, MLOps, cloud platforms, and technical leadership . You will work closely with product, engineering, and business stakeholders to design scalable data and ML systems that drive measurable business impact. Key Responsibilities Data Science & Machine Learning Translate business problems into data science and ML solutions Perform advanced EDA, feature engineering, and model development Build and optimize: Classical ML models (regression, classification, tree‑based models) Time‑series, anomaly detection, and recommendation systems Develop and fine‑tune deep learning models using PyTorch / TensorFlow Design and evaluate experiments (A/B testing, statistical validation) GenAI, NLP & LLM Solutions Build NLP and GenAI applications using modern LLMs Implement RAG pipelines , prompt engineering, and vector search Integrate LLMs using OpenAI / Azure OpenAI APIs Evaluate model quality, latency, and cost for production LLM systems Data Engineering & Pipelines (Good to Have) Design and build scalable data pipelines for batch and streaming use cases Work with distributed processing frameworks like Apache Spark Orchestrate workflows using Airflow / Dagster / Prefect/ Azure Data Factory / Databricks Handle real‑time data using Kafka or cloud‑native streaming services Ensure data reliability, quality, and performance at scale MLOps, Deployment & Production Own the full ML lifecycle : experimentation → training → deployment → monitoring Implement model versioning, reproducibility, and CI/CD pipelines Deploy models using REST APIs or batch inference pipelines Monitor model performance, drift, and data quality in production Work with Docker and Kubernetes for scalable deployments Cloud & Platform Engineering Build solutions on AWS / Azure / GCP (at least one in depth) Work with cloud data platforms like Databricks, Snowflake, BigQuery Optimize system performance and cloud costs Ensure security, access control, and compliance best practices Architecture, Collaboration & Leadership Design end‑to‑end data and ML architectures Make tradeoffs between batch vs streaming, cost vs performance Mentor junior data scientists and review code and models Set data science and ML best practices across teams Communicate insights clearly to technical and non‑technical stakeholders Required Skills & Qualifications Core Technical Skills Strong proficiency in Python and advanced SQL Solid foundation in statistics, probability, and linear algebra Hands‑on experience with XGBoost, LightGBM Experience with PyTorch or TensorFlow Data Engineering (Good to have) Strong experience with Spark / PySpark Pipeline orchestration using Airflow or similar tools Experience with relational, NoSQL, and analytical databases Understanding of data lakes and warehouse architectures MLOps & DevOps (Optional) Experience with MLflow, DVC, or W&B Model deployment using FastAPI Containers and orchestration: Docker, Kubernetes CI/CD and monitoring tools Cloud Platforms Deep expertise in at least one cloud provider: AWS, Azure, or GCP Experience with managed ML and data services Preferred / Nice‑to‑Have Experience with LLM frameworks (LangChain, LlamaIndex) Vector databases (FAISS, Pinecone, Weaviate) Streaming frameworks (Flink) Knowledge of data governance, privacy, and compliance Experience leading cross‑functional technical initiatives Machine Learning Algorithms & Techniques (Hands‑On) Supervised Learning Linear Models Linear Regression Logistic Regression Regularization (L1, L2, Elastic Net) Tree‑Based Models Decision Trees Random Forest Gradient Boosting (XGBoost, LightGBM, CatBoost) Clustering Techniques K‑Means Hierarchical Clustering DBSCAN PCA (feature reduction) t‑SNE / UMAP (visualization & analysis) Dimensionality Reduction Time Series & Forecasting (Basic–Intermediate) Statistical forecasting: Moving averages ARIMA / SARIMA (conceptual + basic use) ML‑based forecasting using regression and tree‑based models Model Evaluation & Optimization Cross‑validation techniques Hyperparameter tuning (Grid Search, Random Search) Bias–variance tradeoff Handling class imbalance Selection of appropriate evaluation metrics Experience 8–12+ years

تقديم الطلب