Job Description
Sr Advanced Software Engr needs possessing a strong sense of responsibility for "production first" and a rapid response mentality 24x7 operation & incident management though on side / On call.
The ideal candidate will have a strong background in DevOps and SRE practices, with at least 5 years of hands-on experience in designing, implementing, and maintaining scalable, reliable, and secure infrastructure for cloud-native applications.
YOU MUST HAVE:
Bachelor’s degree
English oral communication
REQUIRED:
Cloud computing or related fields operation and maintenance/development experience
China regulatory and compliance requirements (MLPS, XinChuang, etc.)
Ensure the continuous and stable operation of production systems, achieving SLA (Service Level Agreements) and SLO (Service Level Objectives).
Quickly respond and resolve online faults, driving root cause remediation and prevents recurrence.
Process Automated Operations and Maintenance, replace repetitive manual operations with code (such as Python and Shell) to automate processes such as deployment, monitoring, and scaling.
Monitoring and Alerting System Construction, Design and maintain cross monitoring system covering application, infrastructure, and business metrics.
Capacity Planning and Performance Optimization, predict system load, optimize resource utilization, and improve system scalability and response efficiency.
Develop and maintain observability solutions (monitoring, logging, alerting) using tools like Prometheus, Grafana, ELK, Datadog, etc.
Collaborate with development teams to ensure best practices in application reliability, scalability, and security.
Automate operational tasks and improve system efficiency through scripting and tooling.
Mentor and guide junior engineers in SRE and DevOps practices.
Ensure compliance with security standards and participate in audits.
