We are seeking a talented DevOps/Site Reliability Engineer (SRE) with a strong background in DevOps practices, Linux environments, and proficiency in scripting and programming languages like Bash, Shell, Python, or Golang. You will be responsible for managing and automating the deployment, monitoring, and reliability of our services. You will work closely with development teams to ensure systems are scalable, resilient, and performance-optimized.
Responsibilities
- Design, build, and maintain CI/CD pipelines to support continuous integration and deployment.
- Develop and implement tools to automate operational processes.
- Manage and monitor Linux-based systems for performance, availability, and security.
- Collaborate with cross-functional teams to optimize system architecture, ensuring high availability and reliability.
- Implement and manage infrastructure as code (IaC) tools.
- Conduct root cause analysis on production issues and implement corrective actions to prevent reoccurrence.
- Monitor application and infrastructure performance and implement improvements.
- Write and maintain scripts in Bash/Shell, Python, or Golang to automate tasks and support infrastructure operations.
- Participate in on-call rotation to support production systems.
Experience: 4+ years of experience in DevOps, SRE, or related roles.
Technical Skills:
- Proficient in Bash/Shell scripting.
- Solid programming skills in Python or Golang.
- Strong understanding of Linux systems.
- Familiarity with cloud platforms (AWS, GCP, Azure) and containerization tools like Docker and Kubernetes.
- Infrastructure as Code (IaC): Experience with tools such as Terraform, Ansible, CloudFormation.
- Monitoring & Logging: Familiarity with monitoring tools like Prometheus, Grafana, Datadog, Splunk.
- Fluency in Mandarin is a significant advantage for collaborating with Chinese-speaking teams. The ability to translate technical documentation or engage in conversations with stakeholders in Chinese is highly valued.