Site Reliability Engineer, Trello
Austin, United States•AustinTexasUnited StatesNorth America•March 18, 2023
Working at AtlassianAtlassian can hire people in any country where we have a legal entity. Assuming you have eligible working rights and a sufficient time zone overlap with your team, you can choose to work remotely or from an office (unless it’s necessary for your role to be performed in the office). Interviews and onboarding are conducted virtually, a part of being a distributed-first company.We are looking for an experienced Site Reliability Engineer to join our Storage Layer SRE (SLS) team. The successful candidate will have a strong background in MongoDb and experience using Golang and Python for automation. This individual will also be responsible for managing the cluster on the AWS cloud and have a strong understanding of Linux operating systems.Our Storage Layer SRE (SLS) team is focused on ensuring the highest levels of availability, reliability and performance for our MongoDB clusters using Ops Manager. We utilize the automation features of Ops Manager to automate routine tasks, such as provisioning, scaling, and backups, which enables us to quickly and easily manage the entire lifecycle of our MongoDB deployments. Additionally, we closely monitor the performance of our clusters and proactively troubleshoot and resolve any issues that may arise. Our team is dedicated to continuously improving our processes and best practices, in order to provide the best possible experience for our customers and end-users.
Responsibilities:
- Ensure high availability and reliability of the MongoDb cluster
- Hands-on experience with public cloud offerings (AWS components like EC2, CloudFormation, RDS, S3, DynamoDB, SQS, Kinesis - or equivalents, e.g. in GCP / Azure)
- Develop and maintain automation tools in Golang/Python to manage the cluster and perform routine tasks
- Monitor and troubleshoot issues related to the MongoDb cluster and take appropriate action to resolve them
- Collaborate with the team to implement best practices for cluster management and performance optimization
- Continuously improve the monitoring and alerting systems for the MongoDb cluster
- Ensure compliance with security standards and best practices for the MongoDb cluster
- Build and maintain infrastructure as code
- Utilize the automation features of Ops Manager to automate routine tasks such as provisioning, scaling, and backups
- Monitor the performance of the clusters and proactively troubleshoot and resolve any issues
- Continuously improve processes and best practices to provide the best possible experience for customers and end-users.
- Incident response, incident management, and experience in an on-call rotation.
Requirements:
- Strong experience with MongoDb and cluster management
- Proficient in Golang or Python and experience using it for automation
- Experience with AWS cloud and system administrator knowledge
- Strong understanding of Ubuntu and Linux operating systems
- Experience with monitoring and alerting systems
- Strong problem-solving and analytical skills
- Experience with infrastructure as code
- Excellent written and verbal communication skills
- Experience working in an Agile environment is a plus.
Cyber Security Jobs by Category
Cyber Security Jobs by Location
Cyber Security Jobs in AustinCyber Security Jobs in TexasCyber Security Jobs in United StatesCyber Security Jobs in North America