Senior Site Reliability Engineer
MX Technologies
Life at MX
We are driven by our moral imperative to advance mankind - and it all starts with our people, product and purpose. We always carry a deep sense of drive and passion with us. If you thrive in a challenging work environment, surrounded by incredible team members who will help you grow, MX is the right place for you.
Come build with us and be part of an award-winning company that’s helping create meaningful and lasting change in the financial industry.
Mission Alignment: Drive MX reliability via prevention engineering, AI automation, and SRE excellence
Key Responsibilities
Incident Response (Hands-On): Triage and mitigate production incidents using observability for Golden Signals (latency, traffic, errors, saturation). Identify root issues in MX architecture (e.g., database sharding, webhook failures), escalating just-in-time under unified command to minimize MTTR <1h.
Incident Response: Drive a coordinated response to an active incident, ensuring that communication is clear, roles are assigned, and the incident is mitigated efficiently and safely
Analysis & Pattern Recognition: Analyze incidents, postmortems, and architecture (e.g., RMQ overflows) to identify recurring issues. Develop knowledge bases and AI-driven early warning systems for proactive prevention.
Prevention Engineering: Identify incident trends and drive projects to prevent repeated incidents cross-org amongst multiple engineering teams. Identify and implement mitigation mechanisms, improve observability, alerting, and runbooks.
Perform on-call rotations; enforce priority matrix; lead RCAs with client-readable, legal-approved outputs.
Cross-functional collaboration to enhance observability (e.g., black-box probes) and reduce technical debt.
Required Qualifications
7+ years SRE/DevOps; 1+ in AI/ML for reliability (e.g., predictive analytics on incident data).
Proficient in Datadog, Kubernetes, Python/Javascript for AI/automation.
Experience reducing MTTR/repeats in distributed systems; familiar with SLOs/error budgets.
BS in CS or equivalent experience; Americas on-call availability.
Preferred Skills
Datadog/similar experience
Fintech experience with MX-like architectures
Google SRE practices: toil elimination, incident management, automation for self-healing.
Proficiency in JavaScript for frontend observability/tools.
Golang, Ruby on Rails experience
At MX, we are a high-performance organization that thrives on trust and results. This role is based in Lehi, Utah, with flexibility for both in-office and remote work. We believe in empowering our team members to deliver exceptional outcomes while taking advantage of our incredible office space when it best supports their work. Our Utah office features onsite perks such as company-paid meals, massage therapists, a sports simulator, gym, mother’s lounge, and meditation room and meaningful interactions with amazing people. We encourage team members to come together in the office to collaborate, kick off key projects, or strategize cross-functionally, fostering connection and innovation.
MX is proudly committed to recruiting and retaining a diverse and inclusive workforce. As an Equal Opportunity Employer, we never discriminate based on race, religion, color, national origin, gender (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender identity, gender expression, age, military or veteran status, status as an individual with a disability, or other applicable legally protected characteristics. We particularly welcome applications from veterans and military spouses. All your information will be kept confidential according to EEO guidelines. You may request reasonable accommodations by sending an email to hr@mx.com.