📍
Sunnyvale, CA

Software Engineer II - Site Reliability Operations Engineer

2 years experience
Retail & Consumer
Software engineering
Posted:
January 8, 2026

Walmart

Multinational retail corporation
81.3
Palpable Score
Apply >view company >

What you'll do...

As a Site Reliability Operations Engineer within the Global Technology Platforms (GTP) Command and Control Center (CCC) Team you will work with other CCC, SRE, DevOps and Engineering practitioners to pro-actively maintain mission-critical infrastructure, cloud platforms, micro-services, tools, and processes that will ensure highest levels of availability and reliability of Walmart’s technology stack.

You're right for the job if you are comfortable in monitoring, detecting, major incident response with a technical team of engineer’s laser focused on restoring service across complex distributed systems. To successfully achieve this, you will draw upon your knowledge of the tech stack and tools to surface key data. You'll excel if you have enthusiasm for digging deep, and a flare for sharp technical communication, prioritization and organization. You will work directly with our Application, DevOps and other cross functional engineering teams to support our next generation “always up” cloud-based technology platforms.
You will utilize your Software engineering skills to understand the technology stack and use this knowledge to ensure systems continue to meet production ready standards, Operational Excellence is key! Good judgement is crucial as you will own detection, prioritization, critical engagement, and communication of the incident process until issue is remediated. Your ability to continuously challenge yourself and develop a strong network with peers and stakeholders cross functionally will see you exceed in this role. Our goal is to protect the customer, merchant and associate experience and deliver outstanding levels of availability across Walmart Global Technology.


About the Role


· Omnichannel eCommerce production support
o Acquire in-depth technical knowledge of omnichannel cloud platforms, web traffic flows, micro-services, and service dependencies for major incident resolution.
· Unix/Linux administration
o Provide support for Unix and Linux systems from Kernel to Shell and beyond, taking into consideration system libraries, file systems, and client-server protocols.
· Networking knowledge and troubleshooting
Leverage knowledge of network technologies such as different protocols (TCP/IP, UDP, ICMP, etc.), MAC addresses, IP packets, DNS, CDN, OSI layers, Firewalls, Gateway, Proxy, and Load balancers.
· Cloud understanding and triaging
o Provide L1 and L2 production support for multiple cloud technologies such as Open stack, Cloud Native platform, Microsoft Azure, and Google Cloud Platform for triaging critical issues using various internal and vendor-related tools.
· Alert, Monitoring, Log analysis
o Detect and analyze monitoring graphs and alerts to identify systems causing production impacts with various tools like Grafana, Prometheus, MMS, Kibana, Graphite, Service Now, JIRA, Dynatrace, New Relic, Omniture, Splunk, and CDN logs [Reduce MTTD – Mean Time to Detect]
· Incident triage, Escalation and Resolution
o Triage site-impacting production issues by quantifying impact, severity and urgency, analyzing systems for quick remediation, engaging the right teams for recovery [Reduce MTTE – Mean Time to Engage], and focusing on immediate restoration [ Reduce MTTR – Mean Time to Restore] of large-scale enterprise systems.
· Enhance Monitoring solutions
o Develop enterprise monitoring and utilize tooling software solutions such as Grafana, Kibana, Splunk, Graphite, New Relic, to improve visibility, pro-actively detect issues and restore system availability
· Enhance Alerting solutions
o Designing and implementing JavaScript for the integration of alerting tool with service API endpoints with various tools like ServiceNow, Spotlight and xMatters
· Develop Tools and support
o Design and develop solutions for widespread internal communications for cloud applications support or workflows for infrastructure availability issues with various internal applications with multiple programming languages like Java, JavaScript (React, Node JS), Python and Shell programming technologies like Prometheus, Database Query languages
· Automation and Self-healing
o Demonstrate knowledge of scripting and software development for automation and self-healing of multi-cloud environments. Help enhance existing solutions by developing automation with Docker, Kubernetes and working with DevOps and Engineering partners


Required Skills:


· 2+ years in an infrastructure, systems, engineering or development environment delivering operational excellence to highly complex distributed systems.
· Bachelor's Degree in Computer Science or a related field, or relevant work experience.
· Strong and demonstrable incident management skills with relevant experience in an enterprise organization.
· Experience and exposure working in a 24/7 operations support environment.
· Methodical and systematic problem-solving approach, combined with a solid awareness of ownership, initiative and drive.
· Experience investigating, analyzing and troubleshooting large scale enterprise systems.
· Networking knowledge and understanding of network concepts, such as different protocols (TCP/IP, UDP, ICMP, etc.), MAC addresses, IP packets, DNS, OSI layers, and load balancing).
· Experience administering Unix/Linux in a production environment.
· Experience working with and developing enterprise monitoring/tooling/logging solutions like Grafana, Kibana, Splunk, Openobserve, Graphite, Nagios, New Relic, DynaTrace and Prometheus.
· Working knowledge of one or more cloud technologies such as AZURE, GCP, OpenStack.
· Experience with distributed version control like Git or similar
· Designing and implementing JavaScript for the integration of alerting tool with service API endpoints with various tools like ServiceNow, Spotlight, Splunk, and xMatters
· Programming experience in one or more of the following languages: Go, Java, Python, Shell, etc.
· Experience in data science/machine learning would be advantageous

About the company

Walmart

Company overview
Walmart is a global retailer that sells groceries, household essentials, and general merchandise through supercenters, neighborhood formats, and online channels. The company also runs Sam’s Club, a membership warehouse business, and operates eCommerce marketplaces and delivery services in multiple countries. Walmart supports large-scale logistics and fulfillment networks that move products from suppliers to stores and customers’ homes. Walmart Global Tech builds and runs the technology platforms behind shopping, supply chain, data, and internal tools used across the business.

Locations and presence

Walmart is headquartered in Bentonville, Arkansas, with major early-career office hubs highlighted for students including Bentonville and Hoboken, plus Silicon Valley locations such as Sunnyvale and San Bruno. Many student and intern roles are set up as in-person office internships, and some corporate and tech teams are organised around hub-based working rather than fully remote setups.

Palpable Score

81.3
/ 100
Walmart is one of the most accessible entry points in the market because the company hires at scale across stores, supply chain, corporate, and tech, with multiple student pipelines feeding into full-time roles. The biggest scoring drag is transparency: there is some helpful candidate guidance and well-described intern programming, but less consistent end-to-end clarity on stages and timelines across all early-career hiring.
view full company profile >

Related jobs

📍
Singapore
Bloomberg
2026 Analytics & Sales Graduate Programme (Thai speaker), Singapore
January 12, 2026
view job >
📍
Singapore
Bloomberg
2026 Analytics & Sales Graduate Programme (Tagalog speaker), Singapore
January 12, 2026
view job >
📍
Singapore
Bloomberg
2026 Analytics & Sales Early Professional Programme, Singapore (May class, Tagalog Speaker)
January 12, 2026
view job >
📍
Hamburg, Germany
Airbus
Master student in the field of innovative business management (d/f/m)
January 12, 2026
view job >