IT Engineer Reliability
Company: Retail Business Services
Location: Mauldin
Posted on: May 27, 2023
|
|
Job Description:
Address: USA-SC-Mauldin-211 BiLo Boulevard
Store Code: Greenville Data Center - It (5118626)
Retail Business Services, ranked No. 25 on Fast Company's 2022 100
Best Workplaces for Innovators, is the services company of leading
grocery retail group Ahold Delhaize USA, which includes Food Lion,
Giant Food, The GIANT Company, Hannaford and Stop & Shop.
Primary Purpose:
Platform Reliability Engineer will help ensure service
availability, identifying and automating manual processes, and
bridging the gaps between product development teams and operations.
Implementing operational improvements in availability, latency,
performance, efficiency, change management, monitorChriing,
incident response, patch management and capacity planning are all
within scope for this role. Whether it's done through code, the
introduction of modern tools, and/or better processes continuous
improvement and efficiency is the goal.
You'll provide operational excellence with troubleshooting skills,
ownership in supporting various Azure services
Duties and Responsibilities:
--- Builds, manages, and operate Azure Core Services with
automation and infrastructure as code
--- Manages, and operates the continuous delivery framework and
tools, manages, and automates the lifecycle of the different
platform components and help support product teams
--- Leverage cloud architecture, applying site reliability
principles, full-stack troubleshooting skills across network,
application, security, Identity, OS, Containers, on-prem, and
distributed services layers.
--- Lead and set strategy, roadmap for cloud reliability and
recommend best practices for Operations
--- Mentor the team members to follow the frameworks and guide them
to accelerate delivery of projects
--- Provide reasoning about system & application architecture as
well as be comfortable looking at code and offering feedback on how
it can be improved to increase reliability.
--- Identify opportunities and drive the implementation of
automation to improve patch management, service health,
manageability, reliability, and telemetry.
--- Own, triage, investigate and resolve service issues with an
emphasis on broad communications, learning & teaching throughout
the process
--- Design process or technology solutions that monitor, identify,
and resolve platform, system, deployment, and environmental issues
both prior & post production releases, and ensure measurable
improvements against Service KPIs.
--- Drive Security and compliance aspects for services in
accordance with Azure compliance requirements.
--- Engage in service capacity planning, demand forecasting and
work towards Azure cost optimizations.
--- Create and document Runbooks, Operational procedures, and
Standards on confluence
--- Communicate on a deeply technical level with product
engineering, project management and product teams to improve and
optimize products, improve infrastructure, and evolve services.
--- Work within a project management/agile scrum teams in a support
role as part of a wider team
--- Remain current on new technologies, methods and procedures
including, but not limited to, coding practices such as Test Driven
Development, Continuous Integration, Continuous Deployment and
Operational excellence/
Qualifications:
--- Bachelor's Degree in Computer Science, Information Technology,
Engineering, or related field
--- 8+ years of IT experience focused on infrastructure which
includes server, storage, network, security, Identity
--- 4+ years of experience supporting, maintaining, and automating
Azure environments
--- 3+ years of experience using IaC tools (ARM, Terraform,
JSON,YAML, PowerShell, Github etc...)
--- Production experience in Cloud technologies - Azure IaaS, PaaS,
networking, Azure functions, Azure automation and runbooks,
workbooks, Insights, Security center, Azure Monitor, Log
Analytics.
--- Ability to read, write, configure, design, and script
end-to-end service telemetry, alerting and self-healing
capabilities for platform services, lead the execution and ongoing
management of services
--- Ability to work in an Extreme Programming environment and work
in a paired programming/operating model, able to lead the team and
help remove roadblocks
--- Able to facilitate diverse teams, multi-task, and work under
pressure to meet aggressive schedule targets
--- Hands on experience with IaC tools like ADO, ARM, terraform,
ansible, PowerShell, python, azcli, github
--- Experience in service capacity planning, demand forecasting,
software performance analysis and system tuning
--- Technical and Operational expertise in
Windows/Linux/VMware/Hyper-V/AKS, SQL and N0-SQL DB's, IaaS, PaaS,
FaaS, Data, BCDR, Security, Management, Storage, Networking,
Monitoring, Identity and Connectivity
--- Experience managing and maintaining code repos, build systems,
and CICD pipelines
--- Experience in infrastructure and configuration as code, as well
as service auto-scale capabilities.
--- Worked in Devops and Agile environments, Blend of both
Development and SRE mindset
--- Systematic problem-solving and troubleshooting skills coupled
with a strong sense of ownership and drive.
--- Participate in on call rotation. Participate, collaborate, and
provide guidance in retrospectives.
--- At least 4 years of hands-on operational experience supporting
the following or related experience:
o Azure Virtual Network, VWAN, Express route, Load Balancer
(L4/L7), Traffic Manager, CDN, Azure DNS, routing & routing
protocols like BGP, firewall concepts
o Azure Identity including any of the following: Azure AD, PIM,
Conditional Access, MFA, Azure AD Connect, Password less sign-ins,
Microsoft Defender, key vault
o Azure Governance, Security, Monitoring, Workbooks, Compliance,
and cost awareness
o Azure Virtual Machines, Containers and/or Kubernetes and/or
OpenShift (infrastructure perspective)
o Azure Storage Account, Disk, Snapshot, Backup, Site Recovery,
file sync, Data Lake/
Preferred Qualifications:
--- Certification in Azure Administrator -required, Azure DevOps
-preferred, Azure Solutions Architect -preferred
LI-RV1 #DICEJobs #LI-Hybrid
Retail Business Services currently provides services to five
omnichannel grocery brands, including Food Lion, Giant Food, The
GIANT Company, Hannaford and Stop & Shop. Retail Business Services
leverages the scale of the local brands to drive synergies and
provide industry-leading expertise, insights and analytics to local
brands to support their strategies. We are committed to diversity,
equity and inclusion and we foster a community of belonging where
everyone is valued.
Retail Business Services is an equal opportunity employer. We
comply with all applicable federal, state and local laws. Qualified
applicants are considered without regard to sex, race, color,
ancestry, national origin, citizenship status, religion, age,
marital status (including civil unions), military service, veteran
status, pregnancy (including childbirth and related medical
conditions), genetic information, sexual orientation, gender
identity, legally recognized disability, domestic violence victim
status or any other characteristic protected by law. We provide
reasonable accommodations to applicants and employees with
disabilities. As important as what we do is how we do it. Our team
embodies our values of Courage, Care, Teamwork, Integrity and Humor
in everything that they do. We have a culture of care that values
and celebrates the qualities and perspectives that make us all
unique.
If you have a disability and require assistance in the application
process, please contact our Talent Acquisition Department at
tad@retailbusinessservices.com .
For more information, visit https://www.retailbusinessservices.com
.
Job Requisition: 306572_external_USA-SC-Mauldin_4182023
Keywords: Retail Business Services, Asheville , IT Engineer Reliability, Engineering , Mauldin, North Carolina
Click
here to apply!
|