לצערנו המשרה כבר לא בתוקף

Site Reliability Engineer

3-4 שנים |
משרה מלאה
| לפני 17 שעות
תיאור משרה

PTC is a global software company that delivers a technology platform and solutions to help companies design, manufacture, operate, and service things for a smart, connected world.

Since 1985, PTC has been enabling customers to stay one step ahead of the competition by combining our strategic vision with leading, field proven technology. PTC technology helps companies to quickly unlock the value now being created at the convergence of the physical and digital worlds through the IoT, AR, 3D Printing, Digital Twin, and Industrie 4.0. With PTC, global manufacturers and an ecosystem of partners and developers can capitalize on this promise of physical digital convergence today and drive the future of innovation.
We are looking for a hands-on engineer, experienced with site reliability and operations, for a leading PLM SaaS solution
Your Day-to-Day Work

Automation

SRE engineers build tools for automation to manage platform operations. Thus, instead of manually performing these functions, their aim is to automate them. Such functions include:

Continuous delivery, and Deployment
Toil reduction, and operations automation
Monitoring
Rapid Incident response
Alerts
Monitoring

SRE engineers are responsible for ensuring that the underlying infrastructure is running smoothly, and that systems and tools are working as expected.

They also monitor critical applications and services to minimize downtime and ensure their availability.

Issue resolution

The engineer works closely with developers, especially when issues arise so they will collaborate with developers to help with troubleshooting and provide consultation when alerts are issued.

This engineer will investigate and then resolve the issue in the event that a developer runs into a problem. Following the incident resolution, the engineer will revisit the issue and determine the cause to ensure it doesn’t happen again.

Cross team collaboration

Based on the above, SREs work across different teams, mainly operations and development. By building reliable systems and providing support to these teams, this will give these teams more time to divert their attention to building new features and hence get these out faster to customers.

דרישות התפקיד

3-5 years of hands-on experience in SRE skills:
Experience building automated monitoring tools like Datadog, Dynatrace, Splunk, SumoLogic etc
Experience in Logging/Monitoring/Insights: Zabbix, Grafana, Azure Monitoring, Open Telemetry
Experienced with tools like, ServiceNow, PagerDuty, CatchPoint, PingDom
Perform Root Cause Analysis and write Runbooks – a must
Well versed with Application, Platform and Operation security.
Should have a profound working knowledge of Cloud-native technologies especially on Azure cloud, and should be abreast of IaaS, PaaS Services with appropriate backup, rollback, HA/DR technologies.
Must possess deep working knowledge of infrastructure on IAM solutions, AAD/LDAP, Networking – DNS, Firewall, Gateway, Load Balancers, storage etc – an advantage
Must be proficient, and skilled in programming/scripting with Java, Go, Python, shell, or groovy scripting. Familiarity with DSLs like Yaml is expected.
Practical experience working, and troubleshooting with Oracle – an advantage
Build Config Management and deployment tooling viz SaltStack/Ansible, Terraform, ARM.
Bsc in computer Science - a must

* משרה זו פונה לנשים וגברים כאחד.