We’re PetDesk and we work in the best industry in the world. We build technology for the care providers of the best beings in the world - our pets. We help keep them healthy, happy, and give them more time to play with their owners - us!
We treat our local business customers like family. We stand up for them in the face of corporate giants. We win their loyalty by being amazing service providers - not with contracts. We are helping to preserve the high quality of care you get from your neighborhood vet, groomer, and boarder.
To do this we have to be different. We have to set high expectations, we have to work harder AND smarter, and we have to leave work feeling content every night because we killed it that day. Being content keeps you sane during the bad times and aggressive during the good times. We will always strive to do better and grow the company but in that pursuit we will find contentment every day.
About the Role
The purpose of the Site Reliability Engineer is to 1) build, expand, and be responsible for the PetDesk cloud infrastructure, and 2) improve the platform’s availability, scalability, performance, DevOps, and security.
PetDesk is made possible by reliable and smoothly running back end services and applications. We take pride in delivering quality, highly available software solutions to our customers and that depends on an infrastructure that not only performs well now but scales as we grow. Our back end is built upon Microsoft technologies (MS SQL Server, IIS, C# and the .NET framework) all hosted using Amazon Web Services.
You will be responsible for the PetDesk cloud infrastructure and be accountable for the following core principles.
Availability - Ensure a high availability of the PetDesk product platform 24/7.
Willing to “carry the pager” but also always striving to build such a kick-butt system that you never need the pager.
Scalability / Elasticity - Ensure system scalability to handle continued high customer growth rates.
Design, Implement, support and scale server architecture on AWS to provide fast and reliable services to a rapidly growing customer base.
You will lead the infrastructure roadmap detailing where we are today and where we want to be.
Performance - Monitor & tune system to continually improve performance of PetDesk products.
Write, analyze, optimize, and troubleshoot simple and complex SQL queries.
DevOps - Manage and improve on the processes to successfully and reliably deliver applications to the PetDesk cloud as well as internal systems required by the development team.
Generate and maintain documentation of data store architectures, configurations, operations, and maintenance.
Strong interpersonal skills and ability to work in dynamic environments with highly skilled and motivated technical team members across the organization and within other geographies, and "roll-up the sleeves" in order to accomplish all necessary tasks.
Security - Develop and maintain best practices to protect PetDesk systems and data.
You will maintain up-to-date knowledge of new technologies and services that will help PetDesk maintain its technical edge.
Deep desire to design, build and scale the back end architecture of a rapidly growing startup.
5+ years of designing/implementing/administering large and complex SQL Server or MySQL databases.
3+ years experience designing, deploying, scaling, and maintaining critical environments in the Amazon Web Service (AWS) Cloud for a customer facing software platform of reasonable scale.
Experience with many AWS and internet scale services and how each could be used to help scale a system.
Practical experience with Application Performance Management tools and usage in a production environment.
Monitoring system performance and seeing it handle anything you throw at it makes you smile.
Proven problem solving skills. Demonstrate track record of situations of how you identified bottlenecks and performance issues along with how they were solved.
Demonstrated past availability metrics on previous systems.
Able to design secure distributed systems and assure operational processes with systems that are publicly accessible in the cloud.
Comfortable administering Database Replication/Mirroring, Disaster Recovery, Backups, and Restores.
Ability to triage issues, react well to changes, work with teams, and self-manage time and effort across multiple projects.
We want people who are smarter than us, building a system that is far better than anything we can do ourselves. You are the kind of person that looks for things to continually improve upon because you are obsessed with performance.
Proficient in at least one object oriented programming language.
Nice to haves
Experience in C#
.NET data access technologies such as Entity Framework
Exposure to “NoSQL” data stores, data warehouse, and web service data exchange
Understanding of DevOps and tools to automate deployments
Database caching with Memcached, AWS Elasticache or similar