Apply for the Site Reliability Engineer position at Ford Motor Company in . Find the best jobs for you effortlessly with InJob.AI, your ultimate solution for job search. Discover top job opportunities and streamline your job search process.

Job Description
<div> <strong> Job Description <br/> <br/> </strong> We are the movers of the world and the makers of the future. We get up every day, roll up our sleeves and build a better world -- together. At Ford, we’re all a part of something bigger than ourselves. Are you ready to change the way the world moves? <br/> <br/> <strong> Enterprise Technology </strong> plays a critical part in shaping the future of mobility. If you’re looking for the chance to leverage advanced technology to redefine the transportation landscape, enhance the customer experience and improve people’s lives, this is the opportunity for you. Join us and challenge your IT expertise and analytical skills to help create vehicles that are as smart as you are. <br/> <br/> The Monitoring as a Service (MaaS) Team is building and evolving their services with customers in mind. MaaS will enable teams to modernize and disrupt by providing robust monitoring tools powered by AI and easy-to-use dashboards. Monitoring increases transparency of applications' performance end-to-end, regardless of hosting location (on-prem or in the cloud), which means a better view into how we can proactively manage our apps and improve performance. <br/> <br/> <strong> SOUTHEAST MI RESIDENTS: </strong> Please note, this job is posted as remote unless the selected candidate lives within 50 miles of Dearborn, MI, then if may require hybrid onsite schedule, up to 60% of the time. <br/> <br/> <strong> In this position... <br/> <br/> </strong> We are seeking an experienced Site Reliability Engineer (SRE) to join our team and lead the development, enhancement, and extension of our global monitoring and observability platform. As SRE your role will combine software engineering and systems engineering disciplines to ensure that software systems are available, scalable, and maintainable This individual will play a pivotal role in shaping the evolving needs of our customers including development of Service Level Indicators and Objectives (SLI/SLO), best practices with associated templates, as well as automation to remove toil and facilitate adoption. <br/> <br/> <strong> Responsibilities <br/> <br/> </strong> <strong> What you'll do... <br/> <br/> </strong> <ul> <li> Strong background in software development and systems administration, as well as excellent problem-solving, troubleshooting, and communication skills. </li> <li> Leverage experience to safely perform destructive testing to seek and discover vulnerabilities. </li> <li> Architect, design and develop automation to improve resilience, recoverability, availability, and scalability of supported applications. </li> <li> Recognize, validate, and evangelize emerging technologies and architectures that align with business objectives. </li> <li> Develop tooling to improve reliability, quality, and time-to-market for software solutions. </li> <li> Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve. </li> <li> Identify and reduce or eliminate toil via automation to maximize the time spent on engineering and innovation. </li> <li> Collaborate with development teams to design, build, and operate scalable and resilient software systems using Cloud native principles. </li> <li> Proactively identify stability risks and work with engineering leadership to establish appropriate mitigation plans. </li> <li> Regularly review key technical metrics such as transactions errors, logging, response times, caching strategies, conversion/bounce rates, capacity, and resource utilization </li> <li> Establish error budgets by identifying the right SLOs, SLIs, and effectively drive their use to ensure maximum availability/uptime. </li> <li> Conduct performance analysis and optimization of new and in-production systems </li> <li> Provide technical guidance and mentorship to other team members. </li> <li> Participate in incident response, support, recovery, and postmortem analysis. <br/> <br/> <br/> </li> </ul> <strong> Qualifications <br/> <br/> </strong> <strong> You'll have... <br/> <br/> </strong> <ul> <li> Bachelor’s degree in Computer Science, Computer Engineering, Systems Engineering or related field or a combination of education and equivalent work experience </li> <li> 5+ years of programming experience with one or more of these languages: Python, Go, Java/Scala, C or C++ </li> <li> 5+ years of experience solving complex architecture/design & business problems, working to simplify, optimize, remove bottlenecks, etc. </li> <li> 3+ years of experience as a Site Reliability Engineer with APM or other monitoring tools such as Dynatrace, New Relic, ELK, Splunk, Prometheus, Sensu, Nagios, Kafka, DataDog </li> <li> 3+ years of experience with J2EE, NoSQL/SQL Datastore, Spring Boot, GCP/AWS/Azure & Docker/K8 in developing multi-tier applications. </li> <li> 3+ years of experience with Cloud (Google preferred) </li> <li> 3+ years of experience with automated test-driven development in CI/CD Pipelines </li> <li> 3+ years of experience with RESTful APIs and microservices platforms <br/> <br/> <br/> </li> </ul> <strong> Even better, you may have... <br/> <br/> </strong> <ul> <li> Master’s Degree in Computer Science, Computer Engineering, Systems Engineering or related field </li> <li> Thorough understanding of software development and agile programming </li> <li> Understanding and ability to implement effective observability strategies to improve MTTD/R </li> <li> Working knowledge of the TCP/IP stack, internet routing and load balancing <br/> <br/> <br/> </li> </ul> You may not check every box, or your experience may look a little different from what we've outlined, but if you think you can bring value to Ford Motor Company, we encourage you to apply! <br/> <br/> As an established global company, we offer the benefit of choice. You can choose what your Ford future will look like: will your story span the globe, or keep you close to home? Will your career be a deep dive into what you love, or a series of new teams and new skills? Will you be a leader, a changemaker, a technical expert, a culture builder…or all the above? No matter what you choose, we offer a work life that works for you, including: <br/> <br/> <ul> <li> Immediate medical, dental, and prescription drug coverage </li> <li> Flexible family care, parental leave, new parent ramp-up programs, subsidized back-up childcare and more </li> <li> Vehicle discount program for employees and family members, and management leases </li> <li> Tuition assistance </li> <li> Established and active employee resource groups </li> <li> Paid time off for individual and team community service </li> <li> A generous schedule of paid holidays, including the week between Christmas and New Year’s Day </li> <li> Paid time off and the option to purchase additional vacation time. <br/> <br/> <br/> </li> </ul> For a detailed look at our benefits, click here: <br/> <br/> https://corporate.ford.com/content/dam/corporate/us/en-us/documents/careers/2024-benefits-and-comp-GSR-sal-plan-2.pdf <br/> <br/> This position is a range of salary grades <strong> 6-8 </strong> . <br/> <br/> Visa sponsorship is not available for this position. <br/> <br/> Candidates for positions with Ford Motor Company must be legally authorized to work in the United States. Verification of employment eligibility will be required at the time of hire. <br/> <br/> We are an Equal Opportunity Employer committed to a culturally diverse workforce. All qualified applicants will receive consideration for employment without regard to race, religion, color, age, sex, national origin, sexual orientation, gender identity, disability status or protected veteran status. In the United States, if you need a reasonable accommodation for the online application process due to a disability, please call 1-888-336-0660. <br/> <br/> </div>
AI Powered Job Insights
Are you ready to dive into the future of mobility? Ford Motor Company is seeking a Site Reliability Engineer to join their Monitoring as a Service (MaaS) team! This role combines software and systems engineering to enhance their global monitoring and observability platform, driving innovation and reliability. 📍 Location: Southeast MI residents preferred (remote options available) 💼 Position: Site Reliability Engineer ⏰ Type: Full-time 📅 Date Posted: July 17, 2024 Role Summary: - Develop and enhance monitoring tools powered by AI for better application transparency. - Improve the resilience, recoverability, and scalability of applications through automation and innovative solutions. - Collaborate with development teams to operate scalable software systems. What You'll Do: - Architect and design automation to boost application performance. - Identify stability risks and work on mitigation plans. - Conduct performance analysis, focusing on optimizing current systems. - Measure and optimize system performance while defining Service Level Indicators (SLI) and Objectives (SLO). - Mentor team members and participate in incident responses and postmortems. What’s Needed: - Bachelor's degree in a relevant field or equivalent experience. - Minimum 5 years of programming experience in languages such as Python, Go, or Java. - Extensive experience with monitoring tools and practices, particularly Dynatrace, New Relic, or similar. - Proven expertise with cloud platforms (preferably Google Cloud), CI/CD pipelines, and microservices architecture. If you have a passion for innovative technology and a drive to improve operational reliability, don't miss this opportunity at Ford! They value diversity and encourage all qualified applicants to apply.
Top Interview Questions
A: My approach to implementing SLIs and SLOs starts with understanding the business needs and user expectations. I collaborate with stakeholders to identify key metrics that define success, such as uptime and response times. Then, I establish SLIs that align with these metrics, ensuring they are measurable and meaningful. Afterward, I set realistic SLOs based on historical performance data and team capabilities. Finally, I ensure these are continuously monitored using tools like Prometheus or DataDog, and regularly review them to adapt to changing requirements.
A: In my previous role, I identified a repetitive deployment process that required manual intervention which created bottlenecks. I automated this process using Jenkins for CI/CD and Docker for containerization. By scripting the deployment pipeline and integrating automated testing, we reduced deployment times from several hours to just minutes, significantly increasing team efficiency and allowing more frequent releases.
A: To ensure performance and availability, I employ a combination of proactive monitoring, automated scaling, and regular load testing. I utilize tools like Grafana and ELK Stack for real-time monitoring of application performance and infrastructure health. Implementing auto-scaling groups in AWS or GCP allows resources to adjust according to demand. Additionally, I conduct regular load tests using tools like JMeter to identify performance bottlenecks before they impact users.
A: To reduce 'toil', I prioritize automating manual tasks, creating self-service tools for development teams, and implementing infrastructure as code (IaC) practices with Terraform or CloudFormation. By continuously reviewing processes to identify repetitive tasks, I can create scripts and workflows that automate these efforts. I also encourage a culture where team members document their knowledge and automation techniques, aiding future efficiencies and innovations.
A: In a past project, I noticed that an application was experiencing increased latency during peak hours, which posed a stability risk. I conducted a performance analysis to identify the root cause, which was related to inefficient database queries. I worked with the development team to optimize these queries and implement caching strategies using Redis. Additionally, I established new SLIs to closely monitor performance metrics. After these changes, we saw a 50% reduction in response times, significantly enhancing system reliability.
Want to get matched with your dream job?
Try InJob.ai for Free and Get Matched 100s of such opportunities!
200+ professionals have found their dream job with InJob.ai this week.

Salary Benefits
Salary details not provided

Want to apply directly?
Apply for the Site Reliability Engineer position at Ford Motor Company in using https://www.linkedin.com/jobs/view/3978681319


PayPal, San Jose, CA
Kforce Inc, Orlando, FL
Netflix, Los Gatos, CA
LoopNet, Irvine, CA
Summit Human Capital,
Western Digital, San Jose, CA
Together AI, San Francisco, CA
Material Bank®,
Still have a question? Check out our FAQ section below.
