Site Reliability Engineer at Material Bank®

Apply for the Site Reliability Engineer position at Material Bank® in . Find the best jobs for you effortlessly with InJob.AI, your ultimate solution for job search. Discover top job opportunities and streamline your job search process.

alert circle

Job Description

<div>
 <em>
  Material Bank is a fast-paced, high-growth technology company and created
  <strong>
   the world's largest material marketplace for the Architecture and Design industry
  </strong>
  , providing the fastest and most powerful way to start and manage a design project. Learn more about us at www.materialbank.com or see below.
  <br/>
  <br/>
 </em>
 --
 <br/>
 <br/>
 Site Reliability Engineers (SREs) are responsible for keeping all user-facing services and other Material Bank production systems running reliably and efficiently.
 <br/>
 <br/>
 SREs in Material Bank specialize in systems (operating systems, storage subsystems, networking), while implementing best practices for availability, reliability and scalability, with interests in algorithms and distributed systems.
 <br/>
 <br/>
 <strong>
  What you&rsquo;ll do:
  <br/>
  <br/>
 </strong>
 <ul>
  <li>
   Be on an on-call (OpsGenie) rotation to respond to incidents that impact Material Bank&rsquo;s availability, and provide support for service engineers with multiple customer incidents.
  </li>
  <li>
   Prevent incidents from reoccurring.
  </li>
  <li>
   Run our infrastructure with, Terraform, GitHub CI/CD, Kubernetes and ECS.
  </li>
  <li>
   Build monitoring that alerts on symptoms rather than on outages using NewRelic and Prometheus.
  </li>
  <li>
   Document EVERYTHING so your findings turn into run-books/SOPs and then into automation.
  </li>
  <li>
   Improve operational processes (such as deployments and upgrades) to make them as uneventful as possible.
  </li>
  <li>
   Design, build and maintain core infrastructure that enables Material Bank scaling to support thousands of concurrent users.
  </li>
  <li>
   Debug production issues across services and levels of the stack.
  </li>
  <li>
   Plan the growth of Material Bank infrastructure.
   <br/>
   <br/>
   <br/>
  </li>
 </ul>
 <strong>
  What you&rsquo;ll bring:
  <br/>
  <br/>
 </strong>
 <ul>
  <li>
   Think about systems: edge cases, failure modes, behaviors, specific implementations.
  </li>
  <li>
   Know your way around Linux.
  </li>
  <li>
   Have strong programming skills: Shell, and Python.
  </li>
  <li>
   Collaborate and communicate asynchronously.
  </li>
  <li>
   Document all the things to inform and mentor others.
  </li>
  <li>
   Biased for action
  </li>
  <li>
   Delivering quickly and effectively, and iterating fast.
  </li>
  <li>
   Have experience with Nginx, Container technologies, Kubernetes, Terraform, Kafka or similar technologies
   <br/>
   <br/>
   <br/>
  </li>
 </ul>
 <strong>
  Projects you can work on
 </strong>
 :
 <br/>
 <br/>
 <ul>
  <li>
   Coding infrastructure automation with Terraform, and common CI/CD tools
  </li>
  <li>
   Improving our monitoring and building new metrics
  </li>
  <li>
   Helping release managers deploy and fix new versions of Materialbank in all geographies.
  </li>
  <li>
   Plan, prepare for, and execute the provisioning of new infrastructure in our future expansions
  </li>
  <li>
   Develop a relationship with our product and business teams to define their SLAs, iterate on those SLAs and improve their reliability
  </li>
  <li>
   Experience defining SLOs and Error budgets
   <br/>
   <br/>
   <br/>
  </li>
 </ul>
 <strong>
  <em>
   What you&rsquo;ll get from us:
   <br/>
   <br/>
  </em>
 </strong>
 <ul>
  <li>
   Our people: If you thrive in an inclusive, innovative, and fast-paced organization, look no further! You will get to work alongside some of the brightest minds - Join a genuinely fun and supportive workplace where we keep our employees consistently engaged through internal communication and corporate events
  </li>
  <li>
   Relaxation and Celebrations: Generous PTO, Sick Days, Paid National Holidays, and even more (ask us about this when we connect).
  </li>
  <li>
   Health Benefits: We contribute to your medical, dental, vision and short-term/long-term disability plans and have a strong employee assistance program.
  </li>
  <li>
   Plan for your Retirement: 401(k) eligible after your first 90 day's employed!
  </li>
  <li>
   Giving Back: We sponsor multiple events throughout the year to help out our communities. You will receive time off to give back as well.
  </li>
  <li>
   Growth: We&rsquo;ll help you take your career to the next level. We want you to be creative and take initiative which will allow you to grow and create within the company. Most importantly, be the best at what matters!
  </li>
  <li>
   Flexible Work Schedules: With business units and employees across the globe, Material Technologies has embraced a hybrid  working model allowing department leaders to decide on the best approach for their respective teams, whether that be remote, in person, or a little of both.
   <br/>
   <br/>
   <br/>
  </li>
 </ul>
 <strong>
  About Material Bank
  <br/>
  <br/>
 </strong>
 Material Bank is the world&rsquo;s largest material marketplace for the architecture and design industry, providing the fastest and most powerful way to search and sample materials. Material Bank connects design professionals to hundreds of manufacturers through facilitating brand discovery, rep engagement, and material sampling.
 <br/>
 <br/>
 Material Bank has transformed the way an entire industry discovers and samples materials. By removing the friction that exists in the process, we drive business between architects and designers (members) and our Brand Partners (clients).
 <br/>
 <br/>
 Our powerful material database and proprietary robotic distribution facility allow members to order samples until midnight (ET) to be delivered free of charge anywhere in the US, in one box, by 10:30 AM the next morning.
 <br/>
 <br/>
 Connect with us and discover your career at Material Bank.
 <br/>
 <br/>
 --
 <br/>
 <br/>
 Material Bank is proud to be an equal opportunity employer. We value diversity, and all applicants will be considered for employment without attention to race, color, religion, sex, sexual orientation, gender identity, age, national origin, veteran or disability status or other status protected under any applicable federal, state or local law.
</div>

AI Powered Job Insights

Exciting opportunity for a Site Reliability Engineer at Material Bank, a leader in the architecture and design industry! They are focused on ensuring high availability of their user-facing services and production systems while leveraging cutting-edge technology.

📍 Location: Remote/Hybrid 
💼 Position: Site Reliability Engineer 
⏰ Type: Full-time 
📅 Date Posted: 2024-07-18 

Role Summary:
- Responsible for maintaining reliability and efficiency of production systems.
- Specializes in systems including operating systems, storage, and networking.

What You'll Do:
- Participate in an on-call rotation to manage incidents impacting availability.
- Prevent recurring incidents and enhance system reliability.
- Utilize tools like Terraform, GitHub CI/CD, Kubernetes, and ECS for infrastructure management.
- Develop monitoring systems that focus on symptoms instead of outages.
- Document processes to create run-books, SOPs, and automation.
- Debug production issues and plan infrastructure growth.

What's Needed:
- Strong knowledge of Linux and programming (Shell, Python).
- Experience with technologies such as Nginx, Container technologies, Kubernetes, Terraform, or Kafka.
- Ability to collaborate asynchronously and mentor others through documentation.
- Focus on delivering quickly and iterating effectively.

Material Bank offers a supportive work culture with flexible schedules, generous benefits including PTO, health coverage, 401(k) plans, and opportunities for community involvement and personal growth. This role is perfect for someone eager to take on challenges in a fast-paced environment and contribute to a transformative service in the design industry.

Top Interview Questions

  • Q: How do you ensure system reliability and minimize downtime in a production environment?

    A: To ensure system reliability, I implement a robust monitoring solution that alerts me to issues before they escalate into outages. I use metrics to understand traffic patterns and peak loads, allowing for proactive scaling of resources. Additionally, I regularly conduct incident response drills and post-mortems to learn from past incidents and refine our operational processes.

  • Q: Can you describe your experience with Infrastructure as Code (IaC) tools like Terraform?

    A: I have extensive experience using Terraform to automate the deployment and management of cloud infrastructure. For example, I designed a full stack application deployment, which included setting up VPCs, subnets, EC2 instances, and RDS databases. By using Terraform's modular approach, I ensured that our infrastructure could be versioned and easily replicated across different environments, improving consistency.

  • Q: How do you approach incident management and what steps do you take post-incident?

    A: In incident management, I follow a structured approach: first, I assess the severity and impact, then take immediate action to restore service. Post-incident, I conduct a thorough investigation to determine root causes, documenting all findings. I then create run-books and Standard Operating Procedures (SOPs) to ensure similar issues can be prevented in the future, and I prioritize communication with affected stakeholders throughout the process.

  • Q: What strategies do you employ to handle scaling challenges in a distributed system?

    A: To handle scaling challenges, I first analyze the system's bottlenecks using performance metrics from tools like Prometheus and NewRelic. I then implement horizontal scaling techniques, ensuring that load balancers efficiently distribute traffic. Additionally, I routinely review our applications for inefficiencies and optimize them to handle increased loads, while ensuring that distributed storage solutions are reliable and performant.

  • Q: How do you document processes and create run-books for your team?

    A: I believe in documenting processes thoroughly and clearly to support knowledge sharing within the team. I start by outlining each step in the process, then expand upon each step with detailed instructions, examples, and links to relevant resources. I also use collaborative tools to gather feedback from colleagues, ensuring that the documentation is user-friendly and up-to-date. This practice not only helps in onboarding new team members but also improves overall operational efficiency.

People Faces

200+ professionals have found their dream job with InJob.ai this week.

salary

Salary Benefits

Salary details not provided

application process

Want to apply directly?

Apply for the Site Reliability Engineer position at Material Bank® in using https://www.linkedin.com/jobs/view/3979378301

Get StartedGet Started

Similar Jobs found by InJob.AI


Scroll To Top
Get Started

Frequently asked Questions

Still have a question? Check out our FAQ section below.

FAQ Section

InJob searches for the best jobs, based on your profile and automatically generates customized cover letters for you. It saves a lot of hours in your job hunting time.

InJob creates your profile by having a conversation with you to learn about your skills and requirements. It also scans your resume to gather information about your experiences, skills, and achievements. This information is used to craft your profile in the backend which is further used to match jobs and gives you a personalized cover letter for each job opportunity.

InJob searches for job opportunities across a wide range of sources, including LinkedIn, Indeed, and hundreds of other job boards to find hidden gems. Its search is not limited, ensuring it covers as many potential job listings as possible. It also searches the career pages of individual companies that suit your target industry and location and you get applied there.

InJob is constantly active, scanning for fresh job opportunities every single minute. This ensures that you are the first person to apply to new job listings that align with your profile.

InJob plays matchmaker by comparing your profile and resume with job listings. Each job receives a score from 1-10, indicating how well you match with it.

In the upcoming update, Yes, this will be included and this will be the main differentiator. InJob will apply for jobs on your behalf. It will target top matches and craft custom cover letters for each job, ensuring your application stands out. InJob will also handle the application process, including visiting company websites and filling out forms.

In the upcoming update, Yes, InJob will provide an interactive dashboard that serves as mission control for your job search. It will display all the jobs InJob has applied for you and their current status. You will also be able to track which companies have shown interest in your profile and view the feedback they provided.

In an upcoming feature, Yes, InJob will collect all feedback, including positive and constructive feedback, and presents it to you. This will allow you to know exactly where you stand in the job market and provides insights on how to improve your skills.