Monarch, Google’s Planet-Scale Monitoring Infrastructure
Google runs very large and complex distributed systems. Keeping them healthy is a full time job in itself. The first step is knowing what is going on in our planet-wide extensive infrastructure. This talk is a technical deep dive into our state-of-the art monitoring systems: what challenges we faced, what solution we found, what lessons we learned.
Roberto has been a Google SRE for 4 years, working on production monitoring and machine learning. Prior to Google, he worked for 14 years in software and systems engineering leading web, virtual reality and health care projects for small and large enterprises, startups and more than 20 banks.