Sam Newman (2016), Building Microservices, O′Reilly.
Monitoring
For each service
- Track inbound response time at a bare minimum. Then track error rates and build your way up to application-level metrics.
- Track the health of all downsteam responses, including the response time fo downstream calls. Track error rates return by these services.
- Standardize how and where metrics are collected.
- Log into a standard location, in a standard format.
- Monitor the underlying operating system so you can track down processes and do capacity planning.
For the system
- Aggregate host-level metrics (e.g CPU) together with application-level metrics.
- Ensure your metric storage tool allows for aggregation at a system or service level, and drill down to individual hosts.
- Have a single, queryable tool for aggregating and storing logs.
- Strongly consider standardizing on the use of corrlation IDs.
- Understand what requires a call to action, and structure alerting and dashboard accordingly.
- Investigate the possibility of unifying how you aggregate all your various metrics