This post is brought to you by Ravello R&D based on our own internal best practices and lessons learned. We need to continually monitor our servers, production and dev environments. As our environments grow and scale out it becomes increasingly difficult to debug failures and crisis analysis requires multi SSH-ing to different servers. Therefore we wanted to be able to view all the logs for all our servers from one single entry point. We also wanted to be notified of abnormal activity in our logs, because we can’t sit and watch them all day long.
Centralized use of logging tools, with their many features, allows Ops teams to analyze the root cause of a crisis in infrastructure and helps DevOps teams to easily analyze and troubleshoot production/development issues. To this end, we surveyed the capabilities of the available tools before choosing a tool for us. Here is what we found.
5 Common Features of Central Logging Tools
- Log collection: From any static/on-demand resource to a single secured sign-in application, accessible from anywhere (not only from our own VPN).
- Alerts: The stakeholders of any specific exceptions in logs receive notifications based on configurable criteria, helping detect issues in production even before a user complains about them.
- Aggregation: Scaled-out servers behind load balancers each produce their own log files, making it impossible to debug a single action flow that distributed between servers, unless the logs converge into a single article.
- History: Keeping old logs can be very helpful when trying to understand why and when a specific product behavior began.
- Visual indicators: Abnormal behaviors can be detected faster when we see them in a visual instrument such as a graph, where peak points are easily noticed.
6 Popular Log Tools
Most of the tools available are based on configuring the syslog on the required server to send data to the remote applications that handle them. Here are examples of some of the popular tools:
- Splunk Storm – Provides cloud-based operation analysis and troubleshooting for your application. Splunk storm supports multiple integration, with applications such as AWS and Heroku. It also allows searching, visualizing, and sharing logs and monitoring data. www.splunkstorm.com
- Graylog – An open source self-hosted application allowing to search logs, create charts and reports, add alerts for incidents. graylog2.org
- Sumlogic – Collects, centralizes, alerts, and visualizes logs. A cloud-based SaaS app, it requires agent on designated servers. Sumlogic includes “Prediction” features to detect issues before they arise. www.sumologic.com
- Logentries – A cloud-based SaaS application , a simple and powerful tool to search, tag, alert, and track log data from a single location. Logentries supports integration to aws cloudWatch and Heroku. www.logentries.com
- Papertrail – A clean and simple cloud-based SaaS application that collects and aggregates logs from multiple sources. It provides powerful search capabilities, alerting, and visual indicators, as well as alert integrations into HipChat, PagerDuty and more. It features simple and secured configuration and setup, an intuitive UI, and API. www.papertrailapp.com
We sought a simple tool that provides just what we need, that is centralizing all our servers (production and development environments) in a single place, where we can just log in and start debugging. A tool that would alert us for special events in the server logs, and allow secure but simple implementation for each server. We chose Papertrail.
The pricing was fairly reasonable, for about 50G of log data saved for 2 weeks, and archived for 1 year, for both our dev/prod accounts that are separated but easily switched between.
At Ravello Systems, because we deploy many on-demand cloud VMs it is important for us to monitor and access logs. We were able to easily set up a simple configuration on each server as it goes live – so we can immediately track its behavior, detect issues, and receive alerts.
Watch the video below and learn more about Ravello Systems