Monitoring .NET applications with ELK
Performance monitoring is a very important part of the application development lifecycle. But only in the recent times it started gaining enough attention in the developers’ community. As a result, we can observe a constantly growing number of monitoring solutions available on the market. Cloud providers also offer their custom services (such as Azure Application Insight or Amazon CloudWatch), which nicely cooperate with applications hosted in the cloud.
In this post I would like to present you yet another set of tools, known as ELK. ELK is composed of three products: Elasticsearch, Logstash and Kibana (thus the shortcut). It is best known as a platform for storing and querying application logs. However, not everyone knows about its capabilities when it comes to managing performance data. In the subsequent paragraphs I’m going to show you how performance data travels through ELK components, and how we can effectively manage it. I am going to focus mainly on .NET applications, but the presented concepts can be easily applied to other frameworks as well.
ELK architecture
As mentioned above, ELK is built from three components: Elasticsearch, Logstash and Kibana. Elasticsearch is a storage engine, based on Lucene, suited perfectly for full-text queries. At the same time, it is easily scalable and maintainable. It may use a predefined schema or dynamic fields for the incoming data. Later, we will use its index templating system, which gives you the ability to predefine schemas for dynamically created indexes. Logstash is a log receiver and forwarder. Its main purpose is to reliably transport your logs from the application/server to the Elasticsearch store. A very simple Logstash configuration, which reads new lines from the standard input and saves them in a local Elasticsearch might look as follows:
input {
stdin {}
}
output {
elasticsearch {
index => "myindex"
}
}
There may be multiple instances of Logstash agents deployed, each performing custom filtering or modifications on the processed data. Finally, Kibana is a web application, which allows you to visualize the collected data. You can create dashboards with graphs for your custom queries to the Elasticsearch. The image below summarizes the whole pipeline:
System monitoring
System monitors are developed under a common name Beats. Under the project main page you may find a list of them. For our performance monitoring scenario, the most interesting are: topbeat and metricbeat. The latter is still under heavy development and in the future will replace the former. Nevertheless, in this post I will focus on topbeat as it’s more mature solution. After downloading the package, we start by importing the Elasticsearch index template:
PS:> Invoke-WebRequest -Method PUT http://<elasticsearch-address>/_template/topbeat -InFile topbeat.template.json
We may connect topbeat directly to the Elasticsearch instance or use Logstash as a log buffer. I will choose the latter option, as I would like to pass all our logs through Logstash. It is also a safer choice as we can process the events asynchronously, and we won’t lose them if, for instance, Elasticsearch has a hiccup. Next step is to configure our Logstash pipeline. A sample configuration file (let’s name it logstash.conf) might look as follows:
input {
beats {
port => 5044
# ssl => true
# ssl_certificate => "c:\logstash\logstash.pem"
# ssl_key => "c:\logstash\logstash.key"
# ssl_key_passphrase => "mysecretpassword"
}
}
output {
if [type] == "system" or [type] == "process" {
elasticsearch {
index => "topbeat-%{+YYYY.MM.dd}"
template_name => "topbeat"
}
} else {
file { path => "c:\logstash\skipped.out" }
}
}
As you can see we are using BEATS protocol on port 5044 to listen for logs. Notice, the SSL lines are commented out, which means the connection won’t be either encrypted or authenticated. I recommend enabling SSL, as it just requires creating a set of certificates. The impact on performance should be negligible, and you will feel safe when your logs won’t be intercepted. The output endpoint is the local Elasticsearch server with a daily-recreated topbeat index.
It’s time to start the Logstash agent:
logstash agent –f logstash.conf
and the topbeat service:
PS:> Start-Service topbeat
Soon, events should start arriving to our Elasticsearch. Let’s visualize them on Kibana dashboard. I want to have four graphs available: system CPU usage, system memory usage, processes CPU usage and processes memory usage. System CPU usage can be visualized with the help of the metrics: cpu.system_p and cpu.user_p. Those are the settings I choose on the Visualize tab:
In order to display a nice percentage scale on the Y-axis you need to go to the Kibana settings for the topbeat index, and set the format as percentage for the metrics fields:
A very similar setup can be done for the system memory, but now using the mem.used_p metric.
For processes we need to add an additional sub-bucket:
and we will receive multiple graphs for processes running on the system. For Y-axis on the CPU usage graph I used the proc.cpu.user_p metric and proc.mem.size for the memory. Finally, after saving all those particular graphs we are ready to place them on our system board:
You may create dashboards for each of your machines (by specifying filters in the data queries) or aggregate system data from various machines on one graph (by sub-bucketing the beat.hostname field). As you see options are endless.
.NET world monitoring
In the previous paragraph we have configured some basic system monitoring. It might be enough, but for our own .NET applications we may be interested in some more detailed data. .NET provides a lot of interesting performance counters. Among them, those grouped under the .NET CLR Memory category I consider most valuable. I also like to know the number of thrown exceptions (.NET CLR Exceptions category), which is often related to problems found in the applications. Finally, for .NET web applications there is a bunch of counters found under the ASP.NET Applications category.
As Elasticsearch team does not provide anything for monitoring those counters, I decided to add this functionality to my Musketeer service. Starting from version 2.0 it may send data not only to the Diagnostics Kit dashboard, but also to Logstash. It is also possible to turn off Diagnostics Kit logging completely, and use Musketeer only as a Logstash agent. Musketeer uses the same BEATS protocol as topbeat and requires its own index in the Elasticsearch server. You may find a template in the repository and apply it to your Elasticsearch instance:
PS:> Invoke-WebRequest -Method PUT http://<elasticsearch-address>/_template/mperfstash -InFile musketeer-mperfstash-template.json
Musketeer identifies applications by their paths. Thus, if two applications share the same path, they are considered the same. I thought that path is more unique than process id, especially in IIS world when you may have multiple applications run in one application pool. To correctly match ASP.NET performance counters with applications, Musketeer needs to enumerate AppDomains in the w3wp process. This is due to a fact that ASP.NET counters instance names are built from app domain names. Enumerating app domains is not that easy, and after some research I decided to use the ETW events from the .NET rundown provider for this purpose.
Let’s make changes in the Musketeer.exe.config file so it sends logs to Logstash:
…
<appSettings>
…
<!--<add key="lowleveldesign.diagnostics.url" value="http://diag.local" />-->
<add key="logstash:url" value="tcp://logstash:5044" />
<!--<add key="logstash:certthumb" value="" />-->
…
</appSettings>
…
And install the service:
PS:> Musketeer.exe install
We also need to change the Logstash configuration so it will pick a correct Elasticsearch index:
input {
beats {
port => 5044
}
}
output {
if [type] =~ "Musketeer.PerfCounter" {
elasticsearch {
index => "mperfstash-%{+YYYY.MM.dd}"
template_name => "mperfstash"
}
} else if [type] == "system" or [type] == "process" {
elasticsearch {
index => "topbeat-%{+YYYY.MM.dd}"
template_name => "topbeat"
}
} else {
file { path => "c:\temp\skipped.out" }
}
}
After starting the LowLevelDesign.Musketeer service we should observe new events coming to the mperfstash index. The table below describes the events provided by the Musketeer service:
Performance counter | Field in the ES index |
Process\% Processor Time | PerfData.CPU |
Process\Working Set | PerfData.Memory |
Process\IO Read Bytes/sec | PerfData.IOReadBytesPerSec |
Process\IO Write Bytes/sec | PerfData.IOWriteBytesPerSec |
.NET CLR Memory\# Gen 0 Collections | PerfData.DotNetGen0Collections |
.NET CLR Memory\# Gen 1 Collections | PerfData.DotNetGen1Collections |
.NET CLR Memory\# Gen 2 Collections | PerfData.DotNetGen2Collections |
.NET CLR Memory\Gen 0 heap size | PerfData.DotNetGen0HeapSize |
.NET CLR Memory\Gen 1 heap size | PerfData.DotNetGen1HeapSize |
.NET CLR Memory\Gen 2 heap size | PerfData.DotNetGen2HeapSize |
.NET CLR Memory\% Time in GC | PerfData.DotNetCpuTimeInGc |
.NET CLR Exceptions\# of Exceps Thrown | PerfData.DotNetExceptionsThrown |
.NET CLR Exceptions\# of Exceps Thrown / sec | PerfData.DotNetExceptionsThrownPerSec |
ASP.NET Applications\Errors Total | PerfData.AspNetErrorsTotal |
ASP.NET Applications\Requests Executing | PerfData.AspNetRequestExecuting |
ASP.NET Applications\Requests Failed | PerfData.AspNetRequestsFailed |
ASP.NET Applications\Requests Not Found | PerfData.AspNetRequestsNotFound |
ASP.NET Applications\Requests Not Authorized | PerfData.AspNetRequestsNotAuthorized |
ASP.NET Applications\Requests In Application Queue | PerfData. AspNetRequestsInApplicationQueue |
ASP.NET Applications\Requests Timed Out | PerfData.AspNetRequestsTimedOut |
ASP.NET Applications\Requests Total | PerfData.AspNetRequestsTotal |
ASP.NET Applications\Requests/Sec | PerfData.AspNetRequestsPerSec |
ASP.NET Applications\Request Execution Time | PerfData.AspNetRequestExecutionTime |
ASP.NET Applications\Request Wait Time | PerfData.AspNetRequestWaitTime |
You may define which applications Musketeer should monitor – web applications are monitored by default, for Windows services you need to specify the inclusion/exclusion regexes (have a look at the project wiki).
Now, it’s time to create custom Kibana dashboards for our applications. I recommend creating one dashboard per application, and eventually one global dashboard to monitor Request Wait Time and Request Execution Time for all our web applications. Remember that applications are identified by their paths. So to filter events for a web application deployed under the c:\temp\testwebapp
folder, we need to add a following condition to the search query:
Our ASP.NET application requests graphs might look as follows:
Next, we may visualize the .NET memory statistics with the following graphs:
Let me finish here. I hope you will easily visualize other available metrics, such as exception thrown rate or requests execution/wait time. Creating custom dashboard with the predefined graphs is also very intuitive with Kibana.
Final words
ELK is a very powerful set of tools. In this post we have examined its capabilities when it comes to performance monitoring, but you should definitely try it as your applications log store too. Keep in mind that all the components are open-sourced and, if required, you may extend them to your needs. Additionally, the Elasticsearch team provides some paid plugins too, which may help you secure and better monitor your Elasticsearch/Kibana servers.
Sebastian Solnica
I’m a System Engineer. For more than 10 years I worked as a .NET developer, which gives me a good understanding of the necessity of automation in the development process. My another area of interest (which I could even call a hobby) involves application/system diagnostics and security. In my constant need to understand how things work, I debug and trace whatever runs on my machine. You may reach out to me on twitter: @lowleveldesign or through my personal site: http://www.lowleveldesign.org/.