In the long-awaited second part, we will learn how to manually create InfluxDB queries and will use this knowledge to create Continuous Queries which helps us to save disk space for long-term data. On top of that, we will install telegraf agent to gather various system metrics from our network hosts and store them in InfluxDB. We will use Home Assistant 0.92.2 and Hass.io addon InfluxDB 3.0.5 (InfluxDB 1.7.6).
Part one can be found here.
Manual InfluxDB queries
Let’s create a query to get data from an HA sensor. Open InfluxDB addon UI, go to Explore and type our first query, then click Submit Query button:
select * from home_assistant.autogen.temp group by entity_id
We should get something like this:
Using Visualization->Visualization Type it is possible to choose between Line Graph and Table representation of our data. When the Line Graph is selected, it makes sense to add the legend to our graph. This can be done in Visualization->Customize->Static Legend. In order to apply changes, it might be necessary to switch to another visualization type and then back to the Line Graph.
Now it’s time to try an aggregation query. The idea is to decrease the number of points by calculating the mean value for the specified time interval (e.g. hour) and use this calculated value which represents the whole hour of data for the new graph:
select mean(value) from home_assistant.autogen.temperature group by time(1h), entity_id
As one can see, the graph looks very similar to previous, but now we need significantly fewer points to draw it:
Aggregation becomes even more important for frequently changed data, e.g. CPU load or network speed. There are a bunch of another functions you may use instead of
mean in your queries. Now we are ready to create our first Continuous Query which will run automatically every specified period of time, calculated aggregated results and store them into measurement with the required retention policy. If those terms just going over your head, I would recommend to refreshing InfluxDB terminology given in the part one.
Important note #1
If your query contains date/time, InfluxDB parser expects by default that it is in UTC format, as the data points itself stored as UTC in InfluxDB. Though, some clients (InfluxDB Web UI in our case), will convert query results to the local time using your browser settings. This may look confusing for the first time:
select value from home_assistant.year.temperature where time = '2019-05-21 21:00:00' group by entity_id
As we can see, InfluxDB added 3 hours to the query results as my time zone is UTC+3. You may specify your timezone by means of
tz clause added at the end of your query:
select value from home_assistant.year.temperature where time = '2019-05-22 00:00:00' group by entity_id tz('Europe/Moscow')
Now we have the same time zone both in the request and in the response. The list of available time zones can be found here.
Important note #2
It is essential to understand that the data points with the same timestamp will be stored as one point in case if they have equal tags or tags are missing. More specifically, the point which came later will overwrite previous points due to the time series database model.
Why it matters: this rule also works for any query, including Continuous Queries. When we create a CQ where we have more than one sensor in measurement, it is important to specify at least one tag in
group by clause which will help InfluxDB to distinguish values of one sensor from another. Otherwise, we may get the list of “random“ values without any information about sensor they belong to:
>select mean(value) from home_assistant.autogen.temperature group by time(1d) name: temperature time mean ---- ---- 2019-05-19T00:00:00Z 17.75294117647059 2019-05-20T00:00:00Z 19.30921052631579 2019-05-21T00:00:00Z 21.379761904761903 2019-05-22T00:00:00Z 19.62985074626866
One can see that temperature values were grouped for each day and there is no way to distinguish between sensors they belong to. Most probably that’s not what you wanted to achieve. Now let’s add
group by clause to our query:
>select mean(value) from home_assistant.autogen.temperature group by time(1d), entity_id name: temperature tags: entity_id=bedroom_temperature time mean ---- ---- 2019-05-19T00:00:00Z 24.661538461538463 2019-05-20T00:00:00Z 24.804347826086957 2019-05-21T00:00:00Z 25.37941176470588 2019-05-22T00:00:00Z 25.353846153846156 name: temperature tags: entity_id=yandex_weather_apparent_temperature time mean ---- ---- 2019-05-19T00:00:00Z 12 2019-05-20T00:00:00Z 15.620689655172415 2019-05-21T00:00:00Z 18.653846153846153 2019-05-22T00:00:00Z 16.041666666666668 name: temperature tags: entity_id=yandex_weather_temperature time mean ---- ---- 2019-05-19T00:00:00Z 14.818181818181818 2019-05-20T00:00:00Z 18.5 2019-05-21T00:00:00Z 18.666666666666668 2019-05-22T00:00:00Z 15.941176470588236
Now results look better, they are grouped by sensor name (
entity_id), values from each group are sorted by their timestamp.
Create a Retention Policy
First of all, we have to create a Retention Policy which, besides all, defines how long InfluxDB will hold our data. The query below will create a year RP:
CREATE RETENTION POLICY "year" ON "home_assistant" DURATION 52w REPLICATION 1
You may ignore the message “Your query is syntactically correct but returned no results” as our query actually returns no results, it just creates a new entity within InfluxDB.
Let’s create another RP which will hold data forever:
CREATE RETENTION POLICY "infinite" ON "home_assistant" DURATION INF REPLICATION 1
Make sure our RPs were created:
> show retention policies on home_assistant name duration shardGroupDuration replicaN default ---- -------- ------------------ -------- ------- autogen 168h0m0s 168h0m0s 1 true infinite 0s 168h0m0s 1 false year 8736h0m0s 168h0m0s 1 false
More info on Retention Policies can be found in the official docs.
Create a Continuous Query
Continuous query - an InfluxDB facility, which allows running a specific query periodically. What is more important is that the result of this query may be stored to another measurement with its own retention policy. That is exactly how it works: we create a query to get aggregated hourly data from a sensor and store it to a new measurement with very long retention period (retention policy). Needless to say, you may get as many CQs and measurements as you would need. It is limited only by your imagination!
CREATE CONTINUOUS QUERY "cq_1h_temp10" ON "home_assistant" BEGIN SELECT mean("value") AS value INTO "year"."temperature" FROM "autogen"."temperature" GROUP BY time(1h), entity_id FILL(previous) END
If no updates received from a sensor during last hour, by default CQ will return an empty point. We are using FILL(previous) to use the last known value from the sensor instead of empty.
Let’s make sure that CQ is created:
> show continuous queries name: home_assistant name query ---- ----- cq_1h_temp10 CREATE CONTINUOUS QUERY cq_1h_temp10 ON home_assistant BEGIN SELECT mean(value) AS value INTO home_assistant.year.temperature FROM home_assistant.autogen.temperature GROUP BY time(1h), entity_id fill(none) END
Job is done, now we should wait a few hours and check how new measurement
year.temperature will be filling up with our temperature data.
More on CQ you can get from the official documentation.
Display a graph upon a Continuous Query
This is easy, the data in a measurement received from a CQ is not so different from the source data from our sensors. Move on to Grafana, select a dashboard or create a new one, create a new panel (Add Panel->Add Query).
GROUP BY field:
ALIAS BY field:
The last one tells Grafana to use
entity_id tag from InfluxDB for legend labels, this is very convenient. Now we should see our graph which can be embedded to Home Assistant Lovelace card in the same way how it was described in the first part of this post.
telegraf - is a tiny single executable file with no dependencies which has very low memory and CPU footprint, so does every program written in Go language, including InfluxDB. Telegraf is responsible for the collection of a huge list of metrics and deliver them to InfluxDB for storage and analysis.
With telegraf we can gather important metrics from the machines on our home network, including those which are not supported by HA, e.g. docker containers memory and CPU usage by Hass.io docker containers.
We can specify which metrics to collect with the configuration file. Unfortunately, there is no hassio addon for telegraf so far so the only possibility to use it is the host OS direct access. You may choose between two options: manual install from a Linux command line or docker container. We will try the second one as it looks more fast and convenient
telegraf in a docker container
In order to use telegraf we need either of the following:
- hassio, installed on a generic Linux server according to this guide
- hassos with the host access
- any other PC or single board computer with Linux and docker installed
- hassio addon portainer
Below is the guide which will work for options 1,2 and 3 from the list above. Open OS prompt and make sure that docker is up and running:
user@host:~$ docker ps
You should see a list of docker containers along with their status and other auxiliary information. We need to create a folder for telegraf configuration file:
user@host:~$ mkdir -p ~/docker/telegraf && cd ~/docker/telegraf
Pull the latest version of telegraf.conf:
user@host:~$ wget https://github.com/influxdata/telegraf/raw/master/etc/telegraf.conf
telegraf user in InfluxDB in the same way like we did in the part one for
Let’s also make
telegraf database (check part one) and add a retention policy
day which will keep our data for one day:
Short term storage will not allow the database to eat all disk space due to frequently updated data like CPU load. When both user and DB are ready, we should edit section
outputs.influxdb of telegraf.conf as shown below:
[[outputs.influxdb]] urls = ["http://a0d7b954-influxdb:8086"] database = "telegraf" retention_policy = "day" precision = "s" timeout = "5s" username = "telegraf" password = "password"
Now we should jump to the
inputs section. There are already some metrics activated by default like processor, mem and disk usage. You may leave them as they are or comment unnecessary things. At the same time, we want to add docker container metrics, for example, to know if a container consumes too many resources. Uncomment section
inputs.docker and edit the following lines:
[[inputs.docker]] endpoint = "unix:///var/run/docker.sock" container_names =  timeout = "5s"
Run our docker container:
user@host:~$ docker run -d --name telegraf --restart=always --net="hassio" -v /etc/localtime:/etc/localtime:ro -v /home/user/docker/telegraf/telegraf.conf:/etc/telegraf/telegraf.conf:ro -v /var/run/docker.sock:/var/run/docker.sock telegraf:latest
/home/user/docker/telegraf/telegraf.conf should point to the folder where
telegraf.conf is. The given command assumes that telegraf container runs on the same machine where hassio located, therefore
--net="hassio" option will let telegraf to access other containers by their names in the
hassio internal docker network. If the container runs on another machine, one should use
--net="host", and use IP of host name of a InfluxDB machine as
urls param of telegraf.conf. Let’s check telegraf logs:
user@host:~$ docker logs telegraf # docker logs telegraf 2019-05-27T20:38:55Z I! Starting Telegraf 1.10.4 2019-05-27T20:38:55Z I! Using config file: /etc/telegraf/telegraf.conf 2019-05-27T20:38:55Z I! Loaded inputs: disk diskio kernel swap system cpu mem processes docker 2019-05-27T20:38:55Z I! Loaded aggregators: 2019-05-27T20:38:55Z I! Loaded processors: 2019-05-27T20:38:55Z I! Loaded outputs: influxdb 2019-05-27T20:38:55Z I! Tags enabled: host=c85cf92dfb59 2019-05-27T20:38:55Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"c85cf92dfb59", Flush Interval:10s
Everything is fine, we can see no errors, now it is time to open InfluxDB and check if data flows. Now we can see that in Explore there are new database telegraf and a set of new measurements holding the metrics of our host:
Below is the sample of query which can be used to build a graph of CPU usage per docker container:
SELECT usage_percent FROM "telegraf"."day"."docker_container_cpu" where time > :dashboardTime: group by container_name
Here we are using
docker_container_cpu as a measurement and
usage_percent as a field. This query can be easily modified by changing either field or measurement to other values from Measurement & Tags and Fields columns:
Now let’s try it in Grafana. First of all, we need to set up the additional data source for our
telegraf DB in the same way we did in the first part for
home_assistant DB. Once the data source is ready, choose it from the Queries to combo box and make the following query:
Go to Visualization tab and choose the following options:
If you plan to keep metrics collected by telegraf during a large amount of time, you can create another CQ for them to reduce the amount of disk space in the same way we did before. Needless to say that the graph can be integrated into Lovelace card as it was demonstrated in the part one.
Sources supported by telegraf
The full list of telegraf plugins is given here.
Nerd bonus: InfluxQL queries from the command line (CLI)
Most of InfluxQL queries can be executed from the InfluxDB Web UI called Chronograf. Sometimes it may be inconvenient, especially when you need to copy query results somewhere else.
Hopefully, you may run InfluxQL queries straight from the command line! The only trick is to run influx command line utility from within a Hass.io docker container running InfluxDB. This can be achieved in two ways :
- Using OS host command line (when Hass.io was installed on a generic Linux server by a script)
- Using hassio portainer addon (when you don’t have access to host OS, e.g. hassos)
We consider the first option, you may try Portainer container console for that, but the web console is quite buggy, it does not support copy and paste and works really bad with the queries which do not fit into one line so cannot be recommended. But you can try it to check whether this is what you need as it’s way easier than obtaining host access for a hassos machine.
Run from the host command line
user@host:~$ docker exec -it addon_a0d7b954_influxdb influx -precision rfc3339 Connected to http://localhost:8086 version 1.7.2 InfluxDB shell version: 1.7.2 Enter an InfluxQL query >
We should authenticate and choose a database to work:
> auth username: homeassistant password: > use home_assistant Using database home_assistant
The important difference to InfluxDB(Capacitor) that you don’t have to specify DB in every query as it was already specified by
use command. Let’s get the list of measurements in our DB:
> show measurements name: measurements name ---- binary downsampled_temp energy humidity temperature
List of the field names:
> show field keys name: binary fieldKey fieldType -------- --------- No motion since float battery_level float device_class_str string friendly_name_str string state string value float name: energy fieldKey fieldType -------- --------- friendly_name_str string icon_str string unit_of_measurement_str string value float name: humidity fieldKey fieldType -------- --------- battery_level float device_class_str string friendly_name_str string unit_of_measurement_str string value float name: temperature fieldKey fieldType -------- --------- battery_level float device_class_str string friendly_name_str string unit_of_measurement_str string value float
Get the data from a measurement:
> select value, battery_level, entity_id from autogen.temperature WHERE time > now() - 3h name: temperature time value battery_level entity_id ---- ----- ------------- --------- 2019-05-28T13:37:05.37566592Z 17 yandex_weather_apparent_temperature 2019-05-28T14:37:06.288192Z 18 yandex_weather_temperature 2019-05-28T15:37:05.333918976Z 17 yandex_weather_temperature 2019-05-28T15:37:05.36456192Z 14 yandex_weather_apparent_temperature
In the example above we get values of fields named
battery_level, as well as the value of
entity_id tag. We also narrow our results by the last 3 hours of data.
Please note that command line client does not automatically convert time to the local timezone as Capacitor does so the timestamps are in UTC. As my time zone is UTC+3, the local time of the last record should be read as 18:37:05.
We can use another retention policy:
> select value, entity_id from year.temperature name: temperature time value entity_id ---- ----- --------- 2019-05-21T21:00:00Z 25.3 bedroom_temperature 2019-05-21T21:00:00Z 12 yandex_weather_apparent_temperature 2019-05-21T21:00:00Z 13 yandex_weather_temperature 2019-05-21T22:00:00Z 25.4 bedroom_temperature 2019-05-21T22:00:00Z 12.5 yandex_weather_apparent_temperature 2019-05-21T22:00:00Z 13.5 yandex_weather_temperature 2019-05-22T00:00:00Z 25.5 bedroom_temperature 2019-05-22T00:00:00Z 11 yandex_weather_apparent_temperature 2019-05-22T00:00:00Z 12 yandex_weather_temperature
Full specs of DB schema exploration commands, as well as InfluxQL syntax description, are given here.
In the second part, we learned how to make an arbitrary number of our own Continuous Queries as well as measurements to hold the aggregated data forever. We also managed to set up telegraf agent and got insights on the performance characteristics of our hardware. If I ever come up with the part 3, we will discuss how to set up centralized logging for most of the devices in your home network, use it to detect anomalies and report errors using different communication channels. Stay tuned.