Alright, so I use lm-sensors compiled with the sensord USE flag on gentoo. this enables logging logged temps to an rrd file

I've reconfigured sensord daemon with sysvinit to boot with these flags (with /etc/conf.d/sensord):

SENSORD_OPTS="--rrd-file=/var/log/sensord.rrd --rrd-interval=1s --load-average"

I do this for a reason. One day I'll document in under [[framework sensor instrumentation]]

I'm also running a graph on my machine that's querying this file /var/log/sensord.rrd regularly.

SOMETIMES sensord just stops working. I've documented this at the bug ticket (TODO: insert bug ticket)

the log file is cryptic and forever growing. I've been having trouble reproducing it.

I'm doing a deep dive today

Useful technologies:

A spattering of my bash history:

TIME     COMMAND           PID  TYPE   SIZE MODE  M      START        END PATH
15:33:40 rrdtool         16146 POSIX        READ  0          0          0 /...
15:33:51 rrdtool         19004 POSIX  11.9M READ  0          0          0 /home/msirabella/proj/framework_thermal_instrumentation/gpu.rrd
15:33:56 rrdupdate       23645 POSIX        WRITE 0          0          0 /...
15:33:59 rrdupdate       25561 POSIX  46.1M WRITE 0          0          0 /var/log/sensord.rrd
15:34:01 rrdupdate       27558 POSIX        WRITE 0          0          0 /...
15:34:03 rrdtool         28883 POSIX        READ  0          0          0 /...

Oh yeah, this is what my log dir looks like:

-rw-r--r-- 1 root root 138K Nov  2 23:34 /var/log/syslog.10.gz
-rw-r--r-- 1 root root 328K Nov 12 22:22 /var/log/syslog.9.gz
-rw-r--r-- 1 root root 118K Nov 12 23:43 /var/log/syslog.8.gz
-rw-r--r-- 1 root root 121K Nov 13 01:05 /var/log/syslog.7.gz
-rw-r--r-- 1 root root 121K Nov 13 02:26 /var/log/syslog.6.gz
-rw-r--r-- 1 root root 121K Nov 13 03:48 /var/log/syslog.5.gz
-rw-r--r-- 1 root root 123K Nov 13 05:11 /var/log/syslog.4.gz
-rw-r--r-- 1 root root 121K Nov 13 06:31 /var/log/syslog.3.gz
-rw-r--r-- 1 root root 122K Nov 13 07:53 /var/log/syslog.2.gz
-rw-r--r-- 1 root root 121K Nov 13 09:14 /var/log/syslog.1.gz
-rw-r--r-- 1 root root 9.6M Nov 13 13:35 /var/log/syslog.0
-rw-r--r-- 1 root root 4.2M Nov 13 15:35 /var/log/syslog

Thanks, sensord.

Nov 13 15:35:27 Ne0Ridl3y sensord: Error updating RRD file: '"sensord" "/var/log/sensord.rrd" "N:55.0:52.0:52.0:55.0:51.0:16.76:41.0:46.9:59.8:1.860000"' /var/log/sensord.rr
d: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord
.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sens
ord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/s
ensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/lo
g/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var
/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log
Nov 13 15:35:27 Ne0Ridl3y sensord: rrd update error (-1)
Nov 13 15:35:28 Ne0Ridl3y sensord: Error updating RRD file: '"sensord" "/var/log/sensord.rrd" "N:54.0:50.0:53.0:51.0:51.0:16.90:41.0:46.9:54.8:1.790000"' /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log

I had previously patched my sensord to have slightly better logging for this particular issue:

$ cat /etc/portage/patches/sys-apps/lm-sensors/rrd-debug.patch
--- a/prog/sensord/rrd.c
+++ b/prog/sensord/rrd.c	2024-10-19 14:18:57.576659193 -0700
@@ -456,8 +456,8 @@
 			"sensord", sensord_args.rrdFile, rrdBuff, NULL
 		};
 		if ((ret = rrd_update(3, (char **) /* WEAK */ argv))) {
-			sensorLog(LOG_ERR, "Error updating RRD file: %s: %s",
-				  sensord_args.rrdFile, rrd_get_error());
+			sensorLog(LOG_ERR, "Error updating RRD file: '\"%s\" \"%s\" \"%s\"' %s: %s",
+				  "sensord", sensord_args.rrdFile, rrdBuff, sensord_args.rrdFile, rrd_get_error());
 		}
 	}
 	sensorLog(LOG_DEBUG, "sensor rrd updated");

Shoutout links:

TIME     COMMAND           PID  TYPE   SIZE MODE  M      START        END PATH



15:33:40 rrdtool         16146 POSIX        READ  0          0          0 /...
15:33:51 rrdtool         19004 POSIX  11.9M READ  0          0          0 /home/msirabella/proj/framework_thermal_instrumentation/gpu.rrd
15:33:56 rrdupdate       23645 POSIX        WRITE 0          0          0 /...
15:33:59 rrdupdate       25561 POSIX  46.1M WRITE 0          0          0 /var/log/sensord.rrd
15:34:01 rrdupdate       27558 POSIX        WRITE 0          0          0 /...
15:34:03 rrdtool         28883 POSIX        READ  0          0          0 /...

TIL that there are multiple kinds of file locks on linux. props to lslocks for making it very clear the kind of locks that are possible / distinguishing them for me.

I started getting failures with lockrun -L /var/log/sensord.rrd -- sleep 1, but this is wierd since nothing else has been making read locks on the file except maybe sensord iteself.

oh, also. the above while true; do lslocks command sucks because it's just hoping that the loop hits when the file is locked. which apparently NEVER happens. was looking into hooking intofiyd into /proc/locks or something.

I then compiled my own fcntl tooling because APPARENTLY there's no bash tooling to hook into it really other than lockrun which has a purpose rather than an exploratory thing.

(mostly pulled from the gavv.net post)

The juice:

    FILE *fp = fopen("/var/log/sensord.rrd", "a+");
    int fd = fileno(fp);




    struct flock fl;
    memset(&fl, 0, sizeof(fl));

    // lock in shared mode
    fl.l_type = F_RDLCK;

    // lock entire file
    fl.l_whence = SEEK_SET; // offset base is start of the file
    fl.l_start = 0;         // starting offset is zero
    fl.l_len = 0;           // len is zero, which is a special value representing end
                            // of file (no matter how large the file grows in future)

    fl.l_pid = 0; // F_SETLK(W) ignores it; F_OFD_SETLK(W) requires it to be zero

    // F_SETLKW specifies blocking mode
    if (fcntl(fd, F_SETLKW, &fl) == -1) {
        exit(1);
    }

    usleep(100000);

    fl.l_type = F_UNLCK;

    if (fcntl(fd, F_SETLK, &fl) == -1) {
        exit(1);
    }

    close(fd);
    exit(0);

I've also pulled out the guts of sensord to really narrow down where the issue is:

#include <rrd.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main() {
    char * rrdFile = "/var/log/sensord.rrd";
    while (1) {
        char * rrdBuff = "N:53.0:53.0:53.0:53.0:53.0:15.76:40.0:35.9:45.8:0.540000";
        const char *argv[] = {
            "sensord", rrdFile, rrdBuff, NULL
        };
        int ret = rrd_update(3, (char **) /* WEAK */ argv);
        printf("%i\n", ret);
        printf("Error updating RRD file: %s: %s\n", rrd_get_error());
        sleep(1);
    };
}
$ ./a.out
Error updating RRD file: :
0
Error updating RRD file: :
0
Error updating RRD file: :
0
Error updating RRD file: :
-1
Error updating RRD file: could not lock RRD:
-1
Error updating RRD file: /var/log/sensord.rrd: could not lock RRD:
-1
Error updating RRD file: /var/log/sensord.rrd: /var/log/sensord.rrd: could not lock RRD:
-1
Error updating RRD file: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: could not lock RRD:
-1
Error updating RRD file: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: could not lock RRD:
-1
Error updating RRD file: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: could not lock RRD:
-1
Error updating RRD file: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: could not lock RRD:
-1
Error updating RRD file: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: could not lock RRD:
-1
Error updating RRD file: /var/log/sensord.rrd: /var/log/sensord

Yep, cool. looks like this is really an rrd problem. you'de think rrd_updates would be insolated between calls.

Updating rrd just to make sure I'm not being stupid and this has been a solved problem....

the solution was a missing rrd_error_clear() in sensord. who woulda thunk.

--- lm-sensors-3.6.2.tar.gz/prog/sensord/rrd.c
+++ ./prog/sensord/rrd.c
@@ -458,6 +458,7 @@
 		if ((ret = rrd_update(3, /* WEAK */ argv))) {
 			sensorLog(LOG_ERR, "Error updating RRD file: %s: %s",
 				  sensord_args.rrdFile, rrd_get_error());
+			rrd_clear_error();
 		}
 	}
 	sensorLog(LOG_DEBUG, "sensor rrd updated");

Update: apparently setting RRD_LOCKING=block in the sensord environment will also fix this.

I forget who I heard this from, maybe it came to me in a dream?

some tooling I want to make: