Alright, so I use lm-sensors compiled with the sensord USE flag on gentoo. this enables logging logged temps to an rrd file
I've reconfigured sensord daemon with sysvinit to boot with these flags (with /etc/conf.d/sensord):
SENSORD_OPTS="--rrd-file=/var/log/sensord.rrd --rrd-interval=1s --load-average"
I do this for a reason. One day I'll document in under [[framework sensor instrumentation]]
I'm also running a graph on my machine that's querying this file /var/log/sensord.rrd regularly.
SOMETIMES sensord just stops working. I've documented this at the bug ticket (TODO: insert bug ticket)
the log file is cryptic and forever growing. I've been having trouble reproducing it.
I'm doing a deep dive today
Useful technologies:
- i3 for multi-window view
- a mad amount of
bashscripting, combining for loops,watch, etc rrdtoollslocksmoreutilsflock(or not lol)lockrungccw3mbecause firefox is currently ramping up my fan to points where it chokes out. also just better for focus (working on it :P)- bash vim mode
forkstat, didn't end up being useful but still cool to seeps,pstree,pgrep, etcinotifywatchstracedebugedit
A spattering of my bash history:
echo -n 'TIME '; lslocks | head -n 1; while true; do lslocks | grep -i rrd | ts '%T' ; done:
TIME COMMAND PID TYPE SIZE MODE M START END PATH
15:33:40 rrdtool 16146 POSIX READ 0 0 0 /...
15:33:51 rrdtool 19004 POSIX 11.9M READ 0 0 0 /home/msirabella/proj/framework_thermal_instrumentation/gpu.rrd
15:33:56 rrdupdate 23645 POSIX WRITE 0 0 0 /...
15:33:59 rrdupdate 25561 POSIX 46.1M WRITE 0 0 0 /var/log/sensord.rrd
15:34:01 rrdupdate 27558 POSIX WRITE 0 0 0 /...
15:34:03 rrdtool 28883 POSIX READ 0 0 0 /...
watch -w -n 0.1 rrdtool fetch --start 'now-${LINES}s' /var/log/sensord.rrd AVERAGEwatch -n 0.1 'tail -n 300 syslog | grep "$(date -d "now - 1 second" +%T)"'flock -s /var/log/sensord.rrd sleep infinitywhile true; do rrdupdate "/var/log/sensord.rrd" "N:53.0:53.0:53.0:53.0:53.0:15.76:40.0:35.9:45.8:0.540000"; done- LOTS of
/etc/init.d/sensord restart tail -f /var/log/syslogtput rmamapparently (TIL)strace -ff --trace=fcntlUSE=debug FEATURES="nostrip installsources" CFLAGS="-ggdb3" emerge rrdtool
Oh yeah, this is what my log dir looks like:
-rw-r--r-- 1 root root 138K Nov 2 23:34 /var/log/syslog.10.gz
-rw-r--r-- 1 root root 328K Nov 12 22:22 /var/log/syslog.9.gz
-rw-r--r-- 1 root root 118K Nov 12 23:43 /var/log/syslog.8.gz
-rw-r--r-- 1 root root 121K Nov 13 01:05 /var/log/syslog.7.gz
-rw-r--r-- 1 root root 121K Nov 13 02:26 /var/log/syslog.6.gz
-rw-r--r-- 1 root root 121K Nov 13 03:48 /var/log/syslog.5.gz
-rw-r--r-- 1 root root 123K Nov 13 05:11 /var/log/syslog.4.gz
-rw-r--r-- 1 root root 121K Nov 13 06:31 /var/log/syslog.3.gz
-rw-r--r-- 1 root root 122K Nov 13 07:53 /var/log/syslog.2.gz
-rw-r--r-- 1 root root 121K Nov 13 09:14 /var/log/syslog.1.gz
-rw-r--r-- 1 root root 9.6M Nov 13 13:35 /var/log/syslog.0
-rw-r--r-- 1 root root 4.2M Nov 13 15:35 /var/log/syslog
Thanks, sensord.
Nov 13 15:35:27 Ne0Ridl3y sensord: Error updating RRD file: '"sensord" "/var/log/sensord.rrd" "N:55.0:52.0:52.0:55.0:51.0:16.76:41.0:46.9:59.8:1.860000"' /var/log/sensord.rr
d: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord
.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sens
ord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/s
ensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/lo
g/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var
/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log
Nov 13 15:35:27 Ne0Ridl3y sensord: rrd update error (-1)
Nov 13 15:35:28 Ne0Ridl3y sensord: Error updating RRD file: '"sensord" "/var/log/sensord.rrd" "N:54.0:50.0:53.0:51.0:51.0:16.90:41.0:46.9:54.8:1.790000"' /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log
I had previously patched my sensord to have slightly better logging for this particular issue:
$ cat /etc/portage/patches/sys-apps/lm-sensors/rrd-debug.patch
--- a/prog/sensord/rrd.c
+++ b/prog/sensord/rrd.c 2024-10-19 14:18:57.576659193 -0700
@@ -456,8 +456,8 @@
"sensord", sensord_args.rrdFile, rrdBuff, NULL
};
if ((ret = rrd_update(3, (char **) /* WEAK */ argv))) {
- sensorLog(LOG_ERR, "Error updating RRD file: %s: %s",
- sensord_args.rrdFile, rrd_get_error());
+ sensorLog(LOG_ERR, "Error updating RRD file: '\"%s\" \"%s\" \"%s\"' %s: %s",
+ "sensord", sensord_args.rrdFile, rrdBuff, sensord_args.rrdFile, rrd_get_error());
}
}
sensorLog(LOG_DEBUG, "sensor rrd updated");
Shoutout links:
- https://gavv.net/articles/file-locks/
- https://wiki.gentoo.org/wiki/Debugging#Installing_debugging_information_for_packages
TIME COMMAND PID TYPE SIZE MODE M START END PATH
15:33:40 rrdtool 16146 POSIX READ 0 0 0 /...
15:33:51 rrdtool 19004 POSIX 11.9M READ 0 0 0 /home/msirabella/proj/framework_thermal_instrumentation/gpu.rrd
15:33:56 rrdupdate 23645 POSIX WRITE 0 0 0 /...
15:33:59 rrdupdate 25561 POSIX 46.1M WRITE 0 0 0 /var/log/sensord.rrd
15:34:01 rrdupdate 27558 POSIX WRITE 0 0 0 /...
15:34:03 rrdtool 28883 POSIX READ 0 0 0 /...
TIL that there are multiple kinds of file locks on linux. props to lslocks for making it very clear the kind of locks that are possible / distinguishing them for me.
I started getting failures with lockrun -L /var/log/sensord.rrd -- sleep 1, but this is wierd since nothing else has been making read locks on the file except maybe sensord iteself.
oh, also. the above while true; do lslocks command sucks because it's just hoping that the loop hits when the file is locked. which apparently NEVER happens. was looking into hooking intofiyd into /proc/locks or something.
I then compiled my own fcntl tooling because APPARENTLY there's no bash tooling to hook into it really other than lockrun which has a purpose rather than an exploratory thing.
(mostly pulled from the gavv.net post)
The juice:
FILE *fp = fopen("/var/log/sensord.rrd", "a+");
int fd = fileno(fp);
struct flock fl;
memset(&fl, 0, sizeof(fl));
// lock in shared mode
fl.l_type = F_RDLCK;
// lock entire file
fl.l_whence = SEEK_SET; // offset base is start of the file
fl.l_start = 0; // starting offset is zero
fl.l_len = 0; // len is zero, which is a special value representing end
// of file (no matter how large the file grows in future)
fl.l_pid = 0; // F_SETLK(W) ignores it; F_OFD_SETLK(W) requires it to be zero
// F_SETLKW specifies blocking mode
if (fcntl(fd, F_SETLKW, &fl) == -1) {
exit(1);
}
usleep(100000);
fl.l_type = F_UNLCK;
if (fcntl(fd, F_SETLK, &fl) == -1) {
exit(1);
}
close(fd);
exit(0);
I've also pulled out the guts of sensord to really narrow down where the issue is:
#include <rrd.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main() {
char * rrdFile = "/var/log/sensord.rrd";
while (1) {
char * rrdBuff = "N:53.0:53.0:53.0:53.0:53.0:15.76:40.0:35.9:45.8:0.540000";
const char *argv[] = {
"sensord", rrdFile, rrdBuff, NULL
};
int ret = rrd_update(3, (char **) /* WEAK */ argv);
printf("%i\n", ret);
printf("Error updating RRD file: %s: %s\n", rrd_get_error());
sleep(1);
};
}
$ ./a.out
Error updating RRD file: :
0
Error updating RRD file: :
0
Error updating RRD file: :
0
Error updating RRD file: :
-1
Error updating RRD file: could not lock RRD:
-1
Error updating RRD file: /var/log/sensord.rrd: could not lock RRD:
-1
Error updating RRD file: /var/log/sensord.rrd: /var/log/sensord.rrd: could not lock RRD:
-1
Error updating RRD file: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: could not lock RRD:
-1
Error updating RRD file: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: could not lock RRD:
-1
Error updating RRD file: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: could not lock RRD:
-1
Error updating RRD file: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: could not lock RRD:
-1
Error updating RRD file: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: /var/log/sensord.rrd: could not lock RRD:
-1
Error updating RRD file: /var/log/sensord.rrd: /var/log/sensord
Yep, cool. looks like this is really an rrd problem. you'de think rrd_updates would be insolated between calls.
Updating rrd just to make sure I'm not being stupid and this has been a solved problem....
the solution was a missing rrd_error_clear() in sensord. who woulda thunk.
--- lm-sensors-3.6.2.tar.gz/prog/sensord/rrd.c
+++ ./prog/sensord/rrd.c
@@ -458,6 +458,7 @@
if ((ret = rrd_update(3, /* WEAK */ argv))) {
sensorLog(LOG_ERR, "Error updating RRD file: %s: %s",
sensord_args.rrdFile, rrd_get_error());
+ rrd_clear_error();
}
}
sensorLog(LOG_DEBUG, "sensor rrd updated");
Update: apparently setting RRD_LOCKING=block in the sensord environment will also fix this.
I forget who I heard this from, maybe it came to me in a dream?
some tooling I want to make:
- better lock experimental tooling.
flockis pretty good but only does one thing. flock withfcntl.- (Update: apparently since commit 8a0dc11 flock also supports fcntl, just hasn't been released)
lslocksis cool except it doesn't watch