The cap_get_bound() function and CAP_IS_SUPPORTED macro were added in
libcap-2.21. Check if the macro is defined before use.
The sys/capability.h header from libcap-2.16 and earlier disables the
linux/types.h header, which breaks the linux/ptp_clock.h header. Change
the order to include sys/capability.h as the last system header.
In the next Linux version the recvmmsg() system call will be probably
fixed to not return socket errors (e.g. due to ICMP) when reading from
the error queue.
The NTP I/O code assumed this was the correct behavior. When the system
call is fixed, a socket error on a client socket will cause chronyd to
enter a busy loop consuming the CPU until the receive timeout is reached
(8 seconds by default).
Use getsockopt(SO_ERROR) to clear the socket error when reading from the
error queue failed.
Instead of having adjtimex just fail with a permission issue
improve the error messaging by warning for the lack of
CAP_SYS_TIME on SYS_Linux_Initialise.
Message will look like (instead of only the latter message):
CAP_SYS_TIME not present
adjtimex(0x8001) failed : Operation not permitted
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
When dropping the root privileges, don't try to keep the CAP_SYS_TIME
capability if the -x option was enabled. This allows chronyd to be
started without the capability (e.g. in containers) and also drop the
root privileges.
Programming pins for external PHC timestamping was added in Linux 3.15,
but the PHC subsystem is older than that. Compile the programming code
only when the ioctl is defined.
When processing data from the PTP_SYS_OFFSET ioctl, the sample is
dropped when an interval between two consecutive readings of the system
clock is negative or zero, assuming the clock has been stepped between
the two readings.
With a real PHC the interval is normally expected to be at least a
microsecond, but with a virtual PHC and a low-resolution system clock
it's possible to get two readings with the same system time. Modify the
check to drop only samples with a negative delay.
It was never used for anything and messages in debug output already
include filenames, which can be easily grepped if there is a need
to see log messages only from a particular file.
Enable SCM_TIMESTAMPING control messages and the socket's error queue in
order to receive our transmitted packets with a more accurate transmit
timestamp. Add a new file for Linux-specific NTP I/O and implement
processing of these messages there.
Enable the PRV_Name2IPAddress() function with seccomp support and start
the helper process before loading the seccomp filter (but after dropping
root privileges). This will move the getaddrinfo() call outside the
seccomp filter and should make it more reliable as the list of required
system calls won't depend on what glibc NSS modules are used on the
system.
The system drivers may implement their own slewing which the generic
driver can use to slew faster than the maximum frequency the driver is
allowed to set directly.
Remove functions that are included in the new timex driver. Keep only
functions that have extended functionality, i.e. read and set the
frequency using the timex tick field and apply step offset with
ADJ_SETOFFSET.
Merge the code from wrap_adjtimex.c that is still needed with
sys_linux.c and remove the file.
The Linux secure computing (seccomp) facility allows a process to
install a filter in the kernel that will allow only specific system
calls to be made. The process is killed when trying to make other system
calls. This is useful to reduce the kernel attack surface and possibly
prevent kernel exploits when the process is compromised.
Use the libseccomp library to add rules and load the filter into the
kernel. Keep a list of system calls that are always allowed after
chronyd is initialized. Restrict arguments that may be passed to the
socket(), setsockopt(), fcntl(), and ioctl() system calls. Arguments
to socketcall(), which is used on some architectures as a multiplexer
instead of separate socket system calls, are not restricted for now.
The mailonchange directive is not allowed as it calls sendmail.
Calls made by the libraries that chronyd is using have to be covered
too. It's difficult to determine which system calls they need as it may
change after an upgrade and it may depend on their configuration (e.g.
resolver in libc). There are also differences between architectures. It
can all break very easily and is therefore disabled by default. It can
be enabled with the new -F option.
This is based on a patch from Andrew Griffiths <agriffit@redhat.com>.
The optimization avoiding unnecessary setting of the kernel leap status
can cause a problem when something outside chronyd sets the status to
the new expected value. There will be no TMX_SetLeap() call which would
update the saved status and the kernel status will be overwritten with
the old (incorrect) value in a later TMX_*() call.
Always call TMX_SetLeap() to save the new value and for the log message
selection just check if a leap second has been applied.
When a leap second is applied by the kernel, it doesn't actually clear
the STA_INS|STA_DEL bits from the status word, but the state returned
by ntp_adjtime()/adjtimex() is TIME_WAIT until the application clears
the bits.
Add "System clock status reset after leap second" log message for this
case.
Different systems may consider different time values to be valid.
Don't exit on settimeofday()/adjtimex() error in case the check in
UTI_IsTimeOffsetSane() isn't restrictive enough.
This will be used to set the kernel adjtimex() variables to allow other
applications running on the system to know if the system clock is
synchronized and the estimated error and the maximum error.
Since the kernel USER_HZ constant was introduced and the internal HZ
can't be reliably detected in user-space, the frequency scaling constant
used with older kernels is just a random guess.
Remove the scaling completely and let the closed loop compensate for the
error. To prevent thrashing between two states when the system's
frequency error is close to a multiple of USER_HZ, stick to the current
tick value if it's next to the new required tick. This is used only on
archs where USER_HZ is 100 as the frequency adjustment is limited to 500
ppm.
The linux_hz and linux_freq_scale directives are no longer supported,
but allowed by the config parser.
Strip all slewing code (adjtime(), freq locked nano PLL, fast tick
slewing) from the Linux driver and use the new generic frequency only
slewing instead. The advantages include stable clock control with very
short update intervals, good control of the slewing frequency, cheap
cooking of raw time stamps and unlimited frequency offset.
The kernel currently doesn't support a linear adjustment with
programmable rate, extend the use of the kernel PLL with locked
frequency instead.
Set the PLL time constant according to the correction time corresponding
to the correction rate and corrected offset.
On kernels with nano PLL adjtime() is no longer used.
We want to correct the offset quickly, but we also want to keep the
frequency error caused by the correction itself low.
Define correction rate as the area of the region bounded by the graph of
offset corrected in time. Set the rate so that the time needed to correct
an offset equal to the current sourcestats stddev will be equal to the
update interval (assuming linear adjustment). The offset and the
time needed to make the correction are inversely proportional.
This is only a suggestion and it's up to the system driver how the
adjustment will be executed.
This is needed to keep sourcestats accurate when the actual frequency is
different from the requested frequency due to clamping (or possibly
rounding in future system drivers).