Don't leave dangling pointers to timer queue entries when they are
freed in the scheduler finalization in case some code tried to remove
a timer later.
Fixes: 6ea1082a72 ("sched: free timer blocks on exit")
Wait for four consecutive source selections giving a bad status
(falseticker, bad distance or jittery) before triggering the source
replacement. This should reduce the rate of unnecessary replacements
and shorten the time needed to find a solution when unreplaceable
falsetickers are preventing other sources from forming a majority due
to switching back and forth to unreachable servers.
Instead of waiting for the next update of reachability, trigger
replacement of falsetickers, jittery and distant sources as soon as
the selection status is updated in their SRC_SelectSource() call.
When starting the daemon, wait in the grandparent process for the parent
process to terminate before exiting to avoid systemd logging a warning
"Supervising process $PID which is not our child". Waiting for the pipe
to be closed by the kernel when the parent process exits is not
sufficient.
Reported-by: Jan Pazdziora <jpazdziora@redhat.com>
Replacement attempts are globally rate limited to one per 7*2^8 seconds
to limit the rate of DNS requests for public servers like pool.ntp.org.
If multiple sources are repeatedly attempting replacement (at their
polling intervals), one source can be getting all attempts for periods
of time.
Use a randomly generated interval to randomize the order of source
replacements without changing the average rate.
The clang memory sanitizer seems to trigger on an uninitialized value
passed to format_name() when the source is a refclock, even though the
value is not used for anything. Pass 0 in this case to avoid the error.
valgrind 3.21.0 reports realloc() of 0 bytes as an error due to having
different behavior on different systems. The only place where this can
happen in chrony is the array, which doesn't care what value realloc()
returns.
Modify the realloc wrapper to call free() in this case to make valgrind
happy.
Don't load keys and cookies from the client's dump file if it has an
unsupported algorithm and unparseable keys (matching the algorithm's
expected length of zero). They would fail all SIV operations and trigger
new NTS-KE session.
Initialize the unused part of shorter server NTS keys (AES-128-GCM-SIV)
loaded from ntsdumpdir to avoid sending uninitialized data in requests
to the NTS-KE helper process.
Do that also for newly generated keys in case the memory will be
allocated dynamically.
Fixes: b1230efac3 ("nts: add support for encrypting cookies with AES-128-GCM-SIV")
If the resolver orders addresses by IP family, there is more than one
address in the preferred IP family, and they are all reachable, but
not selectable (e.g. falsetickers in a small pool which cannot remove
them from DNS), chronyd is unable to switch to addresses in the other IP
family as it follows the resolver's order.
Enable randomization of the address selection for all source
replacements and not just replacement of (unreachable) tentative
sources. If the system doesn't have connectivity in the other family,
the addresses will be skipped and no change in behavior should be
observed.
The polltarget value is used in a floating-point division in the
calculation of the poll adjustment. Set 1 as the minimum accepted
polltarget value to avoid working with infinite values.
Set the polling interval to minpoll when changing address of a source,
but only if it is reachable to avoid increasing load on server or
network in case that is the reason for the source being unreachable.
This shortens the time needed to replace a falseticker or
unsynchronized source with a selectable source.
When the refresh command is issued, instead of trying to replace all
NTP sources as if they were unreachable or falsetickers, keep using the
current address if it is still returned by the resolver for the name.
This avoids unnecessary loss of measurements and switching to
potentially unreachable addresses.
Rework handling of late HW TX timestamps. Instead of suspending reading
from client-only sockets that have HW TX timestamping enabled, save the
whole response if it is valid and a HW TX timestamp was received for the
source before. When the timestamp is received, or the configurable
timeout is reached, process the saved response again, but skip the
authentication test as the NTS code allows only one response per
request. Only one valid response per source can be saved. If a second
valid response is received while waiting for the timestamp, process both
responses immediately in the order they were received.
The main advantage of this approach is that it works on all sockets, i.e.
even in the symmetric mode and with NTP-over-PTP, and the kernel does
not need to buffer invalid responses.
Previously, in the calculation of the next transmission time
corresponding to the current polling interval, the reference point was
the current time in the client mode (i.e. the time when the response is
processed) and the last transmission time in the symmetric mode.
Rework the code to use the last transmission in both modes and make it
independent from the time when the response is processed to avoid extra
delays due to waiting for HW TX timestamps.
Update the serverstats response to use the new 64-bit integers.
Don't define a new value for the response as it already had an
incompatible change since the latest release (new fields added for
timestamp counters).
Add a structure for 64-bit integers without requiring 64-bit alignment
to be usable in CMD_Reply without struct packing.
Add utility functions for conversion to/from network order. Avoid using
be64toh() and htobe64() as they don't seem to be available on all
supported systems.
Count served timestamps in all combinations of RX/TX and
daemon/kernel/hardware. Repurpose CLG_LogAuthNtpRequest() to update all
NTP-specific stats in one call per accepted request and response.
After 5f4cbaab7e ("ntp: optimize detection of clients using
interleaved mode") the local TX timestamp is saved for all requests
indicating interleaved mode even when no previous RX timestamp is found.
Specify maxpoll for HW timestamping (default minpoll + 1) to track the
PHC well even when there is little NTP traffic on the interface. After
each PHC reading schedule a timeout according to the maxpoll. Polling
between minpoll and maxpoll is still triggered by HW timestamps.
Wait for the first HW timestamp before adding the timeout to avoid
polling PHCs on interfaces that are enabled in the configuration but
not used for NTP. Add a new scheduling class to separate polling of
different PHCs to avoid too long intervals between processing I/O
events.