The commit c43efccf02 ("sources: update source selection with
unreachable sources") caused a high rate of failures in the
148-replacement test (1 falseticker vs 2 unreachable sources). This was
due to a larger fraction of the replacement attempts being made for the
source incorrectly marked as a falseticker instead of the second
unreachable source and the random process needed more time to get to the
expected state with both unreachable sources replaced.
When updating reachability of an unreachable source, try to request the
replacement of the source before calling the source selection, where
other sources may be replaced, to better balance the different
replacement attempts.
Add ptpdomain directive to set the domain number of transmitted and
accepted NTP-over-PTP messages. It might need to be changed in networks
using a PTP profile with the same domain number. The default domain
number of 123 follows the current NTP-over-PTP specification.
When an NTP source is specified with the offset option, the corrected
offset may get outside of the supported NTP interval (by default -50..86
years around the build date). If the source passed the source selection,
the offset would be rejected only later in the adjustment of the local
clock.
Check the offset validity as part of the NTP test A to make the source
unselectable and make it visible in the measurements log and ntpdata
report.
Allow one message about failed selection (e.g. no selectable sources)
to be logged before first successful selection when a source has
full-size reachability register (8 polls with a received or missed
response).
This should make it more obvious that chronyd has a wrong configuration
or there is a firewall/networking issue.
Add "kod" option to the ratelimit directive to respond with the KoD
RATE code to randomly selected requests exceeding the configured limit.
This complements the client support of KoD RATE. It's disabled by
default.
There can be only one KoD code in one response. If both NTS NAK and RATE
codes are triggered, drop the response. The KoD RATE code can be set in
an NTS-authenticated response.
If the reload sources command was received in the chronyd start-up
sequence with initstepslew and/or RTC init (-s option), the sources
loaded from sourcedirs caused a crash due to failed assertion after
adding sources specified in the config.
Ignore the reload sources command until chronyd enters the normal
operation mode.
Fixes: 519796de37 ("conf: add sourcedirs directive")
Use leapseclist instead of leapsectz and test also negative leap
seconds. Add a test for leapsectz when the date command indicates
right/UTC is available on the system and mktime() works as expected.
Check TAI offset in the server's log.
The presend option in interleaved mode uses two presend requests instead
of one to get an interleaved response from servers like chrony which
delay the first interleaved response due to an optimization saving
timestamps only for clients actually using the interleaved mode.
After commit 0ae6f2485b ("ntp: don't use first response in interleaved
mode") the first interleaved response following the two presend
responses in basic mode is dropped as the preferred set of timestamps
minimizing error in delay was already used by the second sample in
basic mode. There are only three responses in the burst and no sample is
accumulated.
Increasing the number of presend requests to three to get a fourth
sample would be wasteful. Instead, allow reusing timestamps of the
second presend sample in basic mode, which is never accumulated.
Reported-by: Aaron Thompson
Fixes: 0ae6f2485b ("ntp: don't use first response in interleaved mode")
If the network correction is known for both the request and response,
and their sum is not larger that the measured peer delay, allowing the
transparent clocks to be running up to 100 ppm faster than the client's
clock, apply the corrections to the NTP offset and peer delay. Don't
correct the root delay to not change the estimated maximum error.
Rename the exp1 extension field to exp_mono_root (monotonic timestamp +
root delay/dispersion) to better distinguish it from future experimental
extension fields.
Avoid frequently ending in the middle of a client/server exchange with
long delays. This changed after commit 4a11399c2e ("ntp: rework
calculation of transmit timeout").
If noselect is present in the configured options, don't assume it
cannot change and the effective options are equal. This fixes chronyc
selectopts +noselect command.
Fixes: 3877734814 ("sources: add function to modify selection options")
Refresh NTP sources specified by hostname periodically (every 2 weeks
by default) to avoid long-running instances using a server which is no
longer intended for service, even if it is still responding correctly
and would not be replaced as unreachable, and help redistributing load
in large pools like pool.ntp.org. Only one source is refreshed at a time
to not interrupt clock updates if there are multiple selectable servers.
The refresh directive configures the interval. A value of 0 disables
the periodic refreshment.
Suggested-by: Ask Bjørn Hansen <ask@develooper.com>
Wait for four consecutive source selections giving a bad status
(falseticker, bad distance or jittery) before triggering the source
replacement. This should reduce the rate of unnecessary replacements
and shorten the time needed to find a solution when unreplaceable
falsetickers are preventing other sources from forming a majority due
to switching back and forth to unreachable servers.
Instead of waiting for the next update of reachability, trigger
replacement of falsetickers, jittery and distant sources as soon as
the selection status is updated in their SRC_SelectSource() call.
Replacement attempts are globally rate limited to one per 7*2^8 seconds
to limit the rate of DNS requests for public servers like pool.ntp.org.
If multiple sources are repeatedly attempting replacement (at their
polling intervals), one source can be getting all attempts for periods
of time.
Use a randomly generated interval to randomize the order of source
replacements without changing the average rate.
Set the polling interval to minpoll when changing address of a source,
but only if it is reachable to avoid increasing load on server or
network in case that is the reason for the source being unreachable.
This shortens the time needed to replace a falseticker or
unsynchronized source with a selectable source.
When the refresh command is issued, instead of trying to replace all
NTP sources as if they were unreachable or falsetickers, keep using the
current address if it is still returned by the resolver for the name.
This avoids unnecessary loss of measurements and switching to
potentially unreachable addresses.
Specify maxpoll for HW timestamping (default minpoll + 1) to track the
PHC well even when there is little NTP traffic on the interface. After
each PHC reading schedule a timeout according to the maxpoll. Polling
between minpoll and maxpoll is still triggered by HW timestamps.
Wait for the first HW timestamp before adding the timeout to avoid
polling PHCs on interfaces that are enabled in the configuration but
not used for NTP. Add a new scheduling class to separate polling of
different PHCs to avoid too long intervals between processing I/O
events.
In a non-tty session with chronyc it is not possible to detect the
end of the response without relying on timeouts, or separate responses
to a repeated command if using the -c option.
Add -e option to end each response with a line containing a single dot.
Add a new test for maximum delay using a long-term estimate of a
p-quantile of the peer delay. If enabled, it replaces the
maxdelaydevratio test. It's main advantage is that it is not sensitive
to outliers corrupting the minimum delay.
As it can take a large number of samples for the estimate to reach the
expected value and adapt to a new value after a network change, the
option is recommended only for local networks with very short polling
intervals.
With the first interleaved response coming after a basic response the
client is forced to select the four timestamps covering most of the last
polling interval, which makes measured delay very sensitive to the
frequency offset between server and client. To avoid corrupting the
minimum delay held in sourcestats (which can cause testC failures),
reject the first interleaved response in the client/server mode as
failing the test A.
This does not change anything for the symmetric mode, where both sets of
the four timestamps generally cover a significant part of the polling
interval.
Add "local" option to specify that the reference clock is an
unsynchronized clock which is more stable than the system clock (e.g.
TCXO, OCXO, or atomic clock) and it should be used as a local standard
to stabilize the system clock.
Handle the local refclock as a PPS refclock locked to itself which gives
the unsynchronized status to be ignored in the source selection. Wait
for the refclock to get at least minsamples samples and adjust the clock
directly to follow changes in the refclock's sourcestats frequency and
offset.
There should be at most one refclock specified with this option.