The commit c43efccf02 ("sources: update source selection with
unreachable sources") caused a high rate of failures in the
148-replacement test (1 falseticker vs 2 unreachable sources). This was
due to a larger fraction of the replacement attempts being made for the
source incorrectly marked as a falseticker instead of the second
unreachable source and the random process needed more time to get to the
expected state with both unreachable sources replaced.
When updating reachability of an unreachable source, try to request the
replacement of the source before calling the source selection, where
other sources may be replaced, to better balance the different
replacement attempts.
Allow one message about failed selection (e.g. no selectable sources)
to be logged before first successful selection when a source has
full-size reachability register (8 polls with a received or missed
response).
This should make it more obvious that chronyd has a wrong configuration
or there is a firewall/networking issue.
When updating the reachability register of a source with zero, call the
source selection even if the source is not the currently selected as the
best source. But do that only if all reachability bits are zero, i.e.
there was no synchronized response for last 8 polls.
This will enable the source selection to log a message when only
unreachable sources are updating reachability and it decreases the
number of unnecessary source selections.
In the source selection, check for the unsynchronized leap status after
getting sourcestats data. The unsynchronized source status is supposed
to indicate an unsynchronized source that is providing samples, not a
source which doesn't have any samples.
Also, fix the comment describing the status.
Fixes: 4c29f8888c ("sources: handle unsynchronized sources in selection")
The commit 5dd288dc0c ("sources: reselect earlier when removing
selected source") didn't cover all paths that can lead to a missing log
message when all sources are removed.
Add a flag to track the loss of selection and postpone the log message
in transient states where no message is logged to avoid spamming in
normal operation. Call SRC_SelectSource() after removing the source
to get a log message if there are no (selectable) sources left.
Reported-by: Thomas Lange <thomas@corelatus.se>
With forced reselection during source removal selected_source_index
can only be INVALID_SOURCE if there are no sources. The "Can't
synchronise: no sources" message couldn't be logged even before that as
SRC_ReselectSource() resets the index before calling SRC_SelectSource().
Replace the message with an assertion.
When a selected source is being removed, reset the instance and rerun
the selection while the source is still marked as selected. This forces
a "Can't synchronise" message to be logged when all sources are removed.
Reported-by: Thomas Lange <thomas@corelatus.se>
If noselect is present in the configured options, don't assume it
cannot change and the effective options are equal. This fixes chronyc
selectopts +noselect command.
Fixes: 3877734814 ("sources: add function to modify selection options")
Wait for four consecutive source selections giving a bad status
(falseticker, bad distance or jittery) before triggering the source
replacement. This should reduce the rate of unnecessary replacements
and shorten the time needed to find a solution when unreplaceable
falsetickers are preventing other sources from forming a majority due
to switching back and forth to unreachable servers.
Instead of waiting for the next update of reachability, trigger
replacement of falsetickers, jittery and distant sources as soon as
the selection status is updated in their SRC_SelectSource() call.
Log a warning message for each detected falseticker, but only once
between changes in the selection of the best source. Don't print all
sources when no majority is reached as that case has its own warning
message.
Add a function to add new selection options or remove existing options
specified in the configuration for both NTP sources and reference
clocks.
Provide a pair of IP address and reference ID to identify the source
depending on the type. Find the source directly in the array of sources
instead of going through the NSR hashtable for NTP sources to not
complicate it unnecessarily.
Log important changes from chronyc for auditing purposes.
Add log messages for:
- loaded symmetric keys and server NTS keys (logged also on start)
- modified maxupdateskew and makestep
- enabled/disabled local reference mode (logged also on start)
- reset time smoothing (logged also on clock steps)
- reset sources
Allow sources to accumulate samples with the leap status set to not
synchronized. Define a new state for them to be ignored in the
selection. This is intended for sources that are never synchronized and
will be used only for stabilization.
Don't print the original IP address in parentheses in the "Selected
source ..." message if it is identical to the current address. That is
expected to be the usual case for sources specified by IP address.
For reference clocks, which don't have a name, print "." instead of
NULL.
Fixes: f8610d69f0 ("sources: improve handling of dump files and their format")
After loading the dump files with the -r option, immediately perform a
source selection with forced setting of the reference. This shortens the
interval when a restarted server doesn't respond with synchronized time.
It no longer needs to wait for the first measurement from the best
source (which had to pass all the filters).
Check for write errors when saving dump files. Don't save files with no
samples. Add more sanity checks for loaded data.
Extend the file format to include an identifier, the reachability
register, leap status, name, and authentication flag. Avoid loading
unauthenticated data after switching authentication on. Change format
and order of some fields to simplify parsing. Drop fields that were kept
only for compatibility.
The dump files now contain all information needed to perform the source
selection and update the reference.
There is no support kept for the old file format. Loading of old dump
files will fail after upgrading to new version.
Remove stratum from the NTP sample and update it together with the leap
status. This enables a faster update when samples are dropped by the NTP
filters.
For sources specified by an IP address, keep the original address as the
source's name and pass it to the NCR instance. Allow the sources to go
through the replacement process if their address has changed.
This will be useful with NTS-KE negotiation.
The IP-based source names are now provided via cmdmon. This means
chronyc -n and -N can show two different addresses for a source.
Show untrusted sources with the '?' symbol instead of '-' to make them
consistent with not selectable and selectable sources in the selectdata
description.
Instead of making the initial burst only once and immediately after
chronyd start (even when iburst is specified together with the offline
option), trigger the burst whenever the connectivity changes from
offline to online.
This is not expected to happen, but make sure the endpoints of each
source are in the right order (i.e. the distance is not negative) to
prevent getting a negative depth in the selection.
Handle trusted sources as a separate set of sources which is required to
have a majority for the selection to proceed. This should improve the
selection with multiple trusted sources (e.g. due to the auth selection
mode).
When the selection has some trusted sources, don't require non-trusted
sources to be contained in the best interval as that can usually pass
only one source if the best interval is the interval of the source, or
no source at all if the best interval is an intersection of multiple
sources.
Relax the requirement for non-trusted sources to be contained in the
best interval of trusted sources alone instead of all sources in the
trusted interval.
The selection options returned as flags are not reported by the
client and will be better reported in a separate command with other
selection-specific data.
When authentication is enabled for an NTP source, unauthenticated NTP
sources need to be disabled or limited in selection. That might be
difficult to do when the configuration comes from different sources
(e.g. networking scripts adding servers from DHCP).
Define four modes for the source selection to consider authentication:
require, prefer, mix, ignore. In different modes different selection
options (require, trust, noselect) are added to authenticated and
unauthenticated sources.
The mode can be selected by the authselectmode directive. The mix mode
is the default. The ignore mode enables the old behavior, where all
sources are used exactly as specified in the configuration.