In glibc 2.36 was added the arc4random family of functions. However,
unlike on other supported systems, it is not a user-space PRNG
implementation. It just wraps the getrandom() system call with no
buffering, which causes a performance loss on NTP servers due to
the function being called twice for each response to add randomness
to the RX and TX timestamp below the clock precision.
Don't check for arc4random on Linux to keep using the buffered
getrandom().
Replace NULL in test code of functions which have (at least in glibc) or
could have arguments marked as nonnull to avoid the -Wnonnull warnings,
which breaks the detection with the -Werror option.
If the randomly generated timestamps are close to the current time, the
source can be selected for synchronization, which causes a crash when
logging the source name due to uninitialized ntp_sources.
Specify the source with the noselect option to prevent selection.
Call the function with current time instead of latest sample of the
first source to avoid undefined conversion of negative double to long
int.
Fixes: 07600cbd71 ("test: extend sources unit test")
Add a new test for maximum delay using a long-term estimate of a
p-quantile of the peer delay. If enabled, it replaces the
maxdelaydevratio test. It's main advantage is that it is not sensitive
to outliers corrupting the minimum delay.
As it can take a large number of samples for the estimate to reach the
expected value and adapt to a new value after a network change, the
option is recommended only for local networks with very short polling
intervals.
Instead of waiting for the sample filter to accumulate the specified
number of samples and then deciding if the result is acceptable, count
missing samples and get the result after the specified number of polls.
This should work better when samples are dropped at a high rate. The
source and clock update interval will be stable as long as at least
one sample can be collected.
When the minimum round-trip time is checked to enable a sub-second
polling interval, consider also the last sample in the filter to avoid
waiting for the first sample to be accumulated in sourcestats.
If a sub-second polling interval is configured, initialize the local
poll to 0 to avoid a shorter interval between the first and second
request in case no response to the first request is received (in time).
Return with an error code from chronyc if the command is expected to
print some data and fflush() or ferror() indicates an error. This should
make it easier for scripts to detect missing data when redirected to a
file.
Filtering was moved to a separate source file in commit
c498c21fad ("refclock: split off median filter). It looks like
MedianFilter struct somehow survived the split. Remove it to reduce
confusion.
With the first interleaved response coming after a basic response the
client is forced to select the four timestamps covering most of the last
polling interval, which makes measured delay very sensitive to the
frequency offset between server and client. To avoid corrupting the
minimum delay held in sourcestats (which can cause testC failures),
reject the first interleaved response in the client/server mode as
failing the test A.
This does not change anything for the symmetric mode, where both sets of
the four timestamps generally cover a significant part of the polling
interval.
If the computer is overloaded so much that chronyd cannot stop a slew
within one second of the scheduled end and the actual duration is more
than doubled (2 seconds with the minimum duration of 1 second), the
overshoot will be larger than the intended correction. If these
conditions persist, the oscillation will grow up to the maximum offset
allowed by maxslewrate and the delay in stopping.
Monitor the excess duration as an exponentially decaying maximum value
and don't allow any slews shorter than 5 times the value to damp the
oscillation. Ignore delays longer than 100 seconds, assuming they have a
different cause (e.g. the system was suspended and resumed) and are
already handled in the scheduler by triggering cancellation of the
ongoing slew.
This should also make it safer to shorten the minimum duration if
needed.
Reported-by: Daniel Franke <dff@amazon.com>
Estimate the 1st and 2nd 10-quantile of the reading delay and accept
only readings between them unless the error of the offset predicted from
previous samples is larger than the minimum reading error. With the 25
PHC readings per ioctl it should combine about 2-3 readings.
This should improve hwclock tracking and synchronization stability when
a PHC reading delay occasionally falls below the normal expected
minimum, or all readings in the batch are delayed significantly (e.g.
due to high PCIe load).
Add estimation of quantiles using the Frugal-2U streaming algorithm
(https://arxiv.org/pdf/1407.1121v1.pdf). It does not need to save
previous samples and adapts to changes in the distribution.
Allow multiple estimates of the same quantile and select the median for
better stability.
Move processing of PHC readings from sys_linux to hwclock, where
statistics can be collected and filtering improved.
In the PHC refclock driver accumulate the samples even if not in the
external timestamping mode to update the context which will be needed
for improved filtering.
Increase the number of requested readings from 10 to 25 - the maximum
accepted by the PTP_SYS_OFFSET* ioctls. This should improve stability of
HW clock tracking and PHC refclock.
Latest versions of ethtool print only the shorter lower-case names of
capabilities and filters. Explain that chronyd doesn't synchronize the
PHC and refer to the new vclock feature of the kernel, which should be
used by applications that need a synchronized PHC (e.g. ptp4l and
phc2sys) in order to not interfere with chronyd.
Instead of the generic clock driver silently zeroing the remaining
offset after detecting an external step, cancel it properly with the
slew handlers in order to correct timestamps that are not reset in
handling of the unknown step (e.g. the NTP local TX).
Use 3 as the minimum maxlockage in the local mode to avoid disruptions
due to losing the lock when a single sample is missed, e.g. when the PPS
driver polling interval is slightly longer than the pulse interval and a
pulse is skipped.
A refclock in the local mode is locked to itself. When the maxlockage
check failed after missing some samples, it failed permanently and the
refclock was not able to accumulate any new samples.
When the check fails, drop all samples and reset the source to start
from scratch.
Reported-by: Dan Drown <dan-ntp@drown.org>
A new function is provided by the latest gnutls (should be in 3.7.5) to
set the key of an AEAD cipher. If available, use it to avoid destroying
and creating a new SIV instance with each key change.
This improves the server NTS-NTP performance if using gnutls for SIV.