Replace struct timeval with struct timespec as the main data type for
timestamps. This will allow the NTP code to work with timestamps in
nanosecond resolution.
Crypto-NAK is useful only with Autokey where it allows quick reset
of the association. There is no plan to support Autokey and NTS will
specify its own message for authentication errors.
When selecting sources from a pool, ignore responses which didn't
produce a new sample. Sources with acceptable delay (as configured by
the maxdelay* options) should be prefered.
When a valid packet is received from an unsynchronised source (i.e. only
a test of leap, stratum or root distance failed), there is no point in
waiting for another packet or the RX timeout, and the client socket can
be immediately closed.
Add support for authenticating MS-SNTP responses in Samba (ntp_signd).
Supported is currently only the old MS-SNTP authenticator field. It's
disabled by default. It can be enabled with the --enable-ntp-signd
configure option and the ntpsigndsocket directive, which specifies the
location of the Samba ntp_signd socket.
When a received packet fails to authenticate, check if the digest
contains zeroes and treat it as an MS-SNTP packet with authenticator or
extended authenticator field. For now, discard these packets, i.e. don't
respond with a crypto-NAK.
Replace the flag that enables authentication using a symmetric key with
an enum. Specify crypto-NAK as a special mode used for responses instead
of relying on zero key ID. Also, rework check_packet_auth() to always
save the mode and key ID.
Add offset option to the server/pool/peer directive. It specifies a
correction which will be applied to offsets measured with the NTP
source. It's particularly useful to compensate for a known asymmetry in
network delay or timestamping errors.
If a special reference mode is enabled, always pass the test for
synchronization loop. This allows chronyd using the initstepslew
directive (or the -q/-Q option) to accept time from its own clients
after restart as is documented in the chrony.conf man page.
This was broken since update to NTPv4.
If local reference is active, return normal leap, but unsynchronised
status. Update the callers of the function to work with the leap
directly and not change their behaviour.
REF_IsLocalActive() is no longer needed.
When ntpd as an NTP server has active orphan mode, it doesn't update
its reference time and the reference timestamp may fail the NTP test
3 and 7. (https://bugs.ntp.org/show_bug.cgi?id=1098)
Remove both checks of the timestamp to allow chronyd to operate as
a client of ntpd server in the orphan mode. When ntpd is fixed and
old versions are no longer used, this may be reverted.
When a valid NTP reply is received, save the local address (e.g. from
IP_PKTINFO), so the reference ID which would the source use for this
host can be calculated when needed.
After restricting authentication of servers and peers to the specified
key, a short key in the key file is a security problem from the client's
point of view only if it's specified for a source.
When a server/peer was specified with a key number to enable
authentication with a symmetric key, packets received from the
server/peer were accepted if they were authenticated with any of
the keys contained in the key file and not just the specified key.
This allowed an attacker who knew one key of a client/peer to modify
packets from its servers/peers that were authenticated with other
keys in a man-in-the-middle (MITM) attack. For example, in a network
where each NTP association had a separate key and all hosts had only
keys they needed, a client of a server could not attack other clients
of the server, but it could attack the server and also attack its own
clients (i.e. modify packets from other servers).
To not allow the server/peer to be authenticated with other keys
extend the authentication test to check if the key ID in the received
packet is equal to the configured key number. As a consequence, it's
no longer possible to authenticate two peers to each other with two
different keys, both peers have to be configured to use the same key.
This issue was discovered by Matt Street of Cisco ASIG.
Instead of time_t use a 32-bit fixed point representation with 4-bit
fraction to save the time of the last hit. The rate can now be measured
up to 16 packets per second. Maximum interval between hits is about 4
years.
When the measured NTP or command request rate of a client exceeds
a threshold, reply only to a small fraction of the requests to reduce
the network traffic. Clients are allowed to send a burst of requests.
Try to detect broken clients which increase the request rate when not
getting replies and suppress the rate limiting for them.
Add ratelimit and cmdratelimit directives to configure the thresholds,
bursts and leak rates independently for NTP and command response rate
limiting. Both are disabled by default. Commands from localhost are
never limited.
Don't log NTP peer access and auth/bad command access. Also, change
types for logging number of hits from long to uint32_t. This reduces the
size of the node and allows more clients to be monitored in the same
amount of memory.
The meaning of the poll value in KoD RATE packets is not currently
defined in the NTP specification (RFC 5905). In the reference NTP
implementation it signals the minimum acceptable polling interval to the
clients. In chrony the minimum poll is set to the KoD RATE poll if it's
larger, but not to a larger value than 10.
The problem is that ntpd as a server sets the KoD RATE poll to the
maximum of the client's poll and the configured rate limiting interval.
An attacker can send a burst of spoofed packets to the server to trigger
the client's request rate limit. When the client sends its next request
and the server responds with a KoD RATE packet, the client will set its
minimum poll to the current poll and it will no longer be able to switch
to a shorter poll when needed.
ntpd could be fixed to always set the KoD RATE poll to the rate limiting
interval. Unfortunately, ntpd as a client seems to depend on the current
behavior. It tries to follow the server poll and if the KoD RATE poll
was shorter than the current poll, the polling interval would be
reduced, defeating the purpose of KoD RATE. The server fix will probably
need to wait until clients are fixed and that could take a very long
time.
For now, ignore the poll value in KoD RATE packets. Just add an extra
delay based on the current poll to the next transmit timeout and stop an
ongoing burst.
First packet after setting a source to online was sent with constant
delay (0.2s). If the period in which the source was offline was shorter
than the current polling interval, the new packet was sent sooner than
it would be if the source wasn't switched to offline and back.
Don't reset the local tx timestamp when mode is changed. When starting
the initial transmit timeout, adjust the delay to make the interval
between the two packets at least as long as the current polling
interval.
In client packets set the leap, stratum, reference ID, reference time,
root delay and root dispersion to constant values to not reveal the
state of the synchronization. Use precision 32 to make the receive and
transmit timestamps completely random and not reveal the local time.
Use UTI_GetRandomBytes() instead of random() to generate random bits
below precision. Save the result in NTP_int64 in the network order and
allow precision in the full range from -32 to 32. With precision 32
the fuzzing now makes the timestamp completely random and can be used to
hide the time.
After sending a client packet, schedule a timeout to close the socket
at the time when all server replies would fail the delay test, so the
socket is not open for longer than necessary (e.g. when the server is
unreachable). With the default maxdelay of 3 seconds the timeout is 7
seconds.
For testA in the client mode require also that the time the server
needed to process the client request is not longer than 4 seconds.
With maximum peer delay this limits the interval in which the client can
accept a server reply.
Timeout ID of zero can be now safely used to indicate that the timer is
not running. Remove the extra timer_running variables that were
necessary to track that.
Set refid in server/broadcast packets to 127.127.1.255 when a time
smoothing offset is applied to the timestamps. This allows the clients
and administrators to detect that the server is not serving its best
estimate of the true time.
Time smoothing determines an offset that needs to be applied to the
cooked time to make it smooth for external observers. Observed offset
and frequency change slowly and there are no discontinuities. This can
be used on an NTP server to make it easier for the clients to track the
time and keep their clocks close together even when large offset or
frequency corrections are applied to the server's clock (e.g. after
being offline for longer time).
Accumulated offset and frequency are smoothed out in three stages. In
the first stage, the frequency is changed at a constant rate (wander) up
to a maximum, in the second stage the frequency stays at the maximum for
as long as needed and in the third stage the frequency is brought back
to zero.
Time smoothing is configured by the smoothtime directive. It takes two
arguments, maximum frequency offset and maximum wander. It's disabled by
default.
An attacker knowing that NTP hosts A and B are peering with each other
(symmetric association) can send a packet with random timestamps to host
A with source address of B which will set the NTP state variables on A
to the values sent by the attacker. Host A will then send on its next
poll to B a packet with originate timestamp that doesn't match the
transmit timestamp of B and the packet will be dropped. If the attacker
does this periodically for both hosts, they won't be able to synchronize
to each other. It is a denial-of-service attack.
According to [1], NTP authentication is supposed to protect symmetric
associations against this attack, but in the NTPv3 (RFC 1305) and NTPv4
(RFC 5905) specifications the state variables are updated before the
authentication check is performed, which means the association is
vulnerable to the attack even when authentication is enabled.
To fix this problem in chrony, save the originate and local timestamps
only when the authentication check (test5) passed.
[1] https://www.eecis.udel.edu/~mills/onwire.html
Server sockets are now explicitly opened and closed for normal NTP
server, NTP broadcast and NTP peering. This will allow closing the
NTP port when not needed.
The minsamples and maxsamples directives now set the default value,
which can be overriden for individual sources in the server/peer/pool
and refclock directives.
Add new functions to change source's reference ID/address and reset the
instance. Use that instead of destroying and creating a new instance
when the NTP address is changed.
When using server socket to send client requests (acquisitionport 123)
and currently not waiting for a reply, the socket check will fail for
client requests from the source.
The check needs to be moved to correctly handle the requests as from an
unknown source.
Don't stop online burst for unreachable sources until sending succeeds.
This is mainly useful with iburst when chronyd is started before the
network is configured.
Switch to NTP for presend as the echo service (RFC 862) is rarely
enabled. When presend is active, send an NTP client packet to the
server/peer and ignore the reply.
This also fixes presend with separate client sockets. The destination
port can't be changed on connected sockets, so the echo packet was sent
to the NTP port instead of the echo port.
NTP timestamps use only 32 bits to count seconds and the current NTP era
ends in 2036. Add support for converting NTP timestamps from other NTP
eras on systems with 64-bit time_t.
The earliest assumed NTP time is set by the configure script (by default
to 50 years before the date of the build) and earlier NTP timestamps
underflow to the following NTP era.
Create a new connected client socket before each request and close it
when a valid reply is received.
This is useful when the network configuration is changed and the client
socket should be reconnected, but the old bound address remains valid
and sendmsg() doesn't return with an error.
This will be needed to prevent loading of dump files after sources have
already accumulated samples and possibly reference was already updated
when async resolving of sources is implemented.
When source is set as active, it's receiving reachability updates (e.g.
offline NTP sources are not active).
Also add function to count active sources.
Explicitly set the number of iburst samples to the size of the register
to make sure there are at least 7 reachability updates and the
initstepslew mode can be ended.
If acquisitionport is set to 0 (default), create and connect a new
socket for each server instead of using one socket per address family
for all servers.
If the remote stratum is higher than ours, try to lock on the peer's
polling to minimize our response time by slightly extending our delay or
waiting for the peer to catch up with us as the random part in the
actual interval is reduced. If the remote stratum is equal to ours, try
to interleave evenly with the peer.
If the remote peer uses a polling interval shorter than the local
minimum, the local peer will be unable to send any packets as the
timeout will be updated on every received valid packet and will never
expire.
Modify the delay calculation to aim at poll interval away since the last
transmit.
Also, share the delay calculation code with transmit_timeout().
This should prevent chronyd from getting stuck and refusing new samples
due to failing test4 when the current measured frequency offset is close
to 1.0. That can happen when the system clock is stepped forward behind
chronyd's back.
Allow different hash functions to be used in the NTP and cmdmon
protocols. This breaks the cmdmon protocol compatibility. Extended key
file format is used to specify the hash functions for chronyd and new
authhash command is added to chronyc. MD5 is the default and the only
function included in the chrony source code, other functions will be
available from libraries.
Add new flag to source struct to indicate when source is selectable
(packets with good headers are received) and use a reachability
register for last 8 samples instead of the reachable flag. Source
drivers now provide only reachability updates.
Split test7 into two and to accumulate the sample require that all tests
pass, except the one which checks packet stratum is not higher than ours.
Also, don't mark the source as unreachable when only test4c fails.
Require that the ratio of the increase in delay from the minimum one in
the stats data register to the standard deviation of the offsets in the
register is less than maxdelaydevratio or the difference between
measured offset and predicted offset is larger than the increase in
delay. In the allowed delay increase is included also skew and maximum
clock frequency error.
maxdelaydevratio is 10.0 by default.