In a burst of three requests (two presend + one normal) the server can
detect the client is using the interleaved mode and save the transmit
timestamp of the second response for the third response. This shortens
the interval in which the server has to keep the state.
Rework the code to make a real request for presend and process the
response, but don't accumulate the sample. This allows presend to work
in the interleaved client mode.
Always allow update from the first valid response, even if its transmit
timestamp is not newer than the currently saved timestamp. This shoud
provide a temporary protection in the case where the attacker does have
an authenticated packet from future, but the peers are using the same
polling interval and the protocol is already synchronised. This could be
also useful in the case where the attacker cannot observe the traffic
and authentication is disabled.
A recently published paper [1] (section VIII) describes a DoS attack
on symmetric associations authenticated with a symmetric key where the
attacker can only observe and replay packets. Although the attacker
cannot prevent packets from reaching the other peer (not even by
flooding the network for example), s/he has the same power as a MitM
attacker.
As the authors explain, this is a fundamental flaw of the protocol,
which cannot be fixed in the general case. However, we can at least try
to protect associations in a case where the peers use the same polling
interval (i.e. for each request is expected one response) and all peers
that share the symmetric key never start with clocks in future or very
distant past (i.e. the attacker does not have any packets from future
that could be replayed).
Require that updates of the NTP state between requests have increasing
transmit timestamp and when a packet that passed all NTP tests to be
considered a valid response was received, don't allow any more updates
of the state from packets that don't pass the tests. This should ensure
the last update of the state is from the first time the last real
response was received and still allow the protocol to recover in case
one of the peers steps its clock back or the attacker does have a packet
from future and the attack stops.
[1] Aanchal Malhotra, Matthew Van Gundy, Mayank Varia, Haydn Kennedy,
Jonathan Gardner, and Sharon Goldberg. The Security of NTP's
Datagram Protocol. https://eprint.iacr.org/2016/1006
Adapt the interleaved symmetric mode for client/server associations.
On server, save the state needed for detection and responding in the
interleaved mode in the client log. On client, enable the interleaved
mode when the server is specified with the xleave option. Always accept
responses in basic mode to allow synchronization with servers that
don't support the interleaved mode, have too many clients, or have
multiple clients behing the same IP address. This is also necessary to
prevent DoS attacks on the client by overwriting or flushing the server
state. Protect the client's state variables against replay attacks as
the timestamps are now needed when processing the subsequent packet.
Add xleave option to the peer directive to enable an interleaved mode
compatible with ntpd. This allows peers to exchange transmit timestamps
captured after the actual transmission and significantly improve
the accuracy of the measurements.
Introduce a new structure for local timestamps that will hold the
timestamp with its estimated error and also its source (daemon, kernel
or HW). While at it, reorder parameters of the functions that accept the
timestamps.
Add new functions for processing of packets after they are actually
sent by the kernel or HW in order to get a more accurate transmit
timestamp. Rename old functions for processing of received packets and
their parameters to make the naming more consistent.
Replace struct timeval with struct timespec as the main data type for
timestamps. This will allow the NTP code to work with timestamps in
nanosecond resolution.
Crypto-NAK is useful only with Autokey where it allows quick reset
of the association. There is no plan to support Autokey and NTS will
specify its own message for authentication errors.
When selecting sources from a pool, ignore responses which didn't
produce a new sample. Sources with acceptable delay (as configured by
the maxdelay* options) should be prefered.
When a valid packet is received from an unsynchronised source (i.e. only
a test of leap, stratum or root distance failed), there is no point in
waiting for another packet or the RX timeout, and the client socket can
be immediately closed.
Add support for authenticating MS-SNTP responses in Samba (ntp_signd).
Supported is currently only the old MS-SNTP authenticator field. It's
disabled by default. It can be enabled with the --enable-ntp-signd
configure option and the ntpsigndsocket directive, which specifies the
location of the Samba ntp_signd socket.
When a received packet fails to authenticate, check if the digest
contains zeroes and treat it as an MS-SNTP packet with authenticator or
extended authenticator field. For now, discard these packets, i.e. don't
respond with a crypto-NAK.
Replace the flag that enables authentication using a symmetric key with
an enum. Specify crypto-NAK as a special mode used for responses instead
of relying on zero key ID. Also, rework check_packet_auth() to always
save the mode and key ID.
Add offset option to the server/pool/peer directive. It specifies a
correction which will be applied to offsets measured with the NTP
source. It's particularly useful to compensate for a known asymmetry in
network delay or timestamping errors.
If a special reference mode is enabled, always pass the test for
synchronization loop. This allows chronyd using the initstepslew
directive (or the -q/-Q option) to accept time from its own clients
after restart as is documented in the chrony.conf man page.
This was broken since update to NTPv4.
If local reference is active, return normal leap, but unsynchronised
status. Update the callers of the function to work with the leap
directly and not change their behaviour.
REF_IsLocalActive() is no longer needed.
When ntpd as an NTP server has active orphan mode, it doesn't update
its reference time and the reference timestamp may fail the NTP test
3 and 7. (https://bugs.ntp.org/show_bug.cgi?id=1098)
Remove both checks of the timestamp to allow chronyd to operate as
a client of ntpd server in the orphan mode. When ntpd is fixed and
old versions are no longer used, this may be reverted.
When a valid NTP reply is received, save the local address (e.g. from
IP_PKTINFO), so the reference ID which would the source use for this
host can be calculated when needed.
After restricting authentication of servers and peers to the specified
key, a short key in the key file is a security problem from the client's
point of view only if it's specified for a source.
When a server/peer was specified with a key number to enable
authentication with a symmetric key, packets received from the
server/peer were accepted if they were authenticated with any of
the keys contained in the key file and not just the specified key.
This allowed an attacker who knew one key of a client/peer to modify
packets from its servers/peers that were authenticated with other
keys in a man-in-the-middle (MITM) attack. For example, in a network
where each NTP association had a separate key and all hosts had only
keys they needed, a client of a server could not attack other clients
of the server, but it could attack the server and also attack its own
clients (i.e. modify packets from other servers).
To not allow the server/peer to be authenticated with other keys
extend the authentication test to check if the key ID in the received
packet is equal to the configured key number. As a consequence, it's
no longer possible to authenticate two peers to each other with two
different keys, both peers have to be configured to use the same key.
This issue was discovered by Matt Street of Cisco ASIG.
Instead of time_t use a 32-bit fixed point representation with 4-bit
fraction to save the time of the last hit. The rate can now be measured
up to 16 packets per second. Maximum interval between hits is about 4
years.
When the measured NTP or command request rate of a client exceeds
a threshold, reply only to a small fraction of the requests to reduce
the network traffic. Clients are allowed to send a burst of requests.
Try to detect broken clients which increase the request rate when not
getting replies and suppress the rate limiting for them.
Add ratelimit and cmdratelimit directives to configure the thresholds,
bursts and leak rates independently for NTP and command response rate
limiting. Both are disabled by default. Commands from localhost are
never limited.
Don't log NTP peer access and auth/bad command access. Also, change
types for logging number of hits from long to uint32_t. This reduces the
size of the node and allows more clients to be monitored in the same
amount of memory.
The meaning of the poll value in KoD RATE packets is not currently
defined in the NTP specification (RFC 5905). In the reference NTP
implementation it signals the minimum acceptable polling interval to the
clients. In chrony the minimum poll is set to the KoD RATE poll if it's
larger, but not to a larger value than 10.
The problem is that ntpd as a server sets the KoD RATE poll to the
maximum of the client's poll and the configured rate limiting interval.
An attacker can send a burst of spoofed packets to the server to trigger
the client's request rate limit. When the client sends its next request
and the server responds with a KoD RATE packet, the client will set its
minimum poll to the current poll and it will no longer be able to switch
to a shorter poll when needed.
ntpd could be fixed to always set the KoD RATE poll to the rate limiting
interval. Unfortunately, ntpd as a client seems to depend on the current
behavior. It tries to follow the server poll and if the KoD RATE poll
was shorter than the current poll, the polling interval would be
reduced, defeating the purpose of KoD RATE. The server fix will probably
need to wait until clients are fixed and that could take a very long
time.
For now, ignore the poll value in KoD RATE packets. Just add an extra
delay based on the current poll to the next transmit timeout and stop an
ongoing burst.
First packet after setting a source to online was sent with constant
delay (0.2s). If the period in which the source was offline was shorter
than the current polling interval, the new packet was sent sooner than
it would be if the source wasn't switched to offline and back.
Don't reset the local tx timestamp when mode is changed. When starting
the initial transmit timeout, adjust the delay to make the interval
between the two packets at least as long as the current polling
interval.
In client packets set the leap, stratum, reference ID, reference time,
root delay and root dispersion to constant values to not reveal the
state of the synchronization. Use precision 32 to make the receive and
transmit timestamps completely random and not reveal the local time.