Francis Dupont writes and Jari Arkko responds: > If anti-replay has been enabled, receivers MUST be configured with an > allowed Delta value and maintain a cache of messages received within > this time period from each specific source address. > > => is it really important to support not-in-order messages? Reordering > link-layers are not so frequent, and we talk about ND... I don't know. Seems like for solicited case nonces take care of all this, and for unsolicited case.... if I receiver reordered adverts, I'm not sure I care. But wait: what if a RA is split due to MTU and large content, we wouldn't want to drop the reordered "fragment" of the RA. > Recommended default value for the allowed Delta is 3,600 seconds. > > => this is far too large, especially for a single mechanism. Can you justify this? Too much memory, or what? We were thinking that most machines that are at all in the right time, are within this Delta... > o A packet that passes both of the above tests MUST be registered in > the cache for the given source address. > > => it is too soon to register it in the cache: the packet should pass > the AH signature check too. Yes. > o If the cache becomes full, the receiver SHOULD temporarily reduce > the Delta value for that source address so that all messages > within that value can still be stored. > > => this doesn't solve the problem but opens doors to DoS... > > I propose for not reordering link-layers (the reordering case can be > supported with small modifs but this should be clearer): > - a per-peer (using the source address of received messages as the index) > cache entry with: > * last received time stamp (TSlast) > * date of reception of last message (RDlast) > > - when a message is received from a new peer (i.e., a new source), > the packet is accepted if the timestamp is in the range: > - Max_delta < RDnew - TSnew < + Max_delta > An entry is created when the signature check passes. > > - when a message is received from a known peer, the time stamp (TSnew) > is checked against the last valid message: > TSlast + (RDnew - RDlast) x (1 - allowed_drift) < TSnew and > TSnew < TSlast + (RDnew - RDlast) x (1 + allowed_drif) > > - parameters are max_Delta and allowed_drift (10% ?) > > For reordering link-layer (i.e., complete rules) we need: > - max_link_layer_delay in the last rules > - retain only the valid message with the higher timestamp. This sounds pretty good. What do others think? > - 7.1.6 Processing Rules for Receivers (comment) > > Packets that do not pass all the above tests MUST be silently > discarded. > > => this is the place where to update the timestamp cache. Note that > "silently discarded" should imply "doesn't update the cache"... Ok. > - 13.2.5 Replay Attacks (comment/wording) > > Since most SEND nodes are likely to use fairly coarse grained > timestamps, as explained in Section 7.1.4, this may affect some > nodes. > > => a millisecond is not so coarse grained. IMHO the statement > is more about the fairly large one hour delta... But the delta does not affect replay attacks, only memory usage. ------------------ Pekka Nikander: My (perhaps flawed) analysis of the issue boils down to two points: 1. The value for the timestamp delta 2. What to do if the ND cache becomes full I'll describe these separate messages, focusing on the full cache situation here. The current specifications states the following > o If the cache becomes full, the receiver SHOULD temporarily reduce > the Delta value for that source address so that all messages > within that value can still be stored. Francis Dupont has proposed an alternative method for handling the situation. Trying to rephrase his suggestion: o Per each peer, store the following information: * the receive time of the last received ND message (RDlast) * the time stamp in the last received ND message (TSlast) o When a message is received from a new peer, i.e. one that is not stored in the cache, the received timestamp is checked and the packet is accepted if the timestamp is recent enough: -Delta < RDnew - TSnew < +Delta where RDnew is the local time of the received packet delivery and TSnew is the timestamp in the new packet. A new cache entry is created if the other checks, including the signature check, are passed. o When a message is received from a known peer, i.e. one that has an entry in the cache, the time stamp is checked against the last received message: TSnew > TSlast + (RDnew - RDlast) x (1 - drift) TSnew < TSlast + (RDnew - RDlast) x (1 + drift) The suggested value for /drift/ is 10%, allowing clock skew of at most 10% between the hosts. If TSnew < TSlast, which is possible if packets arrive rapidly and out of order, TSlast MUST NOT be updated, i.e., the stored TSlast for a given host MUST NOT ever decrease. Now, while I agree that Francis' suggestion makes it clearer when to accept new packets, I don't immediately see how it helps with the cache getting full. On the other hand, if we start dropping the cache entries with the oldest RDlast, and at the same time temporarily reduce Delta so much that the dropped entries would not be accepted again, that might help. Second issue: The current specifications states the following > Recommended default value for the allowed Delta is 3,600 seconds. Francis Dupond made a comment that this is far too large. The function of the Delta value can be described as follows: - A larger Delta allows hosts with differently running clocks to communicate, while a small Delta requires better clock synchronization. - A larger Delta potentially requires more memory, since it may require the host to remember more other hosts. - A larger Delta exposes the host to replay attacks in the case that its cache becomes full, thereby requiring it to accept ND messages from new hosts. The situation that seems to be most worrisome is one where an attacker fulfils the cache, thereby forcing the host to drop some cache entries, and then launches a replay attack. It looks like the suggestion in the Part 1 message of this issue plugs the attack. If so, the exact value of Delta is not a concern from the memory usage and DoS/replay attack point of view. Hence, it looks like that the default value for Delta becomes mostly a policy issue. On the other hand, there seems to be some possible replay attack scenarios against nodes that arrive to a new link. (More on that on a separate message). Anyway, I suggest that if we can change the caching algorithm so that the default Delta value is a pure policy issue, we would keep 3600 seconds or even use a larger value. ------------------ Pekka Nikander: Doesn't this open a potential replay attack against a new node that arrives to a link. Let us assume that an attacker is on the link, and collects ND messages. Let us further assume that one node on the link, say Alice, changes its link layer address. Now a new node, say Bob arrives to the link, within the Delta time from Alice changing her address. If the attacker notices Bob arriving before Alice, it can Alice's earlier recorded message that still uses the old link layer address. This causes Bob to record TSlast and RDlast from the message, together with the old link layer address When Alice now notices Bob's arrival, she will send a new message. However, the new message will not pass the check since apparently her clock has drifted more than what is allowed. That is, (RDnew - RDlast) is small due to the replayed message from which RDlast was recorded, but TSnew - TSlast will be much larger, since TSlast was the timestamp in the replayed message. Consequently, I propose that we remove the latter check from the second rule, allowing TSnew to be any much larger than TSlast. I don't see this opening any new attacks, and it fixes the scenario above. ------------------ Jari Arkko: At first I thought it was more serious, but the rule for new nodes limits the starting time to be within Delta of the real time. If that had not been the case, an attacker could have replayed any old message, and made it impossible for the real node to communicate using its current time value. Still, even with the new node rule we have the possibility that the Delta difference is larger than the Drift difference, making the attack you mention possible. I agree about the the fix that you propose. ------------------ Jon Wood: I agree that this would help in a non-attack case, where you have a host talking with many other hosts with poorly sync'd clocks. However... Maybe I just need some coffee this morning, but I still don't see how this would prevent an attack. Even if you reduce the delta to a pretty small value, on many link layers an attacker should still be able to flush a victim's cache in less than 1 second, (for example, by using a large number of pre-computed CGAs and flooding the victim with NS msgs) and then execute a replay attack. What am I missing? ------------------ Jari Arkko: I agree with Jon that it seems like it is still possible to fill the cache. The point is that attackers can come up with new (perhaps precomputed) CGA addresses, and the timestamp cache needs to be per source in multicast messages. Tuomas' CGA generation formula does prevent the attack if Sec > 0, but we may not be able to rely on that, or can we? But there seems to be two ways to deal with unbounded caches, like the ones that exist for tracking something related to a the source address of a packet. 1. Throw out entries. 2. Reduce the allowed clock difference. In the past, I have assumed we are doing #2. Though I can't claim I have thought it through fully. Here's what #2 would do: if there is no lack of space, accept all messages and store all cache entries that have used a timestamp within Delta. If you can store only half of the entries that would be required for this, reduce Delta to half and remove those entries that were furthest away from the node's own time. If the attacker is sending you a constant stream of messages from new source addresses with exactly the right time, reduce Delta to 0. At this point, you can still communicate with legitimate peers, but only if they have exactly the same clock as you do. When new space becomes available, Delta can again be increased. Does this work? You tell me... ------------------ Pekka Nikander: I think that we basically have two distinct attack scenarios: 1. There are a number of hosts on a link, and someone launches an attack. The goal here is clearly make sure that the hosts can continue to communicate even if the attack is going on. 2. There is an attack going on, and a new host arrives to the link. The goal here is to make it possible for the new host to become attached to the network, inspite of the attack. From this point of view, it is clearly better to be very selective in how to throw out entries. Reducing Delta is very discriminative against those hosts that have a large clock difference, while an attacker can reduce its clock difference into arbitrarily small. Throwing out old entries just because their clock difference is large seems like a bad approach. In my opinion, the exact algorithm for expiring cache entries in the case of a full cache is clearly a local policy issue, and should not be specified in the document. However, it might be a good idea to give some guidance for implementors. It also looks like that all the previous ideas of reducing Delta are really bad, since the attacker can easily select its own Delta, and make it whatever small. A better idea seems to be to have a separate cache space for new entries and old entries, and under an attack more eagerly drop new cache entries than old ones. One could track traffic, and only allow those new entries that receive genuine traffic to be converted into old cache entries. It also looks like a good idea to consider the sec parameter when forcing cache entries out, and let those entries with a larger sec a higher chance of staying in. ------------------ Pekka Nikander: > I think it is becoming apparent that a method that will fully prevent > replay attacks will be very complex to implement and administer. I don't believe it is possible to fully prevent reply attacks, given all the other requirements we have. Hence, the real question is to find out what is reasonable effort and what is clearly too little. Furthermore, as I wrote ... >> In my opinion, the exact algorithm for expiring cache entries >> in the case of a full cache is clearly a local policy issue,... ... we probably do not want to specify too much in the doc. More like hints and concerns than a recommendation for a full blown algorithm. >> ... seems to be to have a separate cache space for new entries and >> old entries, and under an attack more eagerly drop new cache >> entries than old ones. One could track traffic, and only allow >> those new entries that receive genuine traffic to be converted >> into old cache entries. >> >> It also looks like a good idea to consider the sec parameter >> when forcing cache entries out, and let those entries with >> a larger sec a higher chance of staying in. > While such a scheme would make attacks harder, it would not > fully prevent them. For example, an attacker could send a little > traffic (i.e. a ping or TCP syn) after each NS to trick the victim into > promoting its cache entry to the old cache. Right. The whole point is to make the effort required by an attacker higher. If an attacker has to generate a number of keys with sec > 0, change its MAC address, send not only unsoliciated messages but also trigger solications and answer them, and also send some regular traffic, it needs to do quite a lot of work. That reduces the "appeal" of the cache filling attacks, probably leading a potential attacker to try to find a weaker spot somewhere else. > Perhaps we should drop the cache idea altogether, and just use > timestamp delta checking? The security provided, while not perfect, > could still be reasonably good (depending on the local policy). It > would also be very simple for locals admins to understand and tweak. I think having Delta and cache dropping separately seem like a good idea. It clearly improves resiliance in many situations. For example, it even allows hosts with a large clock difference to communicate if they somehow are able to create cache entries through solicited (nonce carrying) packets. On the other hand, I think we can leave all these concerns as SHOULD, and leave the implementation the freedom to ignore them and just use e.g. Delta, if really needed. ------------------ Jari Arkko: > From this point of view, it is clearly better to be very > selective in how to throw out entries. Reducing Delta > is very discriminative against those hosts that have a > large clock difference, while an attacker can reduce its > clock difference into arbitrarily small. Throwing out old > entries just because their clock difference is large seems > like a bad approach. Ok, I'm convinced. > In my opinion, the exact algorithm for expiring cache entries > in the case of a full cache is clearly a local policy issue, > and should not be specified in the document. However, it might > be a good idea to give some guidance for implementors. > > It also looks like that all the previous ideas of reducing > Delta are really bad, since the attacker can easily select > its own Delta, and make it whatever small. A better idea > seems to be to have a separate cache space for new entries and > old entries, and under an attack more eagerly drop new cache > entries than old ones. One could track traffic, and only allow > those new entries that receive genuine traffic to be converted > into old cache entries. Ok. We could perhaps say the following: "When there is a very large number of hosts on the same link, or when an attack is in progress, it is possible that the cache holding the most recent timestamp per sender becomes full. In this case the node MUST remove some entries from the cache or refuse some new requested entries. The specific policy as to which entries are preferred over the others is left as an implementation decision. However, typical policies may prefer existing entries over new ones, CGAs with a large Sec value over smaller Sec values, and so on." > It also looks like a good idea to consider the sec parameter > when forcing cache entries out, and let those entries with > a larger sec a higher chance of staying in. Yes. ------------------ Jon Wood: > From this point of view, it is clearly better to be very >> selective in how to throw out entries. Reducing Delta >> is very discriminative against those hosts that have a >> large clock difference, while an attacker can reduce its >> clock difference into arbitrarily small. Throwing out old >> entries just because their clock difference is large seems >> like a bad approach. I agree. >> >> In my opinion, the exact algorithm for expiring cache entries >> in the case of a full cache is clearly a local policy issue, >> and should not be specified in the document. However, it might >> be a good idea to give some guidance for implementors. >> >> It also looks like that all the previous ideas of reducing >> Delta are really bad, since the attacker can easily select >> its own Delta, and make it whatever small. A better idea >> seems to be to have a separate cache space for new entries and >> old entries, and under an attack more eagerly drop new cache >> entries than old ones. One could track traffic, and only allow >> those new entries that receive genuine traffic to be converted >> into old cache entries. >> >> It also looks like a good idea to consider the sec parameter >> when forcing cache entries out, and let those entries with >> a larger sec a higher chance of staying in. >> While such a scheme would make attacks harder, it would not fully prevent them. For example, an attacker could send a little traffic (i.e. a ping or TCP syn) after each NS to trick the victim into promoting its cache entry to the old cache. I think it is becoming apparent that a method that will fully prevent replay attacks will be very complex to implement and administer. Perhaps we should drop the cache idea altogether, and just use timestamp delta checking? The security provided, while not perfect, could still be reasonably good (depending on the local policy). It would also be very simple for locals admins to understand and tweak. ------------------ Jon Wood: > On the other hand, I think we can leave all these concerns > as SHOULD, and leave the implementation the freedom to ignore > them and just use e.g. Delta, if really needed. This pretty much works for me. Right now my feeling is that the complexity of developing a reasonably effective cache scheme would outweigh its security benefits, so I would be a little happier with "MAY" rather than "SHOULD". ------------------ Pekka Nikander: I have now tried to edit in the results of the dicussions on Issues 11 and 22. It has resulted in a new combined section for the Nonce and Timestamp options, and an appendix discussing the issue somewhat longer. I am including the new text below. If there are no comments, I will consider these issues closed. --Pekka Nikander 5.4 Timestamp and Nonce options 5.4.1 Timestamp Option 5.4.2 Nonce Option 5.4.3 Processing rules for senders All solicitation messages MUST include a Nonce. All solicited-for announcements MUST include a Nonce, copying the nonce value from the received solicitation. When sending a solication, the sender MUST store the nonce internally so that it can recognize any replies containing that particular nonce. All NDP messages MUST include a Timestamp. Senders SHOULD set the Timestamp field to the current time, according to their real time clock. If a message has both Nonce and a Timestamp options, the Nonce option MUST precede the Timestamp option in order. 5.4.4 Processing rules for receivers The processing of the Nonce and Timestamp options depends on whether a packet is a solicited-for advertisement or not. A system may implement the distinction in various means. Section 5.4.4.1 defines the processing rules for solicited-for advertisements. Section 5.4.4.2 defines the processing rules for all other messages. When there is a very large number of hosts on the same link, or when a cache filling attack is in progress, it is possible that the cache holding the most recent timestamp per sender becomes full. In this case the node MUST remove some entries from the cache or refuse some new requested entries. The specific policy as to which entries are preferred over the others is left as an implementation decision. However, typical policies may prefer existing entries over new ones, CGAs with a large Sec value over smaller Sec values, and so on. The issue is briefly discussed in Appendix C. 5.4.4.1 Processing solicited-for advertisements The receiver MUST verify that it has recently send a matching solicitation, and that the received advertisement does contain a copy of the Nonce sent in the solicitation. If the message does not contain a Nonce option, it MAY be considered as a non-solicited-for announcement, and processed according to Section 5.4.4.2. If the message does contain a Nonce option, but the Nonce value is not recognized, the message MUST be silently dropped. If the message is accepted, the receiver SHOULD store the receive time of the message and the time stamp time in the message, as specified in Section 5.4.4.2 5.4.4.2 Processing all other messages Receivers SHOULD be configured with an allowed timestamp Delta value and an allowed clock drift parameter. The recommended default value for the allowed Delta is 3,600 seconds (1 hour) and 1% for clock dritf. To facilitate timestamp checking, each node SHOULD store the following information per each peer: The receive time of the last received, acepted ND message. This is called RDlast. The time stamp in the last received, accepted ND message. This is called TSlast. Receivers SHOULD then check the Timestamp field as follows: o When a message is received from a new peer, i.e., one that is not stored in the cache, the received timestamp, TSnew, is checked and the packet is accepted if the timestamp is recent enough with respect to the receival time of the packet, RDnew: -Delta < (RDnew - TSnew) < +Delta The RDnew and TSnew values SHOULD be stored into the cache as RDlast and TSlast. o If the timestamp is NOT within the boundaries but the message is a Neighbor Solicitation message that should be responded to by the receiver, the receiver MAY respond to the message. However, if it does respond to the message, it MUST NOT create a neighbor cache entry. This allows hosts that have large difference in their clocks to still communicate with each other, by exchanging NS/NA pairs. o When a message is received from a known peer, i.e., one that already has an entry in the cache, the time stamp is check against the last received message: TSnew > TSlast + (RDnew - RDlast) x (1 - drift) o If TSnew < TSlast, which is possible if packets arrive rapidly and out of order, TSlast MUST NOT be updated, i.e., the stored TSlast for a given host MUST NOT ever decrease. Otherwise TSlast SHOULD be updated. RDlast MUST be updated in any case. Appendix C. Cache Management In this section we outline a cache management algorithm that allows a host to remain partially functional even under a cache filling DoS attack. This appendix is informational, and real implementations SHOULD use different algorithms in order to avoid he dangers of monocultural code. There are at least two distinct cache related attack scenarios: 1. There are a number of hosts on a link, and someone launches a cache filling attack. The goal here is clearly make sure that the hosts can continue to communicate even if the attack is going on. 2. There is already a cache filling attack going on, and a new host arrives to the link. The goal here is to make it possible for the new host to become attached to the network, inspite of the attack. From this point of view, it is clearly better to be very selective in how to throw out entries. Reducing the timestamp Delta value is very discriminative against those hosts that have a large clock difference, while an attacker can reduce its clock difference into arbitrarily small. Throwing out old entries just because their clock difference is large seems like a bad approach. A reasonable idea seems to be to have a separate cache space for new entries and old entries, and under an attack more eagerly drop new cache entries than old ones. One could track traffic, and only allow those new entries that receive genuine traffic to be converted into old cache entries. While such a scheme will make attacks harder, it will not fully prevent them. For example, an attacker could send a little traffic (i.e. a ping or TCP syn) after each NS to trick the victim into promoting its cache entry to the old cache. Hence, the host may be more intelligent in keeping its cache entries, and not just have a black/white old/new boundary. It also looks like a good idea to consider the sec parameter when forcing cache entries out, and let those entries with a larger sec a higher chance of staying in. ------------------ ------------------ ------------------ ------------------ ------------------ ------------------