Francis Dupont writes and Jari Arkko responds:
>    If anti-replay has been enabled, receivers MUST be configured with an
>    allowed Delta value and maintain a cache of messages received within
>    this time period from each specific source address.
> 
>   => is it really important to support not-in-order messages? Reordering
>   link-layers are not so frequent, and we talk about ND...

I don't know. Seems like for solicited case nonces take care of all
this, and for unsolicited case.... if I receiver reordered adverts, I'm
not sure I care. But wait: what if a RA is split due to MTU and large
content, we wouldn't want to drop the reordered "fragment" of the RA.

>       Recommended default value for the allowed Delta is 3,600 seconds.
> 
>   => this is far too large, especially for a single mechanism.

Can you justify this? Too much memory, or what? We were thinking that
most machines that are at all in the right time, are within this Delta...

>    o  A packet that passes both of the above tests MUST be registered in
>       the cache for the given source address.
> 
>   => it is too soon to register it in the cache: the packet should pass
>   the AH signature check too.

Yes.

>    o  If the cache becomes full, the receiver SHOULD temporarily reduce
>       the Delta value for that source address so that all messages
>       within that value can still be stored.
> 
>   => this doesn't solve the problem but opens doors to DoS...
> 
>   I propose for not reordering link-layers (the reordering case can be
>   supported with small modifs but this should be clearer):
>    - a per-peer (using the source address of received messages as the index)
>      cache entry with:
>     * last received time stamp (TSlast)
>     * date of reception of last message (RDlast)
> 
>    - when a message is received from a new peer (i.e., a new source),
>      the packet is accepted if the timestamp is in the range:
>      - Max_delta < RDnew - TSnew < + Max_delta
>      An entry is created when the signature check passes.
>      
>    - when a message is received from a known peer, the time stamp (TSnew)
>      is checked against the last valid message:
>       TSlast + (RDnew - RDlast) x (1 - allowed_drift) < TSnew and
>       TSnew < TSlast + (RDnew - RDlast) x (1 + allowed_drif)
> 
>   - parameters are max_Delta and allowed_drift (10% ?)
> 
>   For reordering link-layer (i.e., complete rules) we need:
>    - max_link_layer_delay in the last rules
>    - retain only the valid message with the higher timestamp.

This sounds pretty good. What do others think?

>  - 7.1.6 Processing Rules for Receivers (comment)
> 
>    Packets that do not pass all the above tests MUST be silently
>    discarded.
> 
>   => this is the place where to update the timestamp cache. Note that
>   "silently discarded" should imply "doesn't update the cache"...

Ok.

>  - 13.2.5 Replay Attacks (comment/wording)
> 
>    Since most SEND nodes are likely to use fairly coarse grained
>    timestamps, as explained in Section 7.1.4, this may affect some
>    nodes.
> 
>   => a millisecond is not so coarse grained. IMHO the statement
>   is more about the fairly large one hour delta...

But the delta does not affect replay attacks, only memory usage.

------------------

Pekka Nikander: My (perhaps flawed) analysis of the issue boils down
to two points:

  1. The value for the timestamp delta

  2. What to do if the ND cache becomes full

I'll describe these separate messages, focusing on the
full cache situation here.

The current specifications states the following

> o If the cache becomes full, the receiver SHOULD temporarily reduce
>   the Delta value for that source address so that all messages
>   within that value can still be stored.

Francis Dupont has proposed an alternative method for handling the
situation.  Trying to rephrase his suggestion:

  o Per each peer, store the following information:
    * the receive time of the last received ND message (RDlast)
    * the time stamp in the last received ND message (TSlast)

  o When a message is received from a new peer, i.e. one
    that is not stored in the cache, the received timestamp
    is checked and the packet is accepted if the timestamp
    is recent enough:

        -Delta < RDnew - TSnew < +Delta

    where RDnew is the local time of the received packet
    delivery and TSnew is the timestamp in the new packet.

    A new cache entry is created if the other checks, including
    the signature check, are passed.

  o When a message is received from a known peer, i.e. one
    that has an entry in the cache, the time stamp is
    checked against the last received message:

       TSnew > TSlast + (RDnew - RDlast) x (1 - drift)
       TSnew < TSlast + (RDnew - RDlast) x (1 + drift)

    The suggested value for /drift/ is 10%, allowing clock skew
    of at most 10% between the hosts.

    If TSnew < TSlast, which is possible if packets arrive
    rapidly and out of order, TSlast MUST NOT be updated, i.e.,
    the stored TSlast for a given host MUST NOT ever decrease.

Now, while I agree that Francis' suggestion makes it clearer
when to accept new packets, I don't immediately see how it
helps with the cache getting full.  On the other hand, if
we start dropping the cache entries with the oldest RDlast,
and at the same time temporarily reduce Delta so much that
the dropped entries would not be accepted again, that might
help.

Second issue: 

The current specifications states the following

>  Recommended default value for the allowed Delta is 3,600 seconds.

Francis Dupond made a comment that this is far too large.

The function of the Delta value can be described as follows:

 - A larger Delta allows hosts with differently running
   clocks to communicate, while a small Delta requires
   better clock synchronization.

 - A larger Delta potentially requires more memory, since
   it may require the host to remember more other hosts.

 - A larger Delta exposes the host to replay attacks in the
   case that its cache becomes full, thereby requiring it
   to accept ND messages from new hosts.

The situation that seems to be most worrisome is one where
an attacker fulfils the cache, thereby forcing the host to
drop some cache entries, and then launches a replay attack.

It looks like the suggestion in the Part 1 message of this
issue plugs the attack.  If so, the exact value of Delta
is not a concern from the memory usage and DoS/replay attack
point of view.

Hence, it looks like that the default value for Delta becomes
mostly a policy issue.  On the other hand, there seems to be
some possible replay attack scenarios against nodes that arrive
to a new link.  (More on that on a separate message).

Anyway, I suggest that if we can change the caching algorithm
so that the default Delta value is a pure policy issue, we
would keep 3600 seconds or even use a larger value.

------------------

Pekka Nikander: Doesn't this open a potential replay attack against a
new node that arrives to a link.

Let us assume that an attacker is on the link, and collects
ND messages.   Let us further assume that one node on the
link, say Alice, changes its link layer address.

Now a new node, say Bob arrives to the link, within the
Delta time from Alice changing her address.

If the attacker notices Bob arriving before Alice, it can
Alice's earlier recorded message that still uses the old
link layer address.  This causes Bob to record TSlast and
RDlast from the message, together with the old link layer
address

When Alice now notices Bob's arrival, she will send a
new message.  However, the new message will not pass
the check since apparently her clock has drifted more than
what is allowed.  That is, (RDnew - RDlast) is small due
to the replayed message from which RDlast was recorded,
but TSnew - TSlast will be much larger, since TSlast was
the timestamp in the replayed message.

Consequently, I propose that we remove the latter check
from the second rule, allowing TSnew to be any much larger
than TSlast.  I don't see this opening any new attacks,
and it fixes the scenario above. 

------------------

Jari Arkko: At first I thought it was more serious, but the rule
for new nodes limits the starting time to be within
Delta of the real time. If that had not been the case,
an attacker could have replayed any old message, and
made it impossible for the real node to communicate
using its current time value.

Still, even with the new node rule we have the possibility
that the Delta difference is larger than the Drift difference,
making the attack you mention possible.

I agree about the the fix that you propose.

------------------

Jon Wood: I agree that this would help in a non-attack case, where you
have a host talking with many other hosts with poorly sync'd clocks.
However...

Maybe I just need some coffee this morning, but I still don't see how
this would prevent an attack. Even if you reduce the delta to a pretty
small value, on many link layers an attacker should still be able to
flush a victim's cache in less than 1 second, (for example, by using
a large number of pre-computed CGAs and flooding the victim with
NS msgs) and then execute a replay attack.

What am I missing? 

------------------

Jari Arkko: I agree with Jon that it seems like it is still possible
to fill the cache. The point is that attackers can come up with new
(perhaps precomputed) CGA addresses, and the timestamp cache needs to
be per source in multicast messages. Tuomas' CGA generation formula
does prevent the attack if Sec > 0, but we may not be able to rely on
that, or can we?

But there seems to be two ways to deal with unbounded
caches, like the ones that exist for tracking something
related to a the source address of a packet.

1. Throw out entries.
2. Reduce the allowed clock difference.

In the past, I have assumed we are doing #2. Though I can't
claim I have thought it through fully. Here's what #2 would
do: if there is no lack of space, accept all messages and store
all cache entries that have used a timestamp within Delta. If
you can store only half of the entries that would be required
for this, reduce Delta to half and remove those entries that
were furthest away from the node's own time. If the attacker
is sending you a constant stream of messages from new source
addresses with exactly the right time, reduce Delta to 0.
At this point, you can still communicate with legitimate peers,
but only if they have exactly the same clock as you do. When
new space becomes available, Delta can again be increased.

Does this work? You tell me... 

------------------

Pekka Nikander: I think that we basically have two distinct attack
scenarios:

  1. There are a number of hosts on a link, and someone
     launches an attack.  The goal here is clearly make
     sure that the hosts can continue to communicate even
     if the attack is going on.

  2. There is an attack going on, and a new host arrives
     to the link.  The goal here is to make it possible
     for the new host to become attached to the network,
     inspite of the attack.

From this point of view, it is clearly better to be very
selective in how to throw out entries.  Reducing Delta
is very discriminative against those hosts that have a
large clock difference, while an attacker can reduce its
clock difference into arbitrarily small.  Throwing out old
entries just because their clock difference is large seems
like a bad approach.

In my opinion, the exact algorithm for expiring cache entries
in the case of a full cache is clearly a local policy issue,
and should not be specified in the document.  However, it might
be a good idea to give some guidance for implementors.

It also looks like that all the previous ideas of reducing
Delta are really bad, since the attacker can easily select
its own Delta, and make it whatever small.  A better idea
seems to be to have a separate cache space for new entries and
old entries, and under an attack more eagerly drop new cache
entries than old ones.  One could track traffic, and only allow
those new entries that receive genuine traffic to be converted
into old cache entries.

It also looks like a good idea to consider the sec parameter
when forcing cache entries out, and let those entries with
a larger sec a higher chance of staying in.

------------------

Pekka Nikander:

> I think it is becoming apparent that a method that will fully prevent
> replay attacks will be very complex to implement and administer.

I don't believe it is possible to fully prevent reply attacks,
given all the other requirements we have.  Hence, the real
question is to find out what is reasonable effort and what
is clearly too little.

Furthermore, as I wrote ...

>> In my opinion, the exact algorithm for expiring cache entries
>> in the case of a full cache is clearly a local policy issue,...

... we probably do not want to specify too much in the doc.
More like hints and concerns than a recommendation for a
full blown algorithm.

>> ... seems to be to have a separate cache space for new entries and
>> old entries, and under an attack more eagerly drop new cache
>> entries than old ones.  One could track traffic, and only allow
>> those new entries that receive genuine traffic to be converted
>> into old cache entries.
>>
>> It also looks like a good idea to consider the sec parameter
>> when forcing cache entries out, and let those entries with
>> a larger sec a higher chance of staying in.

> While such a scheme would make attacks harder, it would not
> fully prevent them. For example, an attacker could send a little
> traffic (i.e. a ping or TCP syn) after each NS to trick the victim into
> promoting its cache entry to the old cache.

Right.  The whole point is to make the effort required by an
attacker higher.  If an attacker has to generate a number of
keys with sec > 0, change its MAC address, send not only
unsoliciated messages but also trigger solications and answer
them, and also send some regular traffic, it needs to do quite
a lot of work.  That reduces the "appeal" of the cache filling
attacks, probably leading a potential attacker to try to find
a weaker spot somewhere else.

> Perhaps we should drop the cache idea altogether, and just use
> timestamp delta checking? The security provided, while not perfect,
> could still be reasonably good (depending on the local policy).  It
> would also be very simple for locals admins to understand and tweak.

I think having Delta and cache dropping separately seem like a
good idea.  It clearly improves resiliance in many situations.
For example, it even allows hosts with a large clock difference
to communicate if they somehow are able to create cache entries
through solicited (nonce carrying) packets.

On the other hand, I think we can leave all these concerns
as SHOULD, and leave the implementation the freedom to ignore
them and just use e.g. Delta, if really needed.

------------------

Jari Arkko:

>  From this point of view, it is clearly better to be very
> selective in how to throw out entries.  Reducing Delta
> is very discriminative against those hosts that have a
> large clock difference, while an attacker can reduce its
> clock difference into arbitrarily small.  Throwing out old
> entries just because their clock difference is large seems
> like a bad approach.


Ok, I'm convinced.

> In my opinion, the exact algorithm for expiring cache entries
> in the case of a full cache is clearly a local policy issue,
> and should not be specified in the document.  However, it might
> be a good idea to give some guidance for implementors.
>
> It also looks like that all the previous ideas of reducing
> Delta are really bad, since the attacker can easily select
> its own Delta, and make it whatever small.  A better idea
> seems to be to have a separate cache space for new entries and
> old entries, and under an attack more eagerly drop new cache
> entries than old ones.  One could track traffic, and only allow
> those new entries that receive genuine traffic to be converted
> into old cache entries.


Ok. We could perhaps say the following: "When there is a very
large number of hosts on the same link, or when an attack is
in progress, it is possible that the cache holding the most recent
timestamp per sender becomes full. In this case the node MUST
remove some entries from the cache or refuse some new requested
entries. The specific policy as to which entries are preferred
over the others is left as an implementation decision. However,
typical policies may prefer existing entries over new ones,
CGAs with a large Sec value over smaller Sec values, and so on."

> It also looks like a good idea to consider the sec parameter
> when forcing cache entries out, and let those entries with
> a larger sec a higher chance of staying in.


Yes.

------------------

Jon Wood: 
> From this point of view, it is clearly better to be very
>> selective in how to throw out entries.  Reducing Delta
>> is very discriminative against those hosts that have a
>> large clock difference, while an attacker can reduce its
>> clock difference into arbitrarily small.  Throwing out old
>> entries just because their clock difference is large seems
>> like a bad approach.


I agree.


>> 
>> In my opinion, the exact algorithm for expiring cache entries
>> in the case of a full cache is clearly a local policy issue,
>> and should not be specified in the document.  However, it might
>> be a good idea to give some guidance for implementors.
>> 
>> It also looks like that all the previous ideas of reducing
>> Delta are really bad, since the attacker can easily select
>> its own Delta, and make it whatever small.  A better idea
>> seems to be to have a separate cache space for new entries and
>> old entries, and under an attack more eagerly drop new cache
>> entries than old ones.  One could track traffic, and only allow
>> those new entries that receive genuine traffic to be converted
>> into old cache entries.
>> 
>> It also looks like a good idea to consider the sec parameter
>> when forcing cache entries out, and let those entries with
>> a larger sec a higher chance of staying in.
>> 


While such a scheme would make attacks harder, it would not
fully prevent them. For example, an attacker could send a little
traffic (i.e. a ping or TCP syn) after each NS to trick the victim into
promoting its cache entry to the old cache.

I think it is becoming apparent that a method that will fully prevent
replay attacks will be very complex to implement and administer.

Perhaps we should drop the cache idea altogether, and just use
timestamp delta checking? The security provided, while not perfect,
could still be reasonably good (depending on the local policy).  It
would also be very simple for locals admins to understand and tweak.

------------------

Jon Wood:

> On the other hand, I think we can leave all these concerns
> as SHOULD, and leave the implementation the freedom to ignore
> them and just use e.g. Delta, if really needed.

This pretty much works for me. Right now my feeling is that the
complexity of developing a reasonably effective cache scheme
would outweigh its security benefits, so I would be a little happier
with "MAY" rather than "SHOULD". 

------------------

Pekka Nikander: 

I have now tried to edit in the results of the dicussions
on Issues 11 and 22.  It has resulted in a new combined
section for the Nonce and Timestamp options, and an
appendix discussing the issue somewhat longer.

I am including the new text below.  If there are no comments,
I will consider these issues closed.

--Pekka Nikander

5.4 Timestamp and Nonce options

5.4.1 Timestamp Option

 <structure same as before>

5.4.2 Nonce Option

 <structure same as before>

5.4.3 Processing rules for senders

   All solicitation messages MUST include a Nonce.  All solicited-for
   announcements MUST include a Nonce, copying the nonce value from the
   received solicitation.  When sending a solication, the sender MUST
   store the nonce internally so that it can recognize any replies
   containing that particular nonce.

   All NDP messages MUST include a Timestamp.  Senders SHOULD set the
   Timestamp field to the current time, according to their real time
   clock.

   If a message has both Nonce and a Timestamp options, the Nonce option
   MUST precede the Timestamp option in order.

5.4.4 Processing rules for receivers

   The processing of the Nonce and Timestamp options depends on whether
   a packet is a solicited-for advertisement or not. A system may
   implement the distinction in various means. Section 5.4.4.1 defines
   the processing rules for solicited-for advertisements.  Section
   5.4.4.2 defines the processing rules for all other messages.

   When there is a very large number of hosts on the same link, or when
   a cache filling attack is in progress, it is possible that the cache
   holding the most recent timestamp per sender becomes full.  In this
   case the node MUST remove some entries from the cache or refuse some
   new requested entries.  The specific policy as to which entries are
   preferred over the others is left as an implementation decision.
   However, typical policies may prefer existing entries over new ones,
   CGAs with a large Sec value over smaller Sec values, and so on.  The
   issue is briefly discussed in Appendix C.

5.4.4.1 Processing solicited-for advertisements

   The receiver MUST verify that it has recently send a matching
   solicitation, and that the received advertisement does contain a
   copy of the Nonce sent in the solicitation.

   If the message does not contain a Nonce option, it MAY be
   considered as a non-solicited-for announcement, and processed
   according to Section 5.4.4.2.

   If the message does contain a Nonce option, but the Nonce value is
   not recognized, the message MUST be silently dropped.

   If the message is accepted, the receiver SHOULD store the receive
   time of the message and the time stamp time in the message, as
   specified in Section 5.4.4.2

5.4.4.2 Processing all other messages

   Receivers SHOULD be configured with an allowed timestamp Delta value
   and an allowed clock drift parameter.  The recommended default value
   for the allowed Delta is 3,600 seconds (1 hour) and 1% for clock
   dritf.

   To facilitate timestamp checking, each node SHOULD store the
   following information per each peer:

      The receive time of the last received, acepted ND message.  This
      is called RDlast.

      The time stamp in the last received, accepted ND message.  This is
      called TSlast.

   Receivers SHOULD then check the Timestamp field as follows:

   o  When a message is received from a new peer, i.e., one that is not
      stored in the cache, the received timestamp, TSnew, is checked and
      the packet is accepted if the timestamp is recent enough with
      respect to the receival time of the packet, RDnew:

        -Delta < (RDnew - TSnew) < +Delta

       The RDnew and TSnew values SHOULD be stored into the cache as
      RDlast and TSlast.

   o  If the timestamp is NOT within the boundaries but the message is a
      Neighbor Solicitation message that should be responded to by the
      receiver, the receiver MAY respond to the message.  However, if it
      does respond to the message, it MUST NOT create a neighbor cache
      entry.  This allows hosts that have large difference in their
      clocks to still communicate with each other, by exchanging NS/NA
      pairs.

   o  When a message is received from a known peer, i.e., one that
      already has an entry in the cache, the time stamp is check against
      the last received message:

        TSnew > TSlast + (RDnew - RDlast) x (1 - drift)

   o  If TSnew < TSlast, which is possible if packets arrive rapidly and
      out of order, TSlast MUST NOT be updated, i.e., the stored TSlast
      for a given host MUST NOT ever decrease.  Otherwise TSlast SHOULD
      be updated.  RDlast MUST be updated in any case.

Appendix C. Cache Management

   In this section we outline a cache management algorithm that allows a
   host to remain partially functional even under a cache filling DoS
   attack.  This appendix is informational, and real implementations
   SHOULD use different algorithms in order to avoid he dangers of
   monocultural code.

   There are at least two distinct cache related attack scenarios:

   1.  There are a number of hosts on a link, and someone launches a
       cache filling attack.  The goal here is clearly make sure that
       the hosts can continue to communicate even if the attack is going
       on.

   2.  There is already a cache filling attack going on, and a new host
       arrives to the link.  The goal here is to make it possible for
       the new host to become attached to the network, inspite of the
       attack.

   From this point of view, it is clearly better to be very selective in
   how to throw out entries.  Reducing the timestamp Delta value is very
   discriminative against those hosts that have a large clock
   difference, while an attacker can reduce its clock difference into
   arbitrarily small.  Throwing out old entries just because their clock
   difference is large seems like a bad approach.

   A reasonable idea seems to be to have a separate cache space for new
   entries and old entries, and under an attack more eagerly drop new
   cache entries than old ones.  One could track traffic, and only allow
   those new entries that receive genuine traffic to be converted into
   old cache entries.  While such a scheme will make attacks harder, it
   will not fully prevent them. For example, an attacker could send a
   little traffic (i.e. a ping or TCP syn) after each NS to trick the
   victim into promoting its cache entry to the old cache.  Hence, the
   host may be more intelligent in keeping its cache entries, and not
   just have a black/white old/new boundary.

   It also looks like a good idea to consider the sec parameter when
   forcing cache entries out, and let those entries with a larger sec a
   higher chance of staying in. 

------------------

------------------

------------------

------------------

------------------

------------------