Previous failure model
  - failstop failure: machines crash or are unavailable.
  - failure detector sends heartbeat message, and if no response,
    removes the unreachable node
  - so detector might be overzealous, but it _will_ eventually catch
    all failed nodes

Byzantine failure model
  - nodes fail in completely arbitrary ways
  - adversarial model: someone breaks code, and messes you up in worst way
    possible
  - bugs, misconfigurations are also detectable in this model
  - depends on non-correlation of failures.  like failstop model, where
    you assume not all nodes fail at once, here you assume N different copies
    of code/operating system/root passwords, so they won't fail in the same way.

No way to write failure detector for byzantine model
  - an adversary will just hack code to respond in way that failure detector
    expects, so you'd never notice a difference

Initial setup is described below:

Same primary-backup model for replication
  - primary sends all requests to backups
  - backups execute them
  - backups reply to primary
  - primary replies to client
  - require responses from _all_ of the nodes before responding

View change---when a node goes down/fails, run paxos to agree on new view

Recovery from failure---state transfer from a working copy
  - makes rollback easy, since you don't know who will become new primary
    when paxos elects a new one
  - since you don't reply to client until all backups succeed, the client will
    be none-the-wiser
  - if we only required majority of backups to respond to primary on state
    changes, the replication would have to go to majority of them to do
    state transfer
    
How many replicas to tolerate f failstop faults?
  2f+1---f fail, and f+1 will give majority of correct response.

How many replicas to tolerate f byzantine faults?
  3f+1---if f are split from the network, and f are faulty, then f+1 will remain
    to give you the truth

So how do we change protocol a bit to handle all of these failures
  - each node cryptographically signs its responses so that client can verify
    primary isn't lying
  - primary responds to client after hearing from 2f+1 of the clients
  - since primary might be malicious, it might send two different state changes
    to two backups.  The backups only talk to the primary, so they wouldn't
    know that they were inconsistent.  This is solved by a pre-prepare message.
  - view change must change to handle BFT
    
Pre-prepare message
  - client sends operation to primary
  - primary sends pre-prepare to each replica
  - replicas respond with signed prepare message saying they are willing
    to assign the view stamp to a given operation
  - primary responds to all replicas with 2f+1 prepare messages.  Now
    replicas can run operation, after ensuring that primary's public key
    matches, that all previous operations have run, and that viewstamp is
    accurate
  - replicas respond to primary saying they committed results.  primary responds
    to them saying which 2f+1 commiters it received.  Then 2f+1 of the replicas
    respond to the client, and now everyone knows that no one faulty node
    could have harmed the system

Note: as an optimization, get rid of pre-prepare->prepare step by having all
  replicas multicast their message to all other replicas, at which point
  they will know they are all good to commit without an extra round trip to
  the primary
  
Paper mentions optimizations for multiple reads in a row, and for batching a
  bunch of operations into one to avoid the round trips.
  
How must view change be different to handle BFT?
  - step through primaries in round-robin fashion on each view change.
  - a new primary issues a <DO VIEW CHANGE> to everyone, and everyone
    responds with signed cerificate of all prepared messages that
    they committed
  - at least 2f+1 nodes respond saying they want a new view change, and
    primary sends to everyone
    <NEW VIEW, list of 2f+1 other replicas' DO VIEW CHANGE messages saying which preprepares happened>
  - primary also sends <PREPREPARE, request...> for any changes that might need
    to be replayed on some nodes that didn't get commit.  Replicas that
    already committed change won't re-run it.

Note: unlike paxos, which needs to agree on a value (requiring a third step),
  we know the value here---round robin voting---so we just have to agree to
  move to next view.

Since we can treat failures as byzantine, you don't have to sync to disk
  on each commit.  You have to sync on view change or checkpoint, but in
  normal operation, you save from having to sync to disk after each write, since
  your replicas help you tolerate failures (assuming at least f+1 stay alive in
  a 3f+1-node system).