6.824 Lecture 4: Distributed Programming: Distributed objects

Last lecture: two designs for writing parallel applications on
distributed computer systems.  Today we see another design for
distributed programming: distributed objects, which is targetted to
writing client/server applications.

Outline:
  RPC + threads - how do they interact?
  Why is RPC not sufficient?
  Distributed objects goals
  Java RMI

=Threads + RPC=
say you have code
test()
  lock(m)
  rpc_call(...)
  unlock(m)
is that reasonable?  what can go wrong?

more concretely:
lock(m_list)
for (iter i = list.begin(); !end_of_list; i++)
  i = rpc_call(...)
unlock(m_list)

problems: 
1) distributed deadlock---what if server has a callback to you, but
  they can't update the list, since the callback is in a different thread,
  and can't aquire m_list.
2) rpcs are slow, so you kill concurrency even more

There are many ways to get around this, but essentially, you should
be aware of the race conditions or deadlocks that can come up with
local locks and distributed rpc calls.

=RMI=
Why is RPC not sufficient? Let's look at YFS RPC (admittedly a bit
primitive, but nevertheless):
  programmer has to write stubs
  one request may cause a handler to be invoked many times
  few data structures can be passed to the client or server
    for example, can you pass a C++ object?  
    a pointer and dereference it remotely?
  programmer must design a scheme for naming remote objects
    for name design, we need to maintain mapping from name to object
      locks: string name
      extents: extendid_t
      files/directories: inums

Typical goals of distributed object systems:
  transparent RPC for object methods
  avoid explicit object handles (like strings for locks, etc)
  automatic association of relevant server w/ object ref
  allow passing of object references as arguments
    not just to object's home server (as in our lock server)
    even to other client hosts
  distributed GC, needed for remote refs

YFS RPC vs Java RMI
  RMI gives marshalling/pickling for any type, not just primitives as 
    YFS RPC does.
  RMI forces methods to throw exceptions, whereas RPC just returns error code,
    although both give at-most-once semantics
  RMI has uniform naming scheme -- everything is an object reference,
    and methods of a Remote interface implementer are all registered.  YFS RPC
    makes the user name objects by using intelligent strings, and the user
    registers functions without attaching them to objects/references.

The idea is thus to just pass object references around, and call by reference
  on them, rather than calling by value and copying entire objects around
  the network.

Situations in which one client might pass remote object ref to another?
  lots of modules: shopping cart, item db, checkout, front end
  would this be useful in YFS?

Are there other approaches?
  distributed shared memory, would allow direct access to object data
  move the object to caller

Some designs:
  Network objects
  CORBA
  Java RMI
    Our focus (see paper)

So in client, you need a stub for each object, which must also contain an
  endpoint name (dns name:port number) and a remote object number (which
  is what the machine will refer to it as).  Once the endpoint is reached,
  it returns a URL to the bytecode for the stub, so that you can know its
  interface.
  
So a true reference to a remote object is an endpoint address/port, and a
  bytecode URL

first a simple call/return
  o = ???;
  o.fn("hello");
  which server to send to?
  what object on server?
  what about "hello"?
  what does RPC message contain?
    - classid, functionid, objectid, marshalled "hello" (passed by value)
  how does RMI s/w on server gain control? thread...
  how does server find the real object?
  where does server-side dispatch fn come from?

what does a stub object look like?
  type?
  contents?
  where did it come from?

is there anything special about the server-side "real" object?
  - it's called by a skeleton on the server, which is a dispatcher that
    takes the RMI call message (classid, functionid, etc...), and
    converts it to a call to the implementation.
  - the code written by the programmer just has to declare the implementation
    extending RemoteServer, and make all remote methods throw a RemoteException

how about passing an object as an argument?
  o1 = ???;
  o2 = ???;
  o1.fn(o2); // instead of "hello" we pass remote object o2 as argument
  what must o2 look like in the RPC message?
    server host, object ID, bytecode URL 
    that's passed instead of the pickled version of a local object
    this is thus a pass by reference.
  what if o1's server already knows about o2?
    must have a table mapping object ID to ptr to o2
  what if o1's server does not know about o2?
    where does it get stub type, implementation? the URL

there are probably type IDs, so client can re-use stub/skeleton code
  an object ID must contain type ID, or an RPC to fetch it
  clients and servers must have tables mapping type ID to stub code

when can a server free an object (garbage collection)
  only when no client has a live reference
  server must somehow learn when new client gets a reference
    reference/dereference messages are sent to the object server, and
    the server keeps a reference count for each object
  and when client local ref count drops to zero
  so clients must send RPCs to server to note first/last knowledge

what if a client crashes?
  will server ever be able to free the object?
  Java RMI will count an object as dereferenced if the client hasn't
    touched the reference in a while (lease-based system).  This is because
    the client may fail before properly dereferencing the object.
  If it was a network partition instead of machine failure, then if server
    garbage collected the object, then client will get RemoteException upon
    trying to access object again
    
Note---it's hard to see exactly where you benefit a lot from the ability to
pass object references between clients.  So in the end, the object-oriented
part is useful/interesting, but passing references around isn't immediately
useful, since you probably work through a server to intermediate passing
ownership around anyway.