6.824 Lecture 4: Distributed Programming: Distributed objects Last lecture: two designs for writing parallel applications on distributed computer systems. Today we see another design for distributed programming: distributed objects, which is targetted to writing client/server applications. Outline: RPC + threads - how do they interact? Why is RPC not sufficient? Distributed objects goals Java RMI =Threads + RPC= say you have code test() lock(m) rpc_call(...) unlock(m) is that reasonable? what can go wrong? more concretely: lock(m_list) for (iter i = list.begin(); !end_of_list; i++) i = rpc_call(...) unlock(m_list) problems: 1) distributed deadlock---what if server has a callback to you, but they can't update the list, since the callback is in a different thread, and can't aquire m_list. 2) rpcs are slow, so you kill concurrency even more There are many ways to get around this, but essentially, you should be aware of the race conditions or deadlocks that can come up with local locks and distributed rpc calls. =RMI= Why is RPC not sufficient? Let's look at YFS RPC (admittedly a bit primitive, but nevertheless): programmer has to write stubs one request may cause a handler to be invoked many times few data structures can be passed to the client or server for example, can you pass a C++ object? a pointer and dereference it remotely? programmer must design a scheme for naming remote objects for name design, we need to maintain mapping from name to object locks: string name extents: extendid_t files/directories: inums Typical goals of distributed object systems: transparent RPC for object methods avoid explicit object handles (like strings for locks, etc) automatic association of relevant server w/ object ref allow passing of object references as arguments not just to object's home server (as in our lock server) even to other client hosts distributed GC, needed for remote refs YFS RPC vs Java RMI RMI gives marshalling/pickling for any type, not just primitives as YFS RPC does. RMI forces methods to throw exceptions, whereas RPC just returns error code, although both give at-most-once semantics RMI has uniform naming scheme -- everything is an object reference, and methods of a Remote interface implementer are all registered. YFS RPC makes the user name objects by using intelligent strings, and the user registers functions without attaching them to objects/references. The idea is thus to just pass object references around, and call by reference on them, rather than calling by value and copying entire objects around the network. Situations in which one client might pass remote object ref to another? lots of modules: shopping cart, item db, checkout, front end would this be useful in YFS? Are there other approaches? distributed shared memory, would allow direct access to object data move the object to caller Some designs: Network objects CORBA Java RMI Our focus (see paper) So in client, you need a stub for each object, which must also contain an endpoint name (dns name:port number) and a remote object number (which is what the machine will refer to it as). Once the endpoint is reached, it returns a URL to the bytecode for the stub, so that you can know its interface. So a true reference to a remote object is an endpoint address/port, and a bytecode URL first a simple call/return o = ???; o.fn("hello"); which server to send to? what object on server? what about "hello"? what does RPC message contain? - classid, functionid, objectid, marshalled "hello" (passed by value) how does RMI s/w on server gain control? thread... how does server find the real object? where does server-side dispatch fn come from? what does a stub object look like? type? contents? where did it come from? is there anything special about the server-side "real" object? - it's called by a skeleton on the server, which is a dispatcher that takes the RMI call message (classid, functionid, etc...), and converts it to a call to the implementation. - the code written by the programmer just has to declare the implementation extending RemoteServer, and make all remote methods throw a RemoteException how about passing an object as an argument? o1 = ???; o2 = ???; o1.fn(o2); // instead of "hello" we pass remote object o2 as argument what must o2 look like in the RPC message? server host, object ID, bytecode URL that's passed instead of the pickled version of a local object this is thus a pass by reference. what if o1's server already knows about o2? must have a table mapping object ID to ptr to o2 what if o1's server does not know about o2? where does it get stub type, implementation? the URL there are probably type IDs, so client can re-use stub/skeleton code an object ID must contain type ID, or an RPC to fetch it clients and servers must have tables mapping type ID to stub code when can a server free an object (garbage collection) only when no client has a live reference server must somehow learn when new client gets a reference reference/dereference messages are sent to the object server, and the server keeps a reference count for each object and when client local ref count drops to zero so clients must send RPCs to server to note first/last knowledge what if a client crashes? will server ever be able to free the object? Java RMI will count an object as dereferenced if the client hasn't touched the reference in a while (lease-based system). This is because the client may fail before properly dereferencing the object. If it was a network partition instead of machine failure, then if server garbage collected the object, then client will get RemoteException upon trying to access object again Note---it's hard to see exactly where you benefit a lot from the ability to pass object references between clients. So in the end, the object-oriented part is useful/interesting, but passing references around isn't immediately useful, since you probably work through a server to intermediate passing ownership around anyway.