Optimizing Distributed Objects

Influencing another researcher or company is about the best that can happen to a piece of academic research. The proposal in my paper on Safe Query Objects was implemented and shipped by db4objects as a key feature of their product, within 6 months of publishing my paper.
They are an amazing company, and they even gave me a little research grant to do more work. Thanks!

More recently, my paper on Web Services versus Distributed Objects:
A Case Study of Performance and Interface Design
involved some manual transformations that were hard to do. Eli Tilevich picked up the idea and implemented it as an extension of RMI. We started working together and the design evolved into a full paradigm for optimizing remote object systems. We wrote it up as a nice paper, called Explicit Batching for Distributed Objects. The idea is that client proxies don't perform remote communication; instead they just create a batch of operations. All the method results are futures, because the calls haven't been made yet. But it looks pretty much like an ordinary object-oriented program. When the client calls flush, the operations are performed and the futures filled in with results. We evolved the design by two more iterations, to handle batched operations on remote collections, exceptions, and chained batches, or a series of batches that share common state. Each client can optimize their own server access, without any custom server objects (as required by the Data Transfer Object pattern). Amazing!

The result is so obviously better than RMI, in terms of managing latency and optimizing communication, that it is hard to imagine why nobody has tried this before. I keep thinking somebody probably did (and there is some related work that is similar, but not complete). As far as we can tell its new.

It might actually make RMI be able to compete with Web Services :-)

2 comments:

Christian Plesner Hansen said...

A really interesting paper!

One weakness in your approach seems to be that you can only "batch" straight-line code. The examples in the paper where you need chained batches are caused by the need to do simple iterative or conditional execution, which has to take place on the client. This is especially expensive in the second example in section 3.5 where you have to transfer an array of all file dates to the client so it can iterate over them, and then transfer an array of "delete" commands back in the second batch. (It also seems like there are some synchronization issues with the code, if another client accesses the file system in the meantime, but I don't know if that's a concern.)

It would be interesting to see how this paradigm could be could be extended if the current closure proposal was implemented. The proposed closure mechanism allows you to define custom but native-looking control structures. That could be used to implement a "remote-if" and "remote-for-each" statement that could be translated into conditional and iterative commands in a batch. Also, if more behavior could be packed into a batch it might make sense to make batches more like remote transactions.

William R Cook said...

Christian, thanks for the comments. You are right about this; my coauthor is working on conditionals, but it might be too messy in Java. We might have a better time in C# or Scala. I am not sure closures would work, if you are thinking of mobile code, since we are avoiding that. However, something like a monad might work.