Java Code Trips & Tips: June 2009

Working on a GWT project that requires fairly frequent RPC calls to the server to exchange data I was confronted with the following problem:

the data set itself was likely to be of fairly large size -- not massive (we are not talking tens of MBs) but not smaller either: to give a rough estimate, assume anywhere up to 500-600 kB;

in addition, the time taken to 'process' the full dataset (both on the server, but, more importantly on the client) was likely to be such that a user would be faced with frequent, annoying interruptions;

however, the actual ('incremental') changes to the data are very likely to be minimal: roughly, on a data set that may contain up to several '000s entries, probably less than 10 (and usually only 2-3) would be changed between round-trips.

I was therefore rather surprised to find that nobody seemed to have already implemented a fairly simple solution to the problem: a DiffSet<T> that essentially 'captures' the incremental changes between two Set<T> as a sequence of Commands (which I called DiffCommand).

If you are not familiar with the Command pattern, I suggest you have a good read of the Head First Design Patterns book.

I have thus created a few classes, that implement exactly this behaviour, allowing client code to 'Compute a diff' on two arbitrary Collections, one being the original dataset, the other the modified one.

'Executing the diff' on the original will re-generate the new, modified one.

Doing this on the same pair of collections would be so close to pointless to border insanity (one definition of which is expecting a different outcome from the same behaviour) -- the true value is when you have a Client / Server setup and a 'wire protocol' (such as GWT RPC) over which you need to efficiently transfer large datasets:

the Client requires the data set to get started -- the Server computes it, stores it somewhere (let's call this checkPoint) and sends it over the network to the Client:
```
       Set<DataRecord> checkPoint = server.getData();
```

the client stores its own local copy of checkPoint, and creates a new Collection (let's call it localCopy) that following a number of user interactions (or whatever), will be modified (records will be added, deleted or edited):
```
       Set<DataRecord> localCopy = new HashSet<DataRecord>(localCopy);
```

at some point (for example, after an auto-save timer expires) the client will compute the DiffSet between localCopy and checkPoint, and send the difference to the server:
```
      DiffSet<DataRecord> diff = new DiffSet<DataRecord>();
      diff.computeDiff(localCopy, checkPoint);
      server.update(diff, callback);
```

on the server side, we will have stored somewhere safe our checkPoint, which we will now update with the diff'ed data:


     public boolean update(DiffSet<DataRecord> diff) {
       Set<DataRecord> original = retrieveCheckPoint();
       diff.executeDiff(original);
       boolean success = saveCheckPoint(original);
       return success;
     }

finally, in the client's callback (see GWT's RPC mechanism and callbacks) update checkPoint in case of success, or alert the user if things went pear-shape:


     AsyncCallback callback = new AsyncCallback() {
       public void onFailure(Throwable t) {
         // display 'user error'
         GWT.log(t.getClass().getName()+" was thrown because of " +
            t.getLocalizedMessage(), t);
         view.alertUser("Server Error", "Failed to save data on server ");
       }

       public void onSuccess(Boolean result) {
         if (result)
           controller.updateCheckPoint();
       }
     };

There is a lot more info in the javadoc and you can see some typical usage in the associated unit tests: the code is hosted on bitbucket, and can be retrieved using Mercurial:

$ hg clone http://bitbucket.org/marco/diffset/

Java Code Trips & Tips

Thursday, June 11, 2009

A DiffSet generic collection

Blog Archive