Friday 29 August 2008

am loving Beyond REST and the PIMP protocol

I really enjoyed reading Joshua Schachter's post beyond rest today - and particularly the comments. The basic idea is how to get a kinda publish/subscribe system to work on the web for processing real time updates to things like social sites at internet scale without introducing some really complex new protocol; but reusing the lovely RESTful web endpoints.

I posted a comment in the thread but figured it was quite big so figured I'd post it here too :)

I'm really liking the PIMP protocol and like Sam's strawman of using caching headers to implement it...

My suggestion is to think of this instead as another form of caching. All we really want is a header that tells the server that we are interested when a particular resource has been updated and how to tell us. The server can then either understand that header and acknowledge in the response that it will notify me. Here is my strawman:Request:
X-Cache-Callback: OK

Then if that resource is updated the service is expected to either HEAD the callback as a notification or POST the new contents of the resource, servers choice. You could later add semantics for merely updating the resource vs replacing it wholesale. I would also think about adding the ability for the server to specify a timeout after which you are free to poll again if you haven't heard anything on the assumption that sometimes the service may lose the state associated with your subscription.

Am thinking rather than returning OK the server returns the amount of time before the client has to re-issue the subscription to keep it alive. So the server can decide the maximum subscription time. Good PIMP servers (PIMPS :) might wanna make this quite long to reduce the polling overhead.

I also love the simplicity of the HEAD or POST to differentiate a notification of change to a notification-with-the-data.

I've long wanted a 'SUBSCRIBE' verb in HTTP for doing this kinda thing; but I think your cache-header approach is cleaner - as folks can either keep polling and/or subscribe for the update notification.

The nice thing too is that it allows easy migration to PIMP without introducing any overhead or new traffic; that clients continue to poll as normal - but they advertise themselves as being PIMP aware. Then eventually when one day the server becomes PIMP aware the clients receive their notifications (and then hopefully they scale back their polling :) - otherwise they can stick to polling.

Also webmasters can monitor their traffic looking for PIMP headers to know when it'll make sense to upgrade to PIMP. Not everyone is gonna need PIMP and it'll be a no brainer from looking at your logs to determine both when you've sufficient mass of PIMP enabled pollers along with knowing what the reduction in polling traffic upgrading to PIMP would save you.

I'm with Sam in the thinking of this as another form of caching. In implementing PIMP some folks might be able to create update notifications internally in their system when resources change to push out change messages into some kinda queue for posting to the callback URL. This would involve significant work for many sites though.

However it'll surely be pretty trivial to just install a PIMP-enabled caching web proxy inside your data centre in front of your servers - that does the usual cache thing, but also detects these extra PIMP cache headers and does a background poll of resources (respecting your existing cache & time to live headers) to detect changes both to update the cluster of front web caches (so non-PIMP pollers get more real time data) but also to drive the pushing of updates out to PIMP subscribers.

i.e. I can see this as a pretty easy upgrade to most web sites - folks just update their front end web proxies to a PIMP-enabled version and hey presto you can now support PIMP consumers. Am sure the web proxies could include an XMPP firehose too pretty easily for heavy hitters.

It should be pretty easy to hack the web proxies to do this I'd have thought? Even the problem thats been noted earlier in this thread - of trying to push updates to a URL endpoint might be slow, unresponsive or unavailable - the web proxies have to deal with already right in case a *local* server is borked.

Anyway - its a very interesting blog post and particularly the comments. Interesting stuff.