Longevity
Here's a talk I hope some people from the WITSML/PRODML community sit in on.
In the end analysis, success for architecture is measured by its ability to assimilate changes in mission, implementation, interconnection, and scope without the need for incompatible changes. Put succinctly, 20 years into an architecture, success is measured by the ability of systems implemented on Day One to interoperate unchanged with systems implemented on Day 20369.
How do you think WITSML has measured up in only five years or so?
The WITSML SIG has paid lots of lip service, but expended very little real effort on, interop. The attitude of vendors and customers alike has been, "We will upgrade the planet wholesale to each new rev". We are reaping what they've sown: still haven't run into a 1.3.1 server in the wild, have you?
WIXP
I've formalized a spec for how to exchange WITSML data over HTTP:
WIXP. This is how Wellstorm does it. Consider it a draft, but it's how we've implemented it. We expose the standard WITSML SOAP interfaces, but for integration work we use "plain old" HTTP. (We expose other services too, not defined in that spec).
WITSML servers should make data available using this simple protocol, making WITSML service implementation
"so easy that mud logging shops with a one man programming staff can do it". It would be good for other servers to make their data available, and even updateable, using plain old HTTP. The protocol I outlined just requires that clients understand a Repository Description Document, and XML document describing the types and URLs of the WITSML objects it contains.
This HTTP protocol doesn't replace the WITSML SOAP API. It just makes the data available to lots of clients, and from lots of servers, who have no need for the query and update language defined by the WITSML API. If you are interested in making your WITSML data available to more clients, everyone is free to use the protocols in that spec, and the more we all do it the same way, the better. So push back to me with whatever requirements you need, because that spec has been driven by our own particular experiences.
Woefully Underspecified
Gary Masters of POSC
circulated an email today, beginning "I was trying to create an update template and I realized that the behavior is woefully underspecified."
The first problem is the WITSML group has attempted to create a query and update language, but has done that in an ad hoc, unrigorous manner.
The second problem is that no amount of tightening up the query/update template spec is going to make WITSML widely interoperable. Just look at the rest of Gary's email, with its mind-numbing eight point plan to rationalize the update template. Forget the servers -- this is about three orders of magnitude too much complexity for clients to implement. Hell, half the WITSML client programs out there are still hung up on "Get all the logs for a well."
If you want interoperabilty, and I mean the kind of interop so easy that mud logging shops with a one man programming staff can do it, then we should focus on
getting/replacing/deleting documents -- not all this complex query and update behavior crudding up WITSML.
Most WITSML document types -- the non-growing ones, like well, wellbore, rig -- are so small, all you need are HTTP GET, PUT, and DELETE. Want to update a document? Download it with GET, modify it, and replace the whole darn thing with PUT.
Looking for a WITSML object? Start at the top: GET a list of wells, follow hyperlinks to a list of wellbores, and then follow hyperlinks to a list of objects in the wellbore, and locate the object you want. Every WITSML client program displays that tree in the left hand pane of a window! But to build that tree they make an awkward series of WITSML queries. The query template syntax is also woefully underspecified, but no matter, because 99% of all WITSML queries are "Find me all the [logs, trajectories, or whatever] in this wellbore."
To make a document oriented strategy work perfectly, we need to address the growing objects: logs, mud logs, well logs, trajectories. A time log of surface measurements at 1 s intervals can get unmanageable quickly. It's impractical to GET a whole time log, add a line of data to it, and PUT it back.
For appending data the best approach is to permit POSTing data to the object. This is what
Wellstorm does. Want to add a
element to a log? Just POST it to the log's URL. You can also POST a realtime to a log and Wellstorm does the right thing. Apply the same pattern to or any other repeating element in a growing object.
What about retrieving ranges of data? Well, you don't need it. Of all the WITSML objects, only log objects typically grow so very large they need some management over the life of the well. Solution? Make smaller logs. Depth logs aren't a problem, but for time logs: Close off the log and start a new one in that wellbore each day. There are 86,400 seconds in a day. That's how many lines you'll get in the largest log file, in a day. That's a manageable size for downloading as a whole. Let client programs worry about selecting the portions of interest.
Wellstorm already implements all of this. Naturally we expose the standardized and "woefully underspecified" WITSML interfaces. But we also publish the straight HTTP interfaces I've described here. When doing customer integrations, we use the HTTP interfaces. And we encourage all vendors to expose simple HTTP interfaces as we have. A little bit of standardization would help, but the spec for interchanging data this way would be fewer than ten pages.
The benefit would be that we will tremendously lower the barriers to exchanging WITSML data.
WITSML and Concurrency
Using SOAP as the packaging for WITSML messages adds nothing and gives up a lot, compared to using HTTP/REST. A customer integration revealed a concrete issue.
Our client wants to collaborate with a partner on the rig. Our client maintains
well
,
wellbore
, and
log
objects in Wellstorm. The partner would monitor the log data, do some analysis, and then modify a custom element in the wellbore object -- call it "targetHookLoad". Our client, or drill floor personell, would react to that changed number.
In that particular case, no problem. The partner calls WITSML UpdateInStore on an element that our client never modifies.
This is a cool idea: using WITSML objects as a medium of communication. But consider what happens in a more general case.
Say more than one partner on the rig needs to modify the shared element.
Company A's analysis indicates they need to increase the target hook load by 10%. Company A calls GetFromStore to obtain the current value, adds 10% to it, then calls UpdateInStore to change the value.
Company B, meanwhile, in its own analysis, computes another target value. Between the time Company A reads the value and the time they update it, Company B modifies it. But Company A's update is now based on out of date data.
In practice we'll have workarounds. Aware that the WITSML API has no provision for locking objects or concurrency management, we'll contrive new custom data elements for each service company's exclusive modification. Not a big deal by itself.
But compare this omission to a well designed application protocol like HTTP. Note the words "application protocol": HTTP is a protocol you can use to design a distributed application. The designers addressed dozens of issues in distributed application design, based on hard experience as the web emerged in the 1990s. Among the issues they addressed is concurrency management.
In HTTP applications use
ETag,
Last-Modified,
If-Unmodified-Since, and
If-None-Match headers to ensure they never modify changed data. For example, when a client retrieves a resource, it checks the Last-Modifed date/time header. When updating the resources using PUT, the client puts that date in the If-Unmodified-Since header; the server will not update the resource if it has changed since that time.
It's not that WITSML couldn't address each issue in distributed application design. WITSML committee could add new concurrency management parameters to calls. But that would be reinventing the application protocol. There's already a great application protocol out there we can leverage: HTTP.
Finally, new content at wellstorm.com
Finally got the graphics team to emit the new art describing Wellstorm's
WITSML server, so there's a lot of new content up at wellstorm.com. For those who have been waiting!
WITSML not "Document Oriented"
Some more refining of a subtle problem with WITSML. The 1.30 API tightens up the specifications of how servers handle update requests on growing objects like logs. Good! The server's responsibilities are a little clearer now.
The problem is that the server has responsibilities. The server has to understand what a log object is. If you append some <data> elements and only some curves are present, the server must fill in empty curves with the null value. Probably the server must also update the start and end indexes as another side effect (still never explicitly stated in the API).
Even if we someday exhaustively specify server responsibilities, these complex responsibilities are bound to be implemented differently by each company, and incorrectly in many cases. It's an interoperability nightmare.
Contrast that to a document oriented approach: a client would retrieve the data it needs, modify it as neccessary, and update the log object. No side effects: the client submits fully correct <data> elements, including nulls for the omitted curves; it may update the start and end indexes. In this scenario, the client's expectations are never disappointed! Any mistakes are the client's responsibility. A really good server would enforce integrity and prevent clients from making inconsistent changes. Yet a poorly implemented server that did not enforce that integrity would still work well with a well-behaved client. (And anyway, integrity requirements on one server may be quite different from those on another server; under document-oriented approach, it is fine for integrity constraints to be server specific).
In case you're wondering about efficiency: the document oriented approach doesn't mean you have to send complete log objects back and forth just to update. You can still use an update protocol as we do now. It just shifts responsibility for performing complex transformations on data, to the guy who wants to do them -- the client.
Document orientation is a shift in outlook that would make servers less complex to build, and more widely available.
Versioning, Messages, and Models
There's a dangerous idea creeping into WITSML services. It smells a little like the problems I outlined in
Messages not Models: Is WITSML morphing into a data model?
A virtue of WITSML has been its document-oriented focus. But now the discussion at the last SIG concerned delivering WITSML documents to clients understanding only legacy versions of the schema -- in particular we expect a 1.20 client connecting to a 1.30 server to receive version 1.20 documents.
The creeping dangerous idea here is that there is server that has abstracted the data and stored it in some version-independent model. And now there neccessarily are some canonical mappings that need to be promulgated. And we are a hair's breadth away from defining a logical data model for well data.
A purely document oriented approach would regard the WITSML documents we send around as, well, documents. If you go to a web site and retrieve a PDF file, you don't ordinarily expect to be able to ask the server to convert it to MS Word format before delivering it. The document is just the bits as someone put them up there, and the server has no idea what they are.
We should think of WITSML documents the same way. If I insert a WITSML version 1.20
well
document, it's reasonable to say that anyone retrieving that document should get the version 1.20 document -- not the contents of that document mapped -- possibly with losses or errors -- to version 1.30. For the same reason, it was a mistake to require WITSML servers to perform units of measure transformations for a client.
That said, it is still possible, but not required, for web servers to perform media type transformations, using
content negotiation. If the server has several different formats in which it can deliver the content, it can supply these choices to the client. That would be an appropriate way for mutually consenting clients and servers to agree on the format of a returned document. Yet another idea from web architecture we should incorporate into WITSML (cf.
WITSML+REST).
Point: We should be thinking about documents, not models, as the resources we are manipulating on the server.
WITSML+REST
Last month I
promised some more depth on using WITSML RESTfully. I've
posted a draft of a RESTful WITSML protocol. If you're a WITSML implementor you should read that draft, and please leave comments about it on this page.
The protocol reproduces and enhances all the capability in
the present WITSML SOAP API.
From the conclusion of that draft:
Servers can expose WITSML services using the principles that made the Web the most successful distributed application in history. Doing so increases the value of WITSML because users can leverage existing web technologies, like browsing, bookmarking, linking, and forms. Furnishing WITSML data objects with URI enables a host of new functionality. For example, semantic web descriptions of resources require URI; it will now be possible for a company to inventory metadata about the WITSML data objects it owns.
Wellstorm WSPpresently implements some, but not all, of this protocol, in order to support web clients -- but we're working on it. It's very easy to adapt existing server code to execute this protocol, since it conforms well to the semantics of the SOAP API. Let's hope more implementors will do so.