Rubyssimo: 2007

Monday, July 23, 2007

Another Ruby Scoping Shocker

Try this:

i = 10
5.times do |i|
end
i

What would you expect the value of i to be? Coming from a language like Java with its nested variable scoping, I would expect 10. But it is not so in Ruby.

The result is 4, the last value of the variable passed to the block.

Friday, June 29, 2007

A Thought on Rails and Web Services

A very simple thought occurred to me the other day about Rails, REST, and Web services. The WS-* platforms had sidestepped web application frameworks when it came to the business of developing web services. At the most, you might tunnel service requests through a single controller before passing them on to whatever was processing the WS-* protocols. All that work you might have done in crafting your web application didn't mean much when it came to the Web service. In fact, you would have to start all over. It is not just a matter of work, of course, but of skill sets. WS-* spurned the lowly web developer. You gotta read 15 books and take 20 classes before you're up there with the big boys doing enterprise service stacks.

It seems to me that REST and Rails are an effort to grab mindshare about whether the web application framework should have something to do with a web service. I find it interesting that it took us this long to hit upon this simple proposition. For example, there is nothing to a prevented earlier WAFs from having tried a similar grab at mindshare. REST on Struts could have been the hype five years ago. In fairness, the Java frameworks tend to try to specialize and see having only a single purpose as a virtue. I don't know much in Java open-source that aims to be as pervasive as Rails.

Of course, Java had other reasons for fighting REST. Web services are a direct competitor to EJB as a means of enterprise interoperability. And dropdead simple web services rout EJB for all but a few use cases. What is more, using frameworks, not conventions, is the Java away.

Java always took a particular attitude toward simplicity. In the Java world, simplicity was code generation. Java products often advertise themselves by how much code they can generate. In my world, simplicity is emphatically not code generation. Code generation is a sure sign that something has gone seriously wrong with my tool set. Generated code (until the days that code is generated by a bona fide AI) is always somehow a conceptual redundancy in an application. And redundancy is going to become a maintenance burden.

It is quite simply very exciting that we are now moving away from the idea of adopting standards for the sake of the tools we can fetch to help observe them, and that we are moving towards the idea that standards are wonderful because of how they allow us to participate in a service and software ecosystem that has been built around them. One was the vision of big software vendors eager to make large-scale sales. The other was the vision behind the W3C protocols.

Wednesday, May 30, 2007

ActiveRecord Validations Gotcha

Adding a validation to an ActiveRecord model over live data can cause sudden inexplicable brain-shattering headache. Why?

If the new validation causes existing rows to be invalid, any attempt to update attributes on the object without correcting the offending attribute will fail. This will happen, even if an html form that fronts the model doesn't provide an opportunity for your user to modify the attribute. And, boy, that can disorient your users. Any part of your application touching that row will become unwritable.

ActiveRecord's tight policy makes you think hard about what you're doing, which might be a good thing. It's hard to see alternatives that maintain ActiveRecord's simplicity. If your application never modifies rows outside of user-generated CRUD and if each model object corresponds to a single form, then you should be okay. But plenty of good application designs cannot meet both those criteria.

The best alternative behavior I could think of would be a "partial" validation. That is, a validation check only occurred on attributes that had been modified on the object, the "dirty" attributes, so to speak, letting other invalid attributes flyby. Eventually, an invalid row is going to kill you. After all, there is a reason it is invalid.

And while I prefer a noisy failure to a quiet failure, I'd rather not my application lockup and my users bear the burden of my oversight. I can even imagine a validations mechanism that strictly enforced partial validation, but performed full validation, alerting the developers in some loud cantankerous way if full validation failed but partial did not.

Depending on the size of your database, it might be a good idea to database validity during a migration. You could iterate through all rows, calling Active Record#valid? on each, and raise an exception failing on a validation failure. It could report or enact some other policy. Consider adding a migration for each such validation. A validation that invalidates existing rows is really no different than a database migration. After all, in olden times, much of the constraints expressed in validations were expressed in the database schema itself.

Monday, May 28, 2007

REST and Transactional Web Services

There's a long way to go before distributed programming becomes as safe and as simple as non-distributed programming. But if the Web is going to meet its potential of interoperable services locking together to create seamless, smart software, we need to get there. And here's something we need: distributed transactions.

One shortcoming we've found to Amazon Web Services has been this: we can send something to S3, to SQS, and then to our database, but while we can rollback our database, we cannot rollback Amazon. There is a danger that an application operating over these distributed services can be left in an inconsistent state, and that's just no fun for users. EJB was designed to solve such problems with distributed transactions, but the cost of that was, well, EJB.

How does REST clarify this picture? REST gives us a uniform, simple way to describe web services. This makes it easy to describe one Web service call to another Web service. In fact, it makes it easy to describe any RESTful Web service call to a Web service. Yes, the meta-web service! We need only supply a verb, a URL, and maybe a few headers. Signature-based authentication makes it easy to authorize a service to make a call on another without releasing your credentials. With this, I can imagine Web service filtering or Web service tunneling: calling one service through another.

Here is the Web service I would love to see Amazon develop next: a transaction service. There are many, many ways to do this, but here's just one proposal. A client can create a transaction resource with Amazon, describing all the service calls comprising the transaction. Then, Amazon can act as a coordinator for a 2PC (or 3PC) transaction protocol. Any service that can work within such a transaction, of course, has to implement the protocol.

This has some disadvantages. For one, the client wouldn't be able to change state based on the responses of intermediary service calls. For another, it's hard to see how the client could commit the transaction synchronously (And without synchronous response, much of the ease of use of the Web is lost).

For the first problem, I can imagine another way of working this without tunneling. This involves splitting up coordinator duties in 2PC. The transaction client can herself make calls to cohorts during commit-request phase, then letting the transaction service takeover as coordinator during the commit phase.

As for the second problem, creating a long-running task is pretty easy for human clients. Just give them a callback for their browsers to keep polling. BackgrounDRB works like this. For the machine-readable Web, a callback URL to submit check transaction status changes is easy enough. Another alternative is for the service to expose a REST transaction resource which its consumers can poll.

What more is needed?

(1) Some way to describe transactional status within REST resources.

(2) Some way for applications to implement transactional undo/redo across network latencies without locking too many resources. This part is difficult. Database transactions really aren't up for being kept open for long periods of time. We may need new tools to make this happen or at least a better understanding of the design and performance issues involved.

Saturday, May 26, 2007

The Web: Waiting for the Other Shoe to Drop

The philosophy behind REST design has always struck me as having this back-to-the-basics feel: Let's use HTTP verbs as they were originally intended. It's not the first time the community has looked across the convoluted territory of HTTP and felt a reformist impulse. After all, back in the late 90s, there was a lot of talk about how XHTML + CSS would help bring machine readability back to web pages by separating content from layout. The Semantic Web was supposed to further this by providing an ontology to this clean, machine-readable content. Even WS-* web services were supposed to bring about this magical world of seamlessly interoperable machines all plugging into each other.

REST brings new hope. We have a clean, uniform interface for clients to interact with one another. Tools like ActiveResource help reinforce the idea that the interface to a distributed resources can be just the same as the interface to local resources. But which clients should interact with which services and why clients ought to interact with them remain questions answered only by human intervention. What if we can find a layer of abstraction to help chip away at this problem? Actually "solving" the problem might be the realm of AI fantasyland. But I think we can begin to chip away at. And that's progress.

We've already had something basic like this in the past. For example, back in the day using the Web required finding a list of servers that implemented a particular Internet protocol: gopher, IRC, etc. Interoperability occured at the level of protocol. Can we make interoperability happen at the level of semantic description of service with REST?

This is the thought that occurred to me reading Richardson and Ruby's excellent RESTful Web Services. As they point out:

Computers do not know what "modifyPlace" means or what data might be a good value for "latitude".

Well, why not? A computer can certainly know what a good value for an email address is. We take it for granted that a compiler knows the difference between an integer and a string. Why not latitude? We just need one more layer of abstraction above what we have today, before clients can consume services without having been programmed in advance.

It prompts the question: how distributed do we want programming to be? I can imagine the ontology itself being a Web service. If a client did not know what latitude was, it could just look it up. What would the service provide? I can imagine lots. An ontology, validations, algorithms, related services.

REST is real progress, because it simplifies things. Hopefully we'll be spending less time wiring together our Web service stacks and more time thinking about the big picture.

Tuesday, May 22, 2007

The Document Abstraction

There are two ways to go with an Office 2.0 application. You can either use the familiar document abstraction, where the user makes changes to the document within the browser and finally commits them with a save. This is the familiar Office workflow, with the document being kept in JavaScript memory. The alternative is to break the document abstractions somehow. Basecamp is a great example of this. You do not work with a single monolithic document. Instead, you make and commit small changes, one change at a time.

From the point of view of building a web application, the second way is the much friendlier way to go. Easier on the browser. Easier on the database. But it's not really feasible for applications competing against the word processor or the spreadsheet. I suspect that those using the document abstraction are eventually gonna run into problems with very large documents. After all, there are going to be costs to tunneling through HTTP and browsers. Those tunneling costs are likely to surface as performance issues. We all know how some browsers begin to choke on very large pages. Toss big pages together with lots of data kept in JavaScript and you have a recipe for trouble.

There is a technical opportunity here, I feel. The great thing about the browser is that it is a universal platform. The terrible thing about the browser is that it really wasn't designed to be an application platform. If a developer wants to build a document-centric Web application, she is still missing that sweet spot of tools and best practices to make things work out. Is Flex the answer? Is Java WebStart?

Tuesday, May 1, 2007

Unit Testing and Purity

Tim Lucas has a fine article on mocking and Rails testing which touches on some themes that I also hit on in an earlier post.

There is a tendency towards keeping functional tests pure in Rails.

Now, I find myself in a different position. I need to exercise my code as much as possible with the little time I have, so I like to get lots of testing bang for my buck. That means that the pragmatist in me wins over the purist who'd like to see each strata tidily tested in its own appropriate testing layer which does not so much as touch the code stink of another layer, let alone the putrescent code fart-bomb that is the database.

Frankly, I regard it as one of the strengths of Rails that I'm again close to my database. Unlike my magnificent Tapestry+Struts+Spring+Hibernate architectures of old, I'm again within earshot of something that actually has implications for my users, no longer in that level of coding hell where I was which is testing Data Transport Objects and testing the XML configuration of my DTOs and testing my database schema declaration so that it would not be altered while I was busy testing all those other things that I had to be testing to, ya know, save a record in a database -- all of course in perfect TDD abstraction from my database, database connection, web controller, and views, and pretty much in abstraction from the 6,000 things that can and will go wrong. But at least I know my DTO code is impeccable! A bullet-proof POJO! My business logic is flawless, portable. Oh, wait. I didn't write any business logic, which is properly abstracted away into a business logic container framework. Whatever. Come break me!

Testing purists say the solution to this is just more tests, and they are certainly right. The problem is that a small startup simply doesn't have the resources. All this code does come at a cost. Can we do things less purely but more efficiently?

What is crucial to a smallish application is code coverage, not purity of testing style. This is particularly true for uncompiled languages, where there is no such thing as compilation to give you even a smoke test on your code.

Five layers of well-segregated tests are just great, but one layer of impure tests is far, far better than any number of layers of pure tests where some layer somewhere has gone uncovered.

But this alternative approach of a single round-trip from user to database and back has costs of its own. You pay a price in test fragility. Fragility is a symptom of concerns that are improperly coupled.

Lucas has hit on this problem: Rails controller tests basically change each time you change your validations. Repairing them involves duplicating validation concerns within controller tests. Fixtures seemed originally intended to resolve this problem, but they don't lighten the load any, they just hide it off in another file, which can become a curse of its own.

He proposes using mocha to stub out ActiveRecord objects during controller testing. Problem is that views will make calls on all kinds of properties, and stubbing each property just recreates the same coupling problem that stubs were supposed to get around.

One solution is to turn rendering off, but it's just very very useful to have something exercising the rendering code as a basic sanity check.

So my suggestion? What if each model object kept a class method for creating a single valid instance of itself? Something like Object.fixture? At least the responsibility then remains with the model object itself, close to the declaration of its validations. And the controller tests stay uncluttered. You can change properties on that single instance in your test itself, if that particular value is what you are testing. This way, the controller tests do not break. And if you add validations, changes only need to be made in one other method.

A low-fi suggestion, to be sure.

Sunday, April 29, 2007

Insano-Pattern: Tuple Madness

Here at this blog I hope to be documenting all kinds coding tricks sure to keep you indispensable to your employers. This one's for the dynamic languages only: tuple madness.

The pattern is this:

Never return anything from a method call that is not wrapped in an array.

Do:

Prefer returning many objects at once to creating a struct-like container for them.
Return objects of extremely heterogenous types, sharing no ancestor class whatsoever.
Use well-known mathematical sequences as type-indicating indices to your array, placing -- for example -- Orders on all Fibonacci numbers and Users on all perfect squares, with perfect squares taking precedence. Point out that this allows rapid < O(n) array traversal.
Once, just once, return [[[]]] and parse it out as the true set-theoretic definition of the number 2.

Don't:

Document your return types in comments.
Return types in a consistent order. Use the handy-dandy shuffle function.
Return a consistent number of objects. Different code paths should sometimes leave four objects, sometimes five.

If someone criticizes your code, tell them that:

Your coding is LISPish, and that there is just nothing as beneficial to developer productivity as LISP and its powerfully productive list abstractions.
You are taking advantage of duck typing and that you work twice as quickly as you did in the days of static programming.
You are practicing defensive programming and making sure that no dreaded null pointer exception will ever be thrown again. You are doing this by enforcing a scrupulously consistent contracts on each method's return type, and that you really deserve a raise for all the debugging costs you are saving the company. (Note that this should not prevent you from returning nil wrapped in an array).
Typing is just not Agile®. Finishing user story cards is all that counts.

Friday, April 27, 2007

Coghead Revisited -- A New Kind of OS?

Teaching the Whole Office to Program

Since my elementary school days of turtle programming and BASIC, there's been talk about the importance of teaching more people -- kids even -- how to program, making programming into a basic skill, so that anyone could use it to improve her productivity anytime she wanted to without hiring an IT consultant. And the focus -- in this great quest -- has often been on making programming languages a lot easier to understand, easier to read. And, as for all Big Ideas, there's even a sizable naysayer- and backlash- community.

Coghead is building tools to do this kind of thing for Web applications -- to liberate the world from programmers (I, as a programmer, am chomping at the bit to automate myself into irrelevance: I mean that completely sincerely) or to make us all into programmers, depending on your point of view. I saw a firm at Web 2.0 last week (apologies, their name escapes me) that was promoting English as a scripting language.

To be honest, I'm really not sure programming languages can get much simpler. I am very, very happy with the Ruby syntax. I think Ruby -- as a syntax -- is a thing of succinct, eloquent genius. So I don't think the next big gain in either developer productivity or the democratization of programming comes from improving on that syntax. I think it's gone about as far as it can go. The returns are diminishing.

So what if we changed tack? What if we tried something other than simplying programming languages? What if we took Coghead's approach toward all of our programming? I know what you're thinking to yourself: groan, not the dumb RAD (Rapid Applications Development) tools that were so big in the 80s and early 90s, those clunky sets of wizards never amounted to much actual productivity-gain. No, RAD wasn't so rad, because it was simply another interface to the familiar, forever-inaccessible machine code. What I'm wondering about rather is this: What if an entire OS were built in this Coghead-style from the ground up?

What if I could create my own custom view in Outlook? In Photoshop? What if I didn't need to go spelunking in a 6,000 page Developer's Guide? What if I didn't need to pay a year's worth of college tuition for an SDK to do that?

That is, what if applications no longer had despotic control over their own frames? What if I could create a view in any application on my desktop? The OS allowed it, made provisions for it, so no application could fail to support it. And what if I could do this visually? And what if I could tell the OS that what I wanted was a list, and then I could tell the OS what I wanted it to be a list of? What if the entire OS took the creation of this kind of abstraction as one of its primary responsibilities?

So why hasn't this happened already?

The Problem: Declarative Programming and the Backwardness of Software Development

Try teaching someone to program for the very first time. What's the first conceptual stumbling block? What I've found is that it takes people a while to come to terms with the idea that they can't just tell the computer what to do, what they want, what their intention is, they actually have to tell the computer how to do it.

I firmly believe that a smart application could talk a person through about 80% of what programmers commonly do, since so much of what programmers do day-to-day is repetitive and near-automatable with a piece of software pitched at the right level of abstraction. Think about it this way: most all of us still program procedurally. But what we are doing is in essence something declarative. We tell our computers how to do things step-by-step, not what we want done.

After all, the history of programming is the history of increasing abstraction. Standard libraries increase abstraction. Third-party libraries increase abstraction. Good application design involves creating your own reusable abstractions. And these abstractions are declarative in nature: we tell our processor that we want it to multiply two floats. We do not tell it how to encode the floats. We simply need to rise to the level declarative abstraction that non-practitioners are comfortable with.

But most programmers still, for the most part, program at the wrong level of generalization. Creating new algorithms is a very small subset of the day-to-day tasks of most of us. You find yourself most often using other people's algorithms, applying design patterns, and following common business processes -- procedurally doing what ought to be done declaratively.

Rails -- considered as a DSL for web applications -- actually goes a long way towards making Web application programming declarative. Even Java's newfangled annotations are moving in that direction. Those old RAD tools back in the day were basically just visual interfaces to declarative programming done cheaply by code generation.

There are of course the renowned plug-in architectures of Firefox and Eclipse. But those bad boys require some serious code-fu, and while they might liberate their respective platforms, they don't do anything to lower the bar on what it takes to be able to create your own software. If anything, they raise that bar. There's a long long road between Java 101 and OSGi development.

What has limited the power of these kinds of tools in the past is this: it's just been too hard to get all your applications to play ball together. There's just not much out there that I could tie my little visually declared list to. Web 2.0 is making interoperability compulsory, but only an operating system could make it a true precondition of software development. Lots of OSes have component models, but applications tend to support components as an afterthought. What if they simply had to develop for them or else they would have no persistent storage?

Application Rights versus User Rights

So I think there are other kinds of orthogonal concerns whose development would really make a proposal like this fly. Consider, for example, the broad issue of application rights versus user rights. Traditionally, applications have had rights to their territorial integrity (their frame). They have rights to their source code, if they want it (pretty hard to disassemble most stuff). And they have rights to keep your data however they want. What if we revoked those rights? What if we revoked the right to the user's data? That is, what if an OS were designed to prevent applications from serializing data unless it were serialized to an open specification (a specification it would have to provide to the OS)? What if we revoked the right to the source code? What if a popular OS took hold in which everything were scripted and nothing could be compiled down? Sure, companies might shy away from it, but if the platform were popular enough, they would simply have no choice but to develop for it. There's much less to fear from openness if everyone must be open, because if you're looking for intellectual property infringement, no one can hide infringement from you.

I'm not saying that all these ideas are feasible, but I hope they are worth thinking about. It's hard to imagine a speedy OS that didn't allow programmers to compile things down to machine code for some optimizations. On the other hand, as processing power continues to grow, we might see these kinds of optimizations less and less often over the next couple of years.

Wednesday, April 25, 2007

HTML 5 and Drag and Drop

Here's a good overview of the newly unveiled HTML 5 specification that's just out.

And I think they're missing something crucial.

Ray Ozzie has done some great work making what amounts to a clipboard hack for the Web. He is quite right: the Web needs a clipboard, particularly as the Web becomes an application platform. Ozzie's work is slick, elegant, and very useful.

But it's also too darned complicated, and that is no knock on Ray Ozzie. It took a lot of ingenuity to get to this point with the limited resources of HTML 4.

Really, drag and drop should be a first-class citizen of HTML. Why not? Forms and buttons are, and dragging and dropping is as important a GUI concept as forms and buttons. Support baked straight into HTML would make it simple on developers: your browser implements it once, and the rest of us down the Web food-chain never have to worry about it any more than we worry about our <b> tags.

How simple can it be? Very simple, I think. All we really need is a <drag> tag and a <drop> tag. Each <drag> tag attribute can specify which mime formats it will export to (i.e. application/pdf, text/plain) and a callback function for fetching the data each format. A <drop> tag event handler can choose an available format when a drop is made and update its appearance.

And the Web will have, not only a clipboard, but also a desktop.

Friday, April 20, 2007

In Ruby, Not All Objects Are Created Equally

Try this:

irb(main):001:0> class << :a
irb(main):002:1> def foo; "bar"; end
irb(main):003:1> end
TypeError: no virtual class for Symbol
      from (irb):1

But.. but.. but.. I thought adding instance-specific methods was a hallmark of Ruby metaprogramming?

I ran into this problem while trying to create "smart" symbols, symbols that I could hang methods off of to, say, change their textual representation when displayed to the user.

What other objects lack virtual classes? Well, String and Fixnum for two. I'm guessing that this is true of any objects that are accessible as literals.

At first, you might think this is disappointing, an imperfection in the otherwise glittering consistency of Ruby. But consider the alternative.

For one, Ruby would have to keep an object in memory representing the number 7 -- a virtual class just for every token of 7. It would have to create such an object when you tried to access its virtual class. If you had def voyages_of_sinbad; 4 + 3; end, that would have to return not just the result of computation, but then look up the possible virtual class. Whether or not a virtual class for 7 had been created, it would have to at least check. So this look up would be a requirement of any numerical computation.

For another, what distinguishes literals is that they can be accessed directly without clients being passed a reference. In essence, literals are the objects we all share. Literals, as true first-class objects, pulverize encapsulation! (And I ask you to imagine for the moment a insano-pattern of using Fixnums like 7.secret_info to pass messages between objects!)

So be thankful that not all objects are created equally in the Ruby world.

It would be an interesting thought experiment to try to imagine a sensible language without literals.

Sunday, April 15, 2007

Reporting on Rails

Another (far more famous) Ara has written a fantastic, much-needed plug-in for Rails called MOle. MOle lets you gather reporting information on your Rails application in real-time. Reporting is really a cross-cutting concern, common to many applications. It's great to see tools becoming available for Rails in this direction. Coincidentally, we've been rolling our own framework at Postful for doing this sort of thing. What did we call it? Snitch. I'm not kidding. And Snitch is the name of the console application for MOle. We were planning to release our Snitch as a plug-in, but now seeing the fantastic work done on MOle, it makes more sense for us to contribute rather than compete.

What are the advantages of MOle over externalizing your reporting with something like, say, Google Analytics? This is only going to be a fragment of the true list of advantages, since I've just begun dipping into the code, but at first glance what is obvious is this:

Gauging application performance by wrapping controllers

Capturing application-specific business data

Monitoring this data in real-time, rather than the typical reporting lag of external analytics tools

Monitoring particular code paths, like exceptions thrown, rather than just raw request headers

By itself, these things make MOle a necessary complement to externalized analytics packages I'm excited to see where development is going to take this project.

Saturday, April 14, 2007

How to Make Your ActionController Go Up In A Bang

Go on, try it. I dare you. I double-dare you:

class UserController < ActionController::Base  
  def send
  end

  def request
  end

  def render
  end
end

These are all natural enough names for controller methods, but your controller will mysteriously vanish, cease to operate in strange, strange ways if you use any of them.

Why? Because you are overriding methods crucial to the internals of the controller. In the first method, Ruby's Object#send. In the next two, ActionController methods.

How to save yourself time: I really hate silent failure or mysterious failure. But you can make the silent failures clear and noisy (oxymoronic, I know!). Try something like this:

module ActionController
  class Base
    def Base.method_added(sub)
      raise "Cannot override action 'send'"     if sub == :send 
      raise "Cannot override action 'request'"  if sub == :request
      raise "Cannot override action 'response'" if sub == :response
      raise "Cannot override action 'render'"   if sub == :render
      #...
    end
  end
end

Thursday, April 12, 2007

Ruby Scoping Shocker

Ruby head-scratching magic:

irb(main):001:0> if false
irb(main):002:1> x = true
irb(main):003:1> end
=> nil
irb(main):004:0> x
=> nil

But now:

irb(main):005:0> y
NameError: undefined local variable or method `y' for main:Object
        from (irb):5
irb(main):006:0>

This here is very unusual, and I'm not sure if it is part of the Ruby specification or just the implementation. From what I gather, the declaration of 'x' is a side effect of the parsing of the conditional block. What other side effects to unexecuted code are there that we should know about?

The oddity does not carry over to the right side of assignment:

irb(main):001:0> if false
irb(main):002:1> x = y
irb(main):003:1> end
=> nil
irb(main):004:0> x
=> nil
irb(main):005:0> y
NameError: undefined local variable or method `y' for main:Object
        from (irb):5
irb(main):006:0>

Why is the specification/implementation question important?

Because if we could rely on this behavior we could write more compact code, writing fewer variable declarations. For example:

if some_condition
  x = true
end
#..
do_something if x

rather than:

x = nil
if some_condition
  x = true
end
#..
do_something if x

if we can count on this in future and all implementations of Ruby.

Wednesday, April 4, 2007

Why I Hate Test Fixtures (And What I Am Prepared To Do About It)

I have a few complaints against YAML test fixtures:

They break. Changing your database schema will often leave existing test fixtures invalid. I am extremely lazy, and it becomes a maintenance burden.
I hate switching between files in my editor. Already, I have to switch between my application code and my test code. Switching to a test fixture for something as small as a sample object gives me three files I have to switch between.
The test fixtures centralize concerns from all different test suites. For example, I have several users in my test fixture who are used I-don't-even-know-where among my test classes. It seems bizarre to think that while my tests should remain independent and modular, my test data should be entirely jumbled together.
I can't remember what is what within my test fixtures. I had users named to Bill, Peggy, Joe, Quentin. Who are these people? I switched to the slightly more manageable valid_user, invalid_user, valid_user_sending_message_to_invalid_user, invalid_user_receiving_message_to_valid_user. Even if I try to reuse the same fixture objects as often as possible, bloat eventually happens.
If objects have dependencies, they get tedious. Let us say your objects have dependencies on other objects. Your order requires a user to be valid. Your user requires an account. If I now have to make dummy fixtures for all these, I've got to switch between five or six files now.

In fairness, I have these positive things to say about externalized test fixtures:

Sometimes they really need to be externalized, as in when your test fixtures are entire documents, like PDFs or spreadsheets.
The YAML fixtures are a vast, vast improvement over using property files, XML, or writing a lot of object instantiation code within your test itself (all of which I used to do back in Java).

So what do I want instead? I want to create my fixtures within my tests in no more than a line or two with valid defaults:

def test_signup
  user =  User.sample
  post :signup, { :user => user.attributes}
  assert_equal 1, User.count
end

I want to create required dependencies automatically:

def test_signup_creates_account   
  user =  User.sample
  post :signup, { :user => user.attributes}
  assert_equal 1, Account.count
end

I want attributes to be overridable:

def test_signup_requires_email
  user =  User.sample(:email => nil)
  post :signup, { :user => user.attributes}
  assert_equal 0, User.count
end

I want even nested attributes to be overridable:

def test_signup_creates_account   
  user =  User.sample(:account => { :balance => 5.0 })
  post :signup, { :user => user.attributes}
  assert_equal 5.0,  Account.find(1).balance
end

And I want to make minimal changes (none if possible) to my model classes to support this stuff.

Please take note that this kind a testing design pattern has cropped up both in integration testing and in RSpec.

ActiveRecord classes can introspect their associations and, with a plug-in, their validation. For 90% of cases, this should be all we need to create the object graph.

Rubyssimo