Self Conscious Development
Friday, April 4th, 2008Write code as if you care what others think about what you’ve written.
Write code as if you care what others think about what you’ve written.
Reginald Braithwaite says he’d love to hear stories about how programmers learned concepts from one language that made them better in another. This pretty neatly coincides with a post I’ve been meaning to make for months, so I might as well just get on with it and write something (because as CHart reminded me, I haven’t even posted for months).
Sometime around late 2004 - early 2005 I heard about Ruby on Rails for the first time. I’d never really programmed in any languages but Java/C#/PHP before, but I’d read posts by guys like Sam Ruby and Martin Fowler about how Ruby the language was really expressive and compact. However, it wasn’t until Rails started getting some buzz that I really looked at any Ruby code and tried to decipher what it was doing. Rails put Ruby within a frame of reference that I was very familiar with (web development) allowing me to easily contrast the “Ruby Way” with the .NET/Java way I was familiar with.
The first thing that really caught my eye was the extensive usage of blocks, or anonymous methods. Coming from Java/C#, I had a hard time deciphering what was really going on when I saw something like this in Ruby code:
list.find_all{ |item| item.is_interesting }
It was pretty easy to see what the end result should be, but how does it actually work? All I knew was that a simple one liner in Ruby seemed to balloon into this in Java:
List interesting = new ArrayList();
for(Item item : items){
if(item.isInteresting()){
interesting.add(item);
}
}
Sometime later, a pattern was introduced into the Java project I’m currently working on by another developer. This pattern seemed to accomplish roughly the same thing as the Ruby example (conceptually, there was still a lot of code in the Java version).
new Finder(list, new InterestingItemSpecification()).find();
Astute readers might recognize this as a variation on the Specification Pattern I’d written about almost a year ago. The point of this pattern is to allow the developer to specify how to filter a list of items, rather than manually iterating over the list by themselves. Never mind the fact that doing this in Java requires as many lines as the standard for-loop example… It’s the concept of telling the list what you want, rather than looping through manually to take what you want that’s interesting here.
I eventually created a sub-class of Java’s ArrayList that allowed it to be filtered directly, just like Ruby arrays and C#’s generic list class. Now the code ended up looking like this:
list.Where(new InterestingItemSpecification());
Once I got this far, things really started to fall into place. I started to see duplication everywhere. Hundreds of methods (it’s a pretty large project) that selected slightly different things from the same lists, the only difference lying in a little if clause. I started deleting entire methods and replacing them with Specifications. Booyah. Then I started seeing other patterns.
Accumulation/Reduction/Folding:
public BigDecimal getTotal(){
BigDecimal total = BigDecimal.Zero;
for(Item item : getItems()){
total = total.add(item.getSubTotal());
}
return total;
}
Mapping/Conversion
public List getConvertedList(){
List converted = new ArrayList();
for(Item item : items){
converted.add(item.getAnotherObject());
}
return converted;
}
Applying actions/commands to each item
public void calculate(){
for(Item item : items){
item.calculate();
}
}
For each of these common informal patterns I was able to create a formal method for accomplishing the same thing. The goal became to distill each method down to just the part that made it different from another method. The act of iterating a list is boring, boilerplate noise that just doesn’t have to be there. Here’s the end result:
Accumulate
list.reduce(new SumItemCommand());
Map
list.convert(new ItemToThingConverter());
Actions
list.forEach(new DoSomethingToItemCommand());
There’s still the overhead of creating a class for each action/command/converter, etc, but the main goal was reached. (I realize C# doesn’t have this problem, but once again, it’s the concept that was important to my learning).
I eventually started to get really good at seeing these patterns in code, even though a method might combine several of the above concepts. It really is amazing how many different ways a method can be written, but how easy it becomes to distill it down to accumulation, conversion, filtering, and just basic actions once you’ve had this “revelation.”
Over the last few months I started seeing some other, more specific examples of the above patterns. Summing was just a version of accumulation that acted on numbers. SelectMany (which I stole from C# 3.0) was simply accumulating into a list. By the time I got around to almost implementing GroupBy, I just stopped. Whoah. I was well on my way to implementing SQL on in memory objects. Maybe I should just stop this madness and write a SQL query to get what I want in the first place.
It’s amazing when I think back on it, but simply being exposed to another language (Ruby) because the code looked so pretty caused me to learn the hows and whys of basic functional programming techniques. I also gained a new respect for SQL, as I completely stumbled upon most of its basic concepts in my quest to remove needless duplication from Java code. It’s funny to think that Lisp has been around for ages, yet most programmers either aren’t even taught the basic building blocks of functional programming (I wasn’t), or else forgot about it. The sad part is, it’s all just basic fucking Math.
Here’s a bit of a cheeky question…
You know all those "conversations" us nerds have about scalability and performance where we endlessly debate about where to put business logic and whether scaling the database is easier than scaling the application servers? Well, how come we never end up talking about how to make arguably the most costly (in terms of both time and $$$) operation of our applications perform better?
The costly operation I’m talking about is the journey our markup makes from the web server to the browser. It’s funny, because we’ll architect fantastic applications, and then shove absolutely bloated junk markup across the vast, unreliable Internet without a second thought. That shit costs money too… (I’m talking about bandwidth). And it’s code that’s visible to the world.
Ayende has tried to explain why he doesn’t like ASP.NET Webforms many times, but based on the comments that pop up on his posts I’m not sure if he’s successfully getting his point across. I’ll try to help him out in this instance, as I think the same way about not just Webforms, but most other view technologies. This will take more than one post, however, so hopefully I can convince myself to increase my stunning post frequency of the past year in order to properly delve into this issue.
First off, let’s take a paragraph or two for a brief refresher on HTTP, the protocol that drives the Web as we know it. This will be quick, and I guarantee it will be dirty…
HTTP is based on a request/response model between a client and a server. The client is assumed to be a web browser in this instance (but can be anything really), and the server is a central location (IP address, domain, URI etc) on the Internet that responds to requests made by the client(s). Responses are generally sent back as HTML documents, but can also be XML, JSON or anything else, really. Each response tells the client what format it is sending via the Content-Type response header. There are many other response headers that provide clues to each client as to what it should do with the body of the response.
When a client makes a request to an endpoint, it specifies a verb that provides a clue to the server as to what the client wants it to do. These verbs are as followed:
The modern web generally just uses the first two verbs (GET and POST) to get things done, although the latest version of Rails fakes out the PUT and DELETE verbs to more closely match the intended spirit of HTTP. One thing that you may notice is that GET, POST, PUT and DELETE look an awful lot like CREATE, READ, UPDATE and DELETE, but that’s a "coincidence" for another post.
The way this stuff all gets mashed together to create a usable application on the web is only slightly complicated at the lowest level. In a common use case, a user makes a GET request (through a browser) to a URI that returns an HTML response. The browser then displays the HTML to the user. If the HTML response contains a FORM element, well that’s an invitation to the user to change the state of data on the server in some way (maybe by adding a new post to a blog via a few text boxes). When the user clicks the submit button, a POST request is sent to the server that contains all the text the user entered in the HTML textboxes. Once the server receives the request, it’s up to the application that drives it to figure out what to do with the data sent by the client.
I hope I haven’t lost everyone yet, because I swear there’s some sort of profound punchline to be found here.
Now, I’m sure we can all agree at this point that HTTP is pretty simple. Clients make requests using a verb that may or may not contain data, and the server responds back to the client in whatever way it deems appropriate. The issue that Ayende and I have with Webforms (and Struts, and other view frameworks) is that they take something simple and try to make it different. In the case of Webforms, Microsoft has tried to create an event-driven, stateful paradigm out of something that is resource-driven at it’s core.
The result of this is that Webforms has become a layer of indirection that sits on top of HTTP. Indirection in and of itself is not bad; as a guy that uses ORM’s to abstract the database will tell you. The problem is that I think it’s gone a little further away from the underlying model than it should.
Witness the ASP.NET page lifecycle.
Webforms is an attempt to make web programming look like desktop programming. As a guy who learned about web programming via ASP.NET, I found it was pretty intuitive. The problem came when I ran into leaks in the abstraction that I couldn’t deal with without the knowledge of what is really going on under the hood in the HTTP pipeline.
Now the first problem with Webforms is not that it’s an abstraction, or even that it’s a leaky one (they all are). The problem is that what Webforms attempts to abstract away is actually simpler than the abstraction!
The second "problem" with Webforms is that not very many people know the first problem. I know I didn’t, until I saw how Rails, Monorail, and other frameworks are able to work with the underlying model of the Web, while still being terribly simple to understand and develop on top of. Making it easier to program for the Web is a laudable goal, I’m just not so sure that abstracting the technology that it’s built on top of to the point where it’s unrecognizable is the way to go about doing it.
People are talking about Microsoft’s Entity framework and how it does not currently allow persistence ignorant domain objects.
I’ve been torn about this issue for a while now. On the one hand, having an O/R mapper that is persistent ignorant essentially means that it has to support XML mapping files. The downside to this approach is duplication of each entity’s properties (which leads to managing them in multiple places), having to edit and maintain these files, and not being able to see mapping information all in one place. This price is often worth it, though.
On the other hand, using attributes to specify mapping information leads to less "code" to manage, and the advantage of having your domain class and mapping information all in one location. The price is that your domain objects have to know about the persistence framework.
The one thing I’ve observed recently is that most of the Java developers I’ve talked to who’ve used Hibernate in the past are excited and relieved that the latest versions support annotations (attributes in .NET) for specifying mapping information. Most of them seem to dislike mapping via XML files, and feel that the price of using annotations is worth it.
It’s too bad for Microsoft that nHibernate already supports both methods, so they’ll have to as well if they want to keep up.
Several years ago, while I was still working at Kanga, I wrote a timesheet application that was used by the company to track employee hours. Based on ASP.NET and MySQL, the original intent was to have it running on Mono. That never really panned out, but the app was in use for at least a year and a half while I was there. As far as I knew, it was still in use up until the very end of the company.
The code languished on my hard drive for the last two years. During that period, I’d get an average of one or two inquiries each month via email from various souls across teh Intarweb who were interested in taking it for a spin. Unfortunately, the code didn’t even compile anymore and I didn’t really feel like getting it to a working state. That all changed this past week, for whatever reason.
I spent a few nights whipping the codebase into a somewhat decent state. It now compiles, and has an updated SQL script to get the database up and running. It seems to work, so I thought I’d upload it to Google code for anybody who’s interested. The code has a staggering test coverage stat of 0%, but everything used to work 2 years ago, so what the heck. It’s no longer something I’m terribly proud of, but it works and I might commit to making it kick some sort of ass over the next few months.
The first language I really learned to program in was Java. The first language I actually delivered a product with was C#. It wasn’t hard to move from one to the other, as in most aspects the language was exactly the same. There are some significant differences, though. The most striking difference in my mind is how the C# language has received so many nice little features that just make the code cleaner. You could write Java code in C# if you wanted to, but that would just be plain silly. Here’s a sample of what I’m talking about:
Java and C# both utilize a finally concept that lets the programmer clean up resources. However, C# takes this a step further with the using statement. Here’s the Java way…
Timer timer;
try{
timer = new Timer();
timer.start();
doStuffWeWantToTime();
} finally {
timer.stop();
}
…And now the C# way…
using(new Timer.start()){
doStuffWeWantToTime();
}
The using statement will implicitly call the Timer’s Dispose method once the block goes out of scope. The compiler actually generates the same code as the try/finally block, so it’s all just syntactic sugar. But sugar is so sweet.
Next up, we have the new generic Collections namespace in C#. On my current project (Java), we implemented a class called the Finder, which takes a collection and a specification. It uses the specification object(s) to filter the collection like so:
public List getProductsForSale(){
return new Finder(getProducts(),
new ProductsOnSaleSpecification()).find();
}
public class ProductsOnSaleSpecification(){
public bool isSatisfiedBy(Object obj){
Product p = (Product)obj;
return p.isOnSale();
}
}
The Finder abstracts out the looping, while the ProductsOnSaleSpecification tells the Finder which products we’re interested in. It’s pretty sweet, that is until I realized that this is actually built in to C#’s generic collection classes (the following is pseudo-code… Use of generics is implied, but I’m too lazy to html-encode the angle brackets):
public List getProductsForSale(){
return Products.FindAll(IsProductOnSale);
}
private boolean IsProductOnSale(Product p){
return p.IsOnSale;
}
It’s worth noting that the C# collection classes have more than just the FindAll concept… You can also call Exists, Find(one), ConvertAll, FindIndex, FindLast, ForEach, Remove, RemoveAll and TrueForAll in much the same way. You can also pass in an anonymous block of code, which is based on .NET’s support of delegates. I’ve written about this in more detail before.
A basic language feature that exists in C# is the notion of a Property, which allows you to present the internal state of an object in a cleaner form than the Java standard of using getters and setters. Again, this is just syntactic sugar, but it’s nice to be able to visually tell whether you’re operating directly on an object’s state. Here’s the Java code and it’s C# equivalent.
public class Dude(){
private int age;
public int getAge(){
return age;
}
public void setAge(int age){
this.age = age;
}
}
public class Dude(){
private int age;
public int Age{
get{ return age; }
set{ age = value; }
}
}
My next C# feature isn’t really a feature, and it’s perhaps the most contentious of my points. The feature is the lack of checked exceptions in C#. My current project has very few points in the code where we actually handle exceptions (the Facades, and various points in the UI, but almost nowhere in the Domain). Yet we’re consistently forced by the Java compiler to stick throws ValidationException on almost all of our methods. It quickly just becomes unwanted noise in the code base.
That’s about it for my little comparison. I still have a few more points I could make, specifically around delegates and events in C#, and how stupid it is that the Java foreach statement equivalent only operates on generic lists without needing a cast, but I’ll save those for another post.
Since everyone else does it, I figured I’d start linking as well:
This one’s for Ted, Joe, Vlad, Alan and anyone else who thinks functional programming is the shit.
This is neat stuff, because it means we’ll get most of the cool stuff from functional languages in a statically-typed, IDE friendly language. One of the most common complaints about Ruby that I’ve heard, both on our project and online, is that there is no good IDE support (meaning refactoring, and navigation through the code in this context). If this sort of stuff is coming to C#, you can also be damn sure that similar features will make it into Java.
I did a little bit of refactoring the other day that I’m pretty happy with, so I thought I’d share.
We started out with a method in the SystemProfile who’s signature is specified below:
List createPriorityAllocationGroups(
ProfileGroup profileGroup,
List allAllocationChildren,
FacilityMonthlyMeasurements sourceForMeasurementPoints,
FacilityMonthlyMeasurements sourceForOwnerTypeClassification)
It is called within a loop for every ProfileGroup contained in the SystemProfile like so:
private List getProfileAllocationGroupsForProfileGroups(
List profileGroups,
List allAllocationChildren,
FacilityMonthlyMeasurements sourceForMeasurementPoints,
FacilityMonthlyMeasurements sourceForOwnerTypeClassification){
List profileAllocationGroups = new ArrayList();
for (Iterator iter = profileGroups.iterator(); iter.hasNext();) {
ProfileGroup profileGroup = (ProfileGroup) iter.next();
profileAllocationGroups.addAll(
createPriorityAllocationGroups(
profileGroup,
allAllocationChildren,
sourceForMeasurementPoints,
sourceForOwnerTypeClassification,
isCustomProfile));
}
sortProfileAllocationGroups(profileAllocationGroups);
return profileAllocationGroups;
}
Having to pass in four parameters to a method is a little smelly, but not too bad. The bad part happened within the method, which had grown quite complicated over a long period of time. The purpose of the method is to simply assemble a list of new PriorityAllocationGroup objects, so it should have looked something like this:
PriorityAllocationGroup allocationGroup = createNewPriorityAllocationGroup(group);
allocationGroup.addAllocationChildren(getPriorityAllocationChildrenForGroup(group));
allocationGroup.addMeasurementPoints(getMeasurementPointsForGroup(group));
return priorityAllocationGroup;
Instead, we had an implementation that was about 75 lines long, and so convoluted it took
me and my pair almost half a day to figure out how to implement a feature for our story.
Which sucks.
The very first refactoring I identified was to move the method into ProfileGroup. Many of the questions being asked within the method were specific to this object, so keeping it in SystemProfile no longer made sense. This involved moving a lot of other helper methods into ProfileGroup as well. Nothing actually got any simpler, but at least we were now just asking ProfileGroup to assemble it’s ProfileAllocationGroups instead of stuffing it all in SystemProfile.
So originally we had this:
class SystemProfile{
List getProfileAllocationGroupsForProfileGroups(
List profileGroups,
List allAllocationChildren,
FacilityMonthlyMeasurements sourceForMeasurementPoints,
FacilityMonthlyMeasurements sourceForOwnerTypeClassification);
List createPriorityAllocationGroups(
ProfileGroup profileGroup,
List allAllocationChildren,
FacilityMonthlyMeasurements sourceForMeasurementPoints,
FacilityMonthlyMeasurements sourceForOwnerTypeClassification);
}
Which became this:
class SystemProfile{
List getProfileAllocationGroupsForProfileGroups(
List profileGroups,
List allAllocationChildren,
FacilityMonthlyMeasurements sourceForMeasurementPoints,
FacilityMonthlyMeasurements sourceForOwnerTypeClassification);
}
class ProfileGroup{
List createPriorityAllocationGroups(
SystemProfile systemProfile,
List allAllocationChildren,
FacilityMonthlyMeasurements sourceForMeasurementPoints,
FacilityMonthlyMeasurements sourceForOwnerTypeClassification);
}
The next refactoring involved removing another smell (at least to me); passing in Lists as
parameters. Obviously there are sometimes reasons for this, but it’s usually better to be able to call a method like getAllAllocationChildren() at will instead of passing the list all over the place. As it turned out, the allAllocationChildren parameter originated from a method within SystemProfile. Since we were passing the SystemProfile in as a parameter, we could eliminate this parameter to make the method signature look like this:
class ProfileGroup{
List createPriorityAllocationGroups(
SystemProfile systemProfile,
FacilityMonthlyMeasurements sourceForMeasurementPoints,
FacilityMonthlyMeasurements sourceForOwnerTypeClassification);
}
One of the patterns I saw after this cleanup was that the 4 parameters originally passed in were in turn being passed in to other methods. Over and over again. Eventually the lightbulb goes on in my head as I realize that what we’re actually signifying by passing these parameters all over the place is that we’re operating inside a specific
context. So the next refactoring was to combine the sourceForMeasurementPoints, sourceForOwnerTypeClassification, and the SystemProfile into a
ProfileAllocationGroupContext object. Once this was done, it was a simple matter to change the signature of all the helper methods to take the context object as well.
class ProfileGroup{
List createPriorityAllocationGroups(
ProfileAllocationGroupContext context);
}
It’s now the job of the SystemProfile to create the ProfileAllocationGroupContext, which is then passed on to every ProfileGroup when we ask for its ProfileAllocationGroups. Whew!
This seemed ok, but there was still some stuff that didn’t quit fit. There were lots of calls like this in ProfileGroup.createPriorityAllocationGroups:
context.getMeasurementPointsToBeAllocated(this);
This was a fantastic example of how when you’re refactoring code it sometimes "speaks" to you. I had originally thought that putting this code in ProfileGroup was the right thing to do, but as I worked through all the above refactorings it became obvious that it wasn’t. Here’s the method that kicked everything off in SystemProfile as it stood at this point in the refactoring:
List getProfileAllocationGroupsForProfileGroups(
List profileGroups,
ProfileGroupAllocationContext context){
List profileAllocationGroups = new ArrayList();
for (Iterator iter = profileGroups.iterator(); iter.hasNext();) {
ProfileGroup profileGroup = (ProfileGroup) iter.next();
profileAllocationGroups.addAll(profileGroup.createPriorityAllocationGroups(context));
}
sortProfileAllocationGroups(profileAllocationGroups);
return profileAllocationGroups;
}
The code was telling me that instead of
profileGroup.createPriorityAllocationGroups(context)
we should actually be doing
context.createPriorityAllocationGroups(profileGroup)
This makes sense, if we say that for the given context, we want to assemble a list of
PriorityAllocationGroups. By moving the method to the context object we’re actually making it do double duty as an assembler or builder, but that can easily be factored out later. So in the end, my object model looked like this:
class SystemProfile{
List getProfileAllocationGroupsForProfileGroups(
ProfileAllocationGroupContext context);
//plus a bunch of state and other methods
}
class ProfileGroup{
//original state based stuff
}
class ProfileAllocationGroupContext{
List createPriorityAllocationGroups(ProfileGroup profileGroup);
}