karldmoore development

A development oriented blog, containing anything I've been working on, reading or thinking about.

Posts Tagged ‘Improvement’

Documentation: Update It Or Lose It

Posted by karldmoore on February 17, 2009

Over time I’m frequently restructuring and refactoring the code I work on to improve the design and simplify many of the balls of mud I find along the way. If I break some of that code my unit tests shout at me and if they don’t my continuous integration environment shouts at me when I’ve broken one of the integration tests. One area of the code that doesn’t receive the same attention however is the documentation.

I love a well documented API as much as the next developer. I also am fighting a constant battle to improve my JavaDoc skills, trying to write to the same quality that I see in many other projects. But is excellent documentation really enough? It can’t just be excellent when it’s written it needs to be constantly maintained to this same level over time.

It’s a pretty common occurrence that I can look at a section of code and find documentation that has nothing to do with the method it belongs to. The intention revealing name sounds nothing like the documentation which describes what it’s supposed to be doing. Somewhere in the mists of time, the intention of the method and the documentation parted company; it is documentation no more it is ex-documentation. The problem here however is that there’s nothing to catch this issue. If an extra argument is added, Eclipse might display a warning telling me it’s missing from the documentation, but the actual context of the documentation can be completely wrong and I am none the wiser.

Although this incorrect documentation may seem fairly harmless to some, the reason it was supposed to be there in the first place is to the give the user of the API information on the method contract and the expected outcomes. If this information is completely wrong however it introduces confusion and also potential misuse of the API. After much wasted time and several grey hairs later I recently filed a couple of JIRA reports about popular open source frameworks which did exactly this. After my code just didn’t work and the unit tests failed, it took me a number of minutes to realise that the JavaDoc description of expected events just didn’t reflect what was actually going on in the code.

One approach to solving this problem is to include the documentation in the frequent code reviews. Code that doesn’t have documentation or has incorrect documentation can’t be considered complete, thus doesn’t pass the review. Developers need to make sure however, that they treat the documentation with the same respect that they give to their code and their tests. When dealing with documentation I’d much sooner see a method with no documentation than something that is completely wrong. I do want a good level of documentation in the code I work on but there is a very simple rule; update it or lose it.

Advertisements

Posted in Development, Opinion, Refactoring | Tagged: , , , , , , , , , , | 1 Comment »

Embedded Database Performance: Handle With Care

Posted by karldmoore on February 10, 2009

Embedded databases are proving pretty popular with many teams finding their usage extremely appealing. Many embedded databases have impressive features; easy to integrate into the software, low administration requirements, very low footprint and generally extremely fast. It is for this last reason that some people look to these databases, but it is also for this reason that they need to be handled with care.

Just like OpenOffice, a previous project I worked on used an embedded version of HSQLDB. We had started to have several performance problems so engaged in some profiling to find potential areas for improvement. After staring at several screens full of SQL debug, one thing became apparent; our system was producing a huge number of SQL statements. This was accompanied by the realisation that it was also producing a huge number of exactly the same SQL statements.

As the embedded database was extremely fast, there had never been any performance problems before. In fact the database was not the bottleneck in many of the application operations. As the application grew more complex however, performance problems began to arise quite quickly. These problems only got worse when the numbers started to increase. Complexity + Scaling != Success. The extemely fast embedded database was now starting to be a real problem, some serious changes needed to be made.

Many of the database performance problems were resolved quite quickly by simply re-learning the most basic best practices, just because the embedded database was quick these should not have been ignored;

  • Connection pooling can improve performance
  • Statement caching can improve performance
  • Batching statements can improve performance
  • Performing SQL n + 1 selects are a bad idea regardless of how quick the database is
  • Caching common SQL results can improve performance
  • The database needs to be profiled to see exactly what SQL is getting executed against it

It may be possible to ignore these issues early in the project, but as the complexity increases and the application is expected to scale they can begin to be exposed. Embedded databases performance can be impressive, but make sure you handle with care.

Posted in Development, Opinion, Refactoring, SQL | Tagged: , , , , , , , , , | 2 Comments »

Playing The Blame Game: A Better Way?

Posted by karldmoore on February 10, 2009

family-gameSuccess! We’ve just released a new version of our software, this calls for a celebration. All the team march down to the local bar for a well earned beer or an orange juice for the designated driver. But what’s that annoying noise in my jacket pocket, my mobile phone’s ringing and it’s the boss. Some customers have just upgraded and now they are reporting serious problems with their systems, looks like we better get ready to play the blame game.

For those people who have never played the blame game before, we need to set out the rules. The rules are actually quite simple, if in doubt seek the advice of a seasoned player. The first rule of the blame game is, it’s never your fault. The second rule of the blame game is, it’s never your fault. The third rule of the blame game is, it’s always some one else’s fault. So now we’ve established the rules, we need to work out some tactics to work out whose fault it actually is.

The first candidate for blame always has to be the last person to leave the company. Anything that’s wrong can easily be blamed on them, after all they can’t defend themselves so they make an ideal target. The next candidate can be chosen from any of the temporary staff you currently employ, the shorter the contract the better. If you aren’t going to have to see these people for much longer, this should work out quite nicely. The next candidate is anyone that’s working for the company in another location (if it’s another country this works even better). You might have to work with these people again, but you’re probably not going to see them on a daily basis, it shouldn’t be too awkward. Now this is where your choices start to become a little tricky, you’re down to the people that you work with everyday.

You have to keep this simple and align yourselves with the strongest members of the team, after all this is survival of the fittest. Fit yourself nicely into the crowd and look for anyone that doesn’t quite fit, this is real back to basics school ground behaviour. Pick off the weak and vulnerable first as these shouldn’t put up much of a resistance, then work your way up the power hierarchy until you find your patsy. So there you have it, those are the basic tactics for the blame game you might have to tweak as appropriate within your organisation but the general approach should apply.

The most important decision you face when presented with the blame game is do you really want to play? If you decide that it’s really not for you, the best way to deal with it is to simply throw out all of the rules, rip the board in half, stamp your feet as hard as you can and shout out loud, “I refuse to play”. It is often said that; “we work in a blame free culture, until something goes wrong”. Are you really interested in finding someone to blame when something goes wrong? Will it really help the customer solve their problem any sooner if you’ve found the person to carry the can for the failure? In the time it takes to sit there and rack your brains for potential candidates to take the blame, you could already be investigating the cause of the problems and attempting to fix them. In his recent post Sean Landis said;

Successful Agile development presupposes that team members will all act like adults. That’s a euphemism for being competent and professional. Agile teams are expected to accept a high level of responsibility and accountability. When they don’t, things can fall apart really fast.

Regardless of how bullet proof a process claims to be, there are always going to be issues that weren’t found prior to the release. If issues are always going to occur, should teams really be allowed to turn on each other in response to them, will this really lead to a better working relationship? The most important thing is that problems are dealt with promptly and professionally when they arise. Instead of trying to assign blame, people should be responsible for their own actions, they need to learn from the experience and measures should be put in place (if possible) to prevent this happening again. Next time your team is out celebrating your latest release and your mobile phone starts to ring, do you really want to start playing the blame game?

Posted in Development, Opinion | Tagged: , , , , , , , , | Leave a Comment »

SQL n + 1 Selects Explained

Posted by karldmoore on February 5, 2009

The SQL n + 1 selects problem is extremely common but I have often found that many people have either never heard of it or simply don’t understand it. It is actually very easy to introduce a problem like this into your code, but it’s also very easy to resolve as well. Problems like this are best explained with an example; so imagine we have a table called users and another called user_roles. These tables are setup with a one-to-many relationship, meaning that one user (e.g. jsmith) can have many roles (e.g. Administrator, Auditor, Developer). Many people might implement something like this;

public Iterable<User> allUsers() {
    final String selectUsers = "select users.username, users.email, " +
        "users.last_password_change from users";
    return getJdbcTemplate().query(selectUsers, new Object[] {},
                new ParameterizedRowMapper<User>() {
        public User mapRow(ResultSet resultSet, int rowNumber) throws SQLException {
            String username = resultSet.getString("username");
            String email = resultSet.getString("email");
            Date lastPasswordChange = resultSet.getDate("last_password_change");
            User user = new DefaultUser(username, email, lastPasswordChange);
            addRolesToUser(user);
            return user;
        }
    });
}

private void addRolesToUser(final User user) {
    final String selectUserRoles = "select role_name from user_roles where username = ?";
    getJdbcTemplate().query(selectUserRoles, new Object[] { user.getPrincipalName() },
                new RowCallbackHandler() {
        public void processRow(ResultSet resultSet) throws SQLException {
            String rolename = resultSet.getString("role_name");
            user.addRole(rolename);
        }
    });
}

Reviewing the code we can see one query is executed to retrieve the users, the problem here is for each user another SQL statement needs to be executed to retrieve the roles. If the first query retrieved one user, this would require one additional query to retrieve the roles. If the first query retrieved a hundred users, this would require one hundred additional queries to retrieve the roles. The pattern will always be the same, one query for the users and n queries dependent on the number of users found, thus n + 1. Although this solution is functional, it does result in many unnecessary SQL statements being executed.

select users.username, users.email, users.last_password_change from users;
select role_name from user_roles where username = ?;
select role_name from user_roles where username = ?;
select role_name from user_roles where username = ?; 
...

Shared resources are typically the bottleneck in most applications, so expensive or unnecessary SQL should be avoided if possible. As the application attempts to scale, this bottleneck can become extremely problematic and severely inhibit application performance. Fortunately this is a simple solutions to this problem; introducing a join into the query.

public Iterable<User> allUsers() {
    final String selectUsers =
        "select users.username, users.email, users.last_password_change, user_roles.role_name "
            + "from users left join user_roles on (users.username = user_roles.username)";
    final Map<String, User> users = new HashMap<String, User>();
    getJdbcTemplate().query(selectUsers, new Object[] {}, new RowCallbackHandler() {
        public void processRow(ResultSet resultSet) throws SQLException {
            String username = resultSet.getString("username");
            if (!users.containsKey(username)) {
                String email = resultSet.getString("email");
                Date lastPasswordChange = resultSet.getDate("last_password_change");
                User user = new DefaultUser(username, email, lastPasswordChange);
                users.put(username, user);
            }

            String rolename = resultSet.getString("role_name");
            if (!StringUtil.isNull(rolename)) {
                User user = users.get(username);
                user.addRole(rolename);
            }
        }
    });
    return users.values();
}

Although the code and SQL statement are slightly more complex that the original example, it results in much fewer SQL statements being executed. Instead of the n + 1 statements executed in the first example, one statement is executed that fetches all the required data. This typically results in much improved performance and as the numbers scale the improvement in performance can become much more apparent.

select users.username, users.email, users.last_password_change, user_roles.role_name
    from users left join user_roles on (users.username = user_roles.username);

As with all performance optimizations the most important thing is to measure the effect of the improvement. Performance optimizations aren’t always predictable so by taking measurements before and after the change, you can accurately know if you have actually improved the performance (or made it worse). A SQL join may be the most appropriate way of solving a problem such as this, but there are other alternatives such as caching the data instead. Although the SQL n + 1 selects is an extremely common problem, is not always well understood and is often still found within code. It is very easy to introduce a problem like this into your code, but it’s also very easy to resolve as well. Next time you are viewing your debug output, see if you can spot SQL n + 1 selects.

References

Database access using Spring JdbcTemplate
Preventing the n + 1 select problem when using Hibernate

Posted in Development, Refactoring, Spring, SQL | Tagged: , , , , , , , , , | 5 Comments »

Frequent Code Reviews The Key To Success?

Posted by karldmoore on January 27, 2009

Reviews are a widely used technique to analyse code for the presence of defects and potential improvements. Many successful teams continually review code to try to ensure a high level of quality and to constantly improve a developers ability to write good code. The arguments for and against code reviews have been made on many occasions, but one common unknown factor for many teams is the frequency at which they should take place.

Infrequent Reviews

Many teams operate on a feature complete code review. At the end of the cycle of work, several developers sit down together with the author critiquing the delivered work. What often transpires here is that due to the huge mass of code delivered, the chances of a thorough review are rendered utterly impossible. The plethora of code makes it difficult to find a starting point and thus potentially problematic code isn’t allocated the time it necessarily deserves. Any potential recommendations that come out of this kind of code may never see the light of day given the pressing work schedule (if the code was complete why are the changes necessary?). Most importantly for the team, reviewing code so infrequently can lead to demoralisation of its members.

Code reviews may pick apart the authors code; code that they have potentially sweat and cried over (ok maybe not) and worked hard to complete. Does it really make sense to save up all the potential criticism and deliver it in one skull crushing blow? Even if the criticism is perfectly valid, nobody likes to feel good about something only for it to be completely torn apart. Developers complain when customers only let them know what they actually want when they see what they don’t, is the same really acceptable for the code? If infrequent code reviews aren’t the answer, how often can code be successfully reviewed?

Email Reviews

Some teams never let the author of the code commit it to the repository. Instead, they email the code (typically in patch form) to the code owner or one of the principal reviewers. The idea behind this approach is the patches are always reviewed prior to them being committed, and the developers can work productively only focusing on the task in hand. This process has the ability to ensure that every change is scrutinised and verified to maintain the high quality standards that are set down. Having had to work with a system like this in the past, I can honestly say that it was fraught with problems and was one of the most frustrating I’ve ever worked with (I was one of the principal reviewers).

The person applying the patch quickly becomes a bottleneck in the process, how long can people really wait for the patch to be applied? What should the patch reviewer actually do with the patch, make the recommendations themselves or send an email back to the original author to apply the required changes. If multiple people are working on the code base, the chances of conflicts increase. If the patches are applied in the wrong order or peoples timing is just plain unlucky, the person applying the patch has a multitude of problems to deal with. The version control system is full of only one persons name, thus making it incredibly difficult to track down the original owner of the change. Lastly should the worst happen and the build fail………………….. guess who’s change it was that broke it?

Frequent Reviews

People like to know what and how they are doing with their work. If infrequent code reviews lead to a big bang delivery approach, then frequent code reviews take the inverse approach of small nudges in the right direction. By applying these frequent reviews, developers can deal with smaller suggestions and recommendations early in their development. Instead of developers feeling they have completed something only to be told it’s all wrong, they can be guided along the process to ensure they arrive at a right answer. The most difficult question here is; how frequent is frequent? This is a very developer specific metric.

Some developers require frequent attention and when I say frequent I mean every few hours (or more!). As developers become more experienced, this frequency typically reduces until the reviewed eventually becomes the reviewer. Depending on the type of project however, it’s still quite common for very experienced developers to like frequent code reviews. If frequent reviews become very frequent reviews, you might have unwittingly found yourself participating in quasi pair programming.

Pair Programming

Pair programming is not only a great way to develop but also to implicitly review code. A second developer sitting at the keyboard provides instantaneous reviewing of code. The second developer not only reviews the code, but also looks for potential problems and improvements to the evolving code. The second developer isn’t always necessarily the reviewer, and roles can switch between either developer during the exercise.

Having someone take over the keyboard encourages the development to be of higher quality and a second set of eyes prompts the coders to produce good code (obviously given a good pairing of developers). This instant review and feedback can actually reduce the number of bugs introduced into the system. As several developers produced the code, it also means that the team is never reliant on a single developer to address a given area of code. Pair programming is an excellent way of developing an reviewing code, but it’s not without it’s problems with some people finding the feedback is just too often.

Summary

By reducing the time between code reviews, teams can provide better guidance about the eventual quality of code and prevent storing up potential problems. The less frequent the code reviews are the more problems (especially with less experienced staff) that occur. Developers want to feel like that are doing a good job and as such then need small bits of constructive criticism often instead of lots of criticism delivered all at once. Reviewing code is best addressed frequently, providing quicker feedback, and reducing the amount of rework involved. As a developers ability increases, these issues typically subside and they become active in reviewing other peoples code.

If infrequent code reviews are problematic, are frequent code reviews the key to success?

Posted in Development, Opinion | Tagged: , , , , , , , , , , , , | 3 Comments »