karldmoore development

A development oriented blog, containing anything I've been working on, reading or thinking about.

Archive for the ‘Refactoring’ Category

Documentation: Update It Or Lose It

Posted by karldmoore on February 17, 2009

Over time I’m frequently restructuring and refactoring the code I work on to improve the design and simplify many of the balls of mud I find along the way. If I break some of that code my unit tests shout at me and if they don’t my continuous integration environment shouts at me when I’ve broken one of the integration tests. One area of the code that doesn’t receive the same attention however is the documentation.

I love a well documented API as much as the next developer. I also am fighting a constant battle to improve my JavaDoc skills, trying to write to the same quality that I see in many other projects. But is excellent documentation really enough? It can’t just be excellent when it’s written it needs to be constantly maintained to this same level over time.

It’s a pretty common occurrence that I can look at a section of code and find documentation that has nothing to do with the method it belongs to. The intention revealing name sounds nothing like the documentation which describes what it’s supposed to be doing. Somewhere in the mists of time, the intention of the method and the documentation parted company; it is documentation no more it is ex-documentation. The problem here however is that there’s nothing to catch this issue. If an extra argument is added, Eclipse might display a warning telling me it’s missing from the documentation, but the actual context of the documentation can be completely wrong and I am none the wiser.

Although this incorrect documentation may seem fairly harmless to some, the reason it was supposed to be there in the first place is to the give the user of the API information on the method contract and the expected outcomes. If this information is completely wrong however it introduces confusion and also potential misuse of the API. After much wasted time and several grey hairs later I recently filed a couple of JIRA reports about popular open source frameworks which did exactly this. After my code just didn’t work and the unit tests failed, it took me a number of minutes to realise that the JavaDoc description of expected events just didn’t reflect what was actually going on in the code.

One approach to solving this problem is to include the documentation in the frequent code reviews. Code that doesn’t have documentation or has incorrect documentation can’t be considered complete, thus doesn’t pass the review. Developers need to make sure however, that they treat the documentation with the same respect that they give to their code and their tests. When dealing with documentation I’d much sooner see a method with no documentation than something that is completely wrong. I do want a good level of documentation in the code I work on but there is a very simple rule; update it or lose it.

Advertisements

Posted in Development, Opinion, Refactoring | Tagged: , , , , , , , , , , | 1 Comment »

Embedded Database Performance: Handle With Care

Posted by karldmoore on February 10, 2009

Embedded databases are proving pretty popular with many teams finding their usage extremely appealing. Many embedded databases have impressive features; easy to integrate into the software, low administration requirements, very low footprint and generally extremely fast. It is for this last reason that some people look to these databases, but it is also for this reason that they need to be handled with care.

Just like OpenOffice, a previous project I worked on used an embedded version of HSQLDB. We had started to have several performance problems so engaged in some profiling to find potential areas for improvement. After staring at several screens full of SQL debug, one thing became apparent; our system was producing a huge number of SQL statements. This was accompanied by the realisation that it was also producing a huge number of exactly the same SQL statements.

As the embedded database was extremely fast, there had never been any performance problems before. In fact the database was not the bottleneck in many of the application operations. As the application grew more complex however, performance problems began to arise quite quickly. These problems only got worse when the numbers started to increase. Complexity + Scaling != Success. The extemely fast embedded database was now starting to be a real problem, some serious changes needed to be made.

Many of the database performance problems were resolved quite quickly by simply re-learning the most basic best practices, just because the embedded database was quick these should not have been ignored;

  • Connection pooling can improve performance
  • Statement caching can improve performance
  • Batching statements can improve performance
  • Performing SQL n + 1 selects are a bad idea regardless of how quick the database is
  • Caching common SQL results can improve performance
  • The database needs to be profiled to see exactly what SQL is getting executed against it

It may be possible to ignore these issues early in the project, but as the complexity increases and the application is expected to scale they can begin to be exposed. Embedded databases performance can be impressive, but make sure you handle with care.

Posted in Development, Opinion, Refactoring, SQL | Tagged: , , , , , , , , , | 2 Comments »

SQL n + 1 Selects Explained

Posted by karldmoore on February 5, 2009

The SQL n + 1 selects problem is extremely common but I have often found that many people have either never heard of it or simply don’t understand it. It is actually very easy to introduce a problem like this into your code, but it’s also very easy to resolve as well. Problems like this are best explained with an example; so imagine we have a table called users and another called user_roles. These tables are setup with a one-to-many relationship, meaning that one user (e.g. jsmith) can have many roles (e.g. Administrator, Auditor, Developer). Many people might implement something like this;

public Iterable<User> allUsers() {
    final String selectUsers = "select users.username, users.email, " +
        "users.last_password_change from users";
    return getJdbcTemplate().query(selectUsers, new Object[] {},
                new ParameterizedRowMapper<User>() {
        public User mapRow(ResultSet resultSet, int rowNumber) throws SQLException {
            String username = resultSet.getString("username");
            String email = resultSet.getString("email");
            Date lastPasswordChange = resultSet.getDate("last_password_change");
            User user = new DefaultUser(username, email, lastPasswordChange);
            addRolesToUser(user);
            return user;
        }
    });
}

private void addRolesToUser(final User user) {
    final String selectUserRoles = "select role_name from user_roles where username = ?";
    getJdbcTemplate().query(selectUserRoles, new Object[] { user.getPrincipalName() },
                new RowCallbackHandler() {
        public void processRow(ResultSet resultSet) throws SQLException {
            String rolename = resultSet.getString("role_name");
            user.addRole(rolename);
        }
    });
}

Reviewing the code we can see one query is executed to retrieve the users, the problem here is for each user another SQL statement needs to be executed to retrieve the roles. If the first query retrieved one user, this would require one additional query to retrieve the roles. If the first query retrieved a hundred users, this would require one hundred additional queries to retrieve the roles. The pattern will always be the same, one query for the users and n queries dependent on the number of users found, thus n + 1. Although this solution is functional, it does result in many unnecessary SQL statements being executed.

select users.username, users.email, users.last_password_change from users;
select role_name from user_roles where username = ?;
select role_name from user_roles where username = ?;
select role_name from user_roles where username = ?; 
...

Shared resources are typically the bottleneck in most applications, so expensive or unnecessary SQL should be avoided if possible. As the application attempts to scale, this bottleneck can become extremely problematic and severely inhibit application performance. Fortunately this is a simple solutions to this problem; introducing a join into the query.

public Iterable<User> allUsers() {
    final String selectUsers =
        "select users.username, users.email, users.last_password_change, user_roles.role_name "
            + "from users left join user_roles on (users.username = user_roles.username)";
    final Map<String, User> users = new HashMap<String, User>();
    getJdbcTemplate().query(selectUsers, new Object[] {}, new RowCallbackHandler() {
        public void processRow(ResultSet resultSet) throws SQLException {
            String username = resultSet.getString("username");
            if (!users.containsKey(username)) {
                String email = resultSet.getString("email");
                Date lastPasswordChange = resultSet.getDate("last_password_change");
                User user = new DefaultUser(username, email, lastPasswordChange);
                users.put(username, user);
            }

            String rolename = resultSet.getString("role_name");
            if (!StringUtil.isNull(rolename)) {
                User user = users.get(username);
                user.addRole(rolename);
            }
        }
    });
    return users.values();
}

Although the code and SQL statement are slightly more complex that the original example, it results in much fewer SQL statements being executed. Instead of the n + 1 statements executed in the first example, one statement is executed that fetches all the required data. This typically results in much improved performance and as the numbers scale the improvement in performance can become much more apparent.

select users.username, users.email, users.last_password_change, user_roles.role_name
    from users left join user_roles on (users.username = user_roles.username);

As with all performance optimizations the most important thing is to measure the effect of the improvement. Performance optimizations aren’t always predictable so by taking measurements before and after the change, you can accurately know if you have actually improved the performance (or made it worse). A SQL join may be the most appropriate way of solving a problem such as this, but there are other alternatives such as caching the data instead. Although the SQL n + 1 selects is an extremely common problem, is not always well understood and is often still found within code. It is very easy to introduce a problem like this into your code, but it’s also very easy to resolve as well. Next time you are viewing your debug output, see if you can spot SQL n + 1 selects.

References

Database access using Spring JdbcTemplate
Preventing the n + 1 select problem when using Hibernate

Posted in Development, Refactoring, Spring, SQL | Tagged: , , , , , , , , , | 5 Comments »

Tests Are Still Code

Posted by karldmoore on January 14, 2009

Whilst performing code reviews recently, one of the major tasks was reviewing the tests accompanying the code. One of the most surprising discoveries, was the lack of attention paid to the tests and how regularly this was becoming an issue throughout the test code.

Test code is just as likely to contain bugs as the code it is supposed to be testing. After all, if the tests are just code, why wouldn’t it contain bugs. The possibilty of it containing bugs could actually be higher as it doesn’t have tests and in many teams isn’t treated with the same respect as the rest of the code. Tests must therefore be kept as simple and easy to understand as possible and employ the same tools and techniques available when writing tests as are used in the rest of the code.

PMD and FindBugs are great tools to identify potential problems within code, but I have only ever seen one project that applies them to the test code. PMD/CPD is a great tool to identify duplicate code blocks, but I have never seen a project apply it to test code. If duplicate code isn’t acceptable, why should it be tolerated in test code?

@Test
public void updateFirstName() {
    ...
    User updated = getUserById(expected.getId());
    assertNotNull(updated);
    assertEquals(expected.getFirstName(), updated.getFirstName());
    assertEquals(expected.getMiddleName(), updated.getMiddleName());
    assertEquals(expected.getLastName(), updated.getLastName());
}

@Test
public void updateMiddleName() {
    ...
    User updated = getUserById(expected.getId());
    assertNotNull(updated);
    assertEquals(expected.getFirstName(), updated.getFirstName());
    assertEquals(expected.getMiddleName(), updated.getMiddleName());
    assertEquals(expected.getLastName(), updated.getLastName());
}

Over time, tests will inevitably suffer from the same problems as the rest of the code. Tests become bloated, complex, duplicated and hard to understand if they are not maintained and refactored regularly just like the rest of the code. The potential problem with refactoring tests is that you are refactoring something which does not have tests to verify your changes, but testing tests is a controversial issue and can probably be better solved by conducting thorough code reviews.

Code reviews seem to be an underrated technique within test code. Having spoken to many other developers, the majority of them had never participated in any test code reviews, with only a small number having used this approach on several occasions. What makes this even stranger, is the fact that many of these developers do conduct code reviews on a semi-regular basis. This is somewhat anecdotal evidence, but it would be very interesting to see some real numbers on this subject. How many teams conduct thorough code reviews of tests and treat it with the same importance as the rest of the code?

Code without documentation is considered by many to not be code complete, but tests without documentation are a pretty common occurrence. Tests should be simple, which means documentation is not necessarily required, but does that same argument apply to the rest of the code? Tests can have intent revealing names which are sufficient in describing the intended purpose of the test and projects like TestDox can take this and turn it into readable documentation. Is this documentation sufficient, if the test fails, does it provide enough of a clue to what was actually being tested?

In the past I have found test documentation to be extremely useful when writting integration tests. This documentation described the tests not in developer terms, but instead in a use case form. By writing a simple parser, it was possible to generate documentation from the tests which would allow the QA team to understand which flows had been executed. More importantly, it also allow them to quickly understand what a individual test was doing when it failed. This documentation was written as an experiment, but it has proved extremely useful to both developers and QA when something goes wrong.

When writing tests, teams need to ensure they don’t neglect this code. Tests should be maintained, refactored, reviewed and measured to ensure the same quality as they would in the rest of their code. Tests are still code.

Posted in Refactoring, Testing | Tagged: , , , , , , , , , , | Leave a Comment »

RefactorMyCode.com: Will It Work?

Posted by karldmoore on November 7, 2008

I recently stumbled across a new site called RefactorMyCode.com. The strapline reads:

“Have you got a method that you’re dying to simplify? A class that’s growing out of control? Or an algorithm that you just can’t get working? Submit it to the new, awesome web app Refactor My Code, and have it refactored.”

RefactorMyCode.com

RefactorMyCode.com

That doesn’t all sound like refactoring, but I’m always interested in new resources so I thought I’d have a look.
Read the rest of this entry »

Posted in Opinion, Refactoring | Tagged: , , , | 4 Comments »