Mockito Hadoop and HBase

Today I started writing my first map reduce job that runs against HBase. I have been trying hard to work in a test driven manner and this both seemed like a good thing to use TDD on but also a hard thing to use TDD on. The MR job is fairly simple – read two columns from the table and do some basic processing – much like the famous word count example – and then write the results back to a new table. Now perhaps I am speaking too soon but I have set up a basic unit test for the mapper in the job. Thanks to Mockito this proved to be far easier than I anticipated.

@Override
public void map(ImmutableBytesWritable key, Result value, Context context)
                                  throws IOException, InterruptedException {
 
    int frequency = Bytes.toInt(value.getValue(HBaseSchemaConstants.FREQUENCY_BYTES));
    String trail = Bytes.toString(value.getValue(HBaseSchemaConstants.TRAILSTRING_BYTES));
 
    String[] landmarks = trail.split(":");
 
    for (int i = 0; i < landmarks.length-1; i++) {
        String pair = landmarks[i] + ":" + landmarks[i+1];
        context.write(new Text(pair), new IntWritable(frequency));
    }
}

So whats the problem here? Well we have a problem with getting input in and we have a problem with verifying the output.

Input

The input is passed through a Result object which has not setters on it, so I can’t pre populate it with dummy data. This is where Mockito comes in. We create a mock of the result object and stub the getValue calls on it.

Result result = mock(Result.class);
byte[] frequencyColumnId = Bytes.toBytes("frequency");
byte[] frequency = Bytes.toBytes(12);
byte[] trailColumnId = Bytes.toBytes("trailstring");
byte[] trail = Bytes.toBytes("1:2:3:4:5:3:2:1");
when(result.getValue(frequencyColumnId)).thenReturn(frequency);
when(result.getValue(trailColumnId)).thenReturn(trail);

The first line creates the actual mock. The next four lines are just me encoding the mock values I will use in the test into byte arrays. The final two lines are the stubs for the getValue() call I will be making in the map method I am testing. These translate as “when getValue is called with an argument equal to frequencyColumnId then return (int)12″ and “when getValue is called with an argument equal to trailColumnId then return (String)1:2:3:4:5:3:2:1″. This ensures I get some mock values into my map.

Output

Now the problem of the output. In my mapper I make multiple calls to context.write(), which I need to verify, once again I can use Mockito, but this time in a slightly different way. The really cool feature of Mockito here is that it records the number of times each method was called with given arguments.

Context context = mock(Context.class);
/**
main body of test
**/
verify(context, times(1)).write(new Text("1:2"), new IntWritable(12));
verify(context, times(1)).write(new Text("2:3"), new IntWritable(12));
verify(context, times(1)).write(new Text("3:4"), new IntWritable(12));
verify(context, times(1)).write(new Text("4:5"), new IntWritable(12));
verify(context, times(1)).write(new Text("3:2"), new IntWritable(12));
verify(context, times(1)).write(new Text("2:1"), new IntWritable(12));

In this block of code we call verify(nameOfMockObject, time(numberOfTimesCalled).methodWeAreInterestedIn(expectedArgs); basically we can repeatedly call verify as many times as we have expected calls on the mock, in this case context.

Ill add more once I have done my reduce method too, but so far its looking like a cinch!


About this entry