Mockito Hadoop and HBase
Today I started writing my first map reduce job that runs against HBase. I have been trying hard to work in a test driven manner and this both seemed like a good thing to use TDD on but also a hard thing to use TDD on. The MR job is fairly simple – read two columns from the table and do some basic processing – much like the famous word count example – and then write the results back to a new table. Now perhaps I am speaking too soon but I have set up a basic unit test for the mapper in the job. Thanks to Mockito this proved to be far easier than I anticipated.
@Override public void map(ImmutableBytesWritable key, Result value, Context context) throws IOException, InterruptedException { int frequency = Bytes.toInt(value.getValue(HBaseSchemaConstants.FREQUENCY_BYTES)); String trail = Bytes.toString(value.getValue(HBaseSchemaConstants.TRAILSTRING_BYTES)); String[] landmarks = trail.split(":"); for (int i = 0; i < landmarks.length-1; i++) { String pair = landmarks[i] + ":" + landmarks[i+1]; context.write(new Text(pair), new IntWritable(frequency)); } }
So whats the problem here? Well we have a problem with getting input in and we have a problem with verifying the output.
Input
The input is passed through a Result object which has not setters on it, so I can’t pre populate it with dummy data. This is where Mockito comes in. We create a mock of the result object and stub the getValue calls on it.
Result result = mock(Result.class); byte[] frequencyColumnId = Bytes.toBytes("frequency"); byte[] frequency = Bytes.toBytes(12); byte[] trailColumnId = Bytes.toBytes("trailstring"); byte[] trail = Bytes.toBytes("1:2:3:4:5:3:2:1"); when(result.getValue(frequencyColumnId)).thenReturn(frequency); when(result.getValue(trailColumnId)).thenReturn(trail);
The first line creates the actual mock. The next four lines are just me encoding the mock values I will use in the test into byte arrays. The final two lines are the stubs for the getValue() call I will be making in the map method I am testing. These translate as “when getValue is called with an argument equal to frequencyColumnId then return (int)12″ and “when getValue is called with an argument equal to trailColumnId then return (String)1:2:3:4:5:3:2:1″. This ensures I get some mock values into my map.
Output
Now the problem of the output. In my mapper I make multiple calls to context.write(), which I need to verify, once again I can use Mockito, but this time in a slightly different way. The really cool feature of Mockito here is that it records the number of times each method was called with given arguments.
Context context = mock(Context.class); /** main body of test **/ verify(context, times(1)).write(new Text("1:2"), new IntWritable(12)); verify(context, times(1)).write(new Text("2:3"), new IntWritable(12)); verify(context, times(1)).write(new Text("3:4"), new IntWritable(12)); verify(context, times(1)).write(new Text("4:5"), new IntWritable(12)); verify(context, times(1)).write(new Text("3:2"), new IntWritable(12)); verify(context, times(1)).write(new Text("2:1"), new IntWritable(12));
In this block of code we call verify(nameOfMockObject, time(numberOfTimesCalled).methodWeAreInterestedIn(expectedArgs); basically we can repeatedly call verify as many times as we have expected calls on the mock, in this case context.
Ill add more once I have done my reduce method too, but so far its looking like a cinch!
About this entry
You’re currently reading “Mockito Hadoop and HBase,” an entry on random()
- Published:
- 26.01.11 / 10pm
- Category:
- Hadoop, learning Java, programming
No comments
Jump to comment form | comments rss [?] | trackback uri [?]