<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="http://feeds.feedburner.com/~d/styles/rss2full.xsl" type="text/xsl" media="screen"?><?xml-stylesheet href="http://feeds.feedburner.com/~d/styles/itemcontent.css" type="text/css" media="screen"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title />
	
	<link>http://jamelcato.com</link>
	<description>The Personal Site of Jamel Cato</description>
	<pubDate>Tue, 18 Nov 2008 03:26:38 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.1</generator>
	<language>en</language>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" href="http://feeds.feedburner.com/jamelcato" type="application/rss+xml" /><feedburner:browserFriendly></feedburner:browserFriendly><item>
		<title>R.I.P. Michael Crichton</title>
		<link>http://jamelcato.com/goodbye-michael-crichton/</link>
		<comments>http://jamelcato.com/goodbye-michael-crichton/#comments</comments>
		<pubDate>Tue, 11 Nov 2008 03:20:03 +0000</pubDate>
		<dc:creator>jamel</dc:creator>
		
		<category><![CDATA[Books]]></category>

		<category><![CDATA[Jamel Cato]]></category>

		<category><![CDATA[Michael Crichton]]></category>

		<category><![CDATA[Pleasure Reading]]></category>

		<guid isPermaLink="false">http://jamelcato.com/?p=58</guid>
		<description><![CDATA[The high spirits I felt at the election of Barack Obama were tempered upon hearing of the death of Michael Crichton, one of my all-time favorite authors.

Though Crichton is best known for Jurassic Park and the television show ER, I personally enjoyed Travels, his nonfiction memoirs, more than any of his fiction. What a life [...]]]></description>
			<content:encoded><![CDATA[<p>The high spirits I felt at the election of Barack Obama were tempered upon hearing of the death of Michael Crichton, one of my all-time favorite authors.</p>
<p><span id="more-58"></span></p>
<p>Though Crichton is best known for <em>Jurassic Park</em> and the television show <em>ER</em>, I personally enjoyed <em>Travels</em>, his nonfiction memoirs, more than any of his fiction. What a life he led. The best chapter was “The Girl Who Seduced Everybody”, which left me laughing out loud, something none of his novels—as good as they were— ever did.</p>
<p>There are three things about Crichton’s writings that have always fascinated me:</p>
<p>First, no one researched their subject matter more thoroughly than Crichton. In fact, he was one of the few novelists whose novels routinely included academic-style research citations on the back pages. After reading a Crichton novel you feel like an expert on the subject. It’s interesting that such material would be so popular.</p>
<p>Second, despite the fact that nearly all of his novels had a storyline that was firmly science fiction, Crichton never suffered the professional misfortune of being labeled a science fiction author. The commercial success of his books meant they were considered “suspense thrillers that contained science fiction elements.” I’ve always found that mildly preposterous given his storylines included aliens (Sphere), time travel (Timeline), sentient robots (Prey), talking apes who live in ancient lost cities (Congo) and, most famous of all, genetically engineered dinosaurs (Jurassic Park). He obviously had a smart agent.</p>
<p>Some critics have argued that because so many of Crichton’s early books were made into movies, his later works became more like film scripts than novels. I have to say I mostly agree with this criticism, especially in the case of <em>Timeline</em> (which I loved anyway). More precisely my opinion is that many of his later works were undeniably written in a manner that made them amenable to screen adaptations. But hey, when you sell the movie rights to your books before you even write them (as Crichton did with every book after Jurassic Park), that tends to happen. The same criticism can be made of Tom Clancy and John Grisham.</p>
<p>Crichton, a medical doctor with multiple Harvard degrees, was enormously intelligent. Though he generally didn’t base characters on himself, all of his books except <em>Disclosure</em> featured a character that was extraordinarily knowledgeable. And that brings me to the third thing I admire about Crichton: Unlike other New York Times chart-toppers, he wrote novels for the thinking person.</p>
<p>His talent will be missed.</p>
<p>First the <em>Easy Rawlins</em> Series ends, now Michael Crichton is gone. The life of my mind has two gaping voids.</p>
]]></content:encoded>
			<wfw:commentRss>http://jamelcato.com/goodbye-michael-crichton/feed/</wfw:commentRss>
		</item>
		<item>
		<title>An Easy Way to Master INDEX/MATCH Formulas</title>
		<link>http://jamelcato.com/an-easy-way-to-master-indexmatch-formulas-in-excel/</link>
		<comments>http://jamelcato.com/an-easy-way-to-master-indexmatch-formulas-in-excel/#comments</comments>
		<pubDate>Mon, 01 Sep 2008 21:15:47 +0000</pubDate>
		<dc:creator>jamel</dc:creator>
		
		<category><![CDATA[Data Analysis]]></category>

		<category><![CDATA[Excel]]></category>

		<category><![CDATA[Jamel Cato]]></category>

		<category><![CDATA[Advanced Excel Techniques]]></category>

		<category><![CDATA[INDEX()]]></category>

		<category><![CDATA[INDEX/MATCH]]></category>

		<category><![CDATA[MATCH()]]></category>

		<guid isPermaLink="false">http://jamelcato.com/an-easy-way-to-master-indexmatch-formulas/</guid>
		<description><![CDATA[At least once a month I use an INDEX/MATCH formula to match and merge patient data from multiple Excel files. I wrote this post because when I first sought to learn the technique I found the other tutorials on the web either lacking or hard-to-follow.

If you’re reading this, chances are you have strong Excel skills [...]]]></description>
			<content:encoded><![CDATA[<p>At least once a month I use an INDEX/MATCH formula to match and merge patient data from multiple Excel files. I wrote this post because when I first sought to learn the technique I found the other tutorials on the web either lacking or hard-to-follow.</p>
<p><span id="more-32"></span></p>
<p>If you’re reading this, chances are you have strong Excel skills and already know what INDEX/MATCH formulas do. For the rest of you, here’s a short introduction:</p>
<p>INDEX/MATCH formulas, created by combining Excel’s built-in INDEX function and its built-in MATCH function into a single compound formula, are ideal when you need to:</p>
<ul>
<li>Merge data from one Excel list into another Excel list by matching records from the two lists; or</li>
<li>Use a common field from two Excel lists to lookup a second (or third or fourth) field by matching records from the two lists.</li>
</ul>
<p>For instance, suppose you had two Excel worksheets for the same group of customers. The first worksheet contains columns for Customer ID and Email Address. The second worksheet contains columns for Customer ID, Phone Number and Age. With Customer ID as the common column, you could use an INDEX/MATCH formula to add each customer’s phone number and age to the email worksheet.</p>
<p>For SQL experts, you can think of INDEX/MATCH formulas as a way to use Excel to do inner joins.</p>
<blockquote><p><em><span style="text-decoration: underline;">Quick Sidebar</span></em></p>
<p>At this point, someone is undoubtedly thinking: I could do the same thing faster in Microsoft Access with a lookup query in Design View or in Crystal Reports with the link tab of the Database Expert. You are probably correct, but this post is intended for everyday users who only have or know Microsoft Excel or situations where setting up an Access DB or a new Crystal Report is just not warranted. But I digress.</p></blockquote>
<p>A standard INDEX/MATCH formula is written like this:</p>
<p align="center"><code>Index( value_array, Match( lookup_value, lookup_array, match_type ), column_number )</code></p>
<p>The MATCH portion returns a <em>position</em> in a list. The INDEX portion returns a <em>value</em> in a cell. So combining them together allows you to lookup a value in a cell based on the position of an item in a list. (What the formula actually does is use a MATCH function as the second argument of an INDEX function.)</p>
<p><span style="text-decoration: underline;">Here’s the Trick</span></p>
<p>Instead of trying to digest all of the above, just rewrite the formula in the following way and replace the double-bracketed portions with your actual data or cell references.</p>
<p><code></code></p>
<p align="center"><code> =INDEX([[find this kind of value]],MATCH([[for this cell within the **first** list]], [[with a match within this **second** list]],0))</code></p>
<p>A few parting notes that might be additionally helpful:</p>
<ul>
<li>The MATCH portion of the formula is processed before the INDEX portion.</li>
<li>If you plan to use AutoFill to copy the formula down a column, ensure that the lookup array is either a named range or an absolute reference to a range.</li>
<li>You cannot refer to an entire column as the lookup array for the MATCH function; You must specify an exact cell range.</li>
<li>The 0 at the end of the MATCH portion is optional and one of three possible choices (1,0,-1). 0 means find an exact match. 1 means find the highest value that matches. -1 means find the lowest value that matches. If you omit this argument, it defaults to 1, which is almost always what you want.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://jamelcato.com/an-easy-way-to-master-indexmatch-formulas-in-excel/feed/</wfw:commentRss>
		</item>
		<item>
		<title>A Scathing Review of The Happening</title>
		<link>http://jamelcato.com/a-scathing-review-of-the-happening/</link>
		<comments>http://jamelcato.com/a-scathing-review-of-the-happening/#comments</comments>
		<pubDate>Thu, 19 Jun 2008 02:21:25 +0000</pubDate>
		<dc:creator>jamel</dc:creator>
		
		<category><![CDATA[Jamel Cato]]></category>

		<category><![CDATA[Movies]]></category>

		<category><![CDATA[M. Night]]></category>

		<category><![CDATA[M. Night Shyamalan]]></category>

		<category><![CDATA[Movie Review]]></category>

		<category><![CDATA[The Happening]]></category>

		<guid isPermaLink="false">http://jamelcato.com/a-scathing-review-of-the-happening/</guid>
		<description><![CDATA[I’m not a professional movie critic. But if I were, both my thumbs—and my toes—would be pointing down when I reviewed The Happening, the latest thriller from filmmaker M. Night Shyamalan. There are two main reasons why.

The first reason is that nothing happens in the movie. That’s not a spoiler or a clever play on [...]]]></description>
			<content:encoded><![CDATA[<p>I’m not a professional movie critic. But if I were, both my thumbs—and my toes—would be pointing down when I reviewed <em>The Happening</em>, the latest thriller from filmmaker M. Night Shyamalan. There are two main reasons why.</p>
<p><span id="more-31"></span></p>
<p>The first reason is that nothing happens in the movie. That’s not a spoiler or a clever play on words. Sadly, it’s a plot synopsis. If you can believe it, <em>The Happening</em> is an apocalyptic thriller without an apocalypse. Sure, there’s a cataclysmic “event” in the beginning. But then no chaos, excitement or real suspense ensues. Even though humanity is facing possible extinction, everyone stays calm, rational and orderly. In addition to being wildly unrealistic, such a storyline is boring.</p>
<p>If the “event” in the movie happened in real life, there would be widespread panic. Hazmat crews would be everywhere. Cell phone networks would crash. Children would be ripped from their mothers’ fingertips in the inevitable chaos. But nothing remotely like this happens in the movie. Actually, after the first 10 minutes, not much of anything happens, making the film’s title the ultimate oxymoron.</p>
<p>As if a dreary plot was not harm enough, the special effects, dialogue and editing are film-school amateurish. As many other reviewers have decried, I can confirm that there are indeed two scenes in the film where the microphone grip accidentally falls into the frame.</p>
<p>The second reason I’m so disappointed with <em>The Happening</em> is that, like many of his former fans, I really wanted to see Shyamalan redeem himself after the dreadful <em>Lady in the Water</em>. As bad as that movie was (and it was very, very bad), at least it came with the entertaining distraction that was the behind-the-scenes political battle between Shyamalan and the Disney executives who didn’t want to lose their jobs by distributing such a dud. The infighting culminated in a notorious scene at the Four Seasons Hotel in Philadelphia where an angry Shyamalan threw down his napkin and stormed away from a dinner meeting with “the suits” who, after previewing the final cut, had flown across the country to personally beg him to rewrite the script. (Historical Note: The film eventually came out without a re-write and the executives indeed lost their jobs when it flopped.)</p>
<p>If I met M. Night Shyamalan today, I would ask him one question: Where did we go wrong? The <em>Sixth Sense</em> is one of the 10 best films of all times. <em>Unbreakable</em> is an underrated classic. He wouldn’t admit it, but the answer is <em>The Village</em>, his third major release. Shyamalan was stung by the heavy criticism of the movie and, in my opinion, started to believe the experts who said that his film’s hallmark surprise endings had inevitably turned into an Achilles’ heel because it was impossible to surprise an audience who came to the theater expecting a surprise. Both <em>Signs</em> and <em>Lady in the Water</em>, neither of which included a twist ending, proved that the experts were wrong. Twist endings are not Shyamalan’s problem. But I digress.</p>
<p>Back to <em>The</em><em> Happening</em>. It sucks and nothing happens. Wait, did I say that already?</p>
<p>If after reading this review you still spend good gas money to see this film, you will wish that a certain wind blows by the home of everyone who played a part in green-lighting it. You will go from wondering if Shyamalan has lost his touch to being sure of it. Your date will be pissed, not just because you saw a bad movie, but also because the bad movie lacked enough content for interesting dinner conversation.</p>
<p>Here’s a happening for you: On my way out the theater I thought I saw Bruce Willis crouched on one knee talking to a little boy. As I passed by, it sounded like the boy whispered, “I see dead movies.”</p>
]]></content:encoded>
			<wfw:commentRss>http://jamelcato.com/a-scathing-review-of-the-happening/feed/</wfw:commentRss>
		</item>
		<item>
		<title>The End of the Easy Rawlins Series</title>
		<link>http://jamelcato.com/the-end-of-the-easy-rawlins-series/</link>
		<comments>http://jamelcato.com/the-end-of-the-easy-rawlins-series/#comments</comments>
		<pubDate>Mon, 02 Jun 2008 02:59:53 +0000</pubDate>
		<dc:creator>jamel</dc:creator>
		
		<category><![CDATA[Books]]></category>

		<category><![CDATA[Jamel Cato]]></category>

		<category><![CDATA[Pleasure Reading]]></category>

		<category><![CDATA[Blonde Faith]]></category>

		<category><![CDATA[Easy Rawlins]]></category>

		<category><![CDATA[Easy Rawlins Mystery Series]]></category>

		<category><![CDATA[Walter Mosley]]></category>

		<guid isPermaLink="false">http://jamelcato.com/the-end-of-the-easy-rawlins-series/</guid>
		<description><![CDATA[I recently finished reading Blonde Faith by Walter Mosley, the 11th and final novel in the acclaimed Easy Rawlins mystery series (the one that President Clinton made famous when he declared it his all-time favorite). Like the 10 installments before it, it was excellent. Instead of reviewing this particular book, I want to commemorate the [...]]]></description>
			<content:encoded><![CDATA[<p>I recently finished reading <em>Blonde Faith</em> by Walter Mosley, the 11th and final novel in the acclaimed Easy Rawlins mystery series (the one that President Clinton made famous when he declared it his all-time favorite). Like the 10 installments before it, it was excellent. Instead of reviewing this particular book, I want to commemorate the end of the series by sharing a few personal thoughts on the collection as a whole. I’ll try to avoid spoilers.</p>
<p><span id="more-30"></span></p>
<p>Although I really, really hate to see the series come to an end, it’s high time that it does. No matter how enthralling and well-written, any mystery series centered on the detective capers of a middle-aged grandfather (which is what Easy has become by the last book) is bound to become a tough sell sooner or later. I’m sure Mosley was more cognizant of that than anyone, which is why <em>Blonde Faith</em> ended with such finality.</p>
<p>Each book in the series has a color in its title. I had read four of the books and hadn’t consciously noticed this pattern until it came up in a magazine article I read.<br />
<em><br />
A Little Yellow Dog</em> was my favorite book in the series, followed closely by <em>Cinnamon Kiss</em>.</p>
<p>Jackson Blue was my favorite secondary character. Every time his character made an appearance I was reminded that a beautiful mind and an ugly soul can coexist in one person.</p>
<p>I understand about Easy and Bonnie Shay, I do.</p>
<p>Mosley’s depictions of postwar Los Angeles are so vivid and historically accurate it’s like time-traveling.</p>
<p>I discovered the series when a good friend of mine insisted I read Black Betty. I loved it just like she said I would and can’t thank her enough for the introduction.</p>
<p>Most people don’t know that Walter Mosley is bi-racial (his father is black and his mother is Jewish.)  I only mention that fact at all to say that Mosley’s striking ability—and choice—to chronicle the joys and pains of the black experience in America is even more interesting that I originally thought.</p>
<p>Of the several themes that Mosley weaved throughout the series, the most fascinating is this:  Your good deeds today will not absolve you of the evil you did yesterday. No matter how much Christmas Black loves and cares for Easter Dawn, they can’t ever be a normal family because of what he once did on the other side of the world. No matter how grand Jewelle’s real estate empire becomes, she can’t forget what she did to Mofass to get it started. No matter how much of a safe haven Laselle Latour’s female-only boarding house provides for women in L.A., it will never make up for the bordello that she ran back in Houston.</p>
<p>The only character who was exempt from the gravity of this moral black hole is Raymond “Mouse” Alexander, the diminutive killer who was Easy’s right-hand man. Mouse would shoot you through the eye for breakfast and go dancing for dinner. As Easy discovered on many occasions, having Mouse as a best friend was like keeping a full-grown tiger for a pet.</p>
<p>I will miss having another Easy Rawlins mystery to look forward to. Mosley is among the best writers of his generation and the life of my mind has been enriched by these stories. One of my favorite passages from the series is this:</p>
<blockquote><p><em> Proof is a funny thing. For policeman and for lawyers it depends on tangible evidence: fingerprints, eyewitnesses, irrefutable logic, or self-incrimination. But for me evidence is like morning mist over a complex terrain. You see the landscape and then it’s gone.</em></p></blockquote>
<p>And then it’s gone.</p>
]]></content:encoded>
			<wfw:commentRss>http://jamelcato.com/the-end-of-the-easy-rawlins-series/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Crystal Reports 2008 is Here</title>
		<link>http://jamelcato.com/crystal-reports-2008-is-here/</link>
		<comments>http://jamelcato.com/crystal-reports-2008-is-here/#comments</comments>
		<pubDate>Thu, 20 Dec 2007 01:31:11 +0000</pubDate>
		<dc:creator>jamel</dc:creator>
		
		<category><![CDATA[Crystal Reports]]></category>

		<category><![CDATA[Jamel Cato]]></category>

		<category><![CDATA[Reporting Tools]]></category>

		<category><![CDATA[Crystal Reports 2008]]></category>

		<guid isPermaLink="false">http://jamelcato.com/crystal-reports-2008-is-here/</guid>
		<description><![CDATA[In between watching trailers for I Am Legend and raking leaves, I’ve been using Crystal Reports XI a lot lately. I mean really using it—OLAP cubes, Custom Functions, posting questions on the Business Objects message board—the whole nine yards. Somebody at work heard that I was pretty good with it and the next thing I [...]]]></description>
			<content:encoded><![CDATA[<p>In between watching trailers for <em>I Am Legend</em> and raking leaves, I’ve been using Crystal Reports XI a lot lately. I mean <em>really</em> using it—OLAP cubes, Custom Functions, posting questions on the Business Objects message board—the whole nine yards. Somebody at work heard that I was pretty good with it and the next thing I know I have 40 complex reports to develop.</p>
<p><span id="more-28"></span></p>
<p>I suppose that explains why I was on a mailing list announcing the official release of <a href="http://www.businessobjects.com/product/catalog/crystalreports/?intcmp=corphp_cr2008box">Crystal Reports 2008</a> this past October. (I’ve waited until now to post about the release because the Crystal Reports people have a history of releasing new editions where something important is either missing or wrong. UPDATE: Sure enough, the License released in October had a major error and had to be re-issued.) While it would have been more interesting to be on a mailing list that announced that SAP was about to buy the company behind Crystal Reports for $6 billion just a few days later, I take what I can get.</p>
<p>Ken Hamady at <em>The Crystal Reports Underground</em> has <a href="http://kenhamady.com/cru/archives/105">a good writeup</a> about all the new bells and whistles (and the ones that are gone), so I won’t retread that ground.</p>
<p>I remember buying Crystal Reports at the Georgia Tech bookstore when the company was still called Crystal Services. And then it became Seagate. And then Crystal Decisions. And then Business Objects.</p>
<p>All that turnover in the corner office might be why certain basic features consistently get overlooked—like the ability to save a report in an earlier format. Who knows, maybe after each buyout the new engineers can’t reach the old engineers on their new yachts.</p>
<p>Happy Reporting.</p>
]]></content:encoded>
			<wfw:commentRss>http://jamelcato.com/crystal-reports-2008-is-here/feed/</wfw:commentRss>
		</item>
		<item>
		<title>The Problem with SIF</title>
		<link>http://jamelcato.com/rfc-stole-my-sif-and-ran-off-with-my-xml/</link>
		<comments>http://jamelcato.com/rfc-stole-my-sif-and-ran-off-with-my-xml/#comments</comments>
		<pubDate>Tue, 03 Jul 2007 14:12:05 +0000</pubDate>
		<dc:creator>jamel</dc:creator>
		
		<category><![CDATA[Educational Technology]]></category>

		<category><![CDATA[Jamel Cato]]></category>

		<category><![CDATA[School Interoperability Framework]]></category>

		<category><![CDATA[SIF]]></category>

		<guid isPermaLink="false">http://jamelcato.com/rfc-stole-my-sif-and-ran-off-with-my-xml/</guid>
		<description><![CDATA[Lately I&#8217;ve been doing some work with SIF data for a client. For those who don&#8217;t know (which seems to be just about everybody, including me up until earlier this year), SIF stands for School Interoperability Framework. It&#8217;s a new standard for exchanging educational data. You can learn the details here, or if you can&#8217;t [...]]]></description>
			<content:encoded><![CDATA[<p>Lately I&#8217;ve been doing some work with SIF data for a client. For those who don&#8217;t know (which seems to be just about everybody, including me up until earlier this year), SIF stands for <em>School Interoperability Framework</em>. It&#8217;s a new standard for exchanging educational data. You can learn the details <a href="http://en.wikipedia.org/wiki/SIF">here</a>, or if you can&#8217;t sleep, <a href="http://specification.sifinfo.org/Implementation/2.0r1/">here</a>.</p>
<p><span id="more-10"></span></p>
<p>The idea behind SIF is timely and necessary. And the protocol is well designed. But the whole initiative is badly in need of a good PR consultant. In my opinion, SIF is the poster child for Standards Gone Wild. The technical lexicon surrounding a SIF implementation is denser than London fog. I&#8217;m aware that technical people often like the esoteric acronyms and terminology that we think separates us from mere mortals. But in this case, when you&#8217;re trying to convince a whole industry to change a fundamental practice, much of it may be counterproductive.</p>
<p>First off, the term &#8220;School Interoperability Framework&#8221; by itself is enough to intimidate many people. Upon hearing it, non-IT people run for the hills.</p>
<p>If anybody had asked me, I would&#8217;ve suggested a simpler, more descriptive name that could readily be turned into a catchy acronym. I think something like the <strong><em>Open Protocol for Education Networks</em></strong> (OPEN) or more simply, the <strong><em>School Data Standard</em> </strong>(SDS) would have worked better from a marketing standpoint.</p>
<p>If you want the whole world to adopt your standard, then make it adoption-friendly.</p>
<p>Okay, I better get back to writing that non-normative RFC 4122 XML schema for our vertical zone integration server.</p>
]]></content:encoded>
			<wfw:commentRss>http://jamelcato.com/rfc-stole-my-sif-and-ran-off-with-my-xml/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Missing Data Techniques for Dummies</title>
		<link>http://jamelcato.com/missing-data-techniques-for-dummies/</link>
		<comments>http://jamelcato.com/missing-data-techniques-for-dummies/#comments</comments>
		<pubDate>Tue, 03 Jul 2007 13:56:46 +0000</pubDate>
		<dc:creator>jamel</dc:creator>
		
		<category><![CDATA[Data Analysis]]></category>

		<category><![CDATA[Jamel Cato]]></category>

		<category><![CDATA[Dealing with missing data]]></category>

		<category><![CDATA[Mean Imputation]]></category>

		<category><![CDATA[Missing Data]]></category>

		<category><![CDATA[Missing Data Techniques]]></category>

		<category><![CDATA[Multiple Imputation]]></category>

		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://jamelcato.com/missing-data-techniques-for-dummies/</guid>
		<description><![CDATA[This is not another article explaining various missing data techniques. This is a post about how to use them without getting an advanced degree in statistics or charming that nice young Data Analyst into doing it for you.

If you Google &#8220;missing data&#8221; you will be barraged with complicated statistical techniques. That&#8217;s par for the course, [...]]]></description>
			<content:encoded><![CDATA[<p>This is not another article explaining various missing data techniques. This is a post about how to use them without getting an advanced degree in statistics or charming that nice young Data Analyst into doing it for you.</p>
<p><span id="more-9"></span></p>
<p>If you Google &#8220;missing data&#8221; you will be barraged with complicated statistical techniques. That&#8217;s par for the course, but none of these sites ever seem to tell you how to actually implement these techniques in the real world with real data. So I&#8217;ll give it a try.</p>
<p><span style="text-decoration: underline;"><br />
<strong>Not that it&#8217;s best, but this is the approach that I use:</strong></span></p>
<ul>
<li>If less than 5% of data points are missing, I use plain old <em>Listwise Deletion</em>.</li>
<li>If less than 10% of data points are missing, I use <em>Mean Imputation</em>. Yes, I know it artificially inflates central tendency and affects standard error. But since such a small amount of data is missing, I can live with it.</li>
<li>If more than 10% of data points are missing and I&#8217;m confident the missing data are MAR or MCAR, then I use <em>Multiple Imputation</em>.</li>
<li>If more than 10% of data points are missing and I believe the missing data are NMAR, then I pull out the big guns and use <em>Heckman Selection Modeling</em>.</li>
<li>If the missing variable is Race, I simply assign the record to the &#8220;Other&#8221; category and forget about it.</li>
<li>If the data are longitudinal and it&#8217;s not the first wave, then I just use the mean of the subject&#8217;s previous observations and forget about it.</li>
</ul>
<p><span style="text-decoration: underline;"><br />
<strong>And here&#8217;s how I go about it:</strong></span></p>
<p>For Listwise Deletion, I import the data in an Excel worksheet then use Autofill to select the blank records or variables.</p>
<p>For Mean Imputation, I use Excel&#8217;s AVERAGE function to calculate the mean and then use an IF statement to insert that mean into all missing records. If your dataset has tens of thousands of rows, this can be excruciatingly slow in Excel. So you should know in those cases it can be done faster in SAS with PROC MEAN.</p>
<p>For Multiple Imputation, I use SAS&#8217; PROC MI function to run the imputations and its PROC MIANALYSE function to calculate the summary statistics on the regressors. I always run five sets of imputations because studies by very smart people have shown this is enough. Excel gurus know that the summary statistics can be done in Excel by searching for the text string <em>_imputation_</em> (including the underscores) and separating each imputation into a separate worksheet. The string <em>_imputation_</em> is the delimiter that PROC MI inserts into the dataset.</p>
<p>For Heckman Selection Modeling, I use Stata&#8217;s HECKMAN command, because it lets you choose between maximum likelihood and two-stage estimation. I always use the two-stage option, although there&#8217;s probably some complicated rule somewhere in the manual explaining when to use which. SAS lovers should know that SAS (version 9 and later) can do Heckman estimation with its PROC QLIM function, but keep in mind it&#8217;s limited to the maximum likelihood version of the model. Either way, people who don&#8217;t know about these functions will be dazzled with your skills. And it sounds fabulous when the footnotes of your survey say something like, &#8220;Missing values were imputed using two-stage Heckman Correction Estimation.&#8221;</p>
<p><span style="text-decoration: underline;"><br />
<strong>For what it&#8217;s worth:</strong></span></p>
<p>If somebody asks me why I use Multiple Imputation over alternate methods, I just say, &#8220;If it&#8217;s good enough for the Census Bureau, then it&#8217;s good enough for me.&#8221; Then I walk away and go to Starbucks.</p>
<p>If somebody asks me why I use Heckman Selection Modeling, I just say, &#8220;If it&#8217;s good enough for the Nobel Prize Committee, then it&#8217;s good enough for me.&#8221; Then I turn my iPod back on and spin back towards my computer screen.</p>
<p>I&#8217;ve heard that Stata&#8217;s ICE command is better than SAS&#8217;s PROC MI function, but I&#8217;m so accustomed to using SAS and Excel for this that I&#8217;ve never tried it.</p>
<p>I know I won&#8217;t win any goodwill points from the American Statistical Association for saying this, but unless your missing data could affect something really important like, say, nuclear missile targeting, I wouldn&#8217;t lose sleep over the choice of technique you choose because <strong><em>every one of them</em> </strong>has legitimate weaknesses and the probability that no one really cares is 99.99%.</p>
<p>Jamel Cato<br />
The Blue Collar Data Analyst<br />
2007</p>
]]></content:encoded>
			<wfw:commentRss>http://jamelcato.com/missing-data-techniques-for-dummies/feed/</wfw:commentRss>
		</item>
		<item>
		<title>LibraryThing - My New Thing</title>
		<link>http://jamelcato.com/librarything-my-new-thing/</link>
		<comments>http://jamelcato.com/librarything-my-new-thing/#comments</comments>
		<pubDate>Tue, 26 Jun 2007 19:42:59 +0000</pubDate>
		<dc:creator>jamel</dc:creator>
		
		<category><![CDATA[Jamel Cato]]></category>

		<category><![CDATA[LibraryThing]]></category>

		<category><![CDATA[Life on the Web]]></category>

		<category><![CDATA[Bibliophiles]]></category>

		<category><![CDATA[GoodReads.com]]></category>

		<category><![CDATA[Shelfari]]></category>

		<category><![CDATA[Social Networks for Book Lovers]]></category>

		<guid isPermaLink="false">http://jamelcato.com/librarything-my-new-thing/</guid>
		<description><![CDATA[A few days ago I discovered LibraryThing, a social networking site for booklovers. Think Myspace for bibliophiles. What a good idea.  I don&#8217;t know why I didn&#8217;t think of it. Envy aside, I&#8217;m surprised—amazed really—that the site has been around for two years and this is the first time I&#8217;ve come across it. Google [...]]]></description>
			<content:encoded><![CDATA[<p>A few days ago I discovered <a href="http://www.librarything.com">LibraryThing</a>, a social networking site for booklovers. Think Myspace for bibliophiles. What a good idea.  I don&#8217;t know why I didn&#8217;t think of it. Envy aside, I&#8217;m surprised—amazed really—that the site has been around for two years and this is the first time I&#8217;ve come across it. Google has some explaining to do.</p>
<p><span id="more-6"></span></p>
<p>Here are the things I like:</p>
<ul>
<li>You can sign-up with just a username and password. No email required. No forms to fill out.</li>
<li>It&#8217;s easy to find people with similar literary interests.</li>
<li>The excellent search and browsing features.</li>
<li>The best use of tagging on the Internet.</li>
<li>The recommendation engine is way better than Amazon&#8217;s version.</li>
<li>The Zeitgeist page, which features every imaginable statistic about the LT community. As a data analyst, I have to love this. Some of the statistics are remarkable, like the guy with over 14,000 books in his library. One guy. 14,000 books.</li>
</ul>
<p>Here are the things I don&#8217;t like:</p>
<ul>
<li>It can be a lot of work, at least initially. Part of the reason this post doesn&#8217;t link to my LT profile is because I haven&#8217;t finished inputting the hundreds of books I own.</li>
<li>The site has a lot of features, but there&#8217;s no Help section. And the FAQ page is almost impossible to find.</li>
<li>While I&#8217;m enamored with the idea of networking with fans of my favorite authors/books, I&#8217;m not sure I want to let the whole world know what&#8217;s in my personal library in order to do that. I realize I could take the Spiderman approach and hide my identity behind a clever username like <em>Not-Jamel-007</em>, but then I can&#8217;t link my LT profile to any site that includes my real name—like this blog.</li>
</ul>
<p>Notwithstanding these few minor annoyances, I have to say that LibraryThing is my favorite site right now.</p>
<p>But it&#8217;s not a panacea. Over 200,000 members and only 5 of them thought <em>A Little Yellow Dog</em> was worthy of five stars. Go figure.</p>
]]></content:encoded>
			<wfw:commentRss>http://jamelcato.com/librarything-my-new-thing/feed/</wfw:commentRss>
		</item>
		<item>
		<title>How to do Percentile Ranking in Oracle</title>
		<link>http://jamelcato.com/how-to-do-percentile-ranking-in-oracle/</link>
		<comments>http://jamelcato.com/how-to-do-percentile-ranking-in-oracle/#comments</comments>
		<pubDate>Tue, 26 Jun 2007 15:58:32 +0000</pubDate>
		<dc:creator>jamel</dc:creator>
		
		<category><![CDATA[Data Analysis]]></category>

		<category><![CDATA[Jamel Cato]]></category>

		<category><![CDATA[Oracle]]></category>

		<category><![CDATA[CUME_DIST]]></category>

		<category><![CDATA[Oracle script]]></category>

		<category><![CDATA[PERCENT_RANK]]></category>

		<guid isPermaLink="false">http://jamelcato.com/how-to-do-percentile-ranking-in-oracle/</guid>
		<description><![CDATA[Recently I had to provide a script to convert a dataset of raw assessment scores into an Oracle table with the scores ordered by percentile rank. This is a common request so I figured a short, non-technical post on percentile ranking might be helpful to a lot of people.
There are three main ways to calculate [...]]]></description>
			<content:encoded><![CDATA[<p>Recently I had to provide a script to convert a dataset of raw assessment scores into an Oracle table with the scores ordered by percentile rank. This is a common request so I figured a short, non-technical post on percentile ranking might be helpful to a lot of people.</p>
<p>There are three main ways to calculate percentile ranks in Oracle 9i or later:</p>
<p>(a) Calculate them manually with Joins and Sub-queries;<br />
(b) Use the CUME_DIST function and format the results as a percentage;<br />
(c) Use the PERCENT_RANK function.</p>
<p>In this post I&#8217;m going to focus on options (b) and (c) because the only people who would reinvent the wheel and use (a) are SQL programmers who don&#8217;t know (b) and (c) already exist.</p>
<p><span id="more-5"></span></p>
<p>CUME_DIST and PERCENT_RANK are built-in Oracle mathematical functions that allow you to rank a value based on its relative standing within a set of values. For example, if you hear someone say that a 1600 SAT score was in the 99th percentile (meaning 99% of all the other scores in that administration of the test were lower) a ranking formula is what tells you so.</p>
<p>The two functions take different approaches to determining a percentile rank and you should understand the basic difference.</p>
<p>CUM_DIST determines a percentile rank by calculating the ratio of the number of rows that have a lesser or equal ranking to the total number of rows in the partition.</p>
<p>PERCENT_RANK determines a percentile rank by setting the lowest value (that is, the first row returned by the query) equal to 0 and assigning all the remaining rows with this formula:</p>
<p align="center">(n-1)/(m-1) where n is the nth row in a partition of m records.</p>
<p>There are other differences between the two functions but a detailed review is way beyond the promised scope of this post. However, you will do well to remember four points:</p>
<ul>
<li>The two functions are similar, but (unlike many books lead you to believe) they are not identical. That&#8217;s why they return different answers when you use them side-by-side in the same query.</li>
<li>CUME_DIST returns a <em><strong>position</strong></em> of a row and PERCENT_RANK returns a <em><strong>rank</strong></em> of a row.</li>
<li>CUME_DIST always excludes 0 and PERCENT_RANK always includes it.</li>
<li>In most cases where your goal is to rank records by percentile, PERCENT_RANK is the function you want to use.</li>
</ul>
<p>Both functions can be used in two forms: aggregate or analytic. Use the aggregate form when you want to find the percentile rank of <em><strong>one particular recor</strong><strong>d</strong></em> in the database according to some criteria you specify. Use the analytic form when you want to find the percentile ranks of <em><strong>a group of records</strong></em> in the database. You can spot the aggregate form because the SELECT statement will contain a WITHIN GROUP outer table join. The analytic form will use a PARTITION BY clause instead.</p>
<p>The aggregate form uses this syntax:</p>
<p><code>PERCENT_RANK (expression) WITHIN GROUP<br />
(ORDER BY order_by_clause [ASC|DESC] [NULLS FIRST|LAST] );</code></p>
<p>Here&#8217;s a simple example:</p>
<p><code>SELECT PERCENT_RANK (100000000, 1000000) WITHIN GROUP (ORDER BY total_gross, star_salary) "Percentile Rank" from movies WHERE movie_year IN ('2006');</code></p>
<p>The above SQL statement will return (from a table called Movies) the percentile rank of a particular 2006 movie that grossed $100 million and paid its starring actor $1 million.</p>
<p>The analytic form uses this syntax:</p>
<p><code>PERCENT_RANK () OVER<br />
([PARTITION BY query_partition_clause] ORDER BY order_by_clause);</code></p>
<p>Here&#8217;s an example of that:</p>
<p><code>SELECT movie_name, movie_year, movie_type, total_gross,<br />
PERCENT_RANK () OVER (PARTITION BY movie_type<br />
ORDER BY total_gross DESC) "Percentile Rank"<br />
FROM movies<br />
WHERE movie_year IN ('2006');</code></p>
<p>The above SQL statement will return a table listing the percentile rankings of all movies released in 2006 according to their total box office gross.</p>
<p>As a closing note, remember that when using the aggregate form the number and datatypes of expressions inside the first parenthesis must match the number and datatypes of expressions inside the second parenthesis.</p>
<p>Now go rank some data.</p>
]]></content:encoded>
			<wfw:commentRss>http://jamelcato.com/how-to-do-percentile-ranking-in-oracle/feed/</wfw:commentRss>
		</item>
	</channel>
</rss><!-- Dynamic Page Served (once) in 0.505 seconds -->
