Measuring the Duffy Effect

Yes, yes, it has all been a lot of fun playing with the Duffy soundboard. Had I known it was going to blow up like that I would have made a nicer webpage. Alas. Since it started one comment by David Akin has stuck with me though - was Duffy's video pitch that effective of a fundraiser?

Let's use some open data to test a hypothesis.

What is known:

  • In 2009 the Conservative Party issued a fundraising email pitch using the Mike Duffy videos.
  • The campaign included at least 864 unique first names. It's possible there were more names recorded for the campaign. This unknown factor could influence the validity of the reports below.
  • The first names of all donors to all parties for 2007 to 2011.

Hypothesis:

  • If the Duffy video campaign was effective, we should expect to find a relative increase in donations to the CPC starting in 2009 from people who's first name is one of the 864, relative to previous years.
  • Donations to the other federal parties should not fluctuate based on the "Duffy effect".

Great hypothesis? Maybe. Now, to the data.

Election Canada makes every single donation above $200 available online and I've been collecting them since 2007. For each donation Elections Canada provides the date, amount, name, city, province and postal code. From 2007 to 2011 inclusive that adds up to 658,075 donation records (I'm missing NDP 2011 results - the todo list never gets there).

Here's a sample of the raw contribution data. I've masked people's last names and postal code just because.

mysql> select party,substring_index(name,' ',1) first,m1 
amount,city,prov,'-withheld-' as postal,donationdate from rawcontribution order by rand() limit 10;
+---------+------------+---------+----------------+------+------------+--------------+
| party   | first      | amount  | city           | prov | postal     | donationdate |
+---------+------------+---------+----------------+------+------------+--------------+
| cpc     |            | 35.00   |                |      | -withheld- | 2005-04-18   |
| liberal |            | 1000.00 |                |      | -withheld- | 2004-11-30   |
| ndp     | Anne-Marie | 35.00   | CALGARY        | AB   | -withheld- | 2007-07-15   |
| ndp     | Joshua     | 349.00  | AURELE         | ON   | -withheld- | 2012-04-24   |
| cpc     |            | 40.00   |                |      | -withheld- | 2005-05-17   |
| cpc     |            | 50.00   |                |      | -withheld- | 2005-10-11   |
| cpc     | E          | 100.00  | CALGARY        | AB   | -withheld- | 2008-06-19   |
| liberal |            | 210.00  |                |      | -withheld- | 2006-11-22   |
| cpc     | André      | 20.00   | QUEBEC         | QC   | -withheld- | 2010-09-17   |
| cpc     | Lois       | 100.00  | WEST VANCOUVER | BC   | -withheld- | 2010-05-19   |
+---------+------------+---------+----------------+------+------------+--------------+

I will spare you the "SQL". The short story is I reduced this data to the total donations to each party, by year, by first name. I then reduced it again by checking each name against the "Duffy list". Finally we get this table. If the name was on the "duffy list" then it has a value of 1 in the last column. Otherwise it gets a zero:

mysql> select * from donationsbyduffyname order by rand() limit 10;
+------+---------+----------+-------+-----------+
| year | party   | amount   | count | duffyname |
+------+---------+----------+-------+-----------+
| 2008 | cpc     |     1915 |    17 |         0 |
| 2007 | ndp     |      240 |    12 |         0 |
| 2011 | cpc     | 20698.34 |   105 |         1 |
| 2010 | ndp     |     1940 |    37 |         1 |
| 2011 | cpc     |     5175 |    64 |         1 |
| 2012 | ndp     |      349 |     1 |         0 |
| 2009 | green   |      885 |    36 |         1 |
| 2011 | liberal |     1100 |     3 |         0 |
| 2009 | ndp     |      270 |     5 |         0 |
| 2008 | cpc     |      250 |     1 |         0 |
+------+---------+----------+-------+-----------+

You will note that the data tracks all parties and the "Duffy Name" status. If my sitting-on-the-couch statistics thinking is right, this is a reasonable control group for testing the efficacy of the Duffy campaign. If it worked well, donations to the CPC from Klaus, or Sandy, or Sue should spike relative to non-Duffy names - but remain stable for the other parties.

The data is available in this Google doc if you can't see the embedded information below.

I'll explain the 2008 Conservative Party data in case I haven't been clear up to this point. In 2008 the CPC:

  • received $2,748,529 from donors who's first name is not on the "Duffy video" list. They made a total of 18534 donations.
  • received $9,936,884 from donors on the Duffy video list. They made a total of 64131 donations.
  • donors on the list gave an amount 3.62 times more than non-names (the "Duffy Ratio").
  • donors on the list made a donation (of any amount) 3.46 more times than non-names.

It is worth noting that in each year, the CPC raised 78% of it's money from people with names on the Duffy list. (Hold that thought).

The next chart plots the "DuffyAmountRatio" by year for each party. 

And for reference, the total donations to the CPC for the same period:

Duffy Got Game?

Was it worth Senator Duffy's time to record all those videos? At a glance, it looks like no. The ratio of donations from someone on the "Duffy list" vs. not remains stable for the CPC at 3.6-to-1 over five years. Whether they got a Duffygram or not, it didn't affect the likelihood they would donate in 2009, or years after that.

It's interesting that Duffy recorded 864 videos and that just happens to match 78% of their yearly donations (in all years). Given that the video list I compiled is incomplete I wouldn't be surprised if the number of videos actually matches the 80% aggregate donation mark exactly.

The 80/20 rule feels like a natural guess. To record a video for every donor would be a lot more work for diminishing returns. Here the count of unique first names of CPC donors, just for kicks:

+------+------------+
| year | firstNames |
+------+------------+
| 2007 |       2585 |
| 2008 |       3382 |
| 2009 |       2860 |
| 2010 |       2867 |
| 2011 |       3654 |
+------+------------+

Conclusion

I don't really have one. The data I scraped out is not perfect. My analysis probably has a gaping hole in it.

But on the chance I haven't screwed this up, it looks like the Duffygrams didn't bring in any more money than the CPC would have brought in anyway.

Data is fun.