Omni test for statistical significance

Rplot13In survey research, our datasets nearly always comprise variables with mixed measurement levels – in particular, nominal, ordinal and continuous, or in R-speak, unordered factors, ordered factors and numeric variables. Sometimes it is useful to be able to do blanket tests of one set of variables (possibly of mixed level) against another without having to worry about which test to use.

For this we have developed an omni function which can do binary tests of significance between pairs of variables, either of which can be any of the three aforementioned levels. We have also generalised the function to include other kinds of variables such as lat/lon for GIS applications, and to distinguish between integer and continuous variables, but the version I am posting below sticks to just those three levels. Certainly one can argue about which tests are applicable in which precise case, but at least the principle might be interesting to my dear readeRs.

I will write another post soon about using this function in order to display heatmaps of significance levels.

The function returns the p value, together with attributes for the sample size and test used. It is also convenient to test if the two variables are literally the same variable. You can do this by providing your variables with an attribute “varnames”. So if attr(x,”varnames”) is the same as attr(y,”varnames”) then the function returns 1 (instead of 0, which would be the result if you hadn’t provided those attributes).


#some helper functions


xc=function(stri,sepp=" ") (strsplit(stri, sepp)[[1]]) #so you can type xc("red blue green") instead of c("red","blue","green")

#now comes the main function
if(length(unique(v1))<2 | length(unique(v2))<2) p else {

havevarnames=!is.null(attr(v1,"varnames")) & !is.null(attr(v2,"varnames"))
notsame=T; if (havevarnames)notsame=attr(v1,"varnames")!=attr(v2,"varnames")
if(!havevarnames) warning(paste("If you don't provide varnames I can't be sure the two variables are not identical"),attr(v2,"label"),attr(v2,"label"))
if(notsame | !havevarnames){

if(min(length(which(table(v1)!=0)),length(which(table(v2)!=0)))>1) {
if(level1=="str") level1="nom"
if(level2=="str") level2="nom"

# if(attr(v1,"ncol")==2 & attr(v2,"ncol")==9)
if(level1 %in% xc("nom geo") & level2 %in% xc("nom geo")) {if(class(try(chisq.test(v1,v2,...)))!="try-error"){
p=pp$p.value;attr(p,"method")="Chi-squared test"
}else p=1

else if(level1=="ord" & level2 %in% xc("nom geo"))
} else {
attr(p,"method")="Kruskal test"

else if(level1 %in% xc("nom geo") & level2=="ord")
} else {
attr(p,"method")="Kruskal test"

else if((level1=="ord" & level2=="ord") | (level1=="ord" & level2=="con") | (level1=="con" & level2=="ord")) {if(class(try(cor.test(as.numeric(v1),as.numeric (v2),method="spearman",...)))!="try-error") {pp=cor.test(as.numeric(v1),as.numeric (v2),method="spearman",...);p=pp$p.value;attr(p,"method")="Spearman rho.";attr(p,"estimate")=pp$estimate} else cat("not enough finite observations for Spearman")}

else if( level1=="con" & level2 %in% xc("nom geo")) {
pp=anova(lm(as.numeric(v1)~v2));p=pp$"Pr(>F)"[1];attr(p,"estimate")=pp["F value"];attr(p,"method")="ANOVA F"
}else p=1}

else if( level1 %in% xc("nom geo") & level2 %in% xc("con")) {
pp=anova(lm(as.numeric(v2)~v1));p=pp$"Pr(>F)"[1];attr(p,"estimate")=pp["F value"];attr(p,"method")="ANOVA F"
}else p=1}

else if( level1=="con" & level2 %in% xc("ord")) {
pp=anova(lm(as.numeric(v1)~v2));p=pp$"Pr(>F)"[1];attr(p,"estimate")=pp["F value"];attr(p,"method")="ANOVA F"
}else p=1}

else if( level1=="ord" & level2 %in% xc("con")) {
pp=anova(lm(as.numeric(v2)~v1));p=pp$"Pr(>F)"[1];attr(p,"estimate")=pp["F value"];attr(p,"method")="ANOVA F"
}else p=1}

##TODO think if these are the best tests
else if(level1=="con" & level2=="con")
# ;
attr(p,"method")="Pearson correlation"


# else if(level1=="str" | level2 =="str") stop(P("You are trying to carry out stats tests for a string variable",attr(v1,"varnames")," or ",attr(v2,"varnames"),". You probably want to convert to nominal."))
else {p=1
} else {p=1;attr(p,"N")=sum(!} #could put stuff here for single-var analysis


## now let's try this out on a mixed dataset. Load mtcars and convert some vars to ordinal and nominal.




Building a custom database of country time-series data using Quandl

Encouraged by this post I had another look at quandl for collecting datasets from different agencies. Right now I need to get data for four countries on a couple of dozen indicators.


This graphic is just a quick example with only two indicators of what I am aiming to be able to do.

The process on Quandl at the moment is a bit fiddly:

  • there is no search function in the API
  • the country codes used are different from agency to agency

So my workflow is this. It isn’t as complicated as it sounds. I have used spreadsheets to store country codes and queries to make it all as re-useable as possible. You can download the spreadsheets here and here.

  • edit the csv spreadsheet of the 2-and 3-digit ISO country codes, plus the actual names. Also, WHO for some reasons uses some other codes which I had to paste in by hand. If you find your sources are also using yet other codes, you can add them to the spreadsheet. Put an x in the “enabled” column to mark the countries you want to use.
  • search manually at quandl for interesting queries and add them to the other csv spreadsheet, replacing the country code with %s, again putting an x in the “enabled” column for the queries you want, adding a human-readable title in the “title” column if you want and putting “alpha2″ or “alpha3″ etc in the country_sign column to mark which kind of country code is being used.
  • run the script below.











for(qq in 1:nrow(queries)){


for(cc in 1:nrow(codesE)){












rr$Value=ifelse(!$Value),rr$Value,rr$Percent) #you might have to do something like this if your queries are returning data in columns with some other label than Value

#then try a graphic for demonstration purposes

ggplot(data=rr,aes(x=Year,y=Value,group=Country,colour=Country))+geom_point(size=3)+geom_line()+facet_grid(Indicator~.,scales=”free”)+ theme(strip.text.y = theme_text(size = 13, hjust=0,angle = 0))+theme(axis.text.x=element_text(angle=90))

And voila.

I wanted to put the spreadsheets as a google spreadsheet but it seems RGoogleDocs is not working for R 3.0.

Changing figure options mid-chunk (in a loop) using the pander package.

I wrote already about changing figure options mid-chunk in reproducible research. This can be important  e.g. if you are looping through a dataset to produce a graphic for each variable but the figure width or height need to depend on properties of the variables, e.g. if you are producing histograms and want the figures to be a bit wider when there are more bins.

That previous post was about knitr, but at the moment I am using the pander package more than knitr because it makes some things simpler. Changing figure options is a case in point.

Here is the output:

Varying widths for graphs in a loop using the pander package

Results for: mpg

Anything you type here will be inside the same paragraph as the figure and so works like a pseudocaption

Results for: cyl

Anything you type here will be inside the same paragraph as the figure and so works like a pseudocaption


And here is the code:

Varying widths for graphs in a loop using the pander package
<% for (varn in names(mtcars[,1:2))   { %><%=
pandoc.header.return(paste(“Results for: “,varn,”"),3)
fac=(100*log(length(unique(var))))#calculate some factor to ensure somewhat wider graphs for more bins

%><% evals.option(“width”,50+fac) %><%= #have to break out of the BRCODES to change the height options for the next chunk
</br>Anything you type here will be inside the same paragraph as the figure and so works like a pseudocaption<%
#   coord_flip()
%><% } %>

Oh, and to make it all happen:




Haiti: Request for Qualifications for research teams to conduct an impact evaluation of the Integrated Neighborhood Approach (INA)

Very proud and happy to see that this idea, which we developed while I was in Haiti with the IFRC, is nearing fruition:
3ie will be issuing a Request for Qualifications for research teams to conduct an impact evaluation of the Integrated Neighborhood Approach (INA) which aims to build resilient urban communities which are safer, healthier and living in an improved habitat.

knitr: Changing chunk options like fig.height programmatically, mid-chunk

Knitr is a great tool for doing reproducible research.
You can produce all kinds of output inside a single knitr chunk, e.g. you can write a loop to produce lots of figures or tables. The only catch is if you want your figures to have differing captions, heights, etc (and usually you do). The standard way is to write a separate chunk for each figure and set the options in the chunk header. So you can’t produce several differing figures from inside one chunk.
Or can you?

This works for me, based on hints from Yui on github.


opts_knit$set(progress = F, verbose = F)
opts_chunk$set(comment=NA, warning=FALSE,message=FALSE,fig.width=6, echo=F)



Warning: wordpress is eating some of my <<>>. Make sure your chunks are formed with the usual chunk syntax.

So one key thing is to set progress and verbose to F otherwise they destroy the output. Then the little function kexpand expands an inline template which is typed as text as part of the function. Then you can define your plot as .q and your caption as cap, and your heights etc. You could adapt the function to control other options. Strangely, .q doesn’t have to be an argument for the function, you can just set it in the current environment and it gets picked up by the function anyway. Don’t know if this is good practice or why it works but it does.

I just posted this same trick in response to a question on stackoverflow. Let’s see if it gets accepted.

Update: added argument for figure height.

Wild, red goose-herrings: Should we demand a more empirical approach to assessing the impact of humanitarian interventions?

Here is part of an interesting email exchange with colleagues from the world of humanitarian relief: do humanitarians have a responsibility to drop their resistance to theoretical underpinning and and empirical assessment of resilience programs.

I would agree in general that we need to have the courage to theorize about and empirically evaluate humanitarian initiatives. But we need to break out these impact questions in a way in which e.g. Realistic Evaluation suggests, i.e. “what works for whom in what contexts”. Because in e.g. medical research, you can hope for an overall result that drug X works for Y, and then you can do additional research for more specific cases. But in the development and humanitarian world it seems like there is no such thing. Good applications of a program or principal like “resilience”, even were it to be theoretically well defined and have a tested set of indicators, will *always* involve detailed knowledge of the specific context and has to be constructed *for* that context and not just adapted to it. Even programs like cash grants for education, which seem to be the best candidates for matching the medical model of “this just works and basta”, give mixed and confusing results in meta-analysis. Even though the donors you mention would, understandably, dearly love to have “it just works” interventions. I think the best we can hope for is conclusions like “tools out of this intervention toolbox *can* produce good results under certain circumstances and when measured with some of the instruments from this instruments toolbox”. Kind of a distributed model of evidence.

What is exciting is that one of the conditions of successful construction of relief and development interventions in specific cases is, I believe, actually the empirical spirit, a genuine interest in learning about impact (and especially in learning about “what we are doing wrong”, “what are we blind to”, “what are the ‘beneficiaries’ thinking but not saying”).

So: Yes, more empiricism is needed, but the empiricist inspiration is most useful as daring curiosity and thirst for learning at the lowest levels of intervention, i.e. individual program components, where it can make the difference between success and failure, than as a chase for wild, red goose-herrings at the World Bank level, aka “programs that just work”.

Rstudio, not vim, for reports

I just wasted a dozen hours scattered over several days testing out vim instead of r-studio for when I am writing longer reports with a lot of text in markdown as well as code. My motivation was this:

  • I wanted my source file and the R console side-by-side vertically to give more room to the source code without squashing the console
  • I wanted proper code folding of my markdown text, getting as close as possible to the holy grail of outliners, which is still, incredibly, Microsoft Word’s outline view.

So I went through all the hassle of installing vim (and gvim for a more graphical approach) on my ubuntu box. I got R integration working OK in gvim, Tried voom for additional outlining support but it didn’t work properly for me. Tried various vim plug-ins to get syntax highlighting and folding to work in vim, such as Managed to get ftplugin to work OK on .md files but not on .Rmd files, which is what I need for R.  Likewise, I couldn’t get R integration to work for .md files.

More importantly though, I just don’t see that these kinds of clunky folding solutions come anywhere near the outline dream of Word (or freeplane for that matter). You can’t promote, demote sections, move up and down, etc. Not worth changing my whole workflow away from Rstudio.

Then I took a minute to read the documentation and changelog for Rstudio and realised that the newer versions have support for sections in code which work outside code chunks, and are sort-of compatible with markdown. So at least I can expand and collapse all markdown sections, which is pretty good. Plus, I realised that you can indeed move the console to its own vertical pane (perhaps you could always do this and I just missed it).

So here I am back with Rstudio and pretty happy all round.


The five-year plan is dead, long live the five-year plan!

Interesting discussion taking place on the XCEval mailing list. Deborah Rugg, UNEG Chair and OIOS IED Director, posted selective highlights of final resolution 67/226, adopted by the General Assembly on 21.12.12 on results-based management (RM) and evaluation.

Bob Williams pointed out the irony that this doesn’t seem to take into account a recent, major evaluation of RBM (summary) which didn’t trash RBM as such, but noted some serious forces in the system which almost doom it to fail, e.g. not relaxing other, already strenuous reporting requirements.

Personally my heart sank the most at the glassy-eyed Stalinism of “clear and robust results frameworks that demonstrate complete results chains that establish expected results at the output, outcome and impact levels”, not qualified by any mention of flexibility, revision, adaptation or learning.

Here is the summary of the Resolution:

F. Results-based management

164. Affirms the importance of results-based management as an essential element of accountability that can contribute to improved development outcomes and the achievement of the Millennium Development Goals and the internationally agreed development goals;

167. Recognizes progress in improving transparency, and calls for further efforts to ensure coherence and complementarity in the oversight functions, audit and evaluations across the United Nations development system;

170. Requests the United Nations development system to promote the development of clear and robust results frameworks that demonstrate complete results chains that establish expected results at the output, outcome and impact levels and include measurable indicators with baselines, milestones and targets for monitoring, and in this regard requests the United Nations funds and programmes, and encourages the specialized agencies, to consult Member States during the production of results frameworks of their respective strategic plans, and report annually on implementation from 2014;

G. Evaluation of operational activities for development

173. Emphasizes the importance for organizations of the United Nations development system of having independent, credible and useful evaluation functions, with sufficient resources, and promoting a culture of evaluation that ensures the active use of evaluation findings and recommendations in policy development and improving the functioning of the organizations;

174. Calls upon members of the United Nations development system to further increase institutional and organizational capacity for the evaluation of operational activities for development and to increase training and skills-upgrading in results-based management, monitoring and evaluation methods, as well as to ensure the effective utilization of findings, recommendations and lessons learned in programming and operational decision-making, and requests the funds and programmes and the specialized agencies to develop evaluation plans that are aligned with new strategic plans and are an integrated part of monitoring systems;

176. Reaffirms the need to strengthen independent and impartial system-wide evaluation of operational activities for development;

177. Notes, in this regard, the findings and recommendations of the independent review commissioned by the Secretary-General in response to General Assembly resolution 64/289 on a comprehensive review of the existing institutional framework for the system-wide evaluation of operational activities for development of the United Nations system, and in this regard reaffirms that further strengthening of system-wide evaluation within the United Nations development system should be based on utilizing and enhancing existing mechanisms;

178. Encourages the enhanced coordination and exchange of experience among the United Nations entities engaged in system-wide evaluation of operational activities for development, namely, the Joint Inspection Unit, the United Nations Evaluation Group, the Office for the Coordination of Humanitarian Affairs, the Office of Internal Oversight Services and the Department of Economic and Social Affairs;

179. Notes that the Joint Inspection Unit is the only entity within the United Nations system with a specific mandate for independent system-wide evaluation, and acknowledges the reforms initiated by the Unit;

180. Also notes the development of the norms and standards for evaluation by the United Nations Evaluation Group as a professional network, and encourages the use of these norms and standards in the evaluation functions of United Nations funds, programmes and specialized agencies, as well as in system-wide evaluations of operational activities for development;

181. Requests the Secretary-General to establish an interim coordination mechanism for system-wide evaluation of operational activities for development of the United Nations system composed of the Joint Inspection Unit, the United Nations Evaluation Group, the Department of Economic and Social Affairs, the Office for the Coordination of Humanitarian Affairs and the Office of Internal Oversight Services, and also requests the Secretary-General, through the interim coordination mechanism, to develop a policy for independent system-wide evaluation of operational activities for development of the United Nations system, including submitting a proposal for pilot system-wide evaluations, for discussion at the operational activities segment of the Economic and Social Council in 2013;

RStudio and TeXworks working great together

Just now writing a reproducible report in R using RStudio on Ubuntu. So the source is a .Rnw file and I am compiling it with knitr. For the narrative part of the report it is a shame that RStudio doesn’t have autocomplete for latex styles, headings etc. But I just realised that it is possible to have the same .Rnw file open in RStudio and TeXworks at the same time. As soon as you save changes in one they are instantly updated in the other. So you can do your statistics in RStudio and just flip to the TeXworks window to do the narrative sections. Cool!
Update: the same thing works TexStudio which is also available for Ubuntu and has a useable outline view. So even better.
Screenshot from 2013-04-16 11:26:20

Behaviour change through fun

This is a nice video but like any other intervention, it raises the question of how sustainable the changes are.
Are interventions which work because they are fun less likely to be sustainable than other interventions?

Multi-stage sampling together with hierarchical/ mixed effects models: which packages?

Dear R experts,
I sent this question to the r-help list but didn’t get much response, probably because it is more of a stats question. But as this blog is syndicated on r-bloggers I thought I would try it again here on this blog. If I am barking up the wrong tree, feel free to flame.

When I have to analyze educational datasets with samples of children from samples of schools and which include sampling weights, I use the survey package e.g. to calculate means and confidence intervals or to do a linear model. But this kind of design (e.g. children nested inside schools) also as I understand it requires looking at the mixed effects. But this isn’t possible using the survey package. Perhaps I am better advised to use nlme – I guess I could use the sample weights as predictors in nlme regressions but I don’t think that is correct.

It seems that this kind of design (in fact any stratified survey sample which includes nested levels) needs analysing from both perspectives – (survey weights and mixed effects) at once – but the packages of choice for each of these perspectives, survey and nlme, each don’t seem to have slots for the other perspective.

If someone could put me on the right track I could be more specific with reproducible examples etc

Best Wishes
Steve Powell

Horizon plots with ggplot (not)

The Timely Portfolio blog via R-bloggers has recently published some interesting entries about the value of horizon plots for visual comparison of a number of time series.

Very nice it looks too. You can read more about them here. The trick to understanding them is to imagine that each row was orginally a line chart of six times the height. First colour the area between the origin and the line so that dark reds are very negative and dark blues very positive. Then, for each row, slice the chart into six horizontal slices and lay them on top of one another. That way you save a lot of vertical space so comparisons are easier. Dark red still means very negative, etc. The vertical scale is the same for each chart.

This one was done using the latticeExtra package in R. I couldn’t figure out how to do them in ggplot. It was trivially easy to do a normal line chart and add a coloured background:

m$variable=gsub('\\.',' ',m$variable)
ggplot(m,aes(date,0,fill=value))+geom_tile(aes(height=max(m$value)-min(m$value)))+geom_line(aes(x=date,y=value))+facet_grid(variable~.)+ scale_fill_gradient2(low="red",high="blue")+ylab("value")+ opts(strip.text.y=theme_text(angle=0, hjust=1))

That’s it.

Not as sophisticated, but actually the colours help quite nicely for comparison without the complexity of the horizon approach. It seems less exciting – but then perhaps the horizon plot overstates its case a bit. And it is trivial to understand, which the horizon plot isn’t.

What do you think, is the horizon plot worth the extra effort?
Is there an easy way to do horizon plots in ggplot?


Heatmap tables with ggplot2, sort-of

I wrote before about heatmap tables as a better way of producing frequency or other tables, with a solution which works nicely in latex.

It is possible to do them much more easily in ggplot2, like this

ggfluctuation(Pm,type="heatmap")+geom_text(aes(label=Pm$value),colour="white")+ opts(axis.text.x=theme_text(size = 15),axis.text.y=theme_text(size = 15))

Note that ggfluctuation will also take a table as input, but in this case P isn’t a table, it is a matrix, so we have to melt it using the reshape package.

Here is the output from the code above:

However, doing the marginal totals would be a bit of a faff like this.

Notice that this is statistically quite a different animal – unlike the previous version, the colours just divide the range of values. They are not indications of any kind of significant deviation from expected values. So they are less useful to the careful reader but on the other hand need no explanation.

Note also that ggfluctuation produces by default a different output

which is better in many ways. But it looks like a graphic, not a table, and the point of heatmap tables is you can slip them in where your reader expects a table and you don’t have to do so much explaining.



Annual Meeting of INURED in Port-au-Prince


Had the pleasure and honour to attend part of the Annual meeting of INURED, an inter-university research network in Haiti. Listened to a very interesting presentation about a national study of violence against children (a difficult enough study in any country) which applied questionnaires with a representative sample but also had a significant ethnographic component.

Using R for classification in small-N studies

Rick Davies just wrote an interesting post which combined thoughts on QCA (and multi-valued QCA or mvQCA) and classification trees with thoughts on INUS causation and classification trees.

The question was something like: how can we look at a small-to-medium set of cases (like a dozen or a hundred countries or development programs) and tease out which factors are associated with some outcome. In Rick’s example, he looked at some African countries to see which characteristics are associated with a higher percentage of women in parliament.

Over at, I wrote a little post to show an easy way for evaluators to do classification trees using the open-source statistic software R rather than the Rapid Miner and BigML tools which Rick used. The problem I address at the end is how we can be sure if parts of the resulting models are not spurious.

Impact, Outcome and INUS causes

Interesting discussion on the Outcome Mapping mailing list – here is something I just posted. This part of the discussion was about whether Outcome Mapping, which focusses on the contribution of an intervention to a development result rather than on deciding whether a result can be (entirely) attributed to an intervention.

I agree that both OM (and by extension, it seems Outcome Harvesting) as well as Impact Evaluation are both involved in making causal claims. And perhaps OM is just a bit more realistic about what kind of causal claims can be realistically made.

The recent DfID/Stern report on “Broadening the range of designs and methods for impact evaluations” (, p 41) nicely goes back to Mackie (1974) and INUS causes – an insufficient but necessary part of a condition that is itself unnecessary but sufficient for the occurence of the event – which is it seems often what we really mean when we talk about causes in real life.

So we can say some intervention is a contributory cause for some result if the intervention was a necessary part of a whole context full of other things which were happening; and this whole package of things was enough to make the result happen, but it might have happened other ways too. I guess this is roughly what OM means when it talks about contribution? If so, good. And if Impact Evaluation thinks it is really dealing with necessary & sufficient causation between intervention and result it is living in totally cloud-cuckoo land.

It isn’t really enough to just use the words “impact” or “outcome” with a certain kind of tension in our voices and hope that people understand which kind of cause we are thinking about. In my view the Stern report goes a long way to unpacking some of these things and is a worthwhile read. And the very title of that report seems to suggest that we should feel free to use the word “impact” for what OM claims, because it is (perhaps unfortunately, seeing as how we don’t have agreement on these things at the moment) more important to spell out each time what kind of claim we are making rather than just hope that our use of certain code-words will be understood by all – they aren’t.

Postwar Developments

Download the file: butollo_festschrift_powell

Book chapter just out in Begegnung, Dialog und Integration Festschrift zur Emeritierung von Prof. Dr. Willi Butollo
From the introduction:
In the middle of the 1992-5 war, Willi Butollo visited Bosnia & Herzegovina (B&H) in the effort to support local psychotherapists and counsellors who were working there.  This was the beginning of a relationship between Willi’s Lehrstuhl in Munich and the various Departments of Psychology in former Yugoslavia, a relationship which blossomed for the best part of ten years across several countries. The fruit of that intervention and that relationship are still with us.
In this contribution I would like firstly to give a brief overview of some of these cooperation and research activities. In the second part of this contribution I will then present one interesting but previously unpublished result from the research conducted during that decade. This comes from a strand of investigation which we had begun in order to complement the more classical approach to research on post-traumatic stress disorder (PTSD) which we had adopted in the published papers, which is more focused on problems associated with the post-war situation and which puts those published results in a different perspective.  The concept in question we named “ethno-political distress”. It covers aspects of the post-war situation which we believe to make a contribution to mental health and distress but which have not to date received sufficient attention.
So the title of this contribution, “postwar developments”, refers both to the development of clinical psychology in the countries of former Yugoslavia to which we tried to contribute, and to some problematic aspects of post-war development at the individual level.


Auto-send pdfs from zotero to your kindle and convert for easy reading

Lately I have been using my Kindle Touch for reading work-related pdfs in peace and quiet at home. So I faced three problems:

  1. most scientific pdfs are very hard to read on the small screen
  2. I use zotero and I want these pdfs to be included in my zotero literature list
  3. I want it all to happen wirelessly and automatically.

Here is my solution. It needs a Dropbox account, a gmail account and an iftt account. And a kindle account.

  • I add the pdf to zotero and add a unique code (in my case "xk", but anything unusual would do) to the filename of the pdf in zotero so the name might be e.g. "2345xk.pdf"
  • I added this line to my crontab on ubuntu (type crontab -e in a terminal)
    */6  * * * * find /home/myname/.mozilla/firefox/c42asdfeo9r.default/zotero/storage -name *xk*  -exec ln {} /path/to/my/Dropbox/Public/convert2kindle/ ;
  • What this does is, every 6 minutes, hardlink all the pdfs in the zotero storage folder with xk in the title into a subfolder of the public folder in my dropbox. I guess there is a similar way of doing this in Windows.
  • At iftt, I made a task which, when it finds a new pdf file in that Dropbox subfolder, sends an email to Amazon with Convert in the title and the pdf as an attachment. This triggers Amazon to convert the pdf and send it to the Kindle. The Convert option is really pretty good and makes the unreadable readable, if not beautiful.

All I need now is a script to re-import the highlights and notes I make on the each pdf on the Kindle back into Zotero. Now that sounds hard.

Simple network diagrams in R

Why study networks?

Development and aid projects these days are more and more often focussing on supporting networks, so tools to analyse networks are always welcome.

In this post I am going to present a very easy-to-use package for the stats program R which makes nice-looking graphs of these kinds of networks.

In a recent project for a client, one of the outcomes is to improve how a bunch of different local and regional organisations work together. The local organisations in particular are associated with one of three ethnicities, and one project goal is to encourage these organisations to work with one another other as peers.

One tool we used to look at this is the old friend of the educational psychologist, the sociogram. We made a numbered list of about 80 relevant local and regional organisations. Then we sent this list to each of the local organisations and asked them to list the five with which they communicated the most, the five with which they cooperated the most, and the five which they think make the biggest contribution to solving their collective problems.


Read more

Progress Markers, Boundary Partners and Item-Response Theory

There has recently been a discussion thread on the Outcome Mapping mailing list about progress markers and boundary partners. Briefly, a development project plans progress for/with the key people with whom it directly cooperates, called "boundary partners", in terms of their progress towards achieving key outcomes called "progress markers" which are grouped into "expect to see" "like to see" and "love to see" items. Often these items are visualised as steps up a ladder.

One contributor pointed out that one problem with these metaphors is that a boundary partner can reach a higher point before they reach a lower point. Also, some progress markers which are probabilistic or repetitive in nature might be met once, then not met e.g. the following month, then met again, e.g. the month after. So for example, our boundary partners might start off reaching all the "expect to see" goals like coming to meetings, then stop coming to meetings, then start again, while concurrently (perhaps surprisingly) achieving some of the higher progress markers.

These examples seem to break the ladder metaphor for how we visualise the progress markers. And in fact something very similar applies to the levels of a logical framework or results framework.

My feeling is that the ladder metaphor for progress markers is unhelpful not because it is a ladder but because it is implies that every boundary partner (or group of boundary partners) is always at one particular point on the ladder at any one time. So it cannot be improved by other spatial metaphors (journeys etc) with the same implication.

I would like to suggest one kind of model which does fit this situation pretty well, and that is the model of students succeeding or failing at a series of more or less difficult exam questions, for example a reading test. So for example suppose a class of children learning to read are given a list of (progressively more difficult) words to read aloud. Most likely, the weaker students will get some of the easier ones right and not many of the more difficult words, whereas the stronger students will get all of the easier ones right and some of the more difficult ones; but there will always be exceptions and surprises. So the likelihood of whether student X gets question Y right depends on both the student (who we can assess as being on a scale from weak to strong) and the questions (which we judge to be on a scale from easy to difficult).

From this perspective, the progress markers can indeed be put on a scale from expect to see / like to see /  love to see, according to how likely it is that an average boundary partner has achieved or is achieving them at some point in time; and we can say that the progress of the boundary partners towards the outcome challenge is some kind of combination of how well they are doing at achieving each of the progress markers  – just as we would judge a child's reading ability by combining their score on a whole bunch of reading words, perhaps giving a higher score for harder words.

One advantage of this approach is that the progress of boundary partners can still, after all, be expressed in terms of progress up a ladder, e.g. we could say that boundary partner X is around the level of progress marker Y, but we understand this to mean in detail that they are pretty likely to be achieving Y, and might be achieving some progress markers higher up the ladder, and should pretty certainly be achieving the progress markers below; the position on the ladder is a summary of how well they are doing or have been doing at all of these progress markers.

Of course in some situations, partner X might not (yet) be in a position to attempt some high-up marker Y1. Or they might be far beyond marker Y2 which they long ago finished with. But we can still ask ourselves how likely they would be to achieve each one if they were in fact faced with it.

This kind of test situation is the focus of both Rasch models and Item-Response theory, two related approaches in educational science. They both have well developed mathematical ways of dealing with these situations. And they also give us additional food for thought, for example that we don't need to decide in advance which are the expect, like and love to see markers because we could just wait to see how easy our partners find each of them to achieve. So from a bunch of test data (the answers by the students to the questions) we can use Rasch theory afterwards to work out not only where each student goes on a scale from weak to strong but also where each question goes on a scale from easy to hard. Also, IRT opens up the possibility that a bunch of progress markers cannot be put on one single dimension or ladder of difficulty, but that two or more are necessary – so for example some boundary partners might be doing well on progress markers connected with democratisation, but not so well on those which have more to do with communication skills.

Would be interested to hear if this fits with the way you look at progress markers, or indeed with any other way of conceptualising progress in terms of achieving progressively more difficult targets.

How to mindmap your zotero items


I love zotero as a reference manager. And I love mindmaps. In particular I love docear, which is built on best-of-breed opensource mindmapper called Freeplane. I usually mindmap at the start of the process of developing a project or an article, but sometimes in the later stages too. Now what docear does is it also builds in jabref, which is another great reference manager. This gives me the following cool functionality:

  • Drag a document reference from the jabref window in docear to add the reference to your map
  • If the reference contains a pdf file which you have annotated, you can automatically import all your annotations (and the document’s bookmarks) as child nodes. This is very useful! You can double-click on the parent node and the document opens; in some cases you can click on the child node and go straight to your comment in the pdf.

But it has taken me about a week to work out how to manage this process in detail. My requirements were these:

  • I use both academic references and pdfs but also (often large numbers of) plain unpublished pdfs and other documents related to projects I am working on. I need to be able to reorganise them from time to time into different subfolders and also import any annotations I have made in them.
  • If I reorganise my pdfs in different folders, the links in docear musn’t break.
  • I want to have access to the same bunch of references both from zotero and from docear.
  • I am on ubuntu linux.

How I did it:

  1. Installed Foxit reader under Wine as my pdf reader. This is great for annotating, and also supports opening the pdf at the correct page when you double-click on an imported annotation in docear. Unfortunately, docear can’t import highlights made in foxit.
  2. Agreed with myself that my primary way of making notes on a doc is with pdf annotations.
  3. Agreed with myself that in the case of unpublished project pdfs, I won’t keep separate copies in the filesystem at all. I just import them straight into zotero e.g. by dragging and dropping, and delete the originals. I use folders and subfolders in zotero to keep them organised. Zotero doesn’t change the location of the actual file when you do this, so links in docear stay pure.
  4. Installed autozotbib which instantly saves an updated bibtext export of the Zotero database as soon as you change it. (This will be slow with a big database though). Then I set it to export straight to the docear.bib database which is displayed inside docear. This is the trick which keeps Zotero and docear in step. It does mean there is no point in editing the jabref database inside docear because any such changes will be overwritten.
  5. (This was the hardest part) Ignored all the other promising potential building blocks like the “incoming” folder in docear, zotfile, auto linking of pdfs with matching bibtex keys in jabref, etc. etc. These were the red herrings!

Fix for Google search results not showing the actual URL

You might have been irritated by the following scenario: you want the actual URL of some google search result, e.g. in order to send it to someone else or to reference it. However google search results don't show the actual link, they show something like etc.

If your browser is set to open pdfs separately and not in the browser, it is very difficult to solve this problem with pdf links. One way to avoid this is to install the firefox extension Then the links will show up correctly.

Linus on having a vision or not

Right now I am doing a short consultancy helping IFRC with a Learning Conference in Haiti. So we are having some interesting discussions on what a learning organisation is; how important is it to have a plan, or is it it enough to just have a vision and work out how to get there as you go along.

Well it seems Linus Torwalds has neither a plan nor a vision …

Linus is the inventor of Linux, the operating system (well, kernel actually) at the heart of over 90% of the world's supercomputers, most of the the computers that serve up the internet, and Android, the world's number one smartphone OS. Found these two great quotes at

One of the main reasons I think Linux came to be successful in the first place was that I never had very lofty goals. The goalposts for me were always a few weeks out – never some kind of "one day, this will change the world". It was much more pedestrian than that, and I actually think that's the only way to make real progress: one small step at a time, not looking too far ahead to see the details. People like to idolize the "ideas" and "inspiration", but in the end, almost anybody can have an idea. Getting things actually done is where people stumble.

I’ve never been a visionary – the thing I tend to worry about is actual technical issues, and my goal has always been to just make sure the technical side of Linux (and other projects I’ve been involved in) have been as solid as possible.

So, is an OS just a special case? Or can you succeed at anything just by loving the nitty-gritty?

Joy of colour

I felt I had to share this gorgeously simple trick for finding the colour name you need in R.


col.wheel <- function(str, cex=0.75) {
   cols <- colors()[grep(str, colors())]
   pie(rep(1, length(cols)), labels=cols, col=cols, cex=cex)
#then just do:
#or whatever.

Kudos to

Criteria for assessing the evaluability of Theories of Change

We quite often ask if a ToC is “evaluable”; but what does that mean?

On his blog Rick Davies suggests:
  • Understandable
  • Verifiable
  • Testable
  • Explained
  • Complete
  • Inclusive
  • Justifiable
  • Plausible
  • Owned
  • Embedded

Book chapter just out

Powell, Steve, Joakim Molander, and Ivona Čelebičić. ‘Assessment of Outcome Mapping as a Tool for Evaluating and Monitoring Support to Civil Society Organisations’. In Governance by Evaluation for Sustainable Development: Institutional Capacities and Learning, edited by Michal Sedlacko and André Martinuzzi. Edward Elgar Publishing Limited, 2011.

Did poverty cause the England riots? Guardian graphic fails to correct for population density


Great idea in principle. The yellow dots show where the accused live, and the redder districts are the poorest. But I guess these are the districts with the highest population density so the graphic doesn’t mean anything on its own. Pity

Freiwilligendienste und ihre Wirkung – vom Nutzen des Engagements – Aus Politik und Zeitgeschichte (APuZ 48/2011)

Freiwilligendienste und ihre Wirkung – vom Nutzen des Engagements


“Voluntary Service and its impact – the point of getting involved.” Well done Joern!

Using lyx to automate production of project proposals, expressions of interest, etc

At proMENTE social research we use lyx ( and zotero ( to produce our project proposals, expressions of interest, etc.
I just did some more work on this and thought I would document some useful steps and tricks.

Overall approach. Basically we have a single lyx file with all the material we need – information about the organisation, methods we use etc., combined with various bibliographies, project lists, and CVs included using pdfpages. Then we just hide what we don’t need for a particular proposal by using branches in lyx.
Using unicode bibtex files. We export out bibliographies from Zotero which produces unicode by default. We need this for all our special characters, especially čćžšđ etc. Lyx is not supposed to work with unicode characters in its bibtex files, but if you just set the encoding to utf8 in the document settings, it seems to always work, except for a few special characters like long hyphens which we can delete in Zotero. It also doesn’t work with Russian characters.
Using bibliography for project lists. We decided to use bibliographies to list our projects too. So we use zotero to store information on projects just as if they were publications.
Sectioned bibliographies. We need to break up our publication list into reports, book chapters etc. So we just select “sectioned bibliography” in the document settings in lyx and then link to a series of blbtex files further down the document. These are all set to show “all references”. This means in the main text at the start of the document we can cite our own publications and projects. 
Including CV pdfs. This is a nice benefit of lyx – we can include our CVs very simply using insert/file/external material/pdfpages. The only trick is to make sure the links are relative as we use dropbox to sync between different computers.
Fancy page headers. We use usepackage{fancyhdr} and usepackage{graphicx} in the document preamble to customise the appearance of headers and include a little logo on every page.

Fancy section headings. We use usepackage{xcolor} in usepackage{titlesec} in the document preamble to customise the appearance of section headers.

Removing title page. We decided not to use the title page at all, so renewcommand{maketitle}{} gets rid of it.


I have attached an example output pdf, without the CVs.

Read more

Illogical frameworks, composite results, and logframe bloat.

Logframe bloat causes headaches, tiredness and indigestion in M&E staff. This post is about one cause of it.

Sometimes I have to help organisations with their existing results frameworks or logframes and I am struck by how often I see, below a composite result, what one could call “redundant results”: subsidiary results which are supposed to contribute to the higher, composite result but which are in fact just parts of its definition.

For example, suppose: “Result 1: increased awareness of HIV amongst schoolchildren and their parents”, which we could call a composite result, has below it two subsidiary results: “Result 1.1: increased awareness of HIV amongst schoolchildren” and “Result 1.2: increased awareness of HIV amongst parents”. 

Does this kind of combination of results strike anyone else as weird? 1.1 and 1.2 are redundant, aren't they? Can't we just strike the two lower-level results? Leaving them in is a major cause of logframe bloat.

Manuals like the USAID TIPS series correctly teach that results have to be logically independent of their parent result; they can be measured independently from it, and collectively they can causally contribute to it. Whereas in the example, 1.1 and 1.2 together are the very same thing as 1; taken together, they logically imply it and are implied by it. If A and B are the same thing as C, they cannot cause it. It can't be part of our theory of change to achieve C by doing C.

There are many occasions where we just have to have composite results; when we are addressing changes in related but different groups of stakeholders, or geographical areas, minority and majority groups, etc. Or where we are aiming for a bunch or series of related but different products or achievements or regulations adopted or whatever. That's OK. But then isn't it always a mistake to add a redundant level of superfluous results below such a composite result?

Sure, one can (correctly) respond “oh this just goes to show the fatal flaws in this kind of sequential, deterministic program design” but that doesn't help the tens of thousands of organisations who are bound by contract to exactly these kinds of designs, and are going to be bound by similar contracts for the foreseeable future too.

This leads to a bunch of other issues.

First, unfortunately, there are many different ways in which subsidiary results can fail to be logically independent of the parent result. Here are some possibilities.

  1. Exact overlap: subsidiary results overlap exactly with the parent. In this case we can just delete the redundant lower layer as discussed above.

  2. The parent can include content which is not covered in the subsidiary results. So if we change Result 1 to “increased awareness of HIV amongst educational stakeholders”, 1.1 and 1.2 are part of the definition of 1, but we have left out teachers, principals, the education authorities etc. Now, do we assume that the subsidiary results (or any other results elsewhere in the framework) can make a causal contribution to this additional coverage in the parent?

    1. If no, we just have superfluous content in the parent which we are not programming to change and which should just be deleted (and then the subsidiary level is entirely redundant and we can delete that too).

    2. If yes, for example because we believe the children and their parents will influence the other stakeholders, we have got a bit of a mess and we should redesign this part of the framework.

  3. The subsidiary results can include content which is not covered in the parent. So if we change Result 1.1 to “increased awareness of STDs amongst school children”, 1.1 is still part of the definition of 1, but it also includes material which is not covered in 1. Now, do we assume that this additional content in the subsidiary results can make a causal contribution to the parent (or to any other results in the framework)? Again, there are two cases:

    1. If no, we just have superfluous content in the subsidiary results which does not lead to higher-level change and which should probably just be deleted (and then the subsidiary level is entirely redundant and we can delete that too).

    2. If yes, for example because we really believe the children's knowledge of other STDs will influence or reinforce their or other stakeholders' awareness of HIV, again we have got a bit of a mess and we should redesign this part of the framework.

  4. Of course various combinations of case B and case C are possible too.

Second, what are we going to do about the indicators? Composite results need composite indicators. But donors want commitments to simple targets for simple indicators. So for a composite result we have to both define a set of subsidiary indicators for each of the dimensions of the result and specify how to combine them into a single indicator.

I will look at composite indicators in a subsequent post.

Copyright © socialdatablog
Project evaluation. Reproducible research.

Built on Notes Blog Core
Powered by WordPress

That's it - back to the top of page!