Posts
Five minutes goes by fast!
Just posted the recording of my lightning talk at the 2017 meeting of the Medical Library Association where I talked about Visualizing PubMed. You get five minutes and three slides (though one could stretch that by using animations). Still can’t believe how fast it went by…
If you want to hear somebody talking as fast as they reasonably can over a slide deck, feel free to check it out!
– ❉ –
We've Got Updates...
PubVenn has been (finally) updated to the newest verion of Ben Frederickson’s excellent venn.js.
Note to anyone using this version on a Bootstrap site – you’ll run into a CSS class conflict for “label” elements in diagram(s). You can fool with less or namespaces or just include a local rule for “text.label”
– ❉ –
The Littlest Webservice
As (hopefully) demonstrated by Visualizing PubMed, there is almost no end to the cool things you can do using the E-Utilities API to PubMed (and other NCBI databases). However, there are some limitations that one needs to keep in mind when developing interactive tools. Chief among these is the admonition that one should make no more than three requests per second to the API.
This becomes an issue in use cases such as PubMed by Year, where we would seem to need lots of individual searches. For example, if we need counts for “myexamplesearch” from 1945 to the present in order to compare them to baseline, the most direct solution would be to send multiple requests:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=myexamplesearch+AND+1945[DP]
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=myexamplesearch+AND+1946[DP]
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=myexamplesearch+AND+1947[DP]
…etc
That works, but it also comes to 72 requests per graph. If we’re adhering to the rules and waiting 1/3 of a second between each request, it’s going to take us 24 seconds to get our counts. What can we do instead?
Close observers will note that there is a “Download CSV” link underneath the “Results by Year” widget that shows up for any PubMed search with 500 hits or more. Sadly, there is no way to directly get this link using the API, but we could just try to scrape it off the PubMed results page and then urlfetch the CSV. Due to the way that PubMed pages are served, there’s not an easy way to get a static URL off the bat, but we can scrape the value for “blobid”. Once we have that, we can combine it with our original search like so:
csvstem = "https://www.ncbi.nlm.nih.gov/pubmed?p$l=Email&Mode=download&dlid=timeline&filename=timeline.csv"
try:
validate_certificate = 'true'
csvurl = csvstem + "&term=" + searchstr + "&bbid=" + blobId + "&p$debugoutput=off"
result = urlfetch.fetch(csvurl)
With the resulting CSV, we now have counts for all possible years (currently back to 1809 for the smattering of older PubMed Central stuff) in two requests instead of bunches. Even better, we can use some python to construct a small webservice to serve those counts up as easily-consumable JSON:
https://med-by-year.appspot.com/search?q=death
(see the full code here)
– ❉ –
Slides from Code4Lib Southeast 2017
I just put the slides from my presentation at Code4LibSE 2017 up on Athenaeum (the UGA IR): http://hdl.handle.net/10724/36962.
Of course, there were a lot of other cool presentations as well; you can see those linked to from the conference schedule.
– ❉ –
Going to Atlanta
Assuming that the rest of Atlanta’s transportation infrastructure doesn’t catch fire between here and then, I’ll be headed West to Code4LibSE 2017 to talk about Visualizing PubMed
– ❉ –