What is the average time between papers published in the astronomy community?

I’ve been interested for a while in mining ADS (NASA’s Astrophysics Data System, an online repository of bibliographic records). Using the ADS developer API, it is quite simple to download bibliographical records as JSON data and do some analysis on a sample of astronomical publications.

The full, detailed analysis (with some caveats) is available here. The main plot derived from the reduced dataset  tracks the number of months between successive first-author papers (i.e. between the first and the second, the second and the third, etc.) to test the hypothesis that the rate of publishing papers increases as the author becomes more experienced and entrenched.

 

On average, the lag between first-author papers decreases steadily from approximately a year and a half (18 months) between the first and the second paper, flattening to approximately 7-8 months by the tenth paper published.

AstroTRENDS: No so weaselly after all

MT

Substitute damn every time you’re inclined to write very; your editor will delete it and the writing will be just as it should be. — Mark Twain

Damn right!


In my last post, I showed a plot of the number of abstract that contained weasel words, as tracked by AstroTRENDS:

chart-2I interpreted this trend as a steady change in the style and “audacity” of astronomy papers, and I believed that a possible cause was hedging. (See Writing without conviction? Hedging in science research articles.) Note that I was not making a statement about the quality of the research, but merely about an interesting trend I had not seen mentioned elsewhere.

Perhaps I should have used more weasel words in my post! I acted upon Ben Weiner’s suggestion: use a set of non-weasel words as a control to verify whether the trend was due to an increase in verbosity instead. I track this set of keywords (a mix of adverbs and adjectives that I deemed to be neutral):

Fast OR Slow OR Large OR Small OR Before OR After OR High OR Low OR Many OR Few OR More OR Less OR Inside OR Outside OR Recently OR Just

This is available as “Non-weasel keywords” in the AstroTRENDS drop downs.

And here’s the plot!

The two appear to be tracking each other pretty well. It seems to me to be a strong indication of the correctness of Ben’s guess that verbosity is the main driver here. However, simple keyword search may still not be telling the whole story (e.g. because certain keywords “saturate” as the abstracts get longer, appearing more than once), so a better approach could be to study a small sample of abstracts through the years.

 

Weasels (green) vs. non-weasels (yellow)
Weasels (green) vs. non-weasels (yellow)

Turns out that there’s a comprehensive ADS API, described on GitHub here, so with a bit of rejiggering I will be able to let AstroTRENDS do free-form queries (via Michael Kurtz.), and do a bit of abstract munging myself.

AstroTRENDS: Weasel words

Credit: Cliff

I added a bunch of new keywords to AstroTRENDS, mostly suggested by friends and people in the community who had read my Facebook post.

A thought I had yesterday is the following: has the astronomical literature become more speculative, and perhaps less committed to audacious claims, in recent times? It is difficult to test this hypothesis  by merely querying ADS for abstract keywords. It would certainly be better served by a natural-language processing analysis of the full text, although this is just my uninformed speculation.

A much simpler way is to search for the so-called “weasel words” (such a funny way of describing them from a non-native speaker POV!). Matthew Might (a CS professor from the University of Utah) has a really interesting article about the different abuses of language that are common among technical writers, and he created some automated tools for detecting them. It’s a great read. (There’s even an emacs minor mode called writegood based on his recommendations, which I will be testing for sure). Although I don’t necessarily agree with a strict adherence to all of his points, there are certainly some great pieces of advice there.

Taking his post as a reference, I added a new “weasel words” pseudo-keyword to AstroTRENDS. The “weasel words” keyword shows the result of an ADS query of refereed abstracts containing the following boolean expression:

Could OR Possibly OR Might OR Maybe OR Perhaps OR Quite OR Fairly OR Various OR Very OR Several OR Exceedingly OR Vastly OR Interestingly OR Surprisingly OR Remarkably OR Clearly OR Significantly OR Substantially OR Relatively OR Completely OR Extremely

We can easily disagree on whether using these words in an abstract constitutes “weaseling”, or has any sort of nefarious purpose (I certainly pepper my writing with more than my fair share of those). It is still an interesting exercise to verify whether usage of those words has increased over time. The following plot shows the fraction of articles containing those words (i.e. number of articles containing the words normalized by the total article count) each year.

chart-2

 

Keeping all the caveats above in mind, there is a definite upward, pretty linear-by-eye trend going on. I’m not sure whether it has to do with simple evolution of language and style, less boastful writing, an accident of fate/bug on my part, or some other factor.

This is of course a super-shallow analysis that would require far more insight than what I offered in this post, but it’s still intriguing. I tried to altavista whether this is well-known, but have come empty handed so far. Any ideas?

You can play with the interactive plot itself by clicking this link.

UPDATE: Ben Weiner made a really good point on the Facebook astronomer group.   He suggests that an additional, alternative explanation could simply be that abstracts have become, on average, more verbose with time, which would explain the higher frequency of fluffy adjectives and adverbs. This could be checked with a control set of non-weasel words… which I will definitely try.


How did this post do with writegood-mode? Pretty nicely… but I got a grade of “11” on Hemingway, with about 9 out 24 sentences being hard to read.  Oh well.
Weasel image credit: Cliff

AstroTRENDS: A new tool to track astronomy topics in the literature

A screenshot of AstroTRENDS, showing three random keywords: Dark Energy, Spitzer, and White Dwarf.
A screenshot of AstroTRENDS, showing three random keywords: Dark Energy, Spitzer, and White Dwarf. White Dwarfs are the “old reliable” of the group.

Inspired by this post by my good friend Augusto Carballido, I created a new web app called AstroTRENDS. It’s like Google Trends, for astronomy!

AstroTRENDS shows how popular specific astronomic topics are in the literature throughout the years. For instance, you could track the popularity of Dark Energy vs. Dark Matter; or the rise of exoplanetary-themed papers since the discovery of the first exoplanets in 1992. As an example, check out this post I wrote about whether the astronomical community has settled on the “extrasolar planet” or “exoplanet” monicker.

You can normalize keywords with respect to one another, or the total article count, to track relative trends in popularity (say, the growth of “Transits” papers compared to “Radial Velocity” papers). Finally, you can click on a specific point to see all the papers containing the keyword from that year (maybe that spike in a keyword is connected to a discovery, a new theory or the launch of a satellite?).

How does it work? I crawled ADS for a small number of keywords that I thought were interesting (but you can ask me for more!), and counted how many refereed articles were published containing that keyword in the abstract for each year between 1970 and 2013. Keywords containing multiple words are contained within quotes, to specify that all words must be in the abstract.

Play and have fun with it, and if you find an interesting trend, you can share it with others by copying and pasting the address from the “Share” box. (Feel free to send it to me, too!)

Open AstroTRENDS

Extrasolar planet vs. Exoplanet: 300 words about a trifling choice

After pausing for a bit  when, at the prodding of a friend, I couldn’t remember whether I used “modeling” or “modelling” in my writing, I thought about another choice I face often. I often find myself using the terms “exoplanet” and “extrasolar planet” interchangeably to denote any planet outside the Solar System. I definitely use “extrasolar planet” more often during talks, even though it is a mouthful, and “exoplanet” in writing — especially in communicating with colleagues, where the meaning of the word does not need explaining, versus communicating with the general public. 

Let’s first get this out of the way: the two terms are synonyms. That said…

A cursory Google search of “extrasolar planets” (~380,000 results) vs. “exoplanet” (~830,000 results) reveals a definite 2:1 preponderance of the latter term.  A search of Greg’s oklo.org blog reveals a very similar ratio of blog posts using the two terms. Wikipedia prefers exoplanet, while Encyclopedia Britannica goes for extrasolar planet.

What has the scientific community at large settled on? I used a tool (only for private use, for now) that does simple ADS queries across years to track the popularity of keywords across article abstracts. This chart was the result:

Exoplanet vs. Extrasolar Planet: the community has spoken.
Exoplanet vs. Extrasolar Planet: the community has spoken.

The chart shows the number of articles published (on a log scale), per year, containing the keyword in the abstract. [ref]Any “holes” in the curve are due to 0 papers being published that year.[/ref]

The first mention of “extrasolar planet” appears to be in 1971, from this Icarus paper (Photometric Color Indices of Extrasolar Planets), while the word exoplanet appears in ADS in 1992. [ref]Note that these queries are open to non-refereed (e.g. proceedings) sources as well – I am using default ADS settings.[/ref]

Interestingly, it appears that the fortunes of the two keywords rapidly reversed: around 2003, the usage of “extrasolar planet” started flattening out, while “exoplanet” continued its meteoric rise. Around 2007, “exoplanet” caught up and quickly started surpassing “extrasolar planet”. (The most cited paper in 2007 was Dan Fabrycky’s paper Shrinking Binary and Planetary Orbits by Kozai Cycles with Tidal Friction, by the way… which used “extrasolar planet”).

In 2013, “exoplanet” beat “extrasolar planet” more than 4:1!  Paper containing both words made up about .8% of all indexed astronomy papers (down from a high of ~1% in 2011).

So, if in doubt, go with the majority and use exoplanet. (Or not!)