Since the publication of the so called 'data papers', large databases systematically curated and fully accessibly have been published across different fields. Most popularly, the open-access repository of continental climatic information, WorldClim, has been cited in more than ~20,000 publications since 2005; meanwhile its recent updated version (WorldClim 2) has been already cited by ~5,000 publications since 2017 (according to Google scholar).
Data papers can be defined as academic publications "describing a particular dataset or a group of datasets, published in the form of a peer-reviewed article" with focus on cataloguing instead of analyzing the provided information. Theoretically, an increase of this type of publications can be traced to 2006. A cool discussion on this topic can be found in Schöpfel et al. 2019.
Because of its nature (e.g., confidentiality), health-related datasets are usually undisclosed and difficult to obtain. Moreover, because they are usually stained with political colors, sometimes are actively manipulated or unaccessible (Makoni 2021, Torres et al. 2020). Thus, particularly for infectious disease modeling, open-access readily processable databases can help to understand disease dynamics in real time. Specifically for ecological niche models (ENMs), that really on geographical coordinates as one of their raw components, this databases can work as preliminary information jewels to start different kind of analysis.
In this post, I will be sharing a couple of databases that I have found across different studies with the idea of having a resource for developing tutorials, exercises, or even research based on valid published datasets. Although this post is framed in the context of 'data papers' some of the shared publications are a combination of a data paper with different level of analysis. Let's get into it:
Identification | Author | Year | Publication (direct link to the database) | Description |
1 | Pigott et al. | 2014 | Occurrences of human cases of cutaneous and visceral leishmaniasis | |
2 | Kraemer et al. | 2015 | Point/Polygon locations of Ae. aegypti and Ae. albopictus detections (1960-2014) | |
3 | Messina et al. | 2014 | Point/polygon locations of human cases of dengue (1960-2012) | |
4 | Limmathurotsakul | 2016 | Detections of B. pseudomallei in human, animal, water and soil sources (1910-2014) | |
5 | Wardeh et al. | 2015 | Geographical distribution of pathogen-host associations (1950-2012) | |
| | | | |
6 | Van de Vuurst et al. | 2022 | Occurrences for Desmodus rotundus, across the Americas (~1900-2020) | |
7 | Stephens et al. | 2017 | Database of parasites of wild ungulates, carnivores and primates | |
8 | Ceccarelli et al. | 2018 | Georeferenced points of triatomine vectors from the Americas (1904-2017) | |
9 | Brown et al. | 2017 | Geolocations of Trypanosoma cruzi in humans, vectors and alternative hosts (2003-2015) | |
10 | Pfeffer et al. | 2018 | R package including a lot of information of malaria and its vectors. Part of the Malaria Atlas Project (MAP) | |
11 | Moyes et al. | 2017 | Data associated with insecticide resistant mechanisms in Aedes mosquitos. | |
12 | Gibson | | Associations of helminth parasites with its host with their locality (1988-2003) |
An interesting example on how to leverage the power of this information can be seen in the publication of Olson et al. (2021): Global patterns of aegyptism without arbovirus, where the authors calculate the difference between areas with Ae. aegypti and dengue to recognize locations where the vector is present but Dengue is absent. This study was only possible because of the practical implementation of the results of Kreamer et al. 2015 and Messina et al. 2019.
I will keep updating this database so in the future will hold far more databases to work and implement models wth.
Comments