Discrete typing units of Trypanosoma cruzi: Geographical and biological distribution in the Americas

For the construction of this metadata, two researchers independently selected the articles following the same instructions as described in the Information about the databases used as sources section below; then, a third investigator made another revision to avoid any discrepancy between the results, followed by a three-step debugging process. We extracted the following information from each article: Original code, Sample type, DTU, TcI DTU / genotype, Coordinates (sexagesimal degrees system), Latitude and longitude (decimal degrees system), Country, Continental division, Upper-division (state / province / department / region), Belong to Amazon basin (yes / no), Lower division (department / municipality / community), Local division (municipality / community / village), Date of isolation, Year of isolation / detection, Species of the host , Common name, Source sample, Order of the host, Tribe (only Triatominae), Genus of the host, Cycle (transmission cycle), Genetic marker (for genotyping), Method of identification (of the parasite) and Genes examined. The articles with no complete / clear information regarding sample collection, hosts / vector species, and methods were excluded from the database. Some coordinates were obtained manually using the web page https://www.gps-coordinates.net if the article specified the place-name where the samples were collected. The coordinate system used was WGS84. For the DTUs distribution, we used the software QGIS 3.16 Hannover (https://www.qgis.org/es/site/) to create and edit the maps, and we used the figures from the software R version 3.6.3 with the library “ggplot2”.

Inclusion and exclusion criteria

Herein, we considered those articles with clinical (Identification method, sample type, and species identified) and complete geographical information. Three languages ​​were considered (Spanish, English, and Portuguese). Information was searched for in the abstract and full article. We excluded articles without the full (.pdf) version or with incomplete information, such as coordinates, source of the sample, vector / hosts from where the parasite was recovered, or reported techniques that did not fulfil the correct identification of the parasite.

Information about the databases used as sources

For the database construction, we did a PubMed Advanced Search and employed an algorithm using the words “DTU” and “Trypanosoma cruzi” with the Boolean “AND”. The search was done without establishing a time frame. We downloaded the result file and performed a manual depuration to discard articles unrelated to our interests (ie., pharmacological studies, including another trypanosomatids such as Leishmania spp, studies related to another hemipteran species). After reading and refining the articles implementing the previously mentioned criteria, we constructed a database by country to debug. Then, those articles were collected in a metadata database. Furthermore, three more independent debugging processes were carried out to check if the articles comply with the required parameters. Finally, a standardization process of the database fields was performed to verify that their content was all in the same format.

Database fields information

Original code

Refers to the code of the samples assigned by the authors of each article.

Sample type

This refers to the type of sample from where the parasite was isolated. We considered the following categories: a) Blood, b) Complete Insect, c) Faeces, d) Food, e) Gut, f) Heart, g) Rectal Ampoule, h) Serum, i) Strain, j) Tissues and k) Xenodiagnosis.


This refers to Trypanosoma cruzi‘s DTU per sample. The categories used were: a) TcI, b) TcII, c) TcIII, d) TcIV, e) TcV, f) TcVI, g) Tcbat, h) TcII or TcV, i) TcII or TcVI, j) TcII to TcVI , k) TcIII or TcIV, l) TcIII to TcVI, m) TcIV or TcVI, n) TcIV to TcVI, o) Unknown).

TcI Genotype

Refers to TcI genotyping. They were categorized as follows: a) Sylv (sylvatic), b) Dom (domestic), c) TcIDom / TcISylv and d) Unknown.

Source sample

Refers to the organism from where the sample was isolated. We considered the following categories: a) Food, b) Humans, c) Reservoir (non-human animals), and d) Vector.


Regarding the species of the host, we divided the database into a) species of the host (complete scientific name), b) common name, c) order of the host, d) tribe (only for Triatominae), e) genus of the host (only Genus) and f) cycle (refers to the transmission cycle of the host: Domestic / Sylvatic / Peridomestic / NA (No data)).

Genetic marker

Refers to the nature of the marker: Nuclear, Mitochondrial, Antigen or NA (no data).

Method of identification

For optimization, we categorized the tests / methods / techniques as follows: a) Blotting, b) Electrophoretic, c) PCR-based, d) Real-time PCR, e) Sequencing and f) Serologic. Each category includes subcategories described in Table 2.

Genes examined

Refers to the genes used in each study for parasite identification and genotyping (Supplementary Figure 2).

Geographical location

We have nine categories in the database: a) Coordinates (in the sexagesimal degree system of coordinates), b) Latitude, c) Longitude, d) Country (where the samples were collected), e) Continental division (South or North America) , f) Upper-division (state / province / department / region), g) Belong to the Amazon basin (if the division is in the Amazon basin), h) Lower division (department / province / municipality / community) and i) Local division (municipality / community / village).


Refers to a) Date isolation (Date of the sample collection) and b) Year isolation / detection (Year in which the parasite was detected).

