You can review the code for one of the DDH harvesters here. One of the most practical and valuable aspects of the way the DDH Development Group works is that open-source software is used to tackle day-to-day tasks and streamline workflows, while also following best practices in software development so our open-source tools can be used by the public. The DDH harvests the metadata and exposes it to a larger audience, so the data that a researcher or data scientist adds to other data catalogs will be exposed to many more users than it would be otherwise. Users looking for data can search on the DDH and find information on several different catalogs, without having to search each catalog separately-or even be aware that those catalogs exist. Broad benefits for all users include but are not limited to: Benefits of data harvestingĪlthough harvesting involves many technical elements, such as metadata, APIs, and program scripts, you don’t have to be a technical person to benefit from it. A user can follow the links in DDH to find the original dataset record on its original website:įigure 2 shows the harvested DDH version on the left, and the source version on the right. The most important aspect of harvesting, however, is that it happens automatically, in the background, typically on a daily basis, so the data is kept in-sync.įigure 2 illustrates a dataset harvested from WB Finances, as it appears both in DDH and in its original location. Application programming interfaces (APIs) act as lines of communication between different databases.įigure 1 shows how the process of harvesting synchronizes large numbers of datasets between DDH and other data catalogs.įigure 1 depicts harvesting, conceptually. It’s similar to the techniques that search engines use to look for, catalog, and index content from different websites to make it searchable in a single location. What is data harvesting?ĭata harvesting is a process that copies datasets and their metadata between two or more data catalogs-a critical step in making data useful. A subset of datasets from the Energy data portal.īut how does DDH ensure that it has the latest data and metadata from these different sources? This is where harvesting comes into play.Most datasets from the Microdata Library, unit-level data obtained from sample surveys, censuses, and administrative systems and.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |