Work package 3. Collections digitisation

WP 3 Leader: Assoc. Prof. Borislav Guéorguiev

This WP is focused on digitisation of substantial part (c. 50%) of the collection lots and publication of this digital information organised in a database. We envisage about 25% of the database entries to be accompanied with digital images of specimens. In general, in the frames of the duration of this project (4 years), we plan to digitise 1,000,000 collection lots – 500,000 lots from the collections of IBER-BAS and 500,000 lots from the collections of NMNHS-BAS. The activities planned in WP3 include: (i) setting up a digitisation plan and technical design to secure smooth flow of digitisation process as well as management and coordination of the digitisation process; (ii) organising, conducting and monitoring of digital activities, including basic work on digitisation of collections and preparation of databases as well as quality control and verification of digitisation, and (iii) data management processing, including prior-publication processing the digital record and images to make them published and freely accessible. To reply to the needs of the DiSSCo-BG associated with the mass digitisation, investments will be made in technical areas such as imaging, e- infrastructure services, e.g. resources for storage, computing, networking, tools for data management, security, access, etc., through significant investments managed in WP2. The developed web-based Collection Information System (CIS) will be used for storage and access to the comprehensive information on the IBER-BAS and NMNHS-BAS collections. This web- published and freely accessible database will consist of two components – Digital Collection Catalogue (DCC) containing the information oft the collection entries as presented in the registers and Digital Image Library (DIL) containing digital images. Both modules will be able to "communicate constantly". CIS will be built using open source software, e.g. PostgreSQL-like software can be used as an integrated database environment and PostGIS-like software for spatial data. Database nomenclature will be in line with the basic standards of Darwin Core as well as with GBIF to allow for data interoperability. The CIS will support a number of functionalities, such as:
  • block export of all information (e.g. for backups)
  • export of selected data based on filters (e.g. for specific queries by researchers)
  • producing a map with the geographic location of each individual unit
  • ability to display data to the IBER-BAS and NMNHS-BAS sites
  • block import from other, already existing but different software databases
  • ability to automatically share information to other databases such as pan-European DiSSCo RI portal and others
To improve and facilitate the whole process and accessibility of the data, a mobile (Android based) version will also be developed. Considerable attention will be paid to CIS security as well as allocation of data access. The public profile will provide basic information about the individual lots (= collection units presented in both collection registers and database as a single record). Access levels, in descending order, will be: Database Manager, Administrators, Moderators and Regular user. CIS will be uploaded to a virtual server, which provides significantly more flexibility in operation as well as better information security. CIS will be archived as two back-up copies on weekly basis, one of them using the storage capacity of the Institute of Information and Communication Technologies – Bulgarian Academy of Sciences and one on external storage.
Objectives
  • Coordination and supervision of digitisation process.
  • Coordination of consultations with contact persons of the pan-European DiSSCo RI in order of unification of standards.
  • Building up digitisation management framework – core digital activities; recourses for digitisation; digitisation strategy; digitisation plan; governance for digitisation activities.
  • Implementing web-based database interoperability software for data and metadata capture, datasets interaction and data aggregation.
  • Primary work on digitisation of biological collections – digital conversion of biological objects and populating databases.
  • Quality control and verification of digitisation records – digitisation process monitoring.
  • Managing data and sharing data, including metadata capture – data management process.
  • Ensuring open source data portal solution for open data publishing.
Description of work
Task 3.1. General management and coordination of digitisation process
Task Leader: Assoc. Prof. Borislav Guéorguiev
This task is scheduled for the entire duration of the project. It includes implementation of the following activities:
  • preparation of the digitisation plan: a) time priorities; b) identifying what parts and which collections; c) assigning specific responsibilities; d) identifying formats, standards, technologies, storage, and maintenance; e) defining rules for sharing data
  • metadata access and intellectual property; f) distributing resources – funding, facilities, equipment and expertise (months 1, 2)
  • organisation of digitisation governance (months 2, 3). Governance for digitisation consists of principles, policies, procedures, roles and responsibilities associated with digitisation activities
  • regular monthly meetings of the WPB3 and technical experts belonging to the relevant work package for coordination of activities related to process of digitisation (two-days meetings per months 1, 2, 3, and one-day meeting per months 4–48)
  • regular meetings of WPB3 with the EB (one-day meeting monthly in the first year and every 2nd month afterwards)
  • coordination of the contacts with DiSSCo RI entities, such as DiSSCo Scientific Advisory Board and DiSSCo Technical Advisory Board in order to synchronize national digitisation workflow progress with that of the pan-European DiSSCo RI (regular on-line meetings and physical meetings in the frames of DiSSCO RI and related projects)
  • reviewing digitisation activities (months 12, 18, 24, 30, 36, 42). Digitisation activities should be reviewed to: a/ assess performance against the digitisation plan; b/ check if organization of digitisation activities may be improved; c/ check if management of the digital assets is effective
Task 3.2. Digitisation of collection specimens.
Task Leaders: Assoc. Prof. Rostilav Bekchiev, Valeri Georgiev and Yuriy Kornilev.
This task is designed for months 1-47. It includes the activities of the basic stages of digitisation process. Adopting standards for digitisation is a central component of the quality regime and essential for producing consistent digital assets. This task includes also the basic process of digitisation, which should be carried out at the department- and (or) the collection-levels.
Activities
  • Allocation and arrangement of the digitisation equipment (hardware, software and facilities needed to create, manage and share digital asset) by departments and collections (months 1-6, 13-17, 25-28, 37-38). Timetable of this activity is in line with the implementations of Tasks 2.3 and 2.4 of WP2.
  • Defining digitisation standards: digital formats; digital files; types of images required; branding (embedding institutions logos in images); object management (adding QR codes to herbarium sheets and labels of animal specimens); metadata standards (months 1, 2). This activity includes also considering and adapting beneficial recommended practices by adopting widely used standards endorsed by TDWG and GBIF.
  • Organisation and implementation of the primary stages of digitization and databases by departments and collections (months 3–47).
  • Quality control of digitisation workflow process and data digitisation by departments and collections (monitoring; verification) (months 4–47).
Task 3.3. Data management
Task leaders: Assoc. Prof. Georgi Popgeorgiev and Valeri Georgiev
This task is scheduled for the entire duration of the project and is directed to ensure intermediate and final activities of the digitisation process. Data management is an administrative process that includes acquiring, validating, storing, protecting and processing data to ensure the accessibility, reliability and timeliness of the data for its users. It includes performing of:
  • Developing database design and architecture (months 1, 2)
  • Preparing data and metadata capture tools (months 2, 3)
  • Preparing data storage (local; archive; operational) and protecting datasets (months 3, 4)
  • Acquiring, organizing and validating datasets (months 4–48)
  • Merging and unification of different databases, if available (months 4-9)
  • Preparing data sharing tool (months 6, 7)
  • Preparing exploitation tools to exploit digital asset (months 7, 8)
  • Making collection datasets accessible on the Web for the final user audiences (months 46-48)
WP 3-related deliverables
  • D3.01: Digitisation management framework and digitisation plan (report) (month 2)
  • D3.02: Digitisation governance model (report) (month 3)
  • D3.03: Deployment of modular interoperability software (report) (month 4)
  • D3.04: Managing datasets platform (report) (month 9)
  • D3.05-D3.07: Annual report on digitization advance (months 13, 25, 37)
  • D3.08: Final report on digitization (month 48)
WP 3-related milestones
  • M3.1: Meeting of the WPB3 with the EB and parties concerned – approval of digitisation management framework and digitisation plan (month 2)
  • M3.2: Meeting of the WPB3 and parties concerned – approval of digitisation governance (month 3)
  • M3.3: Meeting of the WPB3 and parties concerned – approval of system to manage metadata and data storage systems; approval of data portal structure for free access to the DiSSCO-BG datasets (month 3)
  • M3.4: Meeting of the WPB3 and parties concerned – approval of data sharing tools and exploitation tools; approval of managing datasets platform (month 9)
  • M3.5: Meeting of the WPB3 and parties concerned – acceptance of digitisation and monitoring results (month 47)