SBL Proposal Accepted for Digital Humanities section: Introducing LODLIBs

Got confirmation of acceptance of a second paper this morning. Thank you to the session chairs (Garrick Allen and Paul Dilley) and the review committee for the opportunity to present this research.

Title: Introducing Linked Open Data Living Informational Books

Abstract: In a recent article, Claire Clivaz surveys the rise of VREs (Virtual Research Environments) that allow for scientific hypothesis-driven, iterative, and collaborative research in the Humanities. In this presentation, we propose a new kind of VRE, the Linked Open Data Living Informational Book or LODLIB, essentially a scientific hypothesis-driven iterative digital codex. LODLIBs follow the structure of scientific articles (introduction, materials and methods, results, discussion), leverage international Linked Open Data standards (unique and interconnected DOIs), rely on non-commercial Open Science repositories, include internal data dictionaries and lexicographical resources, embed datasets and code within the digital book, invite global open peer-review and collaboration, and allow for cycles of continuous improvement characteristic of agile software and systems development. Essentially, the LODLIB reimagines the codex as human- and machine-readable software, bringing together research and publishing, the Sciences and the Humanities. The LODLIB format inverts the power- and economic relationships between academic authors and publishers, opens academic discourse to the global public, allows for rich analytics about readership and citations, and has the potential to make monographs and compilations go viral in online environments. The conclusion will relate the story of the presenter’s prototyping of the LODLIB format to propose and realize a new, scientific solution to Q and the Synoptic Problem.

Subjects: Computer-Assisted Research | Historical Criticism | Lexicography

The First Gospel (LODLIB v1.38 release notes)

This week’s edition puts us at nearly 720 pages and 300,000 words. This is the week where our research really started to integrate with RStudio. We spent quite a bit of time troubleshooting Greek unicode and UTF-8 encoding issues in RStudio on our main Windows machine and getting Microsoft Linux Subsystem up and running so we can move back and forth between RStudio in both environments. Rather than build unicode points throughout our scripts, we decided to front load this work.

Thus our Code Repository debuts with two major scripts: one that transliterates all Greek unicode characters into ASCII English letter equivalents; and another that loads both Greek and English UTF-8 txt files, then quickly and cleanly parses six vectors for use in deep Computational Linguistics analysis (whole, lemma, and morphology for both languages). With the in-book datasets and code, experts and novices in Gospel Computational Linguistics can start to evaluate and build on our research. Our Data Visualizations section (freshly reformatted to tabloid layout) also features a new section that builds on this: Top Ten Words tables and graphs for the Harnack, Roth, and CENP datasets.

Read More »

The First Gospel (LODLIB v1.35 release notes)

This week’s edition puts us at almost 680 pages and over 280,000 words. Major highlights:

  • A new section on the history of scholarship on Computational Linguistics and the Synoptic Problem. Ever wonder why we couldn’t solve the Synoptic Problem before? Faulty understanding and modeling of the problem and only using a fraction of the relevant datasets!
  • New additions and numerous corrections to our statistical proofs. What happens when you bring together statistics about GMarc’s abundance of triple tradition passages with statistics about its lack of Markan and Lukan passages? Hint: if this were judo or MMA, this would be the submission hold that ends the match against defenders of the early orthodox hypothesis that GMarc is derived from Luke.
  • A new Lk2 clean vocal stratum training dataset for Natural Language Processing and Computational Linguistics. Ever wonder what the redactor of Late Luke (Lk2) unfiltered without synoptic noise sounds like? Any of the coders out there eager to have lemmatized and morphologically tagged datasets to test our hypotheses? Here ya go!
Read More »