Got confirmation of acceptance of a second paper this morning. Thank you to the session chairs (Garrick Allen and Paul Dilley) and the review committee for the opportunity to present this research.
Title: Introducing Linked Open Data Living Informational Books
Abstract: In a recent article, Claire Clivaz surveys the rise of VREs (Virtual Research Environments) that allow for scientific hypothesis-driven, iterative, and collaborative research in the Humanities. In this presentation, we propose a new kind of VRE, the Linked Open Data Living Informational Book or LODLIB, essentially a scientific hypothesis-driven iterative digital codex. LODLIBs follow the structure of scientific articles (introduction, materials and methods, results, discussion), leverage international Linked Open Data standards (unique and interconnected DOIs), rely on non-commercial Open Science repositories, include internal data dictionaries and lexicographical resources, embed datasets and code within the digital book, invite global open peer-review and collaboration, and allow for cycles of continuous improvement characteristic of agile software and systems development. Essentially, the LODLIB reimagines the codex as human- and machine-readable software, bringing together research and publishing, the Sciences and the Humanities. The LODLIB format inverts the power- and economic relationships between academic authors and publishers, opens academic discourse to the global public, allows for rich analytics about readership and citations, and has the potential to make monographs and compilations go viral in online environments. The conclusion will relate the story of the presenter’s prototyping of the LODLIB format to propose and realize a new, scientific solution to Q and the Synoptic Problem.
Subjects: Computer-Assisted Research | Historical Criticism | Lexicography
This week’s edition puts us over 730 pages and 303,000 words. The main addition this week is the Lk2-CINP dataset. CINP stands for “Clear and Implicitly Not Present.” Like Lk2-CENP, this dataset records the redactor of Late Luke (Lk2) speaking freely without noise from prior gospel strata and is roughly the same size, representing about 20% of the total word count of Lk2. While we may make additions or subtractions from this dataset in future editions, depending on our restoration and signal transmission tracing work, we are confident that overall this dataset is a high fidelity representation of the Lk2 vocal stratum and thus ideal for modeling and training. We have already started incorporating the Lk2-CINP dataset into our Computational Linguistics analysis and visualizations, which now also include Acts and the Gospel of John for comparison.
Read More »
This week’s edition puts us at nearly 720 pages and 300,000 words. This is the week where our research really started to integrate with RStudio. We spent quite a bit of time troubleshooting Greek unicode and UTF-8 encoding issues in RStudio on our main Windows machine and getting Microsoft Linux Subsystem up and running so we can move back and forth between RStudio in both environments. Rather than build unicode points throughout our scripts, we decided to front load this work.
Thus our Code Repository debuts with two major scripts: one that transliterates all Greek unicode characters into ASCII English letter equivalents; and another that loads both Greek and English UTF-8 txt files, then quickly and cleanly parses six vectors for use in deep Computational Linguistics analysis (whole, lemma, and morphology for both languages). With the in-book datasets and code, experts and novices in Gospel Computational Linguistics can start to evaluate and build on our research. Our Data Visualizations section (freshly reformatted to tabloid layout) also features a new section that builds on this: Top Ten Words tables and graphs for the Harnack, Roth, and CENP datasets.
Read More »
This week’s edition puts us over 700 pages and 293,000 words. Notable highlights:
Read More »
- Identification of an additional 20 signature features showing statistically significant variance between Lk1/GMarc and Lk2 that will be used in future proofs of the Schwegler hypothesis and our five hypotheses. These now include several features with disproportionately high frequencies in Lk1/GMarc compared to Lk2, not just vice versa. Many of these newly listed features are morphologically nuanced bigrams, trigrams, and quadigrams we’ve been identifying over the past several editions of our LODLIB in DD 1.2.
- Forked three sections (Computational Linguistics and the
Synoptic [Signals] Problem; Data Visualizations; Excursus on Related Topics) from other areas to have their own sections.
- Hundreds more “clear” vocal signal tags are now assigned across any and all strata throughout the entire reconstruction in anticipation of the future compilation of NLP training datasets for each vocal stratum.
- Dozens of new entries to the Data Dictionary, adding further clarification and disambiguation of the Qn, Lk1, and Lk2 vocal strata.
This week’s edition puts us at 685 pages and almost 288,000 words. Notable highlights:
Read More »
- Signals reconstruction and tagging completed for a large chunk of Lk1/GMarc chapter 12, as well as more minor corrections made throughout other chapters
- Dozens of new entries to the Data Dictionary, adding further clarification and disambiguation of the Qn, Lk1, and Lk2 vocal strata
- Hundreds of “clear” vocal signal tags assigned across any and all strata throughout the entire reconstruction in anticipation of the future compilation of NLP training datasets for each vocal stratum
- New restorations made to unattested earlier chapters (e.g., QnLk1 7.31-32) in light of the above
This week’s edition puts us at almost 680 pages and over 280,000 words. Major highlights:
Read More »
- A new section on the history of scholarship on Computational Linguistics and the Synoptic Problem. Ever wonder why we couldn’t solve the Synoptic Problem before? Faulty understanding and modeling of the problem and only using a fraction of the relevant datasets!
- New additions and numerous corrections to our statistical proofs. What happens when you bring together statistics about GMarc’s abundance of triple tradition passages with statistics about its lack of Markan and Lukan passages? Hint: if this were judo or MMA, this would be the submission hold that ends the match against defenders of the early orthodox hypothesis that GMarc is derived from Luke.
- A new Lk2 clean vocal stratum training dataset for Natural Language Processing and Computational Linguistics. Ever wonder what the redactor of Late Luke (Lk2) unfiltered without synoptic noise sounds like? Any of the coders out there eager to have lemmatized and morphologically tagged datasets to test our hypotheses? Here ya go!
This week’s edition puts us at almost 650 pages and over 270,000 words. Lots of new additions have been made to the Comparative Restoration (esp. for chp 12) and to the Data Dictionary. We’ve also made some significant corrections to previous chapters as we continue to follow a cycle of continuous improvement, simultaneously tracing the transmission and syntheses of vocal signals across time and clarifying discrete vocal strata from specific moments in time. I’ve also been enjoying reviewing the scholarly literature in Computational Linguistics about authorship attribution and recognition and figuring out how to adapt the methods of other scholars and also develop new ones specifically customized for ancient Greek texts and the Synoptic Problem. Should have some important scientific findings to announce in the next month or two.
Read More »
Our first edition of the new year puts us over 610 pages and over 265,000 words. The big addition for this version is a twofold digital edition of Harnack’s reconstruction of the Gospel of Marcion in our in-book Dataset and Code Repository. The first is untagged Greek text for human readers and the second is lemmatized with full morphological tagging for deep Computational Linguistics analysis. We welcome and encourage other scholars to use this dataset to evaluate our hypotheses and come to your own conclusions about whether the Gospel of Marcion is in fact the third gospel stratum, composed partly of early Mark and mostly of the first Gospel (Qn).
Read More »
This evening’s edition brings us to 580 pages of detailed and ever-growing evidence proving my five hypotheses to uncover and reconstruct the first and third gospel strata. Besides reorganizing the table of contents and chapter order to be cleaner, we’ve added lots of new content:
Read More »
- an in-book Dataset and Code Repository section, which debuts here with a digital edition of Harnack’s critical reconstruction of Marcion’s Gospel
- lots of footnotes on the history of scholarship of Marcion’s Gospel
- a new section, “Half of a Love Letter to Advocates of the Marcionite Hypothesis”
- a new excursus calling for a new Quest for the Historical Marcion and critiquing the failure of scholars to set Marcion squarely and thoroughly within his Roman historical setting, almost entirely ignoring the major role that Pliny the Younger (the first Roman official on record to execute Christians) probably played in Marcion’s life and thinking as his local governor in Pontus
We’ve now reached 550 pages and 250,000 words, up from 530 pages and 225,000 words in the last version.
We’ve also reclassified the “Linguistic-Syntactical Vocal Strata Profiles” as an embedded Data Dictionary with distinct headings that are now increasingly cross-referenced from footnotes. Hundreds of new entries are included; many of these entries add significant further evidence clarifying the distinct voices (vocal strata) of Qn, Lk1 and Lk2 within Luke.
Read More »