This week’s version initiates data normalization for the study of the Gospel of Marcion in concert with our freshly revised datasets for the fourth round of review of a short data paper and related datasets we have submitted to the Journal of Open Humanities Data, whose Editor-in-Chief is Barbara McGillivray at the Alan Turing Institute at Cambridge. The peer-review process has been wonderful and indeed transformative in my thinking and methodology.
The normalization of GMarc data (transforming past messy/noisy reconstructions into standardized data) will—mark my words—prove the tipping point in the transformation of the scholarly study of the canonical and non-canonical gospel strata into legitimate Data Science. In concert with our new normalization standards and normalized datasets of public domain reconstructions, we also release a slew of data visualizations illustrating the contents and relationships of all past GMarc reconstruction datasets. These visualizations clearly reinforce our scientific hypotheses and proofs that GMarc was in fact the third gospel stratum, based on two sources (the first gospel stratum, Qn, and an early version of Mark).
The age of hagiographical controlling bias and assumptions in Gospel Studies is over. The age of Gospel Data Science is upon us. Scholars can either get on board or get out of the way, but no matter what you do, you can’t stop this.
This week’s version puts us over 400,000 words. In concert with the peer-review of our Harnack 1924 datasets for the Journal of Open Humanities Data, we have compiled datasets for other closely related, public domain reconstructions of Marcion’s Gospel. Today’s release features Zahn’s 1892 reconstruction, the second major reconstruction in the history of scholarship. Zahn’s edition totals 10571 words, far less than Hahn’s 14400, yet far more than Harnack’s 4207 4338. The disparity between these reconstructions exemplifies how much the results of reconstruction are determined by a priori assumptions and methodologies. We anticipate adding granular word counts by passage and tradition type (single, double, triple) for the editions of Hahn and Zahn in the Data Dictionary (DD 1.6) of next week’s LODLIB update.
Lot’s of progress made in today’s upload. We’d specifically like to call attention to an expansion to our statistical proofs, especially in conversation with Daniel Smith’s 2019 chapter in BZNW 235 focusing on a statistical analysis of GMarc. In the interest of facilitating access for readers, we present the bulk of the content found on the page in our LODLIB that details our finding, building on Smith’s verse counts but nuancing them and challenging his starting goal (“On Not Dispensing with Any of Q”) and ultimate conclusions.
Smith Verse Count: GMarc Attested as a Percentage of Lk2
GMarc Verses Attested
GMarc Attested / Lk2
Even without questioning or changing any of the traditional contents considered secure for Q, according to Smith’s verse count approach, Q verses are the best attested of any tradition type. That is a highly significant finding on its own.
But what happens if we adjust our method to account separately for the 83 verses consideredbut doubted or rejected within CEQ? … [more below the fold]
Identification of an additional 20 signature features showing statistically significant variance between Lk1/GMarc and Lk2 that will be used in future proofs of the Schwegler hypothesis and our five hypotheses. These now include several features with disproportionately high frequencies in Lk1/GMarc compared to Lk2, not just vice versa. Many of these newly listed features are morphologically nuanced bigrams, trigrams, and quadigrams we’ve been identifying over the past several editions of our LODLIB in DD 1.2.
Forked three sections (Computational Linguistics and the Synoptic [Signals] Problem; Data Visualizations; Excursus on Related Topics) from other areas to have their own sections.
Hundreds more “clear” vocal signal tags are now assigned across any and all strata throughout the entire reconstruction in anticipation of the future compilation of NLP training datasets for each vocal stratum.
Dozens of new entries to the Data Dictionary, adding further clarification and disambiguation of the Qn, Lk1, and Lk2 vocal strata.