This week’s LODLIB version now contains the author’s accepted version of our data paper and related transformational, normalized datasets based on Roth’s 2015 reconstruction of the Gospel of Marcion (GMarc). As we note in the paper, Roth’s is the most widely accepted reconstruction in scholarship today. Sincere thanks go to Dr. Roth and to Tanja Cowall at Brill’s copyright office for securing an agreement for the distribution of these datasets under a CC-BY-NC-ND license. We also would like to thank the editor-in-chief and three anonymous reviewers at JOHD for their excellent and constructive feedback on this work. While the data paper is still at print, we are happy to go ahead and share the DOIs that have already been minted for the paper (https://doi.org/10.5334/johd.57) and datasets (https://doi.org/10.7910/DVN/BYPOOR). [The latter DOI was randomly (providentially?) generated, by the way! We have no control over what specific DOI is generated when we upload datasets to the Harvard JOHD Dataverse.]
This week’s edition puts us over 730 pages and 303,000 words. The main addition this week is the Lk2-CINP dataset. CINP stands for “Clear and Implicitly Not Present.” Like Lk2-CENP, this dataset records the redactor of Late Luke (Lk2) speaking freely without noise from prior gospel strata and is roughly the same size, representing about 20% of the total word count of Lk2. While we may make additions or subtractions from this dataset in future editions, depending on our restoration and signal transmission tracing work, we are confident that overall this dataset is a high fidelity representation of the Lk2 vocal stratum and thus ideal for modeling and training. We have already started incorporating the Lk2-CINP dataset into our Computational Linguistics analysis and visualizations, which now also include Acts and the Gospel of John for comparison.
This week’s edition puts us at nearly 720 pages and 300,000 words. This is the week where our research really started to integrate with RStudio. We spent quite a bit of time troubleshooting Greek unicode and UTF-8 encoding issues in RStudio on our main Windows machine and getting Microsoft Linux Subsystem up and running so we can move back and forth between RStudio in both environments. Rather than build unicode points throughout our scripts, we decided to front load this work.
Thus our Code Repository debuts with two major scripts: one that transliterates all Greek unicode characters into ASCII English letter equivalents; and another that loads both Greek and English UTF-8 txt files, then quickly and cleanly parses six vectors for use in deep Computational Linguistics analysis (whole, lemma, and morphology for both languages). With the in-book datasets and code, experts and novices in Gospel Computational Linguistics can start to evaluate and build on our research. Our Data Visualizations section (freshly reformatted to tabloid layout) also features a new section that builds on this: Top Ten Words tables and graphs for the Harnack, Roth, and CENP datasets.
This week’s edition puts us over 700 pages and 293,000 words. Notable highlights:
- Identification of an additional 20 signature features showing statistically significant variance between Lk1/GMarc and Lk2 that will be used in future proofs of the Schwegler hypothesis and our five hypotheses. These now include several features with disproportionately high frequencies in Lk1/GMarc compared to Lk2, not just vice versa. Many of these newly listed features are morphologically nuanced bigrams, trigrams, and quadigrams we’ve been identifying over the past several editions of our LODLIB in DD 1.2.
- Forked three sections (Computational Linguistics and the
Synoptic[Signals] Problem; Data Visualizations; Excursus on Related Topics) from other areas to have their own sections.
- Hundreds more “clear” vocal signal tags are now assigned across any and all strata throughout the entire reconstruction in anticipation of the future compilation of NLP training datasets for each vocal stratum.
- Dozens of new entries to the Data Dictionary, adding further clarification and disambiguation of the Qn, Lk1, and Lk2 vocal strata.
This week’s edition puts us at 685 pages and almost 288,000 words. Notable highlights:
- Signals reconstruction and tagging completed for a large chunk of Lk1/GMarc chapter 12, as well as more minor corrections made throughout other chapters
- Dozens of new entries to the Data Dictionary, adding further clarification and disambiguation of the Qn, Lk1, and Lk2 vocal strata
- Hundreds of “clear” vocal signal tags assigned across any and all strata throughout the entire reconstruction in anticipation of the future compilation of NLP training datasets for each vocal stratum
- New restorations made to unattested earlier chapters (e.g., QnLk1 7.31-32) in light of the above
This week’s edition puts us at almost 680 pages and over 280,000 words. Major highlights:
- A new section on the history of scholarship on Computational Linguistics and the Synoptic Problem. Ever wonder why we couldn’t solve the Synoptic Problem before? Faulty understanding and modeling of the problem and only using a fraction of the relevant datasets!
- New additions and numerous corrections to our statistical proofs. What happens when you bring together statistics about GMarc’s abundance of triple tradition passages with statistics about its lack of Markan and Lukan passages? Hint: if this were judo or MMA, this would be the submission hold that ends the match against defenders of the early orthodox hypothesis that GMarc is derived from Luke.
- A new Lk2 clean vocal stratum training dataset for Natural Language Processing and Computational Linguistics. Ever wonder what the redactor of Late Luke (Lk2) unfiltered without synoptic noise sounds like? Any of the coders out there eager to have lemmatized and morphologically tagged datasets to test our hypotheses? Here ya go!
This evening’s edition brings us to 580 pages of detailed and ever-growing evidence proving my five hypotheses to uncover and reconstruct the first and third gospel strata. Besides reorganizing the table of contents and chapter order to be cleaner, we’ve added lots of new content:
- an in-book Dataset and Code Repository section, which debuts here with a digital edition of Harnack’s critical reconstruction of Marcion’s Gospel
- lots of footnotes on the history of scholarship of Marcion’s Gospel
- a new section, “Half of a Love Letter to Advocates of the Marcionite Hypothesis”
- a new excursus calling for a new Quest for the Historical Marcion and critiquing the failure of scholars to set Marcion squarely and thoroughly within his Roman historical setting, almost entirely ignoring the major role that Pliny the Younger (the first Roman official on record to execute Christians) probably played in Marcion’s life and thinking as his local governor in Pontus
We’ve now reached 550 pages and 250,000 words, up from 530 pages and 225,000 words in the last version.
We’ve also reclassified the “Linguistic-Syntactical Vocal Strata Profiles” as an embedded Data Dictionary with distinct headings that are now increasingly cross-referenced from footnotes. Hundreds of new entries are included; many of these entries add significant further evidence clarifying the distinct voices (vocal strata) of Qn, Lk1 and Lk2 within Luke.Read More »
Today we release v1.30, containing new statistical proofs related to my discovery of the First Gospel (Qn) as an actual, historical text whose vocal stratum data can be proven and restored using modern data science methods. This goes together with my scientific reconstruction of Marcion’s Gospel as the third gospel stratum. We are now at 530 pages and almost 225,000 words, up from 500 pages and 210,000 words in our last version.
The main set of new proofs is the “Statistical Analysis of GMarc and Single, Double, and Triple Traditions.” By carefully comparing attestations and word counts of these different tradition types in GMarc and Lk2, we show clearly that GMarc has a consistent, systematic lack of single traditions compared to double and especially triple traditions. These patterns are too consistently evident across an inconsistently attested text to be explained logically as the product of Marcion’s editorial work or of random or even deliberate patterns of early orthodox attestation or suppression. The only scientifically sound explanation of the consistent favoring of double and triple traditions to single traditions in GMarc that Lk2 was a revised and expanded version of GMarc. The payoff of this detailed analysis comes in the following tables:Read More »