My most recent endeavor is to make stellar Bible API. This post is to explain some of the pains I took to handle some different searches for the API.
When searching the Bible, there’s a lot of different input that a user may provide. For example, a user may type:
- Jesus John
- Jesus John Peter
- Jesus John 1
- Jesus 1 John 2
- Jesus 1 John 2 sin
- Jesus 1 John 2:1-10
- John 1
- John 1:1
- John 1:1-10
- John 1:1-10 love
- the love
And on and on the list goes. So basically, we’re looking at the following different input:
- Common word
- Bible book
- Leading number on a Bible book (the ‘1’ in 1 John)
- Chapter:Verse combo
- Chapter:Verse-Verse range combo
My Bible database with verses contains a single table which contains the version, book, book order, chapter, verse, and verse text. Rather than having multiple tables with version has many books, which has many chapters, which has many verses, I decided to go with a single table. I mainly did this for speed, as the extra lookup are unnecessary. Additionally, the verses table would have to have the chapter, book, and version foreign keys in them anyway, so it’s kind of pointless to separate it all out. That said, let’s take a look at how I’m parsing the data. The code builds two different searches, one for passages, and one based on all input as keywords. Here’s the code, and we’ll dissect it below.
So let’s break it down:
Here we’re just defining the variables and splitting up q, which was the search criteria in the
Next, we pull out the common words and add a notice for the user.
First we make sure that they provided a search term after removing common words.
Last check before the real parsing, we see if the length is one and it’s not a number and it’s length is 3 or more, we find any books that it may be, and we assign the term as a something we’re going to search for in the keyword search. The
else is where the real meat begins.
for loop, we check if any word is not numeric and it’s length is shorter than 3. I throw out any searches for these super short words, or partial words. I also add these to
common_words_used, for checking against.
In this chunk, I first assign the term as a keyword, for the keyword search.
Next, I determine if it’s a number. This means that the character could be a chapter or a leading number on a Bible book. In that case, we need to look at the previous and next term. I find the previous and next number in the index, and then find the previous terms. As Python’s
list wraps index lookup (
list[-1] is valid and returns the last item in the list), I make sure the previous index number is greater or equal to 0 first. Once I find the previous term, I look up all book names from that term. If there are any, I assign the current term as the chapter number.
If I haven’t found a book yet, then I check on the next term. I find book names based on the next term, but I also include the current term, which would be a leading number on a book. If there are any books (which should only ever return 1 or 0), then I assign it to book.
For the next term I have to wrap in
except statements for when the index is out of bounds. Similarly, I could get the length of new_terms and compare that to make sure I’m within bounds too.
If the term was not numeric, then it was a keyword, common word, Bible book, or a chapter:verse(-verse range) combo. I already removed the common/short words, so it will either be a Bible book, a keyword, or a chapter+.
First, we get all the books based on the term. Second, we run a regular expression on the term, in the expected chapter:verse-verse2 format, where only chapter is required. If there’s a result from the regular expression, then we assign the appropriate variables (chapter, verse, and verse2, if it’s a range search).
If there were no results from the regex, and the length of the books is 0, then it must be a keyword. Finally if there are any books, we add them to our book array.
For the last piece of the puzzle, we finalize building the criteria arguments and then do the actual lookups.
First we assign the given book. Next we assign the chapter. Last, we assign verse if there is no verse 2 or verse 2 is before verse 1. If that’s not the case, then we assign a range search.
For the passage search, we never do a passage search if there’s only one term given. So if you search for “John”, then you’ll only get keyword searches, but you will get the books with John in the title listed (John, 1 John, 2 John, 3 John). If you did a search for “1 John”, we won’t do a passage search without a chapter. We need to have several arguments, so
args must be greater than 1, and we should have an array of all the books we want to look in, so that must be greater than 1 as well. We include this even if we have an exact book, which will override the IN statement.
For the keyword search, we first check that we don’t have any errors and that there’s at least one keyword argument. If so, then we find all the keyword passages based on these.
And that’s pretty much it. Overall, it ended up being quite a bit of work to figure out the exact intracies of how a Bible search may work, but it ended up being worth it. I can guarantee there are some crazy searches that won’t work with my parsing, but it solves 95% of my use-cases. With search finished, the API is almost finished now.
Questions? E-mail me: this domain AT gmail DOT com