Search in Results

 

There are three types of queries you can use to search in your results—that is in and across the articles or documents that have been retrieved by QUOSA. The three types are Boolean, Regular Expression, and Left Truncation—detailed syntax for each type of query is described lower down in the document.

 

Which query type is best for my task?

 

Boolean: use it for basic, and advanced, keyword and phrase searching.  It supports use of logic operators in the query—AND, OR, NOT plus wildcards, proximity limits, brackets to group query words, sub-scripts and super-scripts, plus others.  For multiple word queries, Boolean assumes an OR between terms unless you specify otherwise.  Boolean search is the fastest way to search through multiple documents.

 

For example, “statin extracted”~5 will find all the documents with words “statin” and “extracted” in a distance of 5 words or less from each other—in either direction.

 

Regular Expression: unlike Boolean, which searches on whole words only, Regular Expression searches through the text character by character.  It can be very powerful, but is may be the least familiar to you. 

 

Use it to search for a specific character string—say, an amino acid string, that may be found in an article as part of a longer strand—which therefore would be missed by Boolean type query. Or, you can search on symbols—for example, FcRN-/- can be found by Regular Expression, but not by Boolean.  You can also use it to find all articles where more than, say, 500 people are enrolled in a clinical trial.  Further, a regular expression, often called a pattern, is an expression that describes a set of strings. They are usually used to give a concise description of a set, without having to list all elements.  For example, the set containing the three strings Handel, Händel, and Haendel can be described by the pattern "H(ä|ae?)ndel" (or alternatively, it is said that the pattern matches each of the three strings).

 

Left Truncation: use it when you know the end of the word that you seek, but the beginning of the word is unknown, ambiguous or simply can be varied. 

 

For example, to find beta blockers ”praprandolol” “atenirolol” etc. the following query can be used: *olol

 

How do I construct my query?

1.    Boolean

A Boolean query is made up of terms and operators, and can be made up of single terms and/or phrases.

A single term is a single word, such as "protein" or "acid"

A phrase is a group of words surrounded by double quotes, such as "heat shock"

Multiple terms can be combined with Boolean operators to form complex queries. The OR operator is the default conjunction operator. This means that if there is no Boolean operator between two terms, the OR operator is used. The OR operator links two terms and finds a document if either of the terms exists in it.

The terms in a query are NOT case-sensitive. 

1.1.           Operators                                                                       

Terms in Boolean queries can be combined using logic operators (OR, AND, "+", NOT, and "-").

Note: The OR, AND, and NOT operators must be entered in CAPS.

OR (or the || symbol)

The OR operator is the default conjunction operator. This means that if there is no Boolean operator between two terms, the OR operator is used. The OR operator links two terms and finds a document if either of the terms exists in it. The symbol || (two bars) can be used in place of the word OR.

For example, to search for documents that contain either "heat shock" or just "heat," specify the search expression as follows:

“heat shock” heat

 

or

 

“heat shock” OR heat

 

AND (or the && symbol)

 

The AND operator finds documents where both terms exist anywhere in the text. The symbol && (two ampersands) can be used in place of the word AND.

For example, to search for documents that contain "heat shock" and "heat protein," specify the search expression as follows:

"heat shock" AND "heat protein"

 

+ (plus sign)

The "+" operator (known as the required operator) requires that the term after the plus sign exist somewhere in a document.

For example, to search for documents that must contain "heat" and may contain "shock," specify the search expression as follows:

+heat shock

 

NOT (or the ! symbol)

 

The NOT operator excludes documents that contain the term after NOT. The symbol “!” (exclamation point) can be used in place of the word NOT.

To search for documents that contain "heat shock" but not "heat protein," specify the search expression as follows:

“heat shock” NOT “heat protein”

Note: The NOT operator cannot be used with just one term. For example, the following search will return no results:

NOT “heat shock”

 

As a workaround QUOSA adds one word exclusionpattern to each and every document it indexes. As a result to find all documents that do not contain phrase “heat shock” you can use this query:

           

exclusionpattern NOT “heat shock”

 

- (minus sign)

The "-" (minus sign) or prohibit operator excludes documents that contain the term after the minus symbol.

For example, to search for documents that contain "heat shock" but not "heat protein," specify the search expression as follows:

“heat shock” - “heat protein” 

1.2.           Grouping

Parentheses can be used to group terms to form sub-queries, which can be very useful to control Boolean logic in a query.

For example, to search for either "heat" or "shock" and "protein," specify the search expression as follows:

(heat OR shock) AND protein

1.3.           Wildcard Characters

There are two wildcard characters that can be used in Boolean queries. They are as follows:

* (asterisk symbol)

 

An asterisk (*) may be used to specify zero or more alphanumeric characters. For example, searching for the term h*s would find results that contain words such as “his,” “homes,” and “herbaceous.”

 

? (question mark symbol)

 

The question mark (?) may be used to represent a single alphanumeric character in a search expression. For example, searching for the term “ho?se” would find results that contain words such as “house” and “horse.”

 

Note: You cannot use * or ? as the first character in any term in a search expression. Following two examples are showing queries that can not be used in Boolean Search:

*ice

capsule AND *activity

Please see ‘Left Truncation’ query type below if you want to use this approach.

 

1.4.           Fuzzy Searches

A fuzzy search can be used to find words similar in spelling. To create a fuzzy search, add the "~" (tilde) symbol at the end of a single-word term. For example, to search for a term similar in spelling to "roam," specify the fuzzy search as follows:

roam~

This search will find terms such as “foam” and “roams.”

1.5.           Proximity Searches

A proximity search can be used to find words that are within a specific distance to other words. To create a proximity search, add the "~" (tilde) symbol at the end of the words. For example, to search for the words "heat" and "shock" within 10 words of each other in a document, specify the search as follows:

“heat shock”~10

 

1.6.           Searching for an expression with sub or super script (supported only for PC Version)

 

Documents containing expressions with sub and super script can be found the following way:

If keyword of interest has a subscript it has to be surrounded by [SB]xxx[SB] tags in the search query.

If keyword of interest has a superscript it has to be surrounded by [SP]xxx[SP] tags in the search query.

 

For example, you need to enter

 

[SP]14[SP]C to search for 14C

10[SP]4[SP] to search for 104 

CO[SB]2[SB] to search for CO2

p27[SP]kip1[SP] to search for p27kip1

 

Sub and super scripts are most reliably searchable in html versions of full-articles, rather than PDFs.

 

1.7.           Escaping Special Characters

Boolean search supports escaping special characters that are part of the search syntax. The current list of special characters includes the following:

+ - && || ! ( ) { } [ ] ^ " ~ * ? : \

To escape these characters, use the “\” (backslash symbol) before the character. For example, to search for (1+1):2, specify the search expression as follows:

\(1\+1\)\:2

 

 


2.    Regular Expressions

Regular expressions are made up of normal characters and metacharacters.

Normal characters include upper- and lowercase letters and digits. In QUOSA, regular expressions are case-insensitive.

Metacharacters are symbols (such as the dollar sign) that have special meanings (described below).

In the simplest case, a regular expression looks like a standard search string. For example, the regular expression “testing contains no metacharacters. It will match “testing,”123testing,” and “Testing,” but it will not match “sting.”

The following metacharacters can be used with regular expressions:

.

 

Matches any single character. For example, the regular expression r.t would match the strings rat, rut, rot, but not root or r t

^

 

Matches the beginning of a word. For example, the regular expression ^the would match the word therefore or on "the" in the string "in the event" but would not match "otherwise." 

$

 

Matches the end of a word. For example, the regular expression weasel$ would match the word weasel but not the word weasels. 

*

 

Matches zero or more occurrences of the character immediately preceding. For example, the regular expression .* means match any number of any characters. 

+

 

Matches one or more occurrences of the character or regular expression immediately preceding. For example, the regular expression 9+ matches 9, 99, 999.

?

 

Matches 0 or one occurrence of the character or regular expression immediately preceding.

\

 

This is the quoting character that is used to treat the character that follows as an ordinary character. For example, \$ is used to match the dollar-sign character ($) rather than the end of a word. Similarly, the expression \. is used to match the period character rather than any single character. 

[ ] 
[c1-c2]
[^c1-c2]

 

Matches any one of the characters between the brackets. For example, the regular expression r[aou]t matches rat, rot, and rut, but not ret. Ranges of characters can be specified by using a hyphen. For example, the regular expression [0-9] means match any digit. Multiple ranges can be specified as well. The regular expression [A-Za-z] means match any upper- or lowercase letter. To match any character except those in the range, the complement range, use the caret as the first character after the opening bracket. For example, the expression [^269A-Z] will match any characters except 2, 6, 9, and uppercase letters. 

( )

 

Treats the expression between the left and right parentheses as a group. Use with the quantity modifiers (*, +, ?, {}) and with |.

|

 

“Or” two conditions together. For example, t(ry|op) matches try and top but not toy.

{i}
{i,j}

 

Matches a specific number of instances or instances within a range of the preceding character. For example, the expression A[0-9]{3} will match "A" followed by exactly three digits (that is, it will match A123 but not A1234). The expression [0-9]{4,6} matches any sequence of 4, 5, or 6 digits.

 

To match multiple-word phrases, separate each word with a single space. For example, the regular expression th.* .*s f.n.? will match “this is fine” and “that was fun,” but not “the cat was found.”

 

Examples:

 

The simplest metacharacter is the dot. It matches any one character (excluding the new-line character). Consider a file named test.txt consisting of the following lines:

he is a rat
he is in a rut
the food is Rotten
I like root beer

The regular expression r.t matches an r followed by any character followed by a t. It will match rat and rut. It will also match the Rot in Rotten because regular expressions in QUOSA are case-insensitive.

To match characters at the beginning of a word, use the circumflex character “Ù” (sometimes called a caret). For example, to find the words containing the string "he" at the beginning of each word in test.txt, you might first think of using the simple expression he. However, this would match the in the third line. The regular expression ^he, however, would only match the h at the beginning of a word.

Sometimes it’s easier to indicate something that should not be matched rather than all the cases that should be matched. When the circumflex is the first character between square brackets, it means to match any character that is not in the range. For example, to match he when it is not preceded by t or s, the following regular expression can be used:

[^st]he.

Character ranges can be specified between the square brackets. For example, the regular expression [A-Z] matches any letter in the alphabet, upper- or lowercase. The regular expression [a-z] is equivalent. The regular expression [A-Z][A-Z]* matches a letter followed by zero or more letters. You can use the + metacharacter to do the same thing, that is, the regular expression [A-Z]+ means the same thing as [A-Z][A-Z]*.

To specify the number of occurrences matched, use braces. For example, to match all instances of 100 and 1000 but not 10 or 10000, use the following:

 

10{2,3}.

 

This regular expression matches the digit 1 followed by either two or three 0's. A useful variation is to omit the second number. For example, the regular expression 0{3,} will match three or more successive 0's.         


3.    Left Truncation

You can perform searches to find words that end in certain letters. This type of search is called left truncation, meaning part of the word to the left is ignored when searching for words with a common ending. An asterisk is used as part of this search.

 

For example, to find all words ending in "olol" in a set of articles, enter

 

*olol

 

as the left truncation search term. QUOSA will find words such as ”praprandolol” “atenirolol” and so on.