JavaDoc Search Specification
This document specifies the behaviour of the JavaDoc search feature for JDK 22.
Overview
The JavaDoc Search feature was introduced in JDK 9 with JEP 225. The initial JEP did not include a fine-grained specification of the search algorithm, and the algorithm has evolved considerably since the initial implementation. The purpose of this document os to provide an up-to-date specification of the search algorithm in documentation generated by the JavaDoc standard doclet.
Definitions
In this document, the term entity
is used to describe an
artifact in documented code that is discoverable through the JavaDoc
Search feature. This includes program elements and other items defined
in documentation comments.
The term signature
is used to describe the exact text
used to represent an entity in JavaDoc Search.
The terms separator
, identifier
,
name
, simple name
,
qualified name
, and fully qualified name
are
used as defined in the Java Language
Specification sections 3.8,
3.11,
6.2,
and 6.7.
This implies the use of Unicode
for the encoding and processing of entity signatures.
The terms letter
, uppercase letter
,
lowercase letter
, digit
, and
white space
refer to the "Letter", "Uppercase_Letter",
"Lowercase_Letter", "Decimal_Number", and "Space_Separator" general
categories of the Unicode
standard.
The term camel-case
is used to describe mixed-case
identifiers that make use of uppercase letters to mark word boundaries
within the identifier.
The term query string
is used to describe the characters
entered in the search input box by the user.
Examples in the following sections refer to or are taken from the standard Java SE class libraries.
Categories of Searchable Entities
The following sections list the kinds of program elements and other entities covered by JavaDoc Search and the format of their signatures.
Modules
The signature of a named module is the module name.
Example signature:
java.base
Packages
If a package is in a named module, the signature of the package is the name of the module, followed by '/', followed by the fully qualified name of the package; if a package is not in a named module, the signature is the fully qualified name of the package.
Example signature:
java.base/java.util.concurrent
Types
The signature of a class or interface type is its fully qualified type name.
Example signatures:
java.lang.Object
java.util.Map.Entry
Members
The signature of a member is the fully qualified name of its containing type, followed by '.', followed by the simple name of the member, followed by a list of parameter types if the member is a constructor or method. The list of parameter types is '(', followed by the simple names of formal parameter types of the constructor or method separated by ', ', followed by ')'.
Example signatures:
java.lang.Object.wait(long, int)
java.lang.String.String(String)
java.lang.Byte.MAX_VALUE
JavaDoc Tags
Various JavaDoc tags can be used to create searchable entities.
- The
{@index}
tag is used to create generic searchable entities. - The
{@systemProperty}
tag is used to document system properties. - The
@spec
tag is used to refer to external specifications.
The signature for each of these tags is a string provided in the tag.
Example signatures and tags:
Java Collections Framework
for{@index "Java Collections Framework"}
jar
for{@index jar jar tool}
user.timezone
for{@systemProperty user.timezone}
Java Native Interface Specification
for@spec jni/index.html Java Native Interface Specification
Search Rules
The following sections describe the rules used to search entity signatures for a given query string.
Although the purpose of this document is not to describe the implementation of the Search feature, an understanding of the sequence of actions is helpful in understanding rules applied. A search involves the following steps:
- The query string is parsed and compiled into a pattern.
- The pattern is matched against the signatures of all searchable entities.
- Entities with matching signatures are filtered and scored according to the rules described in the sections below.
- Entities with a score exceeding a certain threshold are presented to the user ordered by their score.
The scoring mechanism in step 3 is subtractive: it starts with a high score for all matching entities and diminishes the score to rank entities lower or exclude them from the results.
Case Sensitivity
The search pattern is matched against signatures in a case-insensitive manner. If the query string contains uppercase letters, signatures with matching capitalization are scored higher than ones with non-matching capitalization. Additionally, query strings containing uppercase letters cause the rules in the Camel-Case Matches section to be applied.
Query String | Matches |
---|---|
Object |
type java.lang.Object |
object |
type java.lang.Object |
obJECT |
type java.lang.Object |
MAX_VALUE |
member java.lang.Byte.MAX_VALUE |
max_value |
member java.lang.Byte.MAX_VALUE |
max_VALUE |
member java.lang.Byte.MAX_VALUE |
Word Boundaries
Word boundaries play an important role in determining the score of matching entities. The following are considered word boundaries in entity signatures:
- The beginning or end of an identifier
- One or more letters delimited by non-letter characters
- One or more digits delimited by non-digit characters
- An uppercase letter within a string of mixed-case letters and digits
Left Word Boundaries
The beginning of a match in an entity's signature must be a left word boundary, or a separator preceding a left word boundary, in order for the entity to be included in the search results.
Query String | Matches | Does not match |
---|---|---|
base |
module java.base |
type java.sql.DatabaseMetaData |
.util |
package java.util |
type javax.swing.SwingUtilities |
map |
types java.util.Map ,
java.util.HashMap |
type javax.swing.text.Keymap |
.map |
type java.util.Map |
types java.util.HashMap ,
javax.swing.text.Keymap |
val |
member java.lang.Byte.MAX_VALUE |
type java.nio.InvalidMarkException |
32 |
type java.util.zip.Adler32 |
@spec tag for
RFC 1323 |
Matches may be scored differently depending on the type of the left word boundary they begin at. For example, a match starting at the beginning of an identifier may be scored higher than one starting in the middle of a camel-case identifier.
Example: - Query string set
matches types
java.util.Set
and java.util.HashSet
but the
former is ranked higher than the latter.
Right Word Boundaries
The end of the query string is not required to match a right word boundary in an entity's signature in order for the entity to be included in the search results.
Query String | Matches |
---|---|
Obj |
type java.lang.Object |
j.l.o |
type java.lang.Object |
However, matches that include a right word boundary are scored higher than matches which do not (and therefore only match part of an identifier, name or word).
Example:
- Query string
java.lang.ref
matches both packagesjava.lang.ref
andjava.lang.reflect
but the former is ranked higher than the latter because it matches the whole signature.
Camel-Case Matches
Since uppercase letters followed by lowercase letters or digits are also considered word boundaries, the rules for left and right word boundaries also apply to camel-case signatures.
In addition, when searching for camel-case signatures, some or all of the lowercase letters or digits between the uppercase characters can be omitted from the query string.
Query String | Matches |
---|---|
FileInStr |
type java.io.FileInputStream |
FIS |
type java.io.FileInputStream |
j.io.FileInpS |
type java.io.FileInputStream |
FileInStr(FiD |
member
java.io.FileInputStream.FileInputStream(FileDescriptor) |
FInpS(FD |
member
java.io.FileInputStream.FileInputStream(FileDescriptor) |
FINPS(FD |
no match as not all uppercase letters match a camel-case name |
White Space and Multiple Search Terms
White space in the query string is significant if it occurs between non-space characters. Regions of non-space characters in the query string separated by white space are considered search terms. The following rules apply for query string with multiple search terms:
- All search terms must match the signature in the order in which they occur in the query string, although not necessarily in a contiguous region.
- The rules for simple query strings are applied to each individual search term. In particular, the match for each search term must be located on a left word boundary in the signature.
The number of white space characters between search terms is not significant. Leading and trailing white space in the query string is never significant. A query string must contain at least one non-space character to trigger a search.
Query String | Matches |
---|---|
string append long |
methods java.lang.StringBuffer.append(long) and
java.lang.StringBuilder.append(long) |
obj eq o o |
methods java.util.Objects.equals(Object, Object) and
java.util.Objects.deepEquals(Object, Object) |
java frame |
type java.awt.Frame and search tag
Java Collections Framework |
ob ject |
no match as ject does not match a left word boundary in
Java SE |
Core Signature Regions
For some kinds of program elements, additional filtering rules are applied depending on the location of the match within the signature. This is done to prioritize matches in parts of the signature that are particular to the program element over matches in less significant areas.
The part of a signature considered most significant for a program element is called the core region. The table below lists the program elements for which this mechanism is applied, and the core regions of their signatures.
Program Element | Core signature region |
---|---|
Package | Fully qualified package name |
Type | Simple type name |
Member | Simple member name |
With the exceptions documented below, a match is excluded from the search results unless it covers at least part of the core signature region. If the query string has multiple search terms, at least one of them must match a part of the core signature region.
Query String | Matches |
---|---|
java.base |
module java.base but not any packages contained
therein |
java.lang |
package java.lang and its subpackages but not type
contained therein |
java.util.Map |
type java.util.Map but not type
java.util.Map.Entry |
While formal parameter types of executable members are not part of
the core signature region, omitting the member name and starting the
query string with(
bypasses core region filtering to allow
searching for executable members with specific parameter types.
Query String | Matches |
---|---|
int |
type java.lang.Integer but not method
java.lang.String.valueOf(int) |
(int |
methods and constructors with int as first parameter
type |
Listing Child Elements
Any query string that matches a program element can be turned into a query string that matches all its child elements by appending the separator character used in the signatures of child elements. This bypasses core region filtering for contained elements to include them in the search results.
Query String | Matches |
---|---|
j.b |
module java.base but not the packages contained
therein |
j.b/ |
packages contained in module java.base |
java.lang |
package java.lang and its subpackages, but not types
contained therein |
java.lang. |
types in and subpackages of package java.lang |
system |
type java.lang.System but not its members |
system. |
members of type java.lang.System |
Supported Browsers
The search feature is supported in the following browsers.
Browser | Version | Platform |
---|---|---|
Apple Safari | tbd | MacOS |
Google Chrome | tbd | All supported OSs |
Microsoft Edge | tbd | Windows OSs |
Mozilla Firefox | tbd | All supported OSs |