In non-declined languages such as English, the parts of speech of our words are usually simply 'adjective' and 'noun'. A language "declines" its noun phrases if the words in a noun phrase have different forms that depend on the function of the noun phrase in a sentence; for example, in German, adjectives take suffixes that depend upon the gender of the noun being modified and the function of the noun phrase in the sentence (subject, direct object, etc). In a declined language, it might be desirable to use separate parts of speech for separate declined adjective and noun forms.
NounPhraseWithVocab : NounPhraseProd
A language that depends on upper/lower case as a marker of match strength will need to override this to consider the case-fold flag as significant in determining match exactness. In addition, a language that uses additional string comparator flags to indicate better (rather than worse) matches will have to override this to require the presence of those flags.
Language modules might need to override this to supplement the filtering with their own rules. This generic base version considers truncation: an untruncated match is stronger than a truncated match. Non-English languages might want to consider other lexical factors in the match strength, such as whether we matched the exact accented characters or approximated with unaccented equivalents - this information will, of course, need to be coordinated with the dictionary's string comparator, and reflected in the comparator match flags. It's the comparator match flags that we're looking at here.
The main purpose of this routine is to eliminate unwanted redundancy from the dictionary matches; in particular, the dictionary might have multiple matches for a given word at a given object, due to truncation, upper/lower folding, accent removal, and so on. In general, we want to keep only the single strongest match from the dictionary for a given word matching a given object.
The meaning of "stronger" and "exact" matches is language-dependent, so we abstract these with the separate methods dictMatchIsExact() and dictMatchIsStronger().
Keep in mind that the raw dictionary match list has alternating entries: object, comparator flags, object, comparator flags, etc. The return list should be in the same format.
For each adjusted token, the list must have two entries: the first is a string giving the token text, and the second is the property giving the part of speech for the token.