Odd and missing highlighting occurs in preview of items captured by Compliance Accelerator or Discovery Accelerator searches.

book

Article ID: 1000014341

calendar_today

Updated On:

Description

Workaround

{C}%3C!%2D%2Dstartindex%2D%2D%3E

The following workarounds exist to address some of the odd and missing highlighting:

  1. For odd highlighting,
    1. Ensure the following characters are not present in the criteria: % &
    2. Ensure all hot words or hot phrases that have double ending quotes also have double beginning quotes (i.e., Bond Fund" should be "Bond Fund")
    3. Remove any spaces separating double quoted criteria on the same line (for example, "NASD" "review*" should be "NASD""review*").
  2. For missing highlighting, ensure no spaces exist between double quoted criteria on the same line with one or more of the criteria having a wild card (for example, "NASD" +"review*" should be "NASD"+"review*).

The use of % and & in the criteria is not supported as these characters are considered punctuation and are ignored during the Enterprise Vault indexing operations.  As they are ignored during index operations, a search would not find items with these characters.

Cause

When CA or DA searches are run, they are configured to find items based on certain criteria through pattern matching.  Most often, those criteria include words or phrases that are called hot words or hot phrases in CA.  The review process of the search hits displays an HTML rendering of the original message in order to highlight the criteria (that is, CA's hot word(s) or hot phrase(s) or DA's search terms) found within the message under preview in the CA or DA Client.  Odd highlighting can occur to highlight things other than the criteria, such as (but not limited to) individual spaces, periods, parentheses, square brackets and curly braces.  Such odd highlighting typically occurs in punctuation or special, non-letter or non-number characters. 

Enterprise Vault provides a base level of compatibility for languages that use the Unicode character sets.  As such, EV provides best effort support for languages worldwide.  The best effort can result in items being found in searches where they would not be expected, as well as highlighting of items other than the CA hotwords or DA search terms.  In addition, EV does not index special characters, such as (but not limited to) individual spaces, periods, parentheses, square brackets and curly braces.  These characters are replaced by a period to denote a placeholder for them in the index. 

Under certain conditions, some criteria may not be highlighted.  Such conditions include, but may not be limited to, criteria using one or more wild card characters (for example, ? or *) to obtain variants of the core word (for example, using the criteria of review* to obtain variants of the word review, such as reviewed, reviewer, reviewing).

The odd highlighting has been found to occur with any of the following items in the search criteria.  Note that other conditions may exist that have not yet been identified.
- Symbol for the percent sign, also known as percentage -- %
- Symbol for ampersand, also known as and per se -- &
- Apostrophes formatted using smart quotes, also known as curly quotes or typographer's quotes. For example: Won’t  (with the smart quote apostrophe)  vs.  Won't  (with the straight quote apostrophe). In some cases, an apostrophe may appear correct but may not actually be an apostrophe; it may actually be an accent or a symbol. For example, the word "won't" looks normal within a Review Set, but when the View Source is examined (right-click in the Review Pane | View Source), the same word displays as won’t.
- Hot word or hot phrase that has the ending double quotes but no beginning double quotes (for example, Bond Fund").
- Having multiple criteria on a single line, with each criterion surrounded by double quotes and separated by a space (for example, "NASD" "review*").

The missing highlighting has been found to occur with any of the following items in the search hits.  Note that other conditions may exist that have not yet been identified.

- Boolean operators are included with multiple criteria on a single line with each criterion surrounded by double quotes and separated by a space (for example, "NASD" +"review*).
- Double-byte characters, such as Japanese kanji, in words or phrases (either hot words, hot phrases in CA or words or phrases in DA).  The CA and DA highlighting considers each individual double-byte character for highlighting and does not put them together such as is done with English words.  This can cause the individual characters found in, for example CA hotwords, to be highlighted when found in other words.  Also, double-byte characters do not contain spaces when added in the equivalent of phrases.  This prevents proper hit highlighting from occurring.
- Messages are formatted in HTML and there are HTML formatting changes (HTML tags such as font size or font type changes or a carriage return) within a word or phrase found by the search criteria. When CA or DA highlight search terms, that action is performed in an HTML rendering of the item.  CA and DA will use a match in the HTML rendering of the search term to place the highlighting commands before and after so that just the term is highlighted.  If the item was created in HTML and has some format change within the search term, EV's indexing engine will not include the HTML format change in the indexed data so that CA and DA can find the item, but the highlighting processing can't find the match.

Known examples are:
1. The word business is the search term but an email contains the word business with the letter b as a different font from the rest of the word.  The highlighting will not be able to match due to the HTML formatting commands used to change the font between the letter b and the letters usiness.
2. The phrase don't respond is the search term but an email contains the word don't with the apostrophe in HTML code as making the phrase used in the highlighting processing as don’t respond.  As this does not match with don't respond, the phrase is not highlighted.
3. The phrase contains a carriage return, between words in the phrase.  For example, carriage return is found in an email where the word carriage is on one line and the word return is on a separate line with the HTML tag for the carriage return between the 2 words.

To check for hot words or hot phrases that may have missing double quotes or space separated double quoted criteria, use the ImportExport utility in the following steps to export the CA Customer information.  Once exported, carefully review the sections with the hot words and hot phrases to locate and correct any of the above causes of the odd highlighting.

  1. Log onto the SQL Server with an account that has at least db_reader permission on the CA Configuration and Customer databases, such as the Vault Service Account (VSA).
  2. Launch the SQL Server Management Studio application
  3. Expand the Databases
  4. Right click on the CA Configuration database and select the New Query option.
  5. Run the following SQL Query against the CA Configuration database to obtain a listing of all CA Customers and their CustomerID
    • SELECT CustomerID, Name FROM tblCustomer;
  6. Note the CustomerID that is associated with the CA Customer having the odd highlighting issue.
  7. Close SQL Server Management Studio and log off of the SQL Server
  8. Log onto the CA server as the VSA.
  9. Open a Command Prompt
  10. Change to the CA installation folder (default location is C:\Program Files\Enterprise Vault Business Accelerator)
  11. Run the following command, replacing {CUSTOMERID.EN_US} with the CustomerID noted in Step 6 and replacing {File Path} with the path and name of the output file
    • ImportExport -C:{CUSTOMERID.EN_US} -F:{File Path}.xml -L:{File Path}.log
      • For example: ImportExport -C:2 -F:C:\CustomerID2Export.xml -L:C:\ImportExportLog.log
  12. Let the command run to completion.
  13. Close the Command Prompt when the command is completed.
  14. Open the output XML file in an editor such as Notepad, WordPad or Microsoft Word, or in Internet Explorer.
  15. Scroll down the output file until the hot words and hot phrases section is reached.
  16. Carefully review each line for any of the known causes of the odd highlighting (for example, % or & or end double quotes with no beginning double quotes or double quoted criteria with a space separator between other double quoted criteria).
  17. Access the Hot Words and Hot Phrases through the CA Client.
  18. Remove or correct the causes found from the list of all hot words or hot phrases and from any hot word sets that may contain them.
  19. Save any changes implemented.
  20. Run a new search (or searches) to obtain new hits to confirm the odd highlighting has been resolved.

Notes:
1) The above procedure is also very helpful in finding any hot phrases where smart quotes exist.  See the Article 'When using Enterprise Vault (EV) Compliance Accelerator (CA) or Discovery Accelerator (DA) searches, results are invalid or inconsistent' in the Related Articles section for more information.
2) For information on ways to export hot words and hot phrases, refer to the Article 'How to export hot words and hot phrases configured in Compliance Accelerator' in the Related Articles section.

Regarding odd highlighting when using symbols as part of the search terms, the HTML rendering of such symbols may not exactly match the symbol that was used in the search criteria. For example, hits found when searching for the word Testing& could show odd highlighting. A review of an HTML Export of the hits may not show the search term Testing&, but instead will show the HTML interpretation of the search term, similar to:

testing&

testing#

testing$

To check for HTML formatting changes within search terms causing no highlighting, perform the following steps:

  1. Find any message with no highlighting.  This lack of highlighting can be within the Subject or Content of the item or within the HTML rendering of any attachment.
  2. Note the DiscoveredItemID of that message in the upper right of the preview panel.
  3. Export that item using the HTML output option.
  4. Open the HTML output file of the item in a text editor.
  5. Look through the file for any words that have formatting changes within the word.
  6. Once any such word is found, look through the search terms to see if the word is present.  If such a term is present, the cause for no highlighting of that word is the HTML format change.
  7. As needed, repeat Steps 5 and 6 for any other search terms that may be present.

Resolution

The behavior caused by symbols is expected behavior, as symbols may have different meanings in different languages. Certain symbols are also used as wildcards in searches. For example: an asterisk (*) denotes a multi-character wildcard, a question mark (?) denotes a single-character wildcard. Therefore, it is not recommended to include symbols as part of the search terms.
 

Issue/Introduction

Odd and missing highlighting can occur in the preview of items captured by Enterprise Vault (EV) Compliance Accelerator (CA), Discovery Accelerator (DA), or Veritas Alta Surveillance / Veritas Advanced Surveillance (VAS) searches.

Additional Information

ETrack: 3477465 ETrack: 3477479 ETrack: 3979037