Thursday, January 26, 2017

How to troubleshoot and fine tune Service Now Instance Performance




The perceived performance of your ServiceNow instance is made up of these components.
Ø  Application Server response: Time for the application server to process a request and render the resultant page
Ø  Network latency and throughput: Time for the network to pass your request to the server and the response back
Ø  Browser rendering and parsing: Time for your browser to render the HTML and parse/execute Javascript
Ø  Instance Cache: The amount of system resources available for processing
This document will outline basic troubleshooting steps to try to isolate the source of slow response times.

1.        Poor Network Performance: A fairly common scenario, users may experience poor network performance especially when there are multiple locations, in an on-premise installation, where the users must VPN in or where additional devices such as firewalls or proxies come into play.

Symptoms
  • One or more users will report intermittent degraded performance
  • If multiple users report issue it may only occur at specific times or locations
  • If a single user reports issue it may only happen at specific locations

Verifying The Issue 
  • Logout, completely clear cache, delete all cookies and try logging in again
  • Log in from a different browser
  • Log in from a different machine on the same network
If the issue still persists it may be an issue with the network. To verify:





1.      Navigate to Transaction Logs (All User)



2.      Find all transactions for a particular users session
3.      Find all transaction where the Client response time or Network time is excessive (>5000ms)
4.      If there are excessively high Network times then those should be flagged and you should look for additional logs around the same time
5.      If additional log entries are found this trend will likely point to a network issue
6.      If there are excessively high Client response times you should compare this time to the other “time” fields (excluding Browser Time)
7.      If the sum of the other time fields does not closely match the Client response time then it is likely a network issue.
Once you have determined you have a network issue it is necessary to look for trends to isolate the problem.
·        Do affected users have the same location or network?
·        Is the issue isolated to a specific time of day?
·        Do any users access ServiceNow through a proxy?
·        Is the issue intermittent or consistent?

Potential Root Causes
  • Overloaded or malfunctioning networking equipment – This would almost certainly affect other applications as well.
  • Overloaded proxy – If users access ServiceNow through a proxy, is the machine correctly configured and will it support the current load?
  • Insufficient bandwidth.
  • Poor wireless signal.


Fixes
  • Unfortunately for networking issues there is not much that can be done from ServiceNow. Infrastructure may need to be upgraded or reconfigured to support the load.


2. Mis-Configured ACL: Usually occurring for non-admin users or those with specific roles, mis-configured ACLs can impact forms but especially lists where ACLs run multiple times.

Symptoms
  • One or more users will report excessively long load times for lists or forms.
Verifying The Issue
  1. Navigate to Transaction Logs (All User).
  2. Find all transactions for a particular users session.
  3. Find all transaction where the ACL time is excessive (>100ms).
  4. For ACLs even a fraction of a second is a very long time.
  5. Search for any other transactions for the same URL.
  6. If the same URL has a trend of long ACL times there may be a mis-configured ACL.

Validate the ACL is mis-configured
  1. Based on the URL you should be able to find the affected table. Usually it will be in the name however if it is not a standard List or Form page (for instance a UI Page) you may need to look through the contents of the page to see the source table(s).
  2. View the ACLs for the table.
  3. Look for any ACLs with Scripted Conditions.
  4. Verify the Script does not have nested for loops or any other violations of best practice.

Potential Root Causes
  • Poorly written Scripted ACLs
  • Excessive number of records being accessed

Fixes
  • Scripts should be re-written to be as efficient as possible.
  • Lists and queries should be properly limited where necessary.
  • Additional filters can be added to lists to cut down on the number of records returned.
  • Potentially re-structuring tables or columns to simplify the script.

3. Mis-Configured Business Rule: A very common scenario is when Business Rules are not written correctly or several rules are written poorly and stack up to create a poor user experience. Most commonly when a user creates or updates but this can occur any time a business rule runs with the exception of asynchronous types.

Symptoms
  • One or more users will report excessively long wait times saving or loading a record.

Verifying The Issue
  1. Navigate to Transaction Logs (All User).
  2. Find all transactions for a particular users session.
  3. Find all transaction where the Business rule time is excessive (>2000ms).
  4. Find all transaction where the Business rule count is excessive (>10).
  5. Search for any other transactions for the same URL.
  6. If the same URL has a trend of long Business rule times or counts there may be an issue with the BRs.

Validate incorrect Business Rules
  1. Open the Business Rules for the particular table.
  2. Look for Business Rules which match the conditions reported by user (when saving, when creating).
  3. Look for all After or Before rules (Async rules will not apply since they do not stop the user session).
  4. Verify there are not an excessive number or rules.
  5. While having many Business Rules is not bad or uncommon the efficiency of those rules needs to increase with the number. Otherwise many rules that are somewhat slow can add up to a long wait time for end users.
  6. Verify the content of the Business Rules.
  7. Like ACLs you should avoid making an excessive amount of queries or nested queries in a BR.








Potential Root Causes
  • Multiple poorly written Business rules
  • Example: 10 or more rules which each take 250ms will end up 2.5 seconds for the user, this is only a part of the process though so the over all time may end up being excessive.
  • Single poorly written Business rule
  • A single rule taking more than 1000ms will directly impact the users experience

Fixes
  • Scripts should be re-written to be as efficient as possible.
  • Minimizing the number of GlideRecord queries and number of results returned.
  • Potentially moving some scripts to asynchronous where possible

4. Under-performing Node: A very uncommon scenario but also one that is not fixable except by ServiceNow. This will usually involve ServiceNow restarting and possibly decommissioning the node.

Symptoms
  • Multiple users will report excessive wait times performing any action in the system
  • Issue will usually be intermittent
  • Issue will occur more frequently at peak times

Verifying The Issue
  1. Navigate to Transaction Logs (All User).
  2. Find all transactions where the Response Time is excessive (>5000ms).
  3. Add the System ID column
  4. Verify that no other cause (Network, BR, ACL) is affecting the performance.
  5. This may manifest as a longer Session wait timeSemaphore Wait time or SQL time
  6. You may also try to eliminate transactions that are expected to take longer, such as those with larger outputs (Output length)
  7. You may also eliminate those requests which affect multiple System IDs
  8. If there is a node (System ID) which is constantly under-performing and where the same requests execute quickly on other nodes it may be isolated to that node.
  9. You may open a HI server ticket to confirm the node is not functioning correctly

Potential Root Causes
  • If the issue consistently happens even after restarting the node hardware may be defective
  • Many other causes are unknown especially since we do not have access to the back-end

Fixes
  • Ask ServiceNow to restart the node via HI ticket or Support

5.Browser Performance: A very common scenario especially for environments with older browser (specifically Internet Explorer). Poor performance is almost exclusively limited to IE but there are scenarios where Chrome and Firefox will not work correctly.

Symptoms
  • One or more subset of users will report a performance issue especially loading forms or custom pages
  • Issue will be consistent and easily reproduced
  • Issue will only occur on a certain browser or machine

Verifying The Issue
Before verifying check that the user is actually using a Browser on their desktop or laptop and it is not related to a mobile device.
  1. Navigate to Transaction Logs (All User).
  2. Add the User Agent column.
  3. Find all the Transactions for a particular users session who is reporting the issue.
  4. OR Find all transactions where the Response Time is excessive (>5000ms).
  5. Eliminate any transactions which can be explained by other issues.
  6. Try to correlate any poor performance to a particular User Agent.
  7. Look up the Users User Agent string to determine which browser they are using. UserAgentString.com

Validate the browser is not running in “Compatibility Mode”
  1. For Internet Explorer ask the User to verify their version by going to “About Internet Explorer”
  2. If the user reports a version which does NOT match the User Agent reported in the transaction log they may have compatibility mode on which should be disabled.
Validate the Browser is supported
  1. Older versions of Internet Explorer (<10) generally do not perform well enough to have a good User Experience in ServiceNow
  2. Versions of Internet Explorer (<9) may not work at all

Validate extensions are not interfering
  1. Ask the user to disable extensions OR run an instance of the browser in private browsing mode (this usually disables all extensions)
  2. If the issue does not persist with extensions disabled review the extensions to see if any may be causing an issue
  3. Disable extensions one a time to verify.

Validate no errors are being reported
  1. Ask the user to verify there are no errors in the JavaScript console. Most browsers have a developer console which will output all JavaScript errors.
  2. Believe it or not some errors are expected but an excessive amount especially those which report missing resources could be affecting the user experience.

Potential Root Causes
  • Unsupported Browser.
  • Browser running in compatibility mode.
  • Browser extension impacting performance.
  • GPO pushing browser setting which impacts performance.
  • GPO pushing system setting which prevents caching.
  • Cache disabled through browser setting.
  • Browser unable to reach resources – this may also manifest as the UI appearing strange.
Fixes
  • Upgrading to the latest browser version.
  • Disabling Compatibility Mode.
  • Removing or resolving extension or browser configuration.
  • Verifying network configuration (firewall or proxy) allows users machine to retrieve correct resources.

6. Excessive and Incorrect Client Side Scripts: Another common scenario is for the user to see poor performance due to too much form automation on the client side. Offloading some work to Business rules or re-architecting it for performance will usually resolve these.

Symptoms
  • One or more users will report an issue with forms taking a long time to load.
  • Users may reports jittery or unresponsive form when changing values.
  •  
Verifying The Issue
  1. Navigate to Transaction Logs (All User).
  2. Find all transactions for the table or form where the issue is reported which have an excessive (>2000ms) Client script or UI Policy time
  3.  Note that these values will only include those scripts run when loading.
  4. If script time is consistently long there may be an issue with the number or efficiency of the script.

Validate the script is impacting performance
  1. For either/both UI Policies and Client Scripts disable all scripts.
  2. Enable each script testing the form.
  3. Take note of the load time. If a single script increases the load time dramatically there may be an issue with the script.
  4. If each script or policy increases the load slightly resulting in a long load time, there may be an issue with the quantity of scripts.

Validate the form layout
  1. Validate there is not an excessive amount of fields,tabs, related lists or embedded lists on the form.
  2. Validate there are not custom Formatters present.
  3. If there are an excessive amount of controls on a form it will directly impact the load time for the user.

Potential Root Causes
  • Too many Scripts or Policies.
  • Poorly written Scripts.
  • Excessive amount of AJAX transactions in scripts.
  • Some AJAX is OK, but while those transactions are occuring synchronously the user is unable to do anything.
  • Synchronous AJAX should only be used when the result is NECESSARY for the user to proceed.
  • Too many fields present on form.
  • Too many lists or embedded lists.

Fixes
  • Re-writing Scripts to be more efficient.
  • Refactoring or combining UI Policies.
  • Changing synchronous AJAX to asynchronous where possible.
  • Changing some form automation to run as Business rules where possible.
  • Removing fields or creating additional views to only show necessary fields.
  • Removing unnecessary fields or lists.

7. Incorrect User Exception: A perfectly valid expectation is that every page will respond quickly but not every page in ServiceNow is equal. Sometimes 5, 10 or even 30 seconds is reasonable when the system is crunching 1000s or rows to display in a report.

Symptoms
  • User reports performance issue for particular pages.

Verifying The Issue
  1. Navigate to Transaction Logs (All User).
  2. Find all the transactions for the users session.
  3. Find any excessively long transactions (>5000).
  4. Take note of the URL.
  5. If the user is loading home.do or a report it may be valid to expect wait times greater than 5 seconds.

Potential Root Causes
  • User is loading a Report or Dashboard.
  • User is loading too many records on a list. 
Fixes

  • Explain to the user the wait times are valid.
  • Reconfigure the report to have additional conditions.
  • Reset the user preference to display less rows.
  • Change the users Home page if possible.