by Willam van Weelden, COTP
I recently received the following email from a concerned RoboHelp user:
Our [help system] documents are available on the internet if you search for them. Is there a way to keep them private so they don’t show up on Google?
The issue here is that the RoboHelp user created content in RoboHelp and then published the output files to their web server. Using IconLogic as an example, if IconLogic published their internal policies and procedures to the publicly-facing IconLogic website, it would certainly be easy for employees to access the content by typing something like the following example URL into any web browser: www.iconlogic.com/policies.htm. Of course, it would also be easy for non-IconLogic employees to access the content through the same URL. Additionally, over time Google will index the pages; and content found within the private Help System will appear in Google search results.
If you don't want your Help System content to appear via a Google search, there are a few options to consider:
- Use a robots.txt file to stop Google indexing your content.
- Protect your content on the server side.
The protection method is ideal. If you set up access controls on your server, only authorized users will be able to access the content. Search engines or anonymous users won't be able to view your content. Getting protection going will require access to be set up on your server because the RoboHelp output itself doesn't contain any option to secure access. The options available depend on the web server you are using. For Apache, you can find the options here. You will have to work with your web hosting company or IT team to discuss options and determine the best course of action.
An alternative is to use RoboHelp Server, which contains access control. The options are somewhat limited though.
A last, quick, alternative is to use robots.txt to tell Google to not index your content. This will help you to make the content harder to discover. Keep in mind that a robots.txt is a courtesy of search engines to prevent indexing. It does not secure your content in any way. And some crawlers may still choose to ignore the directive and still index your site.
Which method works best depends on your goal. If your goal is to keep the content private for authorized users only, your only recourse is to limit access on the server side.
If you're looking to learn Adobe RoboHelp, consider my highly-interactive live, online beginner class or my advanced content reuse class.
Willam van Weelden is a Certified Online Training Professional (COTP), veteran Help Author, RoboHelp consultant, co-author of IconLogic's "Adobe RoboHelp": The Essentials workbook, and technical writer based in the Netherlands. He is an Adobe Community Professional, ranking him among the world's leading experts on RoboHelp. Willam’s specialties are HTML5 and RoboHelp automation. Apart from RoboHelp, Willam also has experience with other technical communications applications such as Adobe Captivate and Adobe FrameMaker.
You can follow this conversation by subscribing to the comment feed for this post.