Organizations often struggle with timely access to internal information, especially when employees need quick answers to domain-specific questions. In many companies, Go-to-Market (GTM) teams—including sales, marketing, and customer success—frequently rely on the product team to clarify product details, technical specifications, or feature-related queries. However, due to the product team's workload and competing priorities, responses can be delayed, slowing down the GTM teams' workflows and reducing their overall efficiency.
The goal is to build an intelligent, chat-based AI assistant that is trained on internal company datasets such as product documentation, wikis, knowledge bases, and training manuals. This AI chatbot should enable employees to instantly access accurate information and get their queries resolved in real time without human intervention.
In addition, the company wants to provide a similar experience to external users—such as customers, prospects, and partners—by training the AI model on publicly available website data (e.g., FAQs, product pages, support documentation). The chatbot will be exposed over a public URL, allowing users to ask questions and receive immediate, reliable answers, thus improving customer engagement and reducing support ticket volume.
To address the challenge of enabling real-time, accurate responses to user queries based on internal company datasets, we implemented a multi-layered AI solution that prioritized data security, scalability, and precision.
We began by selecting a high-performing open-source large language model (LLM) as the base for our solution. Before fine-tuning, we performed a comprehensive data preprocessing step that included anonymization and the removal of Personally Identifiable Information (PII) from all internal documents. This was crucial not only for ensuring compliance with data privacy regulations but also for maintaining the confidentiality of sensitive organizational information.
The fine-tuning process was then carried out using the cleaned dataset, which consisted of product documentation, internal wikis, playbooks, training material, and other proprietary knowledge sources. By customizing the model with this internal context, we enabled it to understand domain-specific terminology and respond accurately to company-specific queries.
Given the extensive volume of internal documentation, it was not feasible to fit the entire knowledge base into the model's prompt context window. To address this, we implemented a Retrieval-Augmented Generation (RAG) architecture on top of the fine-tuned model.
RAG allowed the system to dynamically retrieve relevant pieces of information from a pre-indexed document store in real time, based on the user's query. These relevant document snippets were then appended to the user query as context before being passed to the model for inference. This approach significantly enhanced the accuracy and relevance of the responses, especially for complex or nuanced questions.
To meet the client's stringent security and compliance requirements, the entire AI model deployment was hosted within their private cloud infrastructure. Specifically, the model was deployed on a dedicated GPU-enabled instance within the client's secure cloud environment, ensuring that sensitive data and model inferences remained entirely within their controlled perimeter.
The document embeddings used for retrieval in the RAG system were stored in a vector database built on Amazon OpenSearch Service, which was already part of the client's AWS environment. This ensured that all vector search operations occurred within the same secure cloud infrastructure and no data was transmitted to external servers or third-party platforms. The use of OpenSearch also allowed for seamless scalability, high availability, and low-latency retrievals.
To further improve the quality of the retrieved documents and ensure that the most relevant information was presented to the AI model, we integrated Cohere's Re-Ranker via AWS Bedrock. The re-ranker scored and sorted the retrieved chunks based on semantic relevance to the user query. This re-ranking mechanism played a critical role in handling complex queries with high accuracy, ensuring that the AI model always received the most pertinent information for response generation.
As with any enterprise-grade AI deployment, building the Infoly AI Chatbot involved navigating several technical, operational, and infrastructural challenges. Below is a breakdown of the key challenges we encountered and the strategies we used to overcome them:
Ensuring high accuracy in responses was critical to the chatbot's adoption and effectiveness, especially for internal GTM teams relying on precise, domain-specific information.
Achieving high throughput with minimal latency was a key challenge, especially given the real-time expectations of internal users.
Making the AI chatbot easily accessible to both internal and external users required seamless integration with various platforms and user touchpoints.
Keeping the chatbot updated with the latest internal and external information was vital for maintaining trust and usefulness.
One of the most critical risks in deploying LLMs is hallucination—where the model generates information that is not grounded in actual source data.
As a result of these measures, the system achieved over 98% accuracy on evaluation queries, with hallucination rates reduced to negligible levels.