Wednesday, November 05, 2025

What is dimensionality in data embedding

I don't want you bombardment of all the technical details pertaining to the Data chunk, embedding in the RAG system learning path or developing the AI application.

But the "Dimensionality"  of the data, a word or a sentence is very important for you to understand how your LLM model is smart enough to treat and contextualize the subsequent converstation or interaction with AI Models.

simple understanding analogue is "About me/location or any thing".

Example : evaluating the subject for the job.

Subject's name, age, sex, location and skillset all become a "feature of your data".

Consider its single array or the list in the Python terminology.

Single Array or the List is a static value and will not allow you to validate the subject job readiness profile.

So bring the more feature/attributes to the table and correlate with multiple parameters like height, weight, skillset comparison, ability to handle the project/ task, his contribution to project/team, mindset etc... with  to evaluate thoroughly.

So multiple array or dimension of the subject will gives the full picture to the user. 

 

  

 

 

 

Tuesday, September 30, 2025

LLM application with Microsoft Phi3 LLM.

 To give kick start to learn and understanding the LLM and LLM application using Ollama. here is the step followed to gain the confidence and keep moving.

Ollama is kind of LLM model orchestration tool to run the LLMs on your local. 

1, Downloaded the Ollama for my Ubuntu linux . This will install Ollama and CLI to pull and run the LLM.

 curl -fsSL https://ollama.com/install.sh | sh

You can get the code from github here 

https://github.com/CodethinkerSP/ai/tree/master/Simple-RAG

Terminal, type 

ollama server

ollama pull phi3:latest

ollama run phi3:latest

Then follow these steps in the Visual Studio code and any of your favourite editor. I personally prefer Jupyter in VS code extension.

Steps followed on my local machine

  1. Installed Ollama and pulled the phi3:latest llm
  2. Ollama CLI : ollama serve and the ollama run phi3:latest
  3. Python code
    1. Load text file
    2. Chunked and embedding the data
    3. Storing the embeddings in the chromdb
    4. Hitting the Ollama API endpoint --> localhost:11434/api/generate with required "json" payload
from sentence_transformers import SentenceTransformer
import chromadb
dataset = []
# data loading
with open("output.txt", 'r', encoding='utf-8') as f:
dataset = f.readlines()

VECTOR_DB = []
EMBEDDING_MODEL = 'all-MiniLM-L6-v2'
LANGUAGE_MODEL = 'phi3:latest'
# initialize vector db
chroma_client = chromadb.PersistentClient(path="./chroma_db3")
# create collection
collection = chroma_client.get_or_create_collection(name="mydataset")
#embedding model
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
for data in dataset:
data = data.strip()
if data:
embedding = model.encode(data).tolist()
# store in vector db
VECTOR_DB.append((data, embedding))
print(f"Inserted {len(VECTOR_DB)} records into the vector database.")


Saturday, August 16, 2025

Movie - I Robot scenes are come to live here


Have you watched the movie - I Robot  ? scenes from the movies are in reached its reality now.

Robots are participating the running race , football and boxing . Soon we see many things done by the Robots. 

Hope there won't be competition between human and robots for its existence.

Watch here.

 https://www.bbc.com/news/videos/cvg3mv3rz60o

Wednesday, August 13, 2025

Adon - AI

AdonAI !

வணக்கம் AI !
Hello  AI ,
ஏய் !

    excerpt  from wise King " Solomon"
    சூரியனுக்குக் கீழே நூதனமானது ஒன்றுமில்லை. ( and there is no new thing under the sun.) 

        Is there any thing whereof it may be said, See, this is new? it hath been already of old time, which was before us.

    I am just trying to "recollect " the knowledge which we all forgot (so I am human )  , what happened in the past

    Going to scribbling on the topics of AI, What I really understood and able to do it . 

Monday, November 11, 2024

Fixing the Security vulnerabilities found on base docker image

Expected/desired solution :

Fixing the Security vulnerabilities found on base docker image without increasing the newer version of  "base docker image" due to Application stability and its dependency of its packages.

Here is the solution :

Easy way

  • use the docker inspect command to see the "environment variable for the current version and its latest patch version if not available search it the Linux Official distro sites.
  • Found the Patch version of the docker image and its checksum in the "Docker Hub".
  • C the docker file and import with checksum or with patch vesion.
  • Build your docker image from your docker file.
  • Scan the docker image your created in the above step using docker security vulnerabilities scanner tools like scout , trivy , sysdig and snyk
  • Count and compare the vulnerabilities on your current docker image and the previous version of the docker image.
  • Finally wrap your Application Packages (WAR, JAR, nodeJS or Python) and check the application stability .


If Easy way - doesn't work , then "Hard way" - I will soon update here

Saturday, June 18, 2022

Installing Azure DevOps Server on local

 If you want to learn and explore but cost and credit card is barrier for you to create the AzureDevOps services, then here is the option for you.

Installing the Azure DevOps Server on your local desktop or laptop.

Download the trial version (90 days free, cool : ) 💪) from here

https://docs.microsoft.com/en-us/azure/devops/server/download/azuredevopsserver?view=azure-devops

On setup wizard use the SQL Server express edition. Once you installed you need to configure your custom agent on your machine.

Now its time for starting  to explore the Azure DevOps services.


Azure DevOps Server Local installation - Custom Agent configuration


And the custom agent on your machine connected with local AzureDevOps server.




Monday, August 02, 2021

function.json location in Visual Studio Code.

When you are working with Azure Function, you may need to bind the input and output and also applying the log and Application Insight to your Azure function through function.json. This file can be easily available to you if you are working or developing the Azure Function through "Portal". But you have to search it in case of developing through Visual Studio or Visual Studio code. You can find the function.json file under the project folder's output. There you can modify the required attributes needed for your Azure Function.

Sunday, February 14, 2021

NuGet Packages Cache Locations

NuGet cache will be cached based on your project and user context, Often it required to be cleared to see the latest package addition and functional of it. Nuget manage its caches in the global configuration , http level,temp cache and plugin-level. You can find all these cache location
  • HTTP cache: C:\Users\UserId\AppData\Local\NuGet\v3-cache Nuget Global Cahce:
  • C:\Users\UserId\.nuget\packages\ Clearing NuGet Temp cache:
  • C:\Users\UserId\AppData\Local\Temp\NuGetScratch Clearing NuGet plugins cache:
  • Users\UserId\AppData\Local\NuGet\plugins-cache
    • After clearning this, Restore NuGet packages on your project and rebuild. Now you will see the latest packages in your project.

Tuesday, October 13, 2020

Publish an Azure SQL Database

 As part of my learning and preparing for the exam AZ 303 - Microsoft Architecture Technologies, Its a sub topic under "Implement and manage Data platform".

Publish an Azure SQL Database: 


The word "Publish" being used because the nature of the Azure SQL Database.(Service). Essentially publish means, updating your Azure SQL database's schema and data.

Before publishing (updating the Azure SQL), we should know steps to creating and deploying the Azure SQL.

Publishing the Azure SQL Database supports the DevOps cycle within your project.

Different types of approaches to create and deploy the Azure SQL.

  1. DACPAC (Data Tier Application Package - Reference )
        It focusing on capturing and deploying the schema including updating an existing database. Best fit for DevOps. Developer can easily deploy the package from development to production with version controls.

SQL Script
        To run the SQL Script against Azure SQL , you need to create the firewall rules in order to execute the script by Azure Pipeline agent.

ARM (Azure Resource Management based PS)
Classic PowerShell (Windows based)
Azure Pipeline based on YAML (executing the SQL Scripts)

Sunday, October 11, 2020

Implement Azure SQL Databases - AZ 303 exam notes.

As part of my exam preparation , There is a focused skill topic "Implement and manage data platforms" for the exam AZ 303 - Azure Architect Technologies.

To learn the Azure SQL related notes, I have created my own Azure SQL database and check what are the features to be learned.

Implement Azure SQL databases 

  1. configure Azure SQL database settings 
  2. implement Azure SQL Database managed instances 
  3. configure HA for an Azure SQL database 
  4.  publish an Azure SQL database
To learn and deep dive into it, I just recapped my basic SQL related technical words like "Core, DTU, TDE, Data backup and copying data between SQL Server and Databases.

Core and DTU, plays very important role in Azure SQL database while choosing the correct Service tiers as business required.

DTU is combination of CPU, memory , I/O read/write and Storage are metrics  defined by OLTP workload metric. So every read write operation, data storing into storage medium, and the core power used to process the database operation all count for SQL Database cost.

On the configuration side, Initially not sure, but after creating the Azure SQL, It was so easy to learn what are the configuration needed to run and protect your SQL database in Azure.

You can see all the features in your database dashboard.




Saturday, September 12, 2020

Design Identity and Security - AZ 304 Exam notes

Design Identity and Security (25-30%)


Design authentication

  •  recommend a solution for single-sign on
            See the simple yet fantastic explanation from Mr.Swaroop Krishnamurthy @ YouTube on How to setup the Single Sign-On (AD , ADFS and Passthrough Authentication)
https://www.youtube.com/watch?v=PyeAC85Gm7w and also Azure AD Passthrough Authentication
  •  recommend a solution for authentication
  •  recommend a solution for Conditional Access, including multi-factor authentication
  •  recommend a solution for network access authentication
  •  recommend a solution for a hybrid identity including Azure AD Connect and Azure AD
  • Connect Health
  •  recommend a solution for user self-service
  •  recommend and implement a solution for B2B integration


Design authorization

  •  choose an authorization approach
  •  recommend a hierarchical structure that includes management groups, subscriptions and
  • resource groups
  • recommend an access management solution including RBAC policies, access reviews,
  • role assignments, physical access, Privileged Identity Management (PIM), Azure AD
  • Identity Protection, Just In Time (JIT) access

Design governance

  •  recommend a strategy for tagging
  •  recommend a solution for using Azure Policy
  •  recommend a solution for using Azure Blueprint

Design security for applications

  • recommend a solution that includes KeyVault

o What can be stored in KeyVault

o KeyVault operations

o KeyVault regions

  •  recommend a solution that includes Azure AD Managed Identities
  •  recommend a solution for integrating applications into Azure AD

Monday, September 07, 2020

Database Service Tier sizing - AZ 304 exam

 AZ - 304 Microsoft Azure Architect (beta)

[Its my own notes for preparing for the exam. You are advised to verify technical correctness on your own]

This is how I am trying to find the relevant Microsoft Document for self study and preparing the exam. Already I am AZ 203 Certified developer and well versed with Azure Storage , I decided to update my skills in advanced topics on "Design Data Storage" which is part of exam's skill measured topic.

Here is the sub topic "recommend database service tier sizing".

 Understanding the available database service in Azure.

  1. General purpose service tier
  2. Business critical
  3. Hyper scale ( Only available for Azure SQL, not for Azure SQL Managed Instance)

You must understand features of the Azure SQL ,  Azure SQL Managed Instance, vCore or DTU Model (purchasing model).


General Purpose service tier is the recommended (Redundancy and High Availability) on for most of the time, as it truly allows you on controlling the Compute and Storage aspects. You can scale in and scale down the storage and compute based on your usage.

The reason behind on why we need for separate storage account for Azure SQL and Azure SQL Managed Instance is , A stateless and stateful compute layer.


Stateless layer managed by Azure Service Fabric to check the health of compute node and manage the perform the failover on the Azure.

Whenever the database engine or operating system is upgraded, some part of underlying infrastructure fails, or if some critical issue is detected in the sqlservr.exe process, Azure Service Fabric will move the stateless process to another stateless compute node.


Azure SQL Managed Instance will have the separate Compute and Storage option whereas Azure SQL will have only the Service features without any specific compute or Storage.

Hyper scale Service tier features

  • Supports up to 100 TB database size (Other service tier supports 1 to 4 TB)
  • Rapid scale up (read only database hot standby)  and scale out when the usage were lower as it relies on Compute and Storage.
  • Fast database restores with in a minute rather than hours or even a day.
  • Higher performance due to higher log throughput and faster commit irrespective of data volums.



Reference

https://docs.microsoft.com/en-in/azure/azure-sql/database/service-tier-general-purpose


Sunday, August 16, 2020

Copying SharePoint List Items using PnP

 This simple PowerShell script enables you to copy the items from one list to another which had the Lookup columns.
If master list item and the child list's PID column shares the same value then, It will start to copy the items.

Check the GitHub code 


Saturday, August 15, 2020

Connecting remote desktop from Linux

 If you are new to Linux and want to connect Remote Desktop then application "Remmina " is your best candidate. It lets you to connect remote machine through SSH, RDP and Bastion

When you connect first time, you might get the error "You requested an H264 GX mode for server [snip], but your libfreerdp does not support H264. Please check Color Depth settings."

This means , the dafault Remmina client built-in on your machine needed to be updated.

Follow this commands,

  1. sudo apt-add-repository ppa:remmina-ppa-team/remmina-next
  2. sudo apt update
  3. sudo apt install remmina remmina-plugin-rdp remmina-plugin-secret
  4. sudo killall remmina
Now you will get the latest "Remmina client" installed on your machine which will accept the H264 color depth.


Wednesday, July 15, 2020

Linux command to delete write protected folder

This simple post explain about the simple yet powerful cmdlet  on how to remove the write protected folder in linux.
Usually you can remove the folder with files through

rm -r directoryname
parameter refers the recursive option to loop all the files in the folder and apply the rm command on each of them.
But still you might receive the error starts with "rm: descend into write-protected directory"
The directory is write protected , so you have to use rm command with administrative privilege.
sudo rm -r directory 

Saturday, June 20, 2020

SharePoint Framework Extension (SPFx) - Recap

Here is my quick recap on SharePoint Framework Extension (SPFx).

  1. Supports open source  development for creating the pages and web part for SharePoint.
  2. Client Side rendering (Performance and  overhead on  SharePoint Servers offloaded)
  3. Mobile optimised ( Responsive rendering through underlying technologies being used in Page or web part development).
  4. Supports widely used Tool Chaining development approaches (Yeoman,npm, gulp, Type Script etc.).
  5. No more JavaScript Injection and IFrame model (SharePoint Add-Ins heavily relies on it and often considered as security threat to SharePoint data)/
  6. SPFx enforces SharePoint permission model to run and deploy.




Friday, May 22, 2020

TypeScript watch compiler parameter

Compiling every changes to your Type Script code, is really  not only time consuming activity but also put huge  pressure your process , especially if you have large source code on your process. Every time it has to compile, propagate the updates to all of its dependencies.
To resolve this issue  on your use the Watch  parameter of tsc compiler.
Syntax for using watch parameter with tsc compiler.

tsc --watch YourTSFile.ts

Now your terminal will be on active mode and start to continuously watching the changes your source code and behind the scene it will compile the code into JS output.
You have to keep your terminal in active mode, it shouldn't be stopped.

Thursday, May 21, 2020

Upgrading latest NodeJS in ubuntu

In this post, I am listing the steps I have followed on my Ubuntu 18.04 to upgrade the latest NodeJS for Angular development purpose.
On my machine, it shows the default node version that is v8.
To develop the Angular application, you must have installed the latest version of NodeJS ( > 10.13.0 ) and NPM (6.11.0).

Tp Upgrade to latest NodeJS in Ubuntu 18.04 , follow these step

sudo npm cache clean -f
sudo npm install -g n
sudo n stable

Still you want to latest version other than stable version, use this command
sudo n latest

After successful of the above commands, now you can check node -v command to verify the latest version. Now I have latest version of npm and nodeJS.

After this, AngularCLI got installed successfuly and able to create angular application.

Monday, May 11, 2020

ASP.NET Core application deploy to Azure

Deploying the ASP.NET application from Visual Studio to Azure is very simple and straight forward. All you need to do is just click next button on each GUI wizard once you logged into your azure subscription.
But this is not the case, if you are developing the web or MVC application using the Visual Studio Code. 
To complete this, you must have local git installed on your machine with internet connection.

Step
Go to your App Service "Over view" page

Copy the Git clone URL
On your local machine navigate to your project folder in command prompt or terminal
on project folder location (in command prompt) type :\ git init
Now type 
git  remote add azure  https://Murugesan@murugesan.scm.azurewebsites.net:443/murugesan.git
Once it successful, type 
git  commit -m "First push"
git push azure master
Now you can able to see your ASP.NET core application in Azure.




Saturday, March 28, 2020

Apache Kafka - 28-March-2020

Today , 28th March ,2020 - What I have learnt today

handwritten notes available here, if you interest please check

On my lock down days here at my home. Today I have started to learn more about Apache Kafka.

Here is my notes to recollect all I have memorised and understood.

Kafka is 3rd generation messaging system or often called streaming platform (What is Messaging System ? - Passing the messages between computers or  distributed applications in asynchronously, securely and scale able fashion.

It will allows you to store  and process the streaming of data/message as they occur.

Kafka will store the messages in Topic those were received from or published by Producers.

Producers can be an any application to send the messages to Kafka Cluster.

Consumers are nothing but a client application which consumes the data / messages from Kafka Cluster.

Streams : Processors , Its an client library for processing the data or messages in Kafka.

Connectors : A Framework to build connectors which is responsible for moving or transporting the data or messages in and out or other systems from Kafka.

Topic : Its a unique for Kafka system, its a logical container to hold or store the messages from the producers.

Partition : Kafka breaks the messages into multiple partition and storing it in host of computers in order to comply the fault-tolerance and high availability.

Support a kafka Topic has huge data, then these data will be split into partition and each messages will be stored in the cell format and cell will share the sequence of id, These id will be called OffSet in Kafkar.

Messages will be identified or retrieved through this Offset Id and combination of Topic Name,Partition Id.