Geo GraphRAG Tutorial: Neo4j & Free Gemini API

Xuebin Wei
20 hours ago
4 min read

Imagine asking an AI, "What are people saying about AI and machine learning in London, UK, within a radius of 100 kilometers?" and getting a highly specific, grounded answer pulling exact usernames, locations, and hashtags.

That is exactly what we are going to build today. In this tutorial, we are taking traditional RAG (Retrieval-Augmented Generation) to the next level by building a Geo-Augmented GraphRAG

https://www.youtube.com/watch?v=JX5BOb-nQrY

Geo GraphRAG Tutorial: Neo4j & Free Gemini API

Prerequisites & Resources

Before we dive in, make sure you have access to the code and understand the foundational data structure we are working with:

Source Code: Access the Jupyter Notebook on GitHub

data-analysis-with-generative-ai/GraphRAG_Social_Media_Neo4j.ipynb at main · lbsocial/data-analysis-with-generative-ai

Previous Tutorial 1: Building a Social Media Knowledge Graph with Python and Neo4j

Social Media ETL Tutorial: Python & Neo4j Knowledge Graph

Previous Tutorial 2: Neo4j Tutorial: Cypher & Generative AI Dashboard

Neo4j Tutorial: Master Cypher & Build AI Dashboards (No Code Required)

1. Installation & Securing Your Free Gemini API Key

First, you need to set up your environment and grab a free Gemini API key from Google AI Studio. As shown in the video, this free tier is incredibly generous and allows us to run the entire pipeline at no cost.

To start, install the necessary Python dependencies:

pip install neo4j google-genai python-dotenv -q

Then, initialize your connections to Neo4j and the Gemini API:

from neo4j import GraphDatabase
from google import genai
import os

# Connect to Gemini Free API
client = genai.Client(api_key="YOUR_GEMINI_API_KEY")

# Connect to Neo4j
NEO4J_URI = "neo4j+s://your-database.databases.neo4j.io"
NEO4J_AUTH = ("neo4j", "your-password")
driver = GraphDatabase.driver(NEO4J_URI, auth=NEO4J_AUTH)

driver.verify_connectivity()
print("✅ Connected to Neo4j and Gemini!")

2. The Social Media Graph Structure

To understand GraphRAG, we first need to understand our data structure.

Diagram showing flow: User posts a Tweet, which is located at a Place and tagged with a Hashtag. Blue, orange, green, purple nodes. — Our foundational Neo4j graph model connects Users, Tweets, Places, and Hashtags.

A traditional RAG system retrieves different embeddings separately. But with a GraphRAG, every dot is connected! A User posts a Tweet located at a Place and tagged with Hashtags.

3. Generating 3072-Dimensional Vector Embeddings

The first step in our pipeline is converting our text into multi-dimensional numbers (vectors) so the computer can understand semantic similarities. We use Gemini's gemini-embedding-001 model to convert every tweet's text into a 3072-dimensional vector.

Graphical diagram illustrates "Step 3: Generate Vector Embeddings." It shows a process with two orange circles, arrows, and a blue hexagon labeled "Gemini Embedding Model." — Converting tweet text into multi-dimensional vector arrays using the Gemini Embedding Model.

def get_embeddings(texts: list[str]) -> list[list[float]]:
    """Get Gemini embeddings for a batch of texts."""
    response = client.models.embed_content(
        model="models/gemini-embedding-001",
        contents=texts,
    )
    return [emb.values for emb in response.embeddings]

Next, we tell Neo4j to create an Approximate Nearest Neighbor (ANN) search index so it can quickly compare these embeddings:

# Create the Neo4j Vector Index for fast semantic search
with driver.session(database="neo4j") as session:
    session.run("""
        CREATE VECTOR INDEX tweet_embeddings IF NOT EXISTS
        FOR (t:Tweet) ON (t.embedding)
        OPTIONS {indexConfig: {
          `vector.dimensions`: 3072,
          `vector.similarity_function`: 'cosine'
        }}
    """)

4. Geo GraphRAG Retrieval: Expanding the Context

If you ask a traditional RAG system, "Which cities generate the most discussions about AI?", it often fails because the raw tweet text might not explicitly mention the city.

Diagram illustrating a tweet about AI in NYC linked to User, Place, and Hashtag. Dark blue background, highlighting context expansion. — GraphRAG traverses the graph to pull in connected User, Place, and Hashtag nodes to enrich the prompt context.

GraphRAG solves this. After finding similar tweets via vector search, we traverse the graph to collect rich context, such as the author, location, and hashtags.

def graph_rag_search(question: str, k: int = 5) -> list[dict]:
    q_embedding = get_embeddings([question])[0]
    
    cypher = """
        // 1. Vector Search
        CALL db.index.vector.queryNodes('tweet_embeddings', $k, $embedding)
        YIELD node AS tweet, score
        
        // 2. Graph Traversal: enrich with relationships
        MATCH (author:User)-[:POSTED]->(tweet)
        OPTIONAL MATCH (tweet)-[:LOCATED_AT]->(place:Place)
        OPTIONAL MATCH (tweet)-[:TAGGED_WITH]->(hashtag:Hashtag)
        
        RETURN tweet.text AS text, author.username AS author, 
               place.name AS location, collect(DISTINCT hashtag.name) AS hashtags, score
        ORDER BY score DESC
    """
    with driver.session(database="neo4j") as session:
        result = session.run(cypher, k=k, embedding=q_embedding)
        return [dict(record) for record in result]

5. Cypher-Augmented Generation (CAG)

Sometimes users ask analytical questions like, "How many tweets were posted from each city?" Vector search isn't good at counting. Instead, we can train Gemini (gemini-2.5-flash) to write native Neo4j Cypher queries for us.

Flowchart of Cypher-Augmented Generation workflow with steps from user query to final answer about NYC follower count, highlighting Neo4j and LLM Translator. — Using an LLM as a translator to convert natural language into executable database queries.

CYPHER_SYSTEM_PROMPT = """You are a Neo4j Cypher expert. Given the user's question, generate a Cypher query to answer it.
The graph schema is:
- (:User)-[:POSTED]->(:Tweet)-[:LOCATED_AT]->(:Place)
- (:Tweet)-[:TAGGED_WITH]->(:Hashtag)
Return ONLY raw Cypher."""

def cypher_rag_answer(question: str) -> str:
    # 1. Generate Cypher using Gemini
    cypher_response = client.models.generate_content(
        model="gemini-2.5-flash",
        contents=question,
        config=genai.types.GenerateContentConfig(system_instruction=CYPHER_SYSTEM_PROMPT, temperature=0.0)
    )
    
    # 2. Execute query against Neo4j...
    # 3. Summarize raw results with Gemini to present to the user!

6. Geo GraphRAG in Action: Geospatial Queries & Point-in-Radius

Finally, we reach the climax of our system: Geo-Augmented GraphRAG. Every Place node has a point() property storing coordinates. We can use Neo4j's native point.distance() function to filter tweets before running our semantic vector search.

Diagram showing geospatial queries with a neon green circle around a center point (Lat/Lon). Tweets within a 50km radius are highlighted. — Applying a geospatial radius filter before executing the semantic vector search.

def geo_graph_rag_search(question: str, city_name: str, radius_km: int = 100, k: int = 5) -> list[dict]:
    q_embedding = get_embeddings([question])[0]
    
    cypher = """
        // 1. Get city center
        MATCH (p:Place {name: $city}) WITH p.location AS center
        
        // 2. Vector search (broad net)
        CALL db.index.vector.queryNodes('tweet_embeddings', $k_broad, $embedding)
        YIELD node AS tweet, score
        
        // 3. Geo filter — keep only tweets within radius
        WHERE point.distance(tweet.location, center) < $radius_m
        
        // 4. Graph traversal
        MATCH (author:User)-[:POSTED]->(tweet)
        OPTIONAL MATCH (tweet)-[:LOCATED_AT]->(place:Place)
        
        RETURN tweet.text AS text, author.username AS author, 
               place.name AS location, score
        LIMIT $k
    """
    with driver.session(database="neo4j") as session:
        result = session.run(cypher, city=city_name, radius_m=radius_km * 1000, 
                             k_broad=k * 10, k=k, embedding=q_embedding)
        return [dict(record) for record in result]

Conclusion

By combining the structural power of Neo4j with the reasoning capabilities of Google's Gemini Free API, we built a complete GraphRAG pipeline from a social media knowledge graph. From semantic vector indexing to complex geospatial radius filtering, GraphRAG systems bridge the gap between simple chatbots and true analytical assistants.

If you found this tutorial helpful, be sure to check out the full code on GitHub and try running it with your own search terms. Let us know in the comments how you plan to use GraphRAG in your own projects!

Geo GraphRAG Tutorial: Neo4j & Free Gemini API

Prerequisites & Resources

1. Installation & Securing Your Free Gemini API Key

2. The Social Media Graph Structure

3. Generating 3072-Dimensional Vector Embeddings

4. Geo GraphRAG Retrieval: Expanding the Context

5. Cypher-Augmented Generation (CAG)

6. Geo GraphRAG in Action: Geospatial Queries & Point-in-Radius

Conclusion

Recent Posts

Comments