How to work properly

 


Docker & Containerization Guide

What is a Container?

A container is an isolated process running on your system, made possible through kernel namespaces—a long-standing Linux feature. Docker simplifies container creation and management.

Key Characteristics:

  • A runnable instance of a Docker image
  • Runs on local machines, VMs, or cloud platforms
  • Portable across operating systems
  • Isolated: runs its own software, binaries, and configs independently of others

What is a Container Image?

A container image is a packaged, read-only filesystem containing everything needed to run an application:

  • Dependencies
  • Configuration files
  • Scripts
  • Binaries
  • Environment variables
  • Default execution commands
  • Metadata

When a container starts, it uses this image as its base filesystem.


How to Containerize an Application

To containerize an app:

  1. Create a Dockerfile in your project root directory (no file extension).
  2. Add instructions in the Dockerfile:
  3. Build the Docker image:
  4. Run the container:

    • -d: detached mode (runs in the background)
    • -p: maps host port to container port

To view running containers, use Docker Desktop or:


Persisting Data with Volumes

By default, container data is ephemeral. To persist data:

  • Create a Docker volume
  • Mount it during container launch

This ensures data survives restarts and deletions.

Note: Docker Desktop is a paid application; Docker on Linux is free.


Recommended Development Setup

Virtual Environment (Python)

  1. Create a working directory
  2. Create a virtual environment:
  3. Activate it:

Version Control with Git

  1. Initialize your repo:
  2. Make your first commit
  3. Use VS Code Git extension for easier Git management

Python Version Management with Pyenv

  • Use pyenv to manage multiple Python versions

Suggested Project Structure


VS Code Best Practices

  • Explorer: Navigate project files
  • Search: Quickly find code
  • Source Control: Manage Git changes
  • Recommended Plugins:
    • Docker
    • Python
    • Pylance
    • Powershell
    • Dataiku DSS

Tech Stack Overview

Component Tools
Database Neo4j, MySQL, Amazon RDS
Frontend React.js, Vue.js (with TypeScript)
Backend Flask API, Django
CI/CD & Git GitHub, GitLab
Process Manager Supervisor, Circus
Data Pipelines Apache Airflow
Cloud AWS EC2, S3, Docker
Auth Auth0
Payments Stripe
Analytics Google Analytics

Use .env or .secrets files for storing credentials and API keys.


️ Graph Database: Neo4j & Amazon Neptune

Neo4j (Local via Docker)

docker-compose.yml:

Import data inside container:


Amazon Neptune (Managed Graph DB)

  • Supports Tinkerpop Gremlin, openCypher
  • Use SSL and curl to query endpoints
  • Use Bulk Loader for high-volume ingestion
  • Connect via Lambda, use SageMaker for Neptune ML

Neptune ML supports:

  • Node classification & regression
  • Edge classification & regression
  • Link prediction

Docker Commands Cheat Sheet

Run a container:

Enter a container:

Stop and remove container:

Pull image:

List running containers:

List all images:

System cleanup:


⚙️ Git Workflow Summary

  • Open Merge Request when done
  • Always test before merging into master or main

.gitignore Examples: