Docker & Containerization Guide
What is a Container?
A container is an isolated process running on your system, made possible through kernel namespaces—a long-standing Linux feature. Docker simplifies container creation and management.
Key Characteristics:
- A runnable instance of a Docker image
- Runs on local machines, VMs, or cloud platforms
- Portable across operating systems
- Isolated: runs its own software, binaries, and configs independently of others
What is a Container Image?
A container image is a packaged, read-only filesystem containing everything needed to run an application:
- Dependencies
- Configuration files
- Scripts
- Binaries
- Environment variables
- Default execution commands
- Metadata
When a container starts, it uses this image as its base filesystem.
How to Containerize an Application
To containerize an app:
- Create a
Dockerfile
in your project root directory (no file extension). - Add instructions in the
Dockerfile
:
1234567FROM # Base imageWORKDIR # Working directory inside the containerCOPY # Copy local files to the imageRUN # Execute commands during image buildCMD # Default command to run on container startEXPOSE # Ports to expose - Build the Docker image:
12docker build -t getting-started . - Run the container:
12docker run -dp 3000:3000 getting-started-d
: detached mode (runs in the background)-p
: maps host port to container port
To view running containers, use Docker Desktop or:
1 2 |
docker ps |
Persisting Data with Volumes
By default, container data is ephemeral. To persist data:
- Create a Docker volume
- Mount it during container launch
This ensures data survives restarts and deletions.
Note: Docker Desktop is a paid application; Docker on Linux is free.
Recommended Development Setup
Virtual Environment (Python)
- Create a working directory
- Create a virtual environment:
12python -m venv env_name - Activate it:
12.\env_name\Scripts\activate
Version Control with Git
- Initialize your repo:
12git init - Make your first commit
- Use VS Code Git extension for easier Git management
Python Version Management with Pyenv
- Use
pyenv
to manage multiple Python versions
Suggested Project Structure
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
my-saas-app/ │ ├── backend/ # Flask app │ ├── app/ # Core logic │ │ ├── __init__.py │ │ ├── routes.py │ │ ├── models.py │ │ ├── services.py │ │ └── utils.py │ ├── config.py │ ├── requirements.txt │ └── Dockerfile │ ├── frontend/ # React app │ ├── public/ │ ├── src/ │ │ ├── components/ │ │ ├── pages/ │ │ ├── App.js │ │ └── index.js │ ├── package.json │ └── Dockerfile │ ├── db/ │ ├── neo4j/ │ └── mysql/ │ ├── docker-compose.yml └── README.md |
VS Code Best Practices
- Explorer: Navigate project files
- Search: Quickly find code
- Source Control: Manage Git changes
- Recommended Plugins:
- Docker
- Python
- Pylance
- Powershell
- Dataiku DSS
Tech Stack Overview
Component | Tools |
---|---|
Database | Neo4j, MySQL, Amazon RDS |
Frontend | React.js, Vue.js (with TypeScript) |
Backend | Flask API, Django |
CI/CD & Git | GitHub, GitLab |
Process Manager | Supervisor, Circus |
Data Pipelines | Apache Airflow |
Cloud | AWS EC2, S3, Docker |
Auth | Auth0 |
Payments | Stripe |
Analytics | Google Analytics |
Use
.env
or.secrets
files for storing credentials and API keys.
️ Graph Database: Neo4j & Amazon Neptune
Neo4j (Local via Docker)
docker-compose.yml:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
version: "3.8" services: neo4j: image: neo4j:5.8.0 container_name: neo4j-graph ports: - 7474:7474 - 7687:7687 environment: - NEO4J_AUTH=none - NEO4J_dbms_memory_heap_max__size=4g volumes: - ./data/:/var/lib/neo4j/import/ - ./data/:/var/lib/neo4j/data/ |
Import data inside container:
1 2 3 4 5 6 7 8 9 10 |
bin/neo4j-admin database import full \ --nodes="import/NODES.csv" \ --relationships="import/RELATIONSHIPS.csv" \ --delimiter="," \ --array-delimiter="\t" \ --skip-bad-relationships=true \ --overwrite-destination \ --multiline-fields=true \ neo4j |
Amazon Neptune (Managed Graph DB)
- Supports Tinkerpop Gremlin, openCypher
- Use SSL and curl to query endpoints
- Use Bulk Loader for high-volume ingestion
- Connect via Lambda, use SageMaker for Neptune ML
Neptune ML supports:
- Node classification & regression
- Edge classification & regression
- Link prediction
Docker Commands Cheat Sheet
Run a container:
1 2 |
docker run -d -p 8080:80 nginx |
Enter a container:
1 2 |
docker exec -it <container_id> /bin/bash |
Stop and remove container:
1 2 3 |
docker stop <id> docker rm <id> |
Pull image:
1 2 |
docker pull hello-world |
List running containers:
1 2 |
docker ps |
List all images:
1 2 |
docker images -a |
System cleanup:
1 2 |
docker system prune |
⚙️ Git Workflow Summary
1 2 3 4 5 6 |
git clone <repo_url> # Clone repo git checkout -b feature-xyz # Create and switch to new branch git add . # Stage changes git commit -m "Message" # Commit changes git push -u origin feature-xyz |
- Open Merge Request when done
- Always test before merging into
master
ormain
.gitignore Examples:
1 2 3 4 5 6 7 |
venv/ *.pyc __pycache__/ .env .secrets node_modules/ |