Initial Push
This commit is contained in:
77
.dockerignore
Normal file
77
.dockerignore
Normal file
@@ -0,0 +1,77 @@
|
||||
# Python
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
*$py.class
|
||||
*.so
|
||||
.Python
|
||||
build/
|
||||
develop-eggs/
|
||||
dist/
|
||||
downloads/
|
||||
eggs/
|
||||
.eggs/
|
||||
lib/
|
||||
lib64/
|
||||
parts/
|
||||
sdist/
|
||||
var/
|
||||
wheels/
|
||||
*.egg-info/
|
||||
.installed.cfg
|
||||
*.egg
|
||||
MANIFEST
|
||||
|
||||
# Virtual environments
|
||||
venv/
|
||||
env/
|
||||
ENV/
|
||||
.venv/
|
||||
|
||||
# IDE
|
||||
.vscode/
|
||||
.idea/
|
||||
*.swp
|
||||
*.swo
|
||||
*~
|
||||
|
||||
# OS
|
||||
.DS_Store
|
||||
Thumbs.db
|
||||
|
||||
# Git
|
||||
.git/
|
||||
.gitignore
|
||||
|
||||
# Logs
|
||||
*.log
|
||||
logs/
|
||||
|
||||
# Database files (will be mounted as volume)
|
||||
*.db
|
||||
*.sqlite
|
||||
*.sqlite3
|
||||
|
||||
# Temporary files
|
||||
*.tmp
|
||||
*.temp
|
||||
.cache/
|
||||
|
||||
# Documentation
|
||||
README.md
|
||||
*.md
|
||||
|
||||
# Docker
|
||||
Dockerfile*
|
||||
docker-compose*.yml
|
||||
.dockerignore
|
||||
|
||||
# Scripts
|
||||
setup.sh
|
||||
scripts/
|
||||
|
||||
# Examples
|
||||
examples/
|
||||
|
||||
# Old files
|
||||
*_old.py
|
||||
*.bak
|
||||
76
.gitignore
vendored
76
.gitignore
vendored
@@ -20,6 +20,7 @@ parts/
|
||||
sdist/
|
||||
var/
|
||||
wheels/
|
||||
pip-wheel-metadata/
|
||||
share/python-wheels/
|
||||
*.egg-info/
|
||||
.installed.cfg
|
||||
@@ -49,7 +50,6 @@ coverage.xml
|
||||
*.py,cover
|
||||
.hypothesis/
|
||||
.pytest_cache/
|
||||
cover/
|
||||
|
||||
# Translations
|
||||
*.mo
|
||||
@@ -72,7 +72,6 @@ instance/
|
||||
docs/_build/
|
||||
|
||||
# PyBuilder
|
||||
.pybuilder/
|
||||
target/
|
||||
|
||||
# Jupyter Notebook
|
||||
@@ -83,9 +82,7 @@ profile_default/
|
||||
ipython_config.py
|
||||
|
||||
# pyenv
|
||||
# For a library or package, you might want to ignore these files since the code is
|
||||
# intended to run in multiple environments; otherwise, check them in:
|
||||
# .python-version
|
||||
.python-version
|
||||
|
||||
# pipenv
|
||||
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
|
||||
@@ -94,30 +91,7 @@ ipython_config.py
|
||||
# install all needed dependencies.
|
||||
#Pipfile.lock
|
||||
|
||||
# UV
|
||||
# Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
|
||||
# This is especially recommended for binary packages to ensure reproducibility, and is more
|
||||
# commonly ignored for libraries.
|
||||
#uv.lock
|
||||
|
||||
# poetry
|
||||
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
|
||||
# This is especially recommended for binary packages to ensure reproducibility, and is more
|
||||
# commonly ignored for libraries.
|
||||
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
|
||||
#poetry.lock
|
||||
|
||||
# pdm
|
||||
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
|
||||
#pdm.lock
|
||||
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
|
||||
# in version control.
|
||||
# https://pdm.fming.dev/latest/usage/project/#working-with-version-control
|
||||
.pdm.toml
|
||||
.pdm-python
|
||||
.pdm-build/
|
||||
|
||||
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
|
||||
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
|
||||
__pypackages__/
|
||||
|
||||
# Celery stuff
|
||||
@@ -154,41 +128,9 @@ dmypy.json
|
||||
# Pyre type checker
|
||||
.pyre/
|
||||
|
||||
# pytype static type analyzer
|
||||
.pytype/
|
||||
|
||||
# Cython debug symbols
|
||||
cython_debug/
|
||||
|
||||
# PyCharm
|
||||
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
|
||||
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
|
||||
# and can be added to the global gitignore or merged into this file. For a more nuclear
|
||||
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
|
||||
#.idea/
|
||||
|
||||
# Abstra
|
||||
# Abstra is an AI-powered process automation framework.
|
||||
# Ignore directories containing user credentials, local state, and settings.
|
||||
# Learn more at https://abstra.io/docs
|
||||
.abstra/
|
||||
|
||||
# Visual Studio Code
|
||||
# Visual Studio Code specific template is maintained in a separate VisualStudioCode.gitignore
|
||||
# that can be found at https://github.com/github/gitignore/blob/main/Global/VisualStudioCode.gitignore
|
||||
# and can be added to the global gitignore or merged into this file. However, if you prefer,
|
||||
# you could uncomment the following to ignore the enitre vscode folder
|
||||
# .vscode/
|
||||
|
||||
# Ruff stuff:
|
||||
.ruff_cache/
|
||||
|
||||
# PyPI configuration file
|
||||
.pypirc
|
||||
|
||||
# Cursor
|
||||
# Cursor is an AI-powered code editor. `.cursorignore` specifies files/directories to
|
||||
# exclude from AI features like autocomplete and code analysis. Recommended for sensitive data
|
||||
# refer to https://docs.cursor.com/context/ignore-files
|
||||
.cursorignore
|
||||
.cursorindexingignore
|
||||
# Price Tracker specific
|
||||
price_tracker.db
|
||||
*.log
|
||||
config-local.json
|
||||
.vscode/
|
||||
.idea/
|
||||
|
||||
208
DOCKER.md
Normal file
208
DOCKER.md
Normal file
@@ -0,0 +1,208 @@
|
||||
# Price Tracker - Docker Deployment
|
||||
|
||||
This guide covers how to build, deploy, and run the Price Tracker application using Docker.
|
||||
|
||||
## Quick Start with Docker
|
||||
|
||||
### 1. Build the Image
|
||||
|
||||
```bash
|
||||
# Build with default tag
|
||||
./build.sh
|
||||
|
||||
# Build with specific tag
|
||||
./build.sh v1.0.0
|
||||
|
||||
# Build and tag for your registry
|
||||
./build.sh latest your-registry.com
|
||||
```
|
||||
|
||||
### 2. Run with Docker Compose (Recommended)
|
||||
|
||||
```bash
|
||||
# Start the application
|
||||
docker-compose up -d
|
||||
|
||||
# View logs
|
||||
docker-compose logs -f
|
||||
|
||||
# Stop the application
|
||||
docker-compose down
|
||||
```
|
||||
|
||||
### 3. Manual Docker Run
|
||||
|
||||
```bash
|
||||
# Create directories for persistence
|
||||
mkdir -p data logs
|
||||
|
||||
# Run the container
|
||||
docker run -d \
|
||||
--name price-tracker \
|
||||
--restart unless-stopped \
|
||||
-p 5000:5000 \
|
||||
-v $(pwd)/data:/app/data \
|
||||
-v $(pwd)/logs:/app/logs \
|
||||
-v $(pwd)/config.json:/app/config.json:ro \
|
||||
-e FLASK_ENV=production \
|
||||
price-tracker:latest
|
||||
```
|
||||
|
||||
## Registry Deployment
|
||||
|
||||
### Push to Registry
|
||||
|
||||
```bash
|
||||
# Tag for your registry
|
||||
docker tag price-tracker:latest your-registry.com/price-tracker:latest
|
||||
|
||||
# Push to registry
|
||||
docker push your-registry.com/price-tracker:latest
|
||||
```
|
||||
|
||||
### Deploy from Registry
|
||||
|
||||
```bash
|
||||
# Deploy using script
|
||||
./deploy.sh latest your-registry.com
|
||||
|
||||
# Or manually
|
||||
docker pull your-registry.com/price-tracker:latest
|
||||
docker run -d \
|
||||
--name price-tracker \
|
||||
--restart unless-stopped \
|
||||
-p 5000:5000 \
|
||||
-v $(pwd)/data:/app/data \
|
||||
-v $(pwd)/logs:/app/logs \
|
||||
-v $(pwd)/config.json:/app/config.json:ro \
|
||||
-e FLASK_ENV=production \
|
||||
your-registry.com/price-tracker:latest
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `FLASK_HOST` | `0.0.0.0` | Host to bind the Flask server |
|
||||
| `FLASK_PORT` | `5000` | Port to bind the Flask server |
|
||||
| `FLASK_ENV` | `production` | Flask environment (production/development) |
|
||||
| `PYTHONUNBUFFERED` | `1` | Enable unbuffered Python output |
|
||||
|
||||
## Volumes
|
||||
|
||||
| Container Path | Description |
|
||||
|----------------|-------------|
|
||||
| `/app/data` | Database and persistent data |
|
||||
| `/app/logs` | Application logs |
|
||||
| `/app/config.json` | Configuration file (read-only) |
|
||||
|
||||
## Health Check
|
||||
|
||||
The container includes a health check that verifies the application is responding on port 5000.
|
||||
|
||||
```bash
|
||||
# Check container health
|
||||
docker ps
|
||||
|
||||
# View health check logs
|
||||
docker inspect price-tracker | grep -A 10 Health
|
||||
```
|
||||
|
||||
## Monitoring
|
||||
|
||||
### View Logs
|
||||
|
||||
```bash
|
||||
# Real-time logs
|
||||
docker logs -f price-tracker
|
||||
|
||||
# Last 100 lines
|
||||
docker logs --tail 100 price-tracker
|
||||
|
||||
# With docker-compose
|
||||
docker-compose logs -f
|
||||
```
|
||||
|
||||
### Container Stats
|
||||
|
||||
```bash
|
||||
# Resource usage
|
||||
docker stats price-tracker
|
||||
|
||||
# Container information
|
||||
docker inspect price-tracker
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Container Won't Start
|
||||
|
||||
1. Check logs: `docker logs price-tracker`
|
||||
2. Verify config file exists and is valid JSON
|
||||
3. Ensure data and logs directories exist with correct permissions
|
||||
|
||||
### Application Not Accessible
|
||||
|
||||
1. Verify port mapping: `docker ps`
|
||||
2. Check firewall settings
|
||||
3. Verify container is healthy: `docker ps` (should show "healthy")
|
||||
|
||||
### Database Issues
|
||||
|
||||
1. Check if data directory is properly mounted
|
||||
2. Verify database file permissions
|
||||
3. Check logs for database errors
|
||||
|
||||
## Production Considerations
|
||||
|
||||
### Security
|
||||
|
||||
- Run container as non-root user (already configured)
|
||||
- Use read-only config file mount
|
||||
- Consider running behind a reverse proxy (nginx, traefik)
|
||||
- Set up proper firewall rules
|
||||
|
||||
### Performance
|
||||
|
||||
- Allocate sufficient memory for scraping operations
|
||||
- Consider scaling with multiple instances behind a load balancer
|
||||
- Monitor resource usage and adjust limits as needed
|
||||
|
||||
### Backup
|
||||
|
||||
```bash
|
||||
# Backup data directory
|
||||
tar -czf price-tracker-backup-$(date +%Y%m%d).tar.gz data/
|
||||
|
||||
# Restore backup
|
||||
tar -xzf price-tracker-backup-YYYYMMDD.tar.gz
|
||||
```
|
||||
|
||||
## Development
|
||||
|
||||
### Local Development with Docker
|
||||
|
||||
```bash
|
||||
# Build development image
|
||||
docker build -t price-tracker:dev .
|
||||
|
||||
# Run with development settings
|
||||
docker run -it --rm \
|
||||
-p 5000:5000 \
|
||||
-v $(pwd):/app \
|
||||
-e FLASK_ENV=development \
|
||||
price-tracker:dev
|
||||
```
|
||||
|
||||
### Debugging
|
||||
|
||||
```bash
|
||||
# Run container with bash shell
|
||||
docker run -it --rm \
|
||||
-v $(pwd):/app \
|
||||
price-tracker:latest \
|
||||
/bin/bash
|
||||
|
||||
# Execute commands in running container
|
||||
docker exec -it price-tracker /bin/bash
|
||||
```
|
||||
48
Dockerfile
Normal file
48
Dockerfile
Normal file
@@ -0,0 +1,48 @@
|
||||
# Use Python 3.11 slim image for smaller size
|
||||
FROM python:3.11-slim
|
||||
|
||||
# Set working directory
|
||||
WORKDIR /app
|
||||
|
||||
# Set environment variables
|
||||
ENV PYTHONDONTWRITEBYTECODE=1 \
|
||||
PYTHONUNBUFFERED=1 \
|
||||
FLASK_APP=main.py \
|
||||
FLASK_ENV=production
|
||||
|
||||
# Install system dependencies
|
||||
RUN apt-get update && apt-get install -y \
|
||||
gcc \
|
||||
curl \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Copy requirements first for better caching
|
||||
COPY requirements.txt .
|
||||
|
||||
# Install Python dependencies
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
# Create non-root user for security
|
||||
RUN useradd --create-home --shell /bin/bash tracker && \
|
||||
chown -R tracker:tracker /app
|
||||
|
||||
# Copy application code
|
||||
COPY . .
|
||||
|
||||
# Create necessary directories
|
||||
RUN mkdir -p /app/logs && \
|
||||
mkdir -p /app/data && \
|
||||
chown -R tracker:tracker /app
|
||||
|
||||
# Switch to non-root user
|
||||
USER tracker
|
||||
|
||||
# Expose port
|
||||
EXPOSE 5000
|
||||
|
||||
# Health check
|
||||
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
|
||||
CMD curl -f http://localhost:5000/ || exit 1
|
||||
|
||||
# Run the application
|
||||
CMD ["python", "main.py"]
|
||||
206
README.md
206
README.md
@@ -1 +1,205 @@
|
||||
# price-tracker
|
||||
# Price Tracker 🛒
|
||||
|
||||
A comprehensive web scraper for tracking product prices across multiple e-commerce sites. Built with Python, Beautiful Soup, and Flask.
|
||||
|
||||
## Features ✨
|
||||
|
||||
- **Multi-site Price Tracking**: Monitor prices across Amazon, eBay, Walmart, and more
|
||||
- **Beautiful Web UI**: Clean, responsive interface for managing products and viewing price history
|
||||
- **Price Alerts**: Get notified when products reach your target price
|
||||
- **Historical Data**: View price trends with interactive charts
|
||||
- **Automated Scraping**: Schedule regular price checks
|
||||
- **Multiple Notifications**: Email and webhook notifications
|
||||
- **Robust Scraping**: Built-in retry logic, rotating user agents, and rate limiting
|
||||
|
||||
## Quick Start 🚀
|
||||
|
||||
1. **Clone and Setup**:
|
||||
```bash
|
||||
git clone <your-repo-url>
|
||||
cd price-tracker
|
||||
chmod +x setup.sh
|
||||
./setup.sh
|
||||
```
|
||||
|
||||
2. **Start the Web UI**:
|
||||
```bash
|
||||
source venv/bin/activate
|
||||
python main.py --mode web
|
||||
```
|
||||
|
||||
3. **Visit**: http://localhost:5000
|
||||
|
||||
## Usage 📋
|
||||
|
||||
### Web Interface
|
||||
|
||||
The web interface provides:
|
||||
- **Dashboard**: Overview of all tracked products with current prices
|
||||
- **Add Products**: Easy form to add new products with URLs from multiple sites
|
||||
- **Product Details**: Detailed view with price history charts and statistics
|
||||
- **Settings**: Configuration management and system health checks
|
||||
|
||||
### Command Line
|
||||
|
||||
```bash
|
||||
# Start web UI
|
||||
python main.py --mode web
|
||||
|
||||
# Run scraping once
|
||||
python main.py --mode scrape
|
||||
|
||||
# Add sample products for testing
|
||||
python examples/add_sample_products.py
|
||||
|
||||
# Scheduled scraping (for cron jobs)
|
||||
python scripts/scheduled_scraping.py
|
||||
```
|
||||
|
||||
### Scheduled Scraping
|
||||
|
||||
Add to your crontab for automatic price checks:
|
||||
```bash
|
||||
# Every 6 hours
|
||||
0 */6 * * * cd /path/to/price-tracker && source venv/bin/activate && python scripts/scheduled_scraping.py
|
||||
|
||||
# Daily at 8 AM
|
||||
0 8 * * * cd /path/to/price-tracker && source venv/bin/activate && python scripts/scheduled_scraping.py
|
||||
```
|
||||
|
||||
## Configuration ⚙️
|
||||
|
||||
Edit `config.json` to customize:
|
||||
|
||||
### Scraping Settings
|
||||
```json
|
||||
{
|
||||
"scraping": {
|
||||
"delay_between_requests": 2,
|
||||
"max_concurrent_requests": 5,
|
||||
"timeout": 30,
|
||||
"retry_attempts": 3
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Email Notifications
|
||||
```json
|
||||
{
|
||||
"notifications": {
|
||||
"email": {
|
||||
"enabled": true,
|
||||
"smtp_server": "smtp.gmail.com",
|
||||
"smtp_port": 587,
|
||||
"sender_email": "your-email@gmail.com",
|
||||
"sender_password": "your-app-password",
|
||||
"recipient_email": "alerts@yourdomain.com"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Adding New Sites
|
||||
|
||||
Add new e-commerce sites by extending the sites configuration:
|
||||
|
||||
```json
|
||||
{
|
||||
"sites": {
|
||||
"your_site": {
|
||||
"enabled": true,
|
||||
"base_url": "https://www.yoursite.com",
|
||||
"selectors": {
|
||||
"price": [".price", ".cost"],
|
||||
"title": [".product-title"],
|
||||
"availability": [".stock-status"]
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Architecture 🏗️
|
||||
|
||||
- **`main.py`**: Application entry point
|
||||
- **`src/config.py`**: Configuration management
|
||||
- **`src/database.py`**: SQLite database operations
|
||||
- **`src/scraper.py`**: Core scraping logic with Beautiful Soup
|
||||
- **`src/scraper_manager.py`**: Scraping coordination and task management
|
||||
- **`src/notification.py`**: Email and webhook notifications
|
||||
- **`src/web_ui.py`**: Flask web interface
|
||||
- **`templates/`**: HTML templates with Bootstrap styling
|
||||
|
||||
## Features in Detail 🔍
|
||||
|
||||
### Smart Price Extraction
|
||||
- Multiple CSS selectors per site for robust price detection
|
||||
- Handles various price formats and currencies
|
||||
- Availability detection (in stock/out of stock)
|
||||
- Automatic retry with exponential backoff
|
||||
|
||||
### Data Storage
|
||||
- SQLite database for price history
|
||||
- Product management with URLs and target prices
|
||||
- Price statistics and trend analysis
|
||||
|
||||
### Web Interface
|
||||
- Responsive design with Bootstrap 5
|
||||
- Interactive price charts with Plotly
|
||||
- Real-time scraping from the UI
|
||||
- Product comparison and best price highlighting
|
||||
|
||||
### Notifications
|
||||
- Email alerts when target prices are reached
|
||||
- Webhook integration for custom notifications
|
||||
- Rich HTML email templates
|
||||
- Test notification functionality
|
||||
|
||||
## Tips for Best Results 📈
|
||||
|
||||
1. **Respectful Scraping**: The tool includes delays and rate limiting to be respectful to websites
|
||||
2. **URL Selection**: Use direct product page URLs, not search results or category pages
|
||||
3. **Target Prices**: Set realistic target prices based on historical data
|
||||
4. **Multiple Sites**: Track the same product on multiple sites for best deals
|
||||
5. **Regular Updates**: Run scraping regularly but not too frequently (every few hours is good)
|
||||
|
||||
## Troubleshooting 🔧
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **No prices found**: Check if the CSS selectors are correct for the site
|
||||
2. **403/429 errors**: Sites may be blocking requests - try different user agents or increase delays
|
||||
3. **Database errors**: Ensure the database file is writable
|
||||
4. **Email not working**: Verify SMTP settings and app passwords for Gmail
|
||||
|
||||
### Adding Debug Information
|
||||
|
||||
Enable debug logging by modifying the logging level in `main.py`:
|
||||
```python
|
||||
logging.basicConfig(level=logging.DEBUG)
|
||||
```
|
||||
|
||||
## Legal and Ethical Considerations ⚖️
|
||||
|
||||
- Respect robots.txt files
|
||||
- Don't overload servers with too many requests
|
||||
- Use for personal/educational purposes
|
||||
- Check terms of service for each site
|
||||
- Be mindful of rate limits
|
||||
|
||||
## Contributing 🤝
|
||||
|
||||
Feel free to contribute by:
|
||||
- Adding support for new e-commerce sites
|
||||
- Improving CSS selectors for existing sites
|
||||
- Adding new notification methods
|
||||
- Enhancing the web UI
|
||||
- Fixing bugs and improving performance
|
||||
|
||||
## License 📄
|
||||
|
||||
This project is for educational purposes. Please review the terms of service of websites you scrape and use responsibly.
|
||||
|
||||
---
|
||||
|
||||
**Happy price tracking! 🛍️**
|
||||
|
||||
37
build.sh
Executable file
37
build.sh
Executable file
@@ -0,0 +1,37 @@
|
||||
#!/bin/bash
|
||||
|
||||
# Build script for Price Tracker Docker container
|
||||
|
||||
set -e
|
||||
|
||||
# Configuration
|
||||
IMAGE_NAME="price-tracker"
|
||||
TAG="${1:-latest}"
|
||||
REGISTRY="${2:-your-registry.com}" # Replace with your actual registry
|
||||
|
||||
echo "Building Price Tracker Docker image..."
|
||||
|
||||
# Build the Docker image
|
||||
docker build -t "${IMAGE_NAME}:${TAG}" .
|
||||
|
||||
# Tag for registry if provided
|
||||
if [ "$REGISTRY" != "your-registry.com" ]; then
|
||||
docker tag "${IMAGE_NAME}:${TAG}" "${REGISTRY}/${IMAGE_NAME}:${TAG}"
|
||||
echo "Tagged image as ${REGISTRY}/${IMAGE_NAME}:${TAG}"
|
||||
fi
|
||||
|
||||
echo "Build completed successfully!"
|
||||
echo "Image: ${IMAGE_NAME}:${TAG}"
|
||||
|
||||
# Display image info
|
||||
docker images | grep "${IMAGE_NAME}"
|
||||
|
||||
echo ""
|
||||
echo "To run locally:"
|
||||
echo " docker run -p 5000:5000 ${IMAGE_NAME}:${TAG}"
|
||||
echo ""
|
||||
echo "To push to registry:"
|
||||
echo " docker push ${REGISTRY}/${IMAGE_NAME}:${TAG}"
|
||||
echo ""
|
||||
echo "To run with docker-compose:"
|
||||
echo " docker-compose up -d"
|
||||
113
config.json
Normal file
113
config.json
Normal file
@@ -0,0 +1,113 @@
|
||||
{
|
||||
"database": {
|
||||
"path": "price_tracker.db"
|
||||
},
|
||||
"scraping": {
|
||||
"delay_between_requests": 2,
|
||||
"max_concurrent_requests": 1,
|
||||
"timeout": 30,
|
||||
"retry_attempts": 3,
|
||||
"user_agents": [
|
||||
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36",
|
||||
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36",
|
||||
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
|
||||
]
|
||||
},
|
||||
"notifications": {
|
||||
"email": {
|
||||
"enabled": false,
|
||||
"smtp_server": "smtp.gmail.com",
|
||||
"smtp_port": 587,
|
||||
"sender_email": "",
|
||||
"sender_password": "",
|
||||
"recipient_email": ""
|
||||
},
|
||||
"webhook": {
|
||||
"enabled": false,
|
||||
"url": ""
|
||||
}
|
||||
},
|
||||
"sites": {
|
||||
"jjfoodservice": {
|
||||
"enabled": true,
|
||||
"base_url": "https://www.jjfoodservice.com",
|
||||
"selectors": {
|
||||
"price": [
|
||||
".price",
|
||||
".product-price",
|
||||
"[data-testid='price']",
|
||||
".price-value",
|
||||
".current-price",
|
||||
".product-card-price"
|
||||
],
|
||||
"title": [
|
||||
"h1",
|
||||
".product-title",
|
||||
".product-name",
|
||||
"[data-testid='product-title']",
|
||||
".product-card-title"
|
||||
],
|
||||
"availability": [
|
||||
".stock-status",
|
||||
".availability",
|
||||
"[data-testid='availability']",
|
||||
".product-availability"
|
||||
]
|
||||
}
|
||||
},
|
||||
"atoz_catering": {
|
||||
"enabled": true,
|
||||
"base_url": "https://www.atoz-catering.co.uk",
|
||||
"selectors": {
|
||||
"price": [
|
||||
".price",
|
||||
".product-price",
|
||||
".delivery-price",
|
||||
".collection-price",
|
||||
"span:contains('£')",
|
||||
".price-value"
|
||||
],
|
||||
"title": [
|
||||
"h1",
|
||||
".product-title",
|
||||
".product-name",
|
||||
"a[href*='/products/product/']",
|
||||
".product-link"
|
||||
],
|
||||
"availability": [
|
||||
".stock-status",
|
||||
".availability",
|
||||
".add-to-basket",
|
||||
"button:contains('Add To Basket')",
|
||||
".out-of-stock"
|
||||
]
|
||||
}
|
||||
},
|
||||
"amazon_uk": {
|
||||
"enabled": true,
|
||||
"base_url": "https://www.amazon.co.uk",
|
||||
"selectors": {
|
||||
"price": [
|
||||
".a-price-whole",
|
||||
".a-price .a-offscreen",
|
||||
"#priceblock_dealprice",
|
||||
"#priceblock_ourprice",
|
||||
".a-price-range",
|
||||
".a-price.a-text-price.a-size-medium.apexPriceToPay",
|
||||
".a-price-current"
|
||||
],
|
||||
"title": [
|
||||
"#productTitle",
|
||||
".product-title",
|
||||
"h1.a-size-large"
|
||||
],
|
||||
"availability": [
|
||||
"#availability span",
|
||||
".a-size-medium.a-color-success",
|
||||
".a-size-medium.a-color-state",
|
||||
"#availability .a-declarative"
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
50
deploy.sh
Executable file
50
deploy.sh
Executable file
@@ -0,0 +1,50 @@
|
||||
#!/bin/bash
|
||||
|
||||
# Deployment script for Price Tracker
|
||||
|
||||
set -e
|
||||
|
||||
# Configuration
|
||||
IMAGE_NAME="price-tracker"
|
||||
TAG="${1:-latest}"
|
||||
REGISTRY="${2:-your-registry.com}" # Replace with your actual registry
|
||||
CONTAINER_NAME="price-tracker"
|
||||
|
||||
echo "Deploying Price Tracker..."
|
||||
|
||||
# Pull latest image if using registry
|
||||
if [ "$REGISTRY" != "your-registry.com" ]; then
|
||||
echo "Pulling latest image from registry..."
|
||||
docker pull "${REGISTRY}/${IMAGE_NAME}:${TAG}"
|
||||
fi
|
||||
|
||||
# Stop and remove existing container if it exists
|
||||
if docker ps -a | grep -q "${CONTAINER_NAME}"; then
|
||||
echo "Stopping existing container..."
|
||||
docker stop "${CONTAINER_NAME}" || true
|
||||
docker rm "${CONTAINER_NAME}" || true
|
||||
fi
|
||||
|
||||
# Create data and logs directories if they don't exist
|
||||
mkdir -p ./data ./logs
|
||||
|
||||
# Run the container
|
||||
echo "Starting new container..."
|
||||
docker run -d \
|
||||
--name "${CONTAINER_NAME}" \
|
||||
--restart unless-stopped \
|
||||
-p 5000:5000 \
|
||||
-v "$(pwd)/data:/app/data" \
|
||||
-v "$(pwd)/logs:/app/logs" \
|
||||
-v "$(pwd)/config.json:/app/config.json:ro" \
|
||||
-e FLASK_ENV=production \
|
||||
"${REGISTRY}/${IMAGE_NAME}:${TAG}"
|
||||
|
||||
echo "Container started successfully!"
|
||||
echo "Access the application at: http://localhost:5000"
|
||||
echo ""
|
||||
echo "To view logs:"
|
||||
echo " docker logs -f ${CONTAINER_NAME}"
|
||||
echo ""
|
||||
echo "To stop the container:"
|
||||
echo " docker stop ${CONTAINER_NAME}"
|
||||
34
docker-compose.yml
Normal file
34
docker-compose.yml
Normal file
@@ -0,0 +1,34 @@
|
||||
version: '3.8'
|
||||
|
||||
services:
|
||||
price-tracker:
|
||||
build: .
|
||||
container_name: price-tracker
|
||||
restart: unless-stopped
|
||||
ports:
|
||||
- "5000:5000"
|
||||
environment:
|
||||
- FLASK_ENV=production
|
||||
- PYTHONUNBUFFERED=1
|
||||
volumes:
|
||||
# Mount database and logs for persistence
|
||||
- ./data:/app/data
|
||||
- ./logs:/app/logs
|
||||
# Mount config for easy updates
|
||||
- ./config.json:/app/config.json:ro
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost:5000/"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 40s
|
||||
networks:
|
||||
- price-tracker-network
|
||||
|
||||
networks:
|
||||
price-tracker-network:
|
||||
driver: bridge
|
||||
|
||||
volumes:
|
||||
price-tracker-data:
|
||||
price-tracker-logs:
|
||||
85
examples/add_sample_products.py
Normal file
85
examples/add_sample_products.py
Normal file
@@ -0,0 +1,85 @@
|
||||
"""
|
||||
Example script to add sample products for testing
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||
|
||||
from src.database import DatabaseManager
|
||||
from src.config import Config
|
||||
|
||||
def add_sample_products():
|
||||
"""Add some sample products for testing."""
|
||||
config = Config()
|
||||
db_manager = DatabaseManager(config.database_path)
|
||||
|
||||
# Sample products with real URLs (for demonstration)
|
||||
sample_products = [
|
||||
{
|
||||
'name': 'AirPods Pro (2nd Generation)',
|
||||
'description': 'Apple AirPods Pro with Active Noise Cancellation',
|
||||
'target_price': 200.00,
|
||||
'urls': {
|
||||
'amazon': 'https://www.amazon.com/dp/B0BDHWDR12',
|
||||
'walmart': 'https://www.walmart.com/ip/AirPods-Pro-2nd-generation/1952646965'
|
||||
}
|
||||
},
|
||||
{
|
||||
'name': 'Sony WH-1000XM4 Headphones',
|
||||
'description': 'Wireless Noise Canceling Over-Ear Headphones',
|
||||
'target_price': 250.00,
|
||||
'urls': {
|
||||
'amazon': 'https://www.amazon.com/dp/B0863TXGM3',
|
||||
'ebay': 'https://www.ebay.com/itm/Sony-WH-1000XM4-Wireless-Headphones/324298765234'
|
||||
}
|
||||
},
|
||||
{
|
||||
'name': 'iPad Air (5th Generation)',
|
||||
'description': '10.9-inch iPad Air with M1 Chip, 64GB',
|
||||
'target_price': 500.00,
|
||||
'urls': {
|
||||
'amazon': 'https://www.amazon.com/dp/B09V3HN1KC',
|
||||
'walmart': 'https://www.walmart.com/ip/iPad-Air-5th-Gen/612825603'
|
||||
}
|
||||
},
|
||||
{
|
||||
'name': 'Nintendo Switch OLED',
|
||||
'description': 'Nintendo Switch OLED Model Gaming Console',
|
||||
'target_price': 300.00,
|
||||
'urls': {
|
||||
'amazon': 'https://www.amazon.com/dp/B098RKWHHZ',
|
||||
'walmart': 'https://www.walmart.com/ip/Nintendo-Switch-OLED/910582148'
|
||||
}
|
||||
},
|
||||
{
|
||||
'name': 'Samsung 55" 4K Smart TV',
|
||||
'description': 'Samsung 55-inch Crystal UHD 4K Smart TV',
|
||||
'target_price': 400.00,
|
||||
'urls': {
|
||||
'amazon': 'https://www.amazon.com/dp/B08T6F5H1Y',
|
||||
'walmart': 'https://www.walmart.com/ip/Samsung-55-Class-4K-Crystal-UHD/485926403'
|
||||
}
|
||||
}
|
||||
]
|
||||
|
||||
print("Adding sample products...")
|
||||
|
||||
for product_data in sample_products:
|
||||
try:
|
||||
product_id = db_manager.add_product(
|
||||
name=product_data['name'],
|
||||
description=product_data['description'],
|
||||
target_price=product_data['target_price'],
|
||||
urls=product_data['urls']
|
||||
)
|
||||
print(f"✓ Added: {product_data['name']} (ID: {product_id})")
|
||||
except Exception as e:
|
||||
print(f"✗ Failed to add {product_data['name']}: {e}")
|
||||
|
||||
print("\nSample products added successfully!")
|
||||
print("You can now run the web UI with: python main.py --mode web")
|
||||
print("Or start scraping with: python main.py --mode scrape")
|
||||
|
||||
if __name__ == "__main__":
|
||||
add_sample_products()
|
||||
99
examples/add_uk_catering_products.py
Normal file
99
examples/add_uk_catering_products.py
Normal file
@@ -0,0 +1,99 @@
|
||||
"""
|
||||
Example script to add UK catering sample products for testing
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||
|
||||
from src.database import DatabaseManager
|
||||
from src.config import Config
|
||||
|
||||
def add_uk_catering_products():
|
||||
"""Add some sample UK catering products for testing."""
|
||||
config = Config()
|
||||
db_manager = DatabaseManager(config.database_path)
|
||||
|
||||
# Sample UK catering products with example URLs
|
||||
# Note: These are example URLs - you'll need to replace with real product URLs
|
||||
sample_products = [
|
||||
{
|
||||
'name': 'McCain Straight Cut Oven Chips 2.5kg',
|
||||
'description': 'Frozen straight cut oven chips for catering use',
|
||||
'target_price': 4.50,
|
||||
'urls': {
|
||||
'jjfoodservice': 'https://www.jjfoodservice.com/products/mccain-straight-cut-oven-chips',
|
||||
'atoz_catering': 'https://www.atoz-catering.co.uk/products/product/mccain-straight-cut-oven-chips-25kg'
|
||||
}
|
||||
},
|
||||
{
|
||||
'name': 'Heinz Baked Beans 6x2.62kg',
|
||||
'description': 'Catering size baked beans in tomato sauce',
|
||||
'target_price': 25.00,
|
||||
'urls': {
|
||||
'atoz_catering': 'https://www.atoz-catering.co.uk/products/product/heinz-baked-beans--6x262kg',
|
||||
'jjfoodservice': 'https://www.jjfoodservice.com/products/heinz-baked-beans-catering'
|
||||
}
|
||||
},
|
||||
{
|
||||
'name': 'Chef Select Chicken Breast Fillets 2kg',
|
||||
'description': 'Fresh chicken breast fillets for professional kitchens',
|
||||
'target_price': 12.00,
|
||||
'urls': {
|
||||
'jjfoodservice': 'https://www.jjfoodservice.com/products/chicken-breast-fillets-2kg',
|
||||
'atoz_catering': 'https://www.atoz-catering.co.uk/products/product/chicken-breast-fillets-2kg'
|
||||
}
|
||||
},
|
||||
{
|
||||
'name': 'Whole Milk 2 Litre Bottles (Case of 6)',
|
||||
'description': 'Fresh whole milk in 2L bottles for catering',
|
||||
'target_price': 8.00,
|
||||
'urls': {
|
||||
'atoz_catering': 'https://www.atoz-catering.co.uk/products/product/cotteswold-whole-milk-1x2lt-blue',
|
||||
'jjfoodservice': 'https://www.jjfoodservice.com/products/whole-milk-2l-case'
|
||||
}
|
||||
},
|
||||
{
|
||||
'name': 'Vegetable Oil 20L Container',
|
||||
'description': 'Catering vegetable oil for deep frying and cooking',
|
||||
'target_price': 35.00,
|
||||
'urls': {
|
||||
'jjfoodservice': 'https://www.jjfoodservice.com/products/vegetable-oil-20l',
|
||||
'atoz_catering': 'https://www.atoz-catering.co.uk/products/product/vegetable-oil-20l-container'
|
||||
}
|
||||
},
|
||||
{
|
||||
'name': 'Plain Flour 16kg Sack',
|
||||
'description': 'Professional baking flour for commercial use',
|
||||
'target_price': 18.00,
|
||||
'urls': {
|
||||
'atoz_catering': 'https://www.atoz-catering.co.uk/products/product/plain-flour-16kg-sack',
|
||||
'jjfoodservice': 'https://www.jjfoodservice.com/products/plain-flour-16kg'
|
||||
}
|
||||
}
|
||||
]
|
||||
|
||||
print("Adding UK catering sample products...")
|
||||
|
||||
for product_data in sample_products:
|
||||
try:
|
||||
product_id = db_manager.add_product(
|
||||
name=product_data['name'],
|
||||
description=product_data['description'],
|
||||
target_price=product_data['target_price'],
|
||||
urls=product_data['urls']
|
||||
)
|
||||
print(f"✓ Added: {product_data['name']} (ID: {product_id})")
|
||||
except Exception as e:
|
||||
print(f"✗ Failed to add {product_data['name']}: {e}")
|
||||
|
||||
print("\nUK catering sample products added successfully!")
|
||||
print("Note: The URLs in this example are placeholders.")
|
||||
print("You'll need to replace them with real product URLs from:")
|
||||
print("- JJ Food Service: https://www.jjfoodservice.com/")
|
||||
print("- A to Z Catering: https://www.atoz-catering.co.uk/")
|
||||
print("\nYou can now run the web UI with: python main.py --mode web")
|
||||
print("Or start scraping with: python main.py --mode scrape")
|
||||
|
||||
if __name__ == "__main__":
|
||||
add_uk_catering_products()
|
||||
115
main.py
Normal file
115
main.py
Normal file
@@ -0,0 +1,115 @@
|
||||
#!/uhttps://www.atoz-catering.co.uksr/bin/env python3
|
||||
"""
|
||||
Price Tracker - Web Scraper for Product Price Monitoring
|
||||
Tracks product prices across multiple e-commerce sites
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
from datetime import datetime
|
||||
from typing import List, Dict, Optional
|
||||
import argparse
|
||||
|
||||
from src.scraper_manager import ScraperManager
|
||||
from src.database import DatabaseManager
|
||||
from src.config import Config
|
||||
from src.notification import NotificationManager
|
||||
from src.web_ui import create_app
|
||||
|
||||
# Configure logging
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
|
||||
handlers=[
|
||||
logging.FileHandler('price_tracker.log'),
|
||||
logging.StreamHandler()
|
||||
]
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
async def run_scraper():
|
||||
"""Run the price scraping process."""
|
||||
try:
|
||||
config = Config()
|
||||
db_manager = DatabaseManager(config.database_path)
|
||||
scraper_manager = ScraperManager(config)
|
||||
notification_manager = NotificationManager(config)
|
||||
|
||||
logger.info("Starting price tracking session")
|
||||
|
||||
# Load products from database
|
||||
products = db_manager.get_all_products()
|
||||
if not products:
|
||||
logger.warning("No products found in database. Add products first.")
|
||||
return
|
||||
|
||||
# Scrape prices for all products
|
||||
results = await scraper_manager.scrape_all_products(products)
|
||||
|
||||
# Process results and save to database
|
||||
price_alerts = []
|
||||
for product_id, site_prices in results.items():
|
||||
for site_name, price_data in site_prices.items():
|
||||
if price_data['success']:
|
||||
# Save price to database
|
||||
db_manager.save_price_history(
|
||||
product_id=product_id,
|
||||
site_name=site_name,
|
||||
price=price_data['price'],
|
||||
currency=price_data.get('currency', 'USD'),
|
||||
availability=price_data.get('availability', True),
|
||||
timestamp=datetime.now()
|
||||
)
|
||||
|
||||
# Check for price alerts
|
||||
product = db_manager.get_product(product_id)
|
||||
if product and price_data['price'] <= product['target_price']:
|
||||
price_alerts.append({
|
||||
'product': product,
|
||||
'site': site_name,
|
||||
'current_price': price_data['price'],
|
||||
'target_price': product['target_price']
|
||||
})
|
||||
|
||||
# Send notifications for price alerts
|
||||
if price_alerts:
|
||||
await notification_manager.send_price_alerts(price_alerts)
|
||||
|
||||
logger.info(f"Scraping completed. Found {len(price_alerts)} price alerts.")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error during scraping: {e}")
|
||||
raise
|
||||
|
||||
|
||||
def run_web_ui():
|
||||
"""Run the web UI for managing products and viewing price history."""
|
||||
import os
|
||||
|
||||
# Use environment variables for configuration
|
||||
host = os.environ.get('FLASK_HOST', '0.0.0.0')
|
||||
port = int(os.environ.get('FLASK_PORT', 5000))
|
||||
debug = os.environ.get('FLASK_ENV', 'production').lower() != 'production'
|
||||
|
||||
app = create_app()
|
||||
logger.info(f"Starting Price Tracker web server on {host}:{port}")
|
||||
app.run(host=host, port=port, debug=debug)
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description='Price Tracker')
|
||||
parser.add_argument('--mode', choices=['scrape', 'web'], default='web',
|
||||
help='Run mode: scrape prices or start web UI')
|
||||
parser.add_argument('--config', help='Path to config file')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if args.mode == 'scrape':
|
||||
asyncio.run(run_scraper())
|
||||
else:
|
||||
run_web_ui()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
15
requirements.txt
Normal file
15
requirements.txt
Normal file
@@ -0,0 +1,15 @@
|
||||
beautifulsoup4==4.12.3
|
||||
requests==2.31.0
|
||||
aiohttp==3.9.1
|
||||
flask==3.0.0
|
||||
flask-wtf==1.2.1
|
||||
wtforms==3.1.1
|
||||
python-dotenv==1.0.0
|
||||
lxml==5.1.0
|
||||
fake-useragent==1.4.0
|
||||
email-validator==2.1.0
|
||||
jinja2==3.1.2
|
||||
plotly==5.17.0
|
||||
pandas==2.1.4
|
||||
numpy==1.26.2
|
||||
python-dateutil==2.8.2
|
||||
102
scripts/scheduled_scraping.py
Normal file
102
scripts/scheduled_scraping.py
Normal file
@@ -0,0 +1,102 @@
|
||||
"""
|
||||
Scheduled price scraping script for cron jobs
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import asyncio
|
||||
import logging
|
||||
from datetime import datetime
|
||||
|
||||
# Add the parent directory to sys.path to import our modules
|
||||
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||
|
||||
from src.config import Config
|
||||
from src.database import DatabaseManager
|
||||
from src.scraper_manager import ScraperManager
|
||||
from src.notification import NotificationManager
|
||||
|
||||
# Configure logging
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format='%(asctime)s - %(levelname)s - %(message)s',
|
||||
handlers=[
|
||||
logging.FileHandler('scheduled_scraping.log'),
|
||||
logging.StreamHandler()
|
||||
]
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
async def run_scheduled_scraping():
|
||||
"""Run the scheduled price scraping."""
|
||||
try:
|
||||
logger.info("=== Starting scheduled price scraping ===")
|
||||
|
||||
# Initialize components
|
||||
config = Config()
|
||||
db_manager = DatabaseManager(config.database_path)
|
||||
scraper_manager = ScraperManager(config)
|
||||
notification_manager = NotificationManager(config)
|
||||
|
||||
# Get all products
|
||||
products = db_manager.get_all_products()
|
||||
if not products:
|
||||
logger.warning("No products found in database")
|
||||
return
|
||||
|
||||
logger.info(f"Found {len(products)} products to scrape")
|
||||
|
||||
# Scrape all products
|
||||
results = await scraper_manager.scrape_all_products(products)
|
||||
|
||||
# Process results
|
||||
total_success = 0
|
||||
total_failed = 0
|
||||
price_alerts = []
|
||||
|
||||
for product_id, site_results in results.items():
|
||||
product = db_manager.get_product(product_id)
|
||||
|
||||
for site_name, result in site_results.items():
|
||||
if result['success']:
|
||||
total_success += 1
|
||||
|
||||
# Save to database
|
||||
db_manager.save_price_history(
|
||||
product_id=product_id,
|
||||
site_name=site_name,
|
||||
price=result['price'],
|
||||
currency=result.get('currency', 'USD'),
|
||||
availability=result.get('availability', True),
|
||||
timestamp=datetime.now()
|
||||
)
|
||||
|
||||
# Check for price alerts
|
||||
if product and product['target_price'] and result['price'] <= product['target_price']:
|
||||
price_alerts.append({
|
||||
'product': product,
|
||||
'site': site_name,
|
||||
'current_price': result['price'],
|
||||
'target_price': product['target_price']
|
||||
})
|
||||
|
||||
logger.info(f"Price alert: {product['name']} on {site_name} - ${result['price']:.2f}")
|
||||
else:
|
||||
total_failed += 1
|
||||
logger.error(f"Failed to scrape {product['name']} on {site_name}: {result.get('error', 'Unknown error')}")
|
||||
|
||||
# Send notifications for price alerts
|
||||
if price_alerts:
|
||||
await notification_manager.send_price_alerts(price_alerts)
|
||||
logger.info(f"Sent notifications for {len(price_alerts)} price alerts")
|
||||
|
||||
logger.info(f"Scraping completed: {total_success} successful, {total_failed} failed")
|
||||
logger.info(f"Found {len(price_alerts)} price alerts")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error during scheduled scraping: {e}", exc_info=True)
|
||||
raise
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(run_scheduled_scraping())
|
||||
66
setup.sh
Executable file
66
setup.sh
Executable file
@@ -0,0 +1,66 @@
|
||||
#!/bin/bash
|
||||
|
||||
# Price Tracker Setup Script
|
||||
# This script helps set up the price tracker environment
|
||||
|
||||
echo "🛒 Price Tracker Setup"
|
||||
echo "====================="
|
||||
|
||||
# Check if Python 3.8+ is installed
|
||||
python_version=$(python3 -c 'import sys; print(".".join(map(str, sys.version_info[:2])))')
|
||||
echo "Python version: $python_version"
|
||||
|
||||
if python3 -c 'import sys; exit(0 if sys.version_info >= (3, 8) else 1)'; then
|
||||
echo "✓ Python version is suitable"
|
||||
else
|
||||
echo "✗ Python 3.8+ is required"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Create virtual environment
|
||||
echo ""
|
||||
echo "📦 Creating virtual environment..."
|
||||
python3 -m venv venv
|
||||
|
||||
# Activate virtual environment
|
||||
echo "🔧 Activating virtual environment..."
|
||||
source venv/bin/activate
|
||||
|
||||
# Install requirements
|
||||
echo "📥 Installing requirements..."
|
||||
pip install --upgrade pip
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Create initial database
|
||||
echo ""
|
||||
echo "🗄️ Initializing database..."
|
||||
python3 -c "
|
||||
from src.database import DatabaseManager
|
||||
from src.config import Config
|
||||
config = Config()
|
||||
db = DatabaseManager(config.database_path)
|
||||
print('Database initialized successfully!')
|
||||
"
|
||||
|
||||
# Ask if user wants to add sample products
|
||||
echo ""
|
||||
read -p "Would you like to add sample products for testing? (y/n): " -n 1 -r
|
||||
echo
|
||||
if [[ $REPLY =~ ^[Yy]$ ]]; then
|
||||
echo "🏪 Adding sample products..."
|
||||
python3 examples/add_sample_products.py
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "🎉 Setup complete!"
|
||||
echo ""
|
||||
echo "Next steps:"
|
||||
echo "1. Activate the virtual environment: source venv/bin/activate"
|
||||
echo "2. Configure settings in config.json if needed"
|
||||
echo "3. Start the web UI: python main.py --mode web"
|
||||
echo "4. Or run scraping: python main.py --mode scrape"
|
||||
echo ""
|
||||
echo "Web UI will be available at: http://localhost:5000"
|
||||
echo ""
|
||||
echo "For scheduled scraping, add this to your crontab:"
|
||||
echo "0 */6 * * * cd $(pwd) && source venv/bin/activate && python scripts/scheduled_scraping.py"
|
||||
7
src/__init__.py
Normal file
7
src/__init__.py
Normal file
@@ -0,0 +1,7 @@
|
||||
"""
|
||||
Price Tracker - Web scraper for monitoring product prices across multiple sites
|
||||
"""
|
||||
|
||||
__version__ = "1.0.0"
|
||||
__author__ = "Price Tracker Team"
|
||||
__description__ = "A comprehensive price tracking system using Beautiful Soup"
|
||||
86
src/config.py
Normal file
86
src/config.py
Normal file
@@ -0,0 +1,86 @@
|
||||
"""
|
||||
Configuration management for the price tracker
|
||||
"""
|
||||
|
||||
import json
|
||||
import os
|
||||
from typing import Dict, Any, Optional
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
class Config:
|
||||
"""Configuration manager for the price tracker application."""
|
||||
|
||||
def __init__(self, config_path: Optional[str] = None):
|
||||
self.config_path = config_path or "config.json"
|
||||
self._config = self._load_config()
|
||||
|
||||
def _load_config(self) -> Dict[str, Any]:
|
||||
"""Load configuration from JSON file."""
|
||||
config_file = Path(self.config_path)
|
||||
if not config_file.exists():
|
||||
raise FileNotFoundError(f"Config file not found: {self.config_path}")
|
||||
|
||||
with open(config_file, 'r') as f:
|
||||
return json.load(f)
|
||||
|
||||
@property
|
||||
def database_path(self) -> str:
|
||||
"""Get database file path."""
|
||||
return self._config.get('database', {}).get('path', 'price_tracker.db')
|
||||
|
||||
@property
|
||||
def scraping_config(self) -> Dict[str, Any]:
|
||||
"""Get scraping configuration."""
|
||||
return self._config.get('scraping', {})
|
||||
|
||||
@property
|
||||
def delay_between_requests(self) -> float:
|
||||
"""Get delay between requests in seconds."""
|
||||
return self.scraping_config.get('delay_between_requests', 2)
|
||||
|
||||
@property
|
||||
def max_concurrent_requests(self) -> int:
|
||||
"""Get maximum concurrent requests."""
|
||||
return self.scraping_config.get('max_concurrent_requests', 5)
|
||||
|
||||
@property
|
||||
def timeout(self) -> int:
|
||||
"""Get request timeout in seconds."""
|
||||
return self.scraping_config.get('timeout', 30)
|
||||
|
||||
@property
|
||||
def retry_attempts(self) -> int:
|
||||
"""Get number of retry attempts."""
|
||||
return self.scraping_config.get('retry_attempts', 3)
|
||||
|
||||
@property
|
||||
def user_agents(self) -> list:
|
||||
"""Get list of user agents."""
|
||||
return self.scraping_config.get('user_agents', [
|
||||
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
|
||||
])
|
||||
|
||||
@property
|
||||
def notification_config(self) -> Dict[str, Any]:
|
||||
"""Get notification configuration."""
|
||||
return self._config.get('notifications', {})
|
||||
|
||||
@property
|
||||
def sites_config(self) -> Dict[str, Any]:
|
||||
"""Get sites configuration."""
|
||||
return self._config.get('sites', {})
|
||||
|
||||
def get_site_config(self, site_name: str) -> Optional[Dict[str, Any]]:
|
||||
"""Get configuration for a specific site."""
|
||||
return self.sites_config.get(site_name)
|
||||
|
||||
def is_site_enabled(self, site_name: str) -> bool:
|
||||
"""Check if a site is enabled."""
|
||||
site_config = self.get_site_config(site_name)
|
||||
return site_config.get('enabled', False) if site_config else False
|
||||
|
||||
def get_enabled_sites(self) -> list:
|
||||
"""Get list of enabled sites."""
|
||||
return [site for site, config in self.sites_config.items()
|
||||
if config.get('enabled', False)]
|
||||
228
src/database.py
Normal file
228
src/database.py
Normal file
@@ -0,0 +1,228 @@
|
||||
"""
|
||||
Database management for price tracking
|
||||
"""
|
||||
|
||||
import sqlite3
|
||||
from datetime import datetime, timedelta
|
||||
from typing import List, Dict, Any, Optional
|
||||
import json
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class DatabaseManager:
|
||||
"""Manages SQLite database operations for price tracking."""
|
||||
|
||||
def __init__(self, db_path: str):
|
||||
self.db_path = db_path
|
||||
self._init_database()
|
||||
|
||||
def _init_database(self):
|
||||
"""Initialize database tables."""
|
||||
with sqlite3.connect(self.db_path) as conn:
|
||||
conn.execute('''
|
||||
CREATE TABLE IF NOT EXISTS products (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
name TEXT NOT NULL,
|
||||
description TEXT,
|
||||
target_price REAL,
|
||||
urls TEXT NOT NULL, -- JSON string of site URLs
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
active BOOLEAN DEFAULT 1
|
||||
)
|
||||
''')
|
||||
|
||||
conn.execute('''
|
||||
CREATE TABLE IF NOT EXISTS price_history (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
product_id INTEGER NOT NULL,
|
||||
site_name TEXT NOT NULL,
|
||||
price REAL NOT NULL,
|
||||
currency TEXT DEFAULT 'GBP',
|
||||
availability BOOLEAN DEFAULT 1,
|
||||
timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
FOREIGN KEY (product_id) REFERENCES products (id)
|
||||
)
|
||||
''')
|
||||
|
||||
conn.execute('''
|
||||
CREATE TABLE IF NOT EXISTS price_alerts (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
product_id INTEGER NOT NULL,
|
||||
site_name TEXT NOT NULL,
|
||||
alert_price REAL NOT NULL,
|
||||
triggered_at TIMESTAMP,
|
||||
notified BOOLEAN DEFAULT 0,
|
||||
FOREIGN KEY (product_id) REFERENCES products (id)
|
||||
)
|
||||
''')
|
||||
|
||||
conn.execute('''
|
||||
CREATE INDEX IF NOT EXISTS idx_price_history_product_id
|
||||
ON price_history (product_id)
|
||||
''')
|
||||
|
||||
conn.execute('''
|
||||
CREATE INDEX IF NOT EXISTS idx_price_history_timestamp
|
||||
ON price_history (timestamp)
|
||||
''')
|
||||
|
||||
def add_product(self, name: str, urls: Dict[str, str],
|
||||
description: str = None, target_price: float = None) -> int:
|
||||
"""Add a new product to track."""
|
||||
urls_json = json.dumps(urls)
|
||||
|
||||
with sqlite3.connect(self.db_path) as conn:
|
||||
cursor = conn.execute('''
|
||||
INSERT INTO products (name, description, target_price, urls)
|
||||
VALUES (?, ?, ?, ?)
|
||||
''', (name, description, target_price, urls_json))
|
||||
|
||||
product_id = cursor.lastrowid
|
||||
logger.info(f"Added product: {name} (ID: {product_id})")
|
||||
return product_id
|
||||
|
||||
def get_product(self, product_id: int) -> Optional[Dict[str, Any]]:
|
||||
"""Get product by ID."""
|
||||
with sqlite3.connect(self.db_path) as conn:
|
||||
conn.row_factory = sqlite3.Row
|
||||
cursor = conn.execute('''
|
||||
SELECT * FROM products WHERE id = ? AND active = 1
|
||||
''', (product_id,))
|
||||
|
||||
row = cursor.fetchone()
|
||||
if row:
|
||||
product = dict(row)
|
||||
product['urls'] = json.loads(product['urls'])
|
||||
return product
|
||||
return None
|
||||
|
||||
def get_all_products(self) -> List[Dict[str, Any]]:
|
||||
"""Get all active products."""
|
||||
with sqlite3.connect(self.db_path) as conn:
|
||||
conn.row_factory = sqlite3.Row
|
||||
cursor = conn.execute('''
|
||||
SELECT * FROM products WHERE active = 1 ORDER BY name
|
||||
''')
|
||||
|
||||
products = []
|
||||
for row in cursor.fetchall():
|
||||
product = dict(row)
|
||||
product['urls'] = json.loads(product['urls'])
|
||||
products.append(product)
|
||||
|
||||
return products
|
||||
|
||||
def update_product(self, product_id: int, **kwargs):
|
||||
"""Update product information."""
|
||||
allowed_fields = ['name', 'description', 'target_price', 'urls']
|
||||
updates = []
|
||||
values = []
|
||||
|
||||
for field, value in kwargs.items():
|
||||
if field in allowed_fields:
|
||||
if field == 'urls':
|
||||
value = json.dumps(value)
|
||||
updates.append(f"{field} = ?")
|
||||
values.append(value)
|
||||
|
||||
if not updates:
|
||||
return
|
||||
|
||||
updates.append("updated_at = ?")
|
||||
values.append(datetime.now())
|
||||
values.append(product_id)
|
||||
|
||||
with sqlite3.connect(self.db_path) as conn:
|
||||
conn.execute(f'''
|
||||
UPDATE products SET {', '.join(updates)} WHERE id = ?
|
||||
''', values)
|
||||
|
||||
def deactivate_product(self, product_id: int):
|
||||
"""Deactivate a product (soft delete)."""
|
||||
with sqlite3.connect(self.db_path) as conn:
|
||||
conn.execute('''
|
||||
UPDATE products SET active = 0, updated_at = ? WHERE id = ?
|
||||
''', (datetime.now(), product_id))
|
||||
|
||||
def save_price_history(self, product_id: int, site_name: str, price: float,
|
||||
currency: str = 'GBP', availability: bool = True,
|
||||
timestamp: datetime = None):
|
||||
"""Save price history entry."""
|
||||
if timestamp is None:
|
||||
timestamp = datetime.now()
|
||||
|
||||
with sqlite3.connect(self.db_path) as conn:
|
||||
conn.execute('''
|
||||
INSERT INTO price_history
|
||||
(product_id, site_name, price, currency, availability, timestamp)
|
||||
VALUES (?, ?, ?, ?, ?, ?)
|
||||
''', (product_id, site_name, price, currency, availability, timestamp))
|
||||
|
||||
def get_price_history(self, product_id: int, days: int = 30) -> List[Dict[str, Any]]:
|
||||
"""Get price history for a product."""
|
||||
start_date = datetime.now() - timedelta(days=days)
|
||||
|
||||
with sqlite3.connect(self.db_path) as conn:
|
||||
conn.row_factory = sqlite3.Row
|
||||
cursor = conn.execute('''
|
||||
SELECT * FROM price_history
|
||||
WHERE product_id = ? AND timestamp >= ?
|
||||
ORDER BY timestamp DESC
|
||||
''', (product_id, start_date))
|
||||
|
||||
return [dict(row) for row in cursor.fetchall()]
|
||||
|
||||
def get_latest_prices(self, product_id: int) -> Dict[str, Dict[str, Any]]:
|
||||
"""Get latest price for each site for a product."""
|
||||
with sqlite3.connect(self.db_path) as conn:
|
||||
conn.row_factory = sqlite3.Row
|
||||
cursor = conn.execute('''
|
||||
SELECT DISTINCT site_name,
|
||||
FIRST_VALUE(price) OVER (PARTITION BY site_name ORDER BY timestamp DESC) as price,
|
||||
FIRST_VALUE(currency) OVER (PARTITION BY site_name ORDER BY timestamp DESC) as currency,
|
||||
FIRST_VALUE(availability) OVER (PARTITION BY site_name ORDER BY timestamp DESC) as availability,
|
||||
FIRST_VALUE(timestamp) OVER (PARTITION BY site_name ORDER BY timestamp DESC) as timestamp
|
||||
FROM price_history
|
||||
WHERE product_id = ?
|
||||
''', (product_id,))
|
||||
|
||||
result = {}
|
||||
for row in cursor.fetchall():
|
||||
result[row['site_name']] = {
|
||||
'price': row['price'],
|
||||
'currency': row['currency'],
|
||||
'availability': bool(row['availability']),
|
||||
'timestamp': row['timestamp']
|
||||
}
|
||||
|
||||
return result
|
||||
|
||||
def get_price_statistics(self, product_id: int, days: int = 30) -> Dict[str, Any]:
|
||||
"""Get price statistics for a product."""
|
||||
start_date = datetime.now() - timedelta(days=days)
|
||||
|
||||
with sqlite3.connect(self.db_path) as conn:
|
||||
cursor = conn.execute('''
|
||||
SELECT site_name,
|
||||
MIN(price) as min_price,
|
||||
MAX(price) as max_price,
|
||||
AVG(price) as avg_price,
|
||||
COUNT(*) as data_points
|
||||
FROM price_history
|
||||
WHERE product_id = ? AND timestamp >= ?
|
||||
GROUP BY site_name
|
||||
''', (product_id, start_date))
|
||||
|
||||
stats = {}
|
||||
for row in cursor.fetchall():
|
||||
stats[row[0]] = {
|
||||
'min_price': row[1],
|
||||
'max_price': row[2],
|
||||
'avg_price': round(row[3], 2),
|
||||
'data_points': row[4]
|
||||
}
|
||||
|
||||
return stats
|
||||
192
src/notification.py
Normal file
192
src/notification.py
Normal file
@@ -0,0 +1,192 @@
|
||||
"""
|
||||
Notification system for price alerts
|
||||
"""
|
||||
|
||||
import smtplib
|
||||
import logging
|
||||
import aiohttp
|
||||
from email.mime.text import MIMEText
|
||||
from email.mime.multipart import MIMEMultipart
|
||||
from typing import List, Dict, Any
|
||||
from datetime import datetime
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class NotificationManager:
|
||||
"""Manages notifications for price alerts."""
|
||||
|
||||
def __init__(self, config):
|
||||
self.config = config
|
||||
self.notification_config = config.notification_config
|
||||
|
||||
async def send_price_alerts(self, alerts: List[Dict[str, Any]]):
|
||||
"""Send notifications for price alerts."""
|
||||
if not alerts:
|
||||
return
|
||||
|
||||
# Send email notifications
|
||||
if self.notification_config.get('email', {}).get('enabled', False):
|
||||
await self._send_email_alerts(alerts)
|
||||
|
||||
# Send webhook notifications
|
||||
if self.notification_config.get('webhook', {}).get('enabled', False):
|
||||
await self._send_webhook_alerts(alerts)
|
||||
|
||||
async def _send_email_alerts(self, alerts: List[Dict[str, Any]]):
|
||||
"""Send email notifications for price alerts."""
|
||||
email_config = self.notification_config.get('email', {})
|
||||
|
||||
try:
|
||||
# Create email content
|
||||
subject = f"Price Alert: {len(alerts)} product(s) at target price!"
|
||||
body = self._create_email_body(alerts)
|
||||
|
||||
# Create message
|
||||
msg = MIMEMultipart()
|
||||
msg['From'] = email_config.get('sender_email')
|
||||
msg['To'] = email_config.get('recipient_email')
|
||||
msg['Subject'] = subject
|
||||
|
||||
msg.attach(MIMEText(body, 'html'))
|
||||
|
||||
# Send email
|
||||
server = smtplib.SMTP(email_config.get('smtp_server'), email_config.get('smtp_port'))
|
||||
server.starttls()
|
||||
server.login(email_config.get('sender_email'), email_config.get('sender_password'))
|
||||
|
||||
text = msg.as_string()
|
||||
server.sendmail(email_config.get('sender_email'),
|
||||
email_config.get('recipient_email'), text)
|
||||
server.quit()
|
||||
|
||||
logger.info(f"Email alert sent for {len(alerts)} products")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to send email alert: {e}")
|
||||
|
||||
async def _send_webhook_alerts(self, alerts: List[Dict[str, Any]]):
|
||||
"""Send webhook notifications for price alerts."""
|
||||
webhook_config = self.notification_config.get('webhook', {})
|
||||
webhook_url = webhook_config.get('url')
|
||||
|
||||
if not webhook_url:
|
||||
return
|
||||
|
||||
try:
|
||||
payload = {
|
||||
'timestamp': datetime.now().isoformat(),
|
||||
'alert_count': len(alerts),
|
||||
'alerts': []
|
||||
}
|
||||
|
||||
for alert in alerts:
|
||||
payload['alerts'].append({
|
||||
'product_name': alert['product']['name'],
|
||||
'site': alert['site'],
|
||||
'current_price': alert['current_price'],
|
||||
'target_price': alert['target_price'],
|
||||
'savings': alert['target_price'] - alert['current_price']
|
||||
})
|
||||
|
||||
async with aiohttp.ClientSession() as session:
|
||||
async with session.post(webhook_url, json=payload) as response:
|
||||
if response.status == 200:
|
||||
logger.info(f"Webhook alert sent for {len(alerts)} products")
|
||||
else:
|
||||
logger.error(f"Webhook failed with status {response.status}")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to send webhook alert: {e}")
|
||||
|
||||
def _create_email_body(self, alerts: List[Dict[str, Any]]) -> str:
|
||||
"""Create HTML email body for price alerts."""
|
||||
html = """
|
||||
<html>
|
||||
<head>
|
||||
<style>
|
||||
body { font-family: Arial, sans-serif; margin: 20px; }
|
||||
.header { background-color: #4CAF50; color: white; padding: 20px; text-align: center; }
|
||||
.alert { border: 1px solid #ddd; margin: 10px 0; padding: 15px; background-color: #f9f9f9; }
|
||||
.product-name { font-size: 18px; font-weight: bold; color: #333; }
|
||||
.price-info { margin: 10px 0; }
|
||||
.current-price { color: #4CAF50; font-weight: bold; font-size: 16px; }
|
||||
.target-price { color: #666; }
|
||||
.savings { color: #FF5722; font-weight: bold; }
|
||||
.site { background-color: #2196F3; color: white; padding: 5px 10px; border-radius: 3px; font-size: 12px; }
|
||||
.footer { margin-top: 30px; font-size: 12px; color: #666; }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<div class="header">
|
||||
<h1>🎉 Price Alert!</h1>
|
||||
<p>Great news! We found products at your target price!</p>
|
||||
</div>
|
||||
"""
|
||||
|
||||
for alert in alerts:
|
||||
product = alert['product']
|
||||
savings = alert['target_price'] - alert['current_price']
|
||||
|
||||
html += f"""
|
||||
<div class="alert">
|
||||
<div class="product-name">{product['name']}</div>
|
||||
<div class="price-info">
|
||||
<span class="site">{alert['site'].upper()}</span>
|
||||
<br><br>
|
||||
<span class="current-price">Current Price: £{alert['current_price']:.2f}</span><br>
|
||||
<span class="target-price">Your Target: £{alert['target_price']:.2f}</span><br>
|
||||
<span class="savings">You Save: £{savings:.2f}</span>
|
||||
</div>
|
||||
</div>
|
||||
"""
|
||||
|
||||
html += """
|
||||
<div class="footer">
|
||||
<p>This is an automated price alert from your Price Tracker system.</p>
|
||||
<p>Happy shopping! 🛒</p>
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
||||
"""
|
||||
|
||||
return html
|
||||
|
||||
async def send_test_notification(self) -> Dict[str, Any]:
|
||||
"""Send a test notification to verify configuration."""
|
||||
test_result = {
|
||||
'email': {'enabled': False, 'success': False, 'error': None},
|
||||
'webhook': {'enabled': False, 'success': False, 'error': None}
|
||||
}
|
||||
|
||||
# Test email
|
||||
if self.notification_config.get('email', {}).get('enabled', False):
|
||||
test_result['email']['enabled'] = True
|
||||
try:
|
||||
test_alerts = [{
|
||||
'product': {'name': 'Test Product'},
|
||||
'site': 'test-site',
|
||||
'current_price': 19.99,
|
||||
'target_price': 25.00
|
||||
}]
|
||||
await self._send_email_alerts(test_alerts)
|
||||
test_result['email']['success'] = True
|
||||
except Exception as e:
|
||||
test_result['email']['error'] = str(e)
|
||||
|
||||
# Test webhook
|
||||
if self.notification_config.get('webhook', {}).get('enabled', False):
|
||||
test_result['webhook']['enabled'] = True
|
||||
try:
|
||||
test_alerts = [{
|
||||
'product': {'name': 'Test Product'},
|
||||
'site': 'test-site',
|
||||
'current_price': 19.99,
|
||||
'target_price': 25.00
|
||||
}]
|
||||
await self._send_webhook_alerts(test_alerts)
|
||||
test_result['webhook']['success'] = True
|
||||
except Exception as e:
|
||||
test_result['webhook']['error'] = str(e)
|
||||
|
||||
return test_result
|
||||
334
src/scraper.py
Normal file
334
src/scraper.py
Normal file
@@ -0,0 +1,334 @@
|
||||
"""
|
||||
Web scraping functionality for price tracking
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import aiohttp
|
||||
import logging
|
||||
import random
|
||||
import re
|
||||
from typing import Dict, List, Optional, Any, Tuple
|
||||
from urllib.parse import urljoin, urlparse
|
||||
from bs4 import BeautifulSoup
|
||||
from fake_useragent import UserAgent
|
||||
|
||||
from .config import Config
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class PriceScraper:
|
||||
"""Base class for price scraping functionality."""
|
||||
|
||||
def __init__(self, config: Config):
|
||||
self.config = config
|
||||
self.ua = UserAgent()
|
||||
self.session = None
|
||||
|
||||
async def __aenter__(self):
|
||||
"""Async context manager entry."""
|
||||
connector = aiohttp.TCPConnector(limit=self.config.max_concurrent_requests)
|
||||
timeout = aiohttp.ClientTimeout(total=self.config.timeout)
|
||||
self.session = aiohttp.ClientSession(
|
||||
connector=connector,
|
||||
timeout=timeout,
|
||||
headers={'User-Agent': self.ua.random}
|
||||
)
|
||||
return self
|
||||
|
||||
async def __aexit__(self, exc_type, exc_val, exc_tb):
|
||||
"""Async context manager exit."""
|
||||
if self.session:
|
||||
await self.session.close()
|
||||
|
||||
def _get_headers(self, url: str = None) -> Dict[str, str]:
|
||||
"""Get request headers with random user agent and site-specific headers."""
|
||||
user_agents = self.config.user_agents
|
||||
if user_agents:
|
||||
user_agent = random.choice(user_agents)
|
||||
else:
|
||||
user_agent = self.ua.random
|
||||
|
||||
headers = {
|
||||
'User-Agent': user_agent,
|
||||
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
|
||||
'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8',
|
||||
'Accept-Encoding': 'gzip, deflate, br',
|
||||
'DNT': '1',
|
||||
'Connection': 'keep-alive',
|
||||
'Upgrade-Insecure-Requests': '1',
|
||||
'Sec-Fetch-Dest': 'document',
|
||||
'Sec-Fetch-Mode': 'navigate',
|
||||
'Sec-Fetch-Site': 'none',
|
||||
}
|
||||
|
||||
# Add site-specific headers
|
||||
if url:
|
||||
if 'amazon.co.uk' in url:
|
||||
headers.update({
|
||||
'Referer': 'https://www.amazon.co.uk/',
|
||||
})
|
||||
elif 'jjfoodservice.com' in url:
|
||||
headers.update({
|
||||
'Referer': 'https://www.jjfoodservice.com/',
|
||||
})
|
||||
elif 'atoz-catering.co.uk' in url:
|
||||
headers.update({
|
||||
'Referer': 'https://www.atoz-catering.co.uk/',
|
||||
})
|
||||
|
||||
return headers
|
||||
|
||||
async def _fetch_page(self, url: str) -> Optional[str]:
|
||||
"""Fetch a web page with retry logic and anti-bot measures."""
|
||||
base_delay = random.uniform(1, 3) # Random delay between 1-3 seconds
|
||||
|
||||
for attempt in range(self.config.retry_attempts):
|
||||
try:
|
||||
# Add delay before each request (except first)
|
||||
if attempt > 0:
|
||||
delay = base_delay * (2 ** attempt) + random.uniform(0, 1)
|
||||
await asyncio.sleep(delay)
|
||||
|
||||
headers = self._get_headers(url)
|
||||
|
||||
async with self.session.get(url, headers=headers) as response:
|
||||
if response.status == 200:
|
||||
return await response.text()
|
||||
elif response.status == 403:
|
||||
logger.warning(f"Access denied (403) for {url} - may be blocked by anti-bot measures")
|
||||
# For 403 errors, wait longer before retry
|
||||
if attempt < self.config.retry_attempts - 1:
|
||||
await asyncio.sleep(random.uniform(5, 10))
|
||||
elif response.status == 429:
|
||||
logger.warning(f"Rate limited (429) for {url}")
|
||||
# For rate limiting, wait even longer
|
||||
if attempt < self.config.retry_attempts - 1:
|
||||
await asyncio.sleep(random.uniform(10, 20))
|
||||
else:
|
||||
logger.warning(f"HTTP {response.status} for {url}")
|
||||
|
||||
except Exception as e:
|
||||
logger.warning(f"Attempt {attempt + 1} failed for {url}: {e}")
|
||||
if attempt < self.config.retry_attempts - 1:
|
||||
await asyncio.sleep(base_delay * (2 ** attempt))
|
||||
|
||||
logger.error(f"Failed to fetch {url} after {self.config.retry_attempts} attempts")
|
||||
return None
|
||||
|
||||
def _extract_price(self, soup: BeautifulSoup, selectors: List[str]) -> Optional[float]:
|
||||
"""Extract price from HTML using CSS selectors."""
|
||||
for selector in selectors:
|
||||
try:
|
||||
elements = soup.select(selector)
|
||||
for element in elements:
|
||||
price_text = element.get_text(strip=True)
|
||||
price = self._parse_price(price_text)
|
||||
if price is not None:
|
||||
return price
|
||||
except Exception as e:
|
||||
logger.debug(f"Error with selector {selector}: {e}")
|
||||
continue
|
||||
|
||||
return None
|
||||
|
||||
def _parse_price(self, price_text: str) -> Optional[float]:
|
||||
"""Parse price from text string."""
|
||||
if not price_text:
|
||||
return None
|
||||
|
||||
# Remove common currency symbols and clean text
|
||||
price_text = re.sub(r'[^\d.,]+', '', price_text)
|
||||
price_text = price_text.replace(',', '')
|
||||
|
||||
# Try to extract price as float
|
||||
try:
|
||||
return float(price_text)
|
||||
except (ValueError, TypeError):
|
||||
# Try to find price pattern
|
||||
price_match = re.search(r'(\d+\.?\d*)', price_text)
|
||||
if price_match:
|
||||
return float(price_match.group(1))
|
||||
|
||||
return None
|
||||
|
||||
def _extract_text(self, soup: BeautifulSoup, selectors: List[str]) -> Optional[str]:
|
||||
"""Extract text from HTML using CSS selectors."""
|
||||
for selector in selectors:
|
||||
try:
|
||||
element = soup.select_one(selector)
|
||||
if element:
|
||||
return element.get_text(strip=True)
|
||||
except Exception as e:
|
||||
logger.debug(f"Error with selector {selector}: {e}")
|
||||
continue
|
||||
|
||||
return None
|
||||
|
||||
def _detect_site(self, url: str) -> Optional[str]:
|
||||
"""Detect which site this URL belongs to."""
|
||||
domain = urlparse(url).netloc.lower()
|
||||
|
||||
if 'amazon' in domain:
|
||||
return 'amazon'
|
||||
elif 'ebay' in domain:
|
||||
return 'ebay'
|
||||
elif 'walmart' in domain:
|
||||
return 'walmart'
|
||||
# Add more site detection logic here
|
||||
|
||||
return None
|
||||
|
||||
async def scrape_product_price(self, url: str, site_name: str = None) -> Dict[str, Any]:
|
||||
"""Scrape price for a single product from a URL."""
|
||||
result = {
|
||||
'success': False,
|
||||
'price': None,
|
||||
'currency': 'GBP',
|
||||
'title': None,
|
||||
'availability': None,
|
||||
'url': url,
|
||||
'error': None
|
||||
}
|
||||
|
||||
try:
|
||||
# Auto-detect site if not provided
|
||||
if not site_name:
|
||||
site_name = self._detect_site(url)
|
||||
if not site_name:
|
||||
result['error'] = "Could not detect site from URL"
|
||||
return result
|
||||
|
||||
# Get site configuration
|
||||
site_config = self.config.get_site_config(site_name)
|
||||
if not site_config:
|
||||
result['error'] = f"No configuration found for site: {site_name}"
|
||||
return result
|
||||
|
||||
if not self.config.is_site_enabled(site_name):
|
||||
result['error'] = f"Site {site_name} is disabled"
|
||||
return result
|
||||
|
||||
# Fetch page content
|
||||
html_content = await self._fetch_page(url)
|
||||
if not html_content:
|
||||
result['error'] = "Failed to fetch page content"
|
||||
return result
|
||||
|
||||
# Parse HTML
|
||||
soup = BeautifulSoup(html_content, 'html.parser')
|
||||
|
||||
# Extract price
|
||||
price_selectors = site_config.get('selectors', {}).get('price', [])
|
||||
price = self._extract_price(soup, price_selectors)
|
||||
|
||||
if price is None:
|
||||
result['error'] = "Could not extract price from page"
|
||||
return result
|
||||
|
||||
# Extract additional information
|
||||
title_selectors = site_config.get('selectors', {}).get('title', [])
|
||||
title = self._extract_text(soup, title_selectors)
|
||||
|
||||
availability_selectors = site_config.get('selectors', {}).get('availability', [])
|
||||
availability_text = self._extract_text(soup, availability_selectors)
|
||||
availability = self._parse_availability(availability_text)
|
||||
|
||||
result.update({
|
||||
'success': True,
|
||||
'price': price,
|
||||
'title': title,
|
||||
'availability': availability
|
||||
})
|
||||
|
||||
logger.info(f"Successfully scraped {site_name}: ${price}")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error scraping {url}: {e}")
|
||||
result['error'] = str(e)
|
||||
|
||||
return result
|
||||
|
||||
def _parse_availability(self, availability_text: str) -> bool:
|
||||
"""Parse availability from text."""
|
||||
if not availability_text:
|
||||
return True # Assume available if no info
|
||||
|
||||
availability_text = availability_text.lower()
|
||||
|
||||
# Common out of stock indicators
|
||||
out_of_stock_indicators = [
|
||||
'out of stock', 'unavailable', 'sold out', 'not available',
|
||||
'temporarily out of stock', 'currently unavailable'
|
||||
]
|
||||
|
||||
for indicator in out_of_stock_indicators:
|
||||
if indicator in availability_text:
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
|
||||
class ScraperManager:
|
||||
"""Manages multiple price scrapers and coordinates scraping tasks."""
|
||||
|
||||
def __init__(self, config: Config):
|
||||
self.config = config
|
||||
self.semaphore = asyncio.Semaphore(config.max_concurrent_requests)
|
||||
|
||||
async def scrape_product(self, product: Dict[str, Any]) -> Dict[str, Dict[str, Any]]:
|
||||
"""Scrape prices for a single product across all configured sites."""
|
||||
product_id = product['id']
|
||||
urls = product['urls']
|
||||
|
||||
results = {}
|
||||
|
||||
async with PriceScraper(self.config) as scraper:
|
||||
tasks = []
|
||||
|
||||
for site_name, url in urls.items():
|
||||
if self.config.is_site_enabled(site_name):
|
||||
task = self._scrape_with_semaphore(scraper, url, site_name)
|
||||
tasks.append((site_name, task))
|
||||
|
||||
# Add delay between requests
|
||||
await asyncio.sleep(self.config.delay_between_requests)
|
||||
|
||||
# Wait for all tasks to complete
|
||||
for site_name, task in tasks:
|
||||
try:
|
||||
result = await task
|
||||
results[site_name] = result
|
||||
except Exception as e:
|
||||
logger.error(f"Error scraping {site_name} for product {product_id}: {e}")
|
||||
results[site_name] = {
|
||||
'success': False,
|
||||
'error': str(e)
|
||||
}
|
||||
|
||||
return results
|
||||
|
||||
async def _scrape_with_semaphore(self, scraper: PriceScraper, url: str, site_name: str):
|
||||
"""Scrape with semaphore to limit concurrent requests."""
|
||||
async with self.semaphore:
|
||||
return await scraper.scrape_product_price(url, site_name)
|
||||
|
||||
async def scrape_all_products(self, products: List[Dict[str, Any]]) -> Dict[int, Dict[str, Dict[str, Any]]]:
|
||||
"""Scrape prices for all products."""
|
||||
results = {}
|
||||
|
||||
for product in products:
|
||||
try:
|
||||
product_id = product['id']
|
||||
logger.info(f"Scraping product: {product['name']} (ID: {product_id})")
|
||||
|
||||
product_results = await self.scrape_product(product)
|
||||
results[product_id] = product_results
|
||||
|
||||
# Add delay between products
|
||||
await asyncio.sleep(self.config.delay_between_requests)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error scraping product {product.get('id', 'unknown')}: {e}")
|
||||
|
||||
return results
|
||||
139
src/scraper_manager.py
Normal file
139
src/scraper_manager.py
Normal file
@@ -0,0 +1,139 @@
|
||||
"""
|
||||
Scraper manager for coordinating price scraping tasks
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
from typing import Dict, List, Any
|
||||
from .scraper import ScraperManager as BaseScraper
|
||||
from .uk_scraper import UKCateringScraper
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class ScraperManager(BaseScraper):
|
||||
"""Enhanced scraper manager with additional coordination features."""
|
||||
|
||||
def __init__(self, config):
|
||||
super().__init__(config)
|
||||
self.active_tasks = {}
|
||||
|
||||
async def scrape_product_by_id(self, product_id: int, product_data: Dict[str, Any]) -> Dict[str, Dict[str, Any]]:
|
||||
"""Scrape a specific product by ID with task tracking."""
|
||||
if product_id in self.active_tasks:
|
||||
logger.info(f"Product {product_id} is already being scraped")
|
||||
return await self.active_tasks[product_id]
|
||||
|
||||
# Create and track the scraping task
|
||||
task = asyncio.create_task(self.scrape_product(product_data))
|
||||
self.active_tasks[product_id] = task
|
||||
|
||||
try:
|
||||
result = await task
|
||||
return result
|
||||
finally:
|
||||
# Clean up completed task
|
||||
if product_id in self.active_tasks:
|
||||
del self.active_tasks[product_id]
|
||||
|
||||
async def cancel_product_scraping(self, product_id: int) -> bool:
|
||||
"""Cancel scraping for a specific product."""
|
||||
if product_id in self.active_tasks:
|
||||
task = self.active_tasks[product_id]
|
||||
task.cancel()
|
||||
try:
|
||||
await task
|
||||
except asyncio.CancelledError:
|
||||
pass
|
||||
del self.active_tasks[product_id]
|
||||
logger.info(f"Cancelled scraping for product {product_id}")
|
||||
return True
|
||||
return False
|
||||
|
||||
def get_active_scraping_tasks(self) -> List[int]:
|
||||
"""Get list of product IDs currently being scraped."""
|
||||
return list(self.active_tasks.keys())
|
||||
|
||||
async def health_check(self) -> Dict[str, Any]:
|
||||
"""Perform a health check on the scraping system."""
|
||||
health_status = {
|
||||
'status': 'healthy',
|
||||
'active_tasks': len(self.active_tasks),
|
||||
'enabled_sites': len(self.config.get_enabled_sites()),
|
||||
'site_checks': {}
|
||||
}
|
||||
|
||||
# Test each enabled site with a simple request
|
||||
enabled_sites = self.config.get_enabled_sites()
|
||||
|
||||
for site_name in enabled_sites:
|
||||
site_config = self.config.get_site_config(site_name)
|
||||
base_url = site_config.get('base_url', '')
|
||||
|
||||
try:
|
||||
from .scraper import PriceScraper
|
||||
async with PriceScraper(self.config) as scraper:
|
||||
html_content = await scraper._fetch_page(base_url)
|
||||
if html_content:
|
||||
health_status['site_checks'][site_name] = 'accessible'
|
||||
else:
|
||||
health_status['site_checks'][site_name] = 'inaccessible'
|
||||
except Exception as e:
|
||||
health_status['site_checks'][site_name] = f'error: {str(e)}'
|
||||
|
||||
# Determine overall health
|
||||
failed_sites = [site for site, status in health_status['site_checks'].items()
|
||||
if status != 'accessible']
|
||||
|
||||
if len(failed_sites) == len(enabled_sites):
|
||||
health_status['status'] = 'unhealthy'
|
||||
elif failed_sites:
|
||||
health_status['status'] = 'degraded'
|
||||
|
||||
return health_status
|
||||
|
||||
async def scrape_product(self, product: Dict[str, Any]) -> Dict[str, Dict[str, Any]]:
|
||||
"""Scrape prices for a single product across all configured sites."""
|
||||
product_id = product['id']
|
||||
urls = product['urls']
|
||||
|
||||
results = {}
|
||||
|
||||
# Determine which scraper to use based on the sites
|
||||
uk_catering_sites = {'jjfoodservice', 'atoz_catering', 'amazon_uk'}
|
||||
has_uk_sites = any(site in uk_catering_sites for site in urls.keys())
|
||||
|
||||
if has_uk_sites:
|
||||
# Use UK catering scraper
|
||||
async with UKCateringScraper(self.config) as scraper:
|
||||
tasks = []
|
||||
|
||||
for site_name, url in urls.items():
|
||||
if self.config.is_site_enabled(site_name):
|
||||
task = self._scrape_with_semaphore_uk(scraper, url, site_name)
|
||||
tasks.append((site_name, task))
|
||||
|
||||
# Add delay between requests
|
||||
await asyncio.sleep(self.config.delay_between_requests)
|
||||
|
||||
# Wait for all tasks to complete
|
||||
for site_name, task in tasks:
|
||||
try:
|
||||
result = await task
|
||||
results[site_name] = result
|
||||
except Exception as e:
|
||||
logger.error(f"Error scraping {site_name} for product {product_id}: {e}")
|
||||
results[site_name] = {
|
||||
'success': False,
|
||||
'error': str(e)
|
||||
}
|
||||
else:
|
||||
# Use standard scraper for other sites
|
||||
results = await super().scrape_product(product)
|
||||
|
||||
return results
|
||||
|
||||
async def _scrape_with_semaphore_uk(self, scraper: UKCateringScraper, url: str, site_name: str):
|
||||
"""Scrape with semaphore using UK scraper."""
|
||||
async with self.semaphore:
|
||||
return await scraper.scrape_product_price(url, site_name)
|
||||
332
src/uk_scraper.py
Normal file
332
src/uk_scraper.py
Normal file
@@ -0,0 +1,332 @@
|
||||
"""
|
||||
Specialized scrapers for UK catering supply sites
|
||||
"""
|
||||
|
||||
import re
|
||||
import logging
|
||||
from typing import Dict, Any, Optional
|
||||
from bs4 import BeautifulSoup
|
||||
from .scraper import PriceScraper
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class UKCateringScraper(PriceScraper):
|
||||
"""Specialized scraper for UK catering supply websites."""
|
||||
|
||||
def _parse_uk_price(self, price_text: str) -> Optional[float]:
|
||||
"""Parse UK price format with £ symbol."""
|
||||
if not price_text:
|
||||
return None
|
||||
|
||||
# Remove common text and normalize
|
||||
price_text = price_text.lower()
|
||||
price_text = re.sub(r'delivery:|collection:|was:|now:|offer:|from:', '', price_text)
|
||||
|
||||
# Find price with £ symbol
|
||||
price_match = re.search(r'£(\d+\.?\d*)', price_text)
|
||||
if price_match:
|
||||
try:
|
||||
return float(price_match.group(1))
|
||||
except ValueError:
|
||||
pass
|
||||
|
||||
# Try without £ symbol but with decimal
|
||||
price_match = re.search(r'(\d+\.\d{2})', price_text)
|
||||
if price_match:
|
||||
try:
|
||||
return float(price_match.group(1))
|
||||
except ValueError:
|
||||
pass
|
||||
|
||||
return None
|
||||
|
||||
def _extract_jjfoodservice_data(self, soup: BeautifulSoup) -> Dict[str, Any]:
|
||||
"""Extract data specifically from JJ Food Service."""
|
||||
result = {
|
||||
'price': None,
|
||||
'title': None,
|
||||
'availability': True,
|
||||
'currency': 'GBP'
|
||||
}
|
||||
|
||||
# Try multiple selectors for price
|
||||
price_selectors = [
|
||||
'.price',
|
||||
'.product-price',
|
||||
'[data-testid="price"]',
|
||||
'.price-value',
|
||||
'.current-price',
|
||||
'.product-card-price',
|
||||
'span:contains("£")',
|
||||
'.cost'
|
||||
]
|
||||
|
||||
for selector in price_selectors:
|
||||
try:
|
||||
elements = soup.select(selector)
|
||||
for element in elements:
|
||||
price_text = element.get_text(strip=True)
|
||||
price = self._parse_uk_price(price_text)
|
||||
if price is not None:
|
||||
result['price'] = price
|
||||
logger.info(f"Successfully scraped jjfoodservice: £{price}")
|
||||
break
|
||||
if result['price'] is not None:
|
||||
break
|
||||
except Exception as e:
|
||||
logger.debug(f"Error with JJ Food Service price selector {selector}: {e}")
|
||||
|
||||
# Try to extract title
|
||||
title_selectors = [
|
||||
'h1',
|
||||
'.product-title',
|
||||
'.product-name',
|
||||
'[data-testid="product-title"]',
|
||||
'.product-card-title',
|
||||
'title'
|
||||
]
|
||||
|
||||
for selector in title_selectors:
|
||||
try:
|
||||
element = soup.select_one(selector)
|
||||
if element:
|
||||
result['title'] = element.get_text(strip=True)
|
||||
break
|
||||
except Exception as e:
|
||||
logger.debug(f"Error with JJ Food Service title selector {selector}: {e}")
|
||||
|
||||
# Check availability
|
||||
availability_indicators = [
|
||||
'out of stock',
|
||||
'unavailable',
|
||||
'not available',
|
||||
'temporarily unavailable'
|
||||
]
|
||||
|
||||
page_text = soup.get_text().lower()
|
||||
for indicator in availability_indicators:
|
||||
if indicator in page_text:
|
||||
result['availability'] = False
|
||||
break
|
||||
|
||||
return result
|
||||
|
||||
def _extract_atoz_catering_data(self, soup: BeautifulSoup) -> Dict[str, Any]:
|
||||
"""Extract data specifically from A to Z Catering."""
|
||||
result = {
|
||||
'price': None,
|
||||
'title': None,
|
||||
'availability': True,
|
||||
'currency': 'GBP'
|
||||
}
|
||||
|
||||
# A to Z Catering specific selectors
|
||||
price_selectors = [
|
||||
'.price',
|
||||
'.product-price',
|
||||
'.delivery-price',
|
||||
'.collection-price',
|
||||
'span:contains("£")',
|
||||
'.price-value',
|
||||
'.cost',
|
||||
'.selling-price'
|
||||
]
|
||||
|
||||
for selector in price_selectors:
|
||||
try:
|
||||
elements = soup.select(selector)
|
||||
for element in elements:
|
||||
price_text = element.get_text(strip=True)
|
||||
# Skip if it contains "delivery" or "collection" but no price
|
||||
if ('delivery' in price_text.lower() or 'collection' in price_text.lower()) and '£' not in price_text:
|
||||
continue
|
||||
|
||||
price = self._parse_uk_price(price_text)
|
||||
if price is not None:
|
||||
result['price'] = price
|
||||
logger.info(f"Successfully scraped atoz_catering: £{price}")
|
||||
break
|
||||
if result['price'] is not None:
|
||||
break
|
||||
except Exception as e:
|
||||
logger.debug(f"Error with A to Z price selector {selector}: {e}")
|
||||
|
||||
# Extract title
|
||||
title_selectors = [
|
||||
'h1',
|
||||
'.product-title',
|
||||
'.product-name',
|
||||
'a[href*="/products/product/"]',
|
||||
'.product-link',
|
||||
'title'
|
||||
]
|
||||
|
||||
for selector in title_selectors:
|
||||
try:
|
||||
element = soup.select_one(selector)
|
||||
if element:
|
||||
result['title'] = element.get_text(strip=True)
|
||||
break
|
||||
except Exception as e:
|
||||
logger.debug(f"Error with A to Z title selector {selector}: {e}")
|
||||
|
||||
# Check availability - A to Z specific indicators
|
||||
availability_indicators = [
|
||||
'out of stock',
|
||||
'unavailable',
|
||||
'not available',
|
||||
'temporarily unavailable',
|
||||
'contact us for availability'
|
||||
]
|
||||
|
||||
page_text = soup.get_text().lower()
|
||||
for indicator in availability_indicators:
|
||||
if indicator in page_text:
|
||||
result['availability'] = False
|
||||
break
|
||||
|
||||
# Check if "Add to Basket" button is present (indicates availability)
|
||||
add_to_basket = soup.select_one('.add-to-basket, button:contains("Add To Basket")')
|
||||
if not add_to_basket and result['availability']:
|
||||
# If no add to basket button and no explicit availability info, assume unavailable
|
||||
out_of_stock_indicators = soup.select('.out-of-stock, .unavailable')
|
||||
if out_of_stock_indicators:
|
||||
result['availability'] = False
|
||||
|
||||
return result
|
||||
|
||||
def _extract_amazon_uk_data(self, soup: BeautifulSoup) -> Dict[str, Any]:
|
||||
"""Extract data specifically from Amazon UK."""
|
||||
result = {
|
||||
'price': None,
|
||||
'title': None,
|
||||
'availability': True,
|
||||
'currency': 'GBP'
|
||||
}
|
||||
|
||||
# Amazon UK price selectors
|
||||
price_selectors = [
|
||||
'.a-price-whole',
|
||||
'.a-price .a-offscreen',
|
||||
'#priceblock_dealprice',
|
||||
'#priceblock_ourprice',
|
||||
'.a-price-range',
|
||||
'.a-price.a-text-price.a-size-medium.apexPriceToPay',
|
||||
'.a-price-current',
|
||||
'span.a-price.a-text-price.a-size-medium'
|
||||
]
|
||||
|
||||
for selector in price_selectors:
|
||||
try:
|
||||
elements = soup.select(selector)
|
||||
for element in elements:
|
||||
price_text = element.get_text(strip=True)
|
||||
price = self._parse_uk_price(price_text)
|
||||
if price is not None:
|
||||
result['price'] = price
|
||||
break
|
||||
if result['price'] is not None:
|
||||
break
|
||||
except Exception as e:
|
||||
logger.debug(f"Error with Amazon UK price selector {selector}: {e}")
|
||||
|
||||
# Extract title
|
||||
title_selectors = [
|
||||
'#productTitle',
|
||||
'.product-title',
|
||||
'h1.a-size-large',
|
||||
'h1'
|
||||
]
|
||||
|
||||
for selector in title_selectors:
|
||||
try:
|
||||
element = soup.select_one(selector)
|
||||
if element:
|
||||
result['title'] = element.get_text(strip=True)
|
||||
break
|
||||
except Exception as e:
|
||||
logger.debug(f"Error with Amazon UK title selector {selector}: {e}")
|
||||
|
||||
# Check availability
|
||||
availability_selectors = [
|
||||
'#availability span',
|
||||
'.a-size-medium.a-color-success',
|
||||
'.a-size-medium.a-color-state',
|
||||
'#availability .a-declarative'
|
||||
]
|
||||
|
||||
for selector in availability_selectors:
|
||||
try:
|
||||
element = soup.select_one(selector)
|
||||
if element:
|
||||
availability_text = element.get_text().lower()
|
||||
if any(phrase in availability_text for phrase in ['out of stock', 'unavailable', 'not available']):
|
||||
result['availability'] = False
|
||||
break
|
||||
except Exception as e:
|
||||
logger.debug(f"Error with Amazon UK availability selector {selector}: {e}")
|
||||
|
||||
return result
|
||||
|
||||
async def scrape_product(self, product_data: Dict[str, Any]) -> Dict[str, Dict[str, Any]]:
|
||||
"""Scrape prices for a product from all configured sites."""
|
||||
results = {}
|
||||
urls = product_data.get('urls', {})
|
||||
|
||||
for site_name, url in urls.items():
|
||||
try:
|
||||
# Only process sites we support
|
||||
if site_name not in ['jjfoodservice', 'atoz_catering', 'amazon_uk']:
|
||||
logger.warning(f"Skipping unsupported site: {site_name}")
|
||||
continue
|
||||
|
||||
html_content = await self._fetch_page(url)
|
||||
if not html_content:
|
||||
results[site_name] = {
|
||||
'success': False,
|
||||
'error': 'Failed to fetch page',
|
||||
'price': None,
|
||||
'currency': 'GBP'
|
||||
}
|
||||
continue
|
||||
|
||||
soup = BeautifulSoup(html_content, 'html.parser')
|
||||
|
||||
# Route to appropriate extraction method
|
||||
if site_name == 'jjfoodservice':
|
||||
extracted_data = self._extract_jjfoodservice_data(soup)
|
||||
elif site_name == 'atoz_catering':
|
||||
extracted_data = self._extract_atoz_catering_data(soup)
|
||||
elif site_name == 'amazon_uk':
|
||||
extracted_data = self._extract_amazon_uk_data(soup)
|
||||
else:
|
||||
# Fallback to generic extraction
|
||||
extracted_data = self._extract_generic_data(soup, site_name)
|
||||
|
||||
if extracted_data['price'] is not None:
|
||||
results[site_name] = {
|
||||
'success': True,
|
||||
'price': extracted_data['price'],
|
||||
'currency': extracted_data['currency'],
|
||||
'title': extracted_data.get('title'),
|
||||
'availability': extracted_data.get('availability', True)
|
||||
}
|
||||
else:
|
||||
results[site_name] = {
|
||||
'success': False,
|
||||
'error': 'Could not extract price',
|
||||
'price': None,
|
||||
'currency': 'GBP'
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error scraping {site_name}: {e}")
|
||||
results[site_name] = {
|
||||
'success': False,
|
||||
'error': str(e),
|
||||
'price': None,
|
||||
'currency': 'GBP'
|
||||
}
|
||||
|
||||
return results
|
||||
515
src/uk_scraper_old.py
Normal file
515
src/uk_scraper_old.py
Normal file
@@ -0,0 +1,515 @@
|
||||
"""
|
||||
Specialized scrapers for UK catering supply sites
|
||||
"""
|
||||
|
||||
import re
|
||||
import logging
|
||||
from typing import Dict, Any, Optional
|
||||
from bs4 import BeautifulSoup
|
||||
from .scraper import PriceScraper
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class UKCateringScraper(PriceScraper):
|
||||
"""Specialized scraper for UK catering supply websites."""
|
||||
|
||||
def _parse_uk_price(self, price_text: str) -> Optional[float]:
|
||||
"""Parse UK price format with £ symbol."""
|
||||
if not price_text:
|
||||
return None
|
||||
|
||||
# Remove common text and normalize
|
||||
price_text = price_text.lower()
|
||||
price_text = re.sub(r'delivery:|collection:|was:|now:|offer:|from:', '', price_text)
|
||||
|
||||
# Find price with £ symbol
|
||||
price_match = re.search(r'£(\d+\.?\d*)', price_text)
|
||||
if price_match:
|
||||
try:
|
||||
return float(price_match.group(1))
|
||||
except ValueError:
|
||||
pass
|
||||
|
||||
# Try without £ symbol but with decimal
|
||||
price_match = re.search(r'(\d+\.\d{2})', price_text)
|
||||
if price_match:
|
||||
try:
|
||||
return float(price_match.group(1))
|
||||
except ValueError:
|
||||
pass
|
||||
|
||||
return None
|
||||
|
||||
def _extract_jjfoodservice_data(self, soup: BeautifulSoup) -> Dict[str, Any]:
|
||||
"""Extract data specifically from JJ Food Service."""
|
||||
result = {
|
||||
'price': None,
|
||||
'title': None,
|
||||
'availability': True,
|
||||
'currency': 'GBP'
|
||||
}
|
||||
|
||||
# Try multiple selectors for price
|
||||
price_selectors = [
|
||||
'.price',
|
||||
'.product-price',
|
||||
'[data-testid="price"]',
|
||||
'.price-value',
|
||||
'.current-price',
|
||||
'.product-card-price',
|
||||
'span:contains("£")',
|
||||
'.cost'
|
||||
]
|
||||
|
||||
for selector in price_selectors:
|
||||
try:
|
||||
elements = soup.select(selector)
|
||||
for element in elements:
|
||||
price_text = element.get_text(strip=True)
|
||||
price = self._parse_uk_price(price_text)
|
||||
if price is not None:
|
||||
result['price'] = price
|
||||
break
|
||||
if result['price'] is not None:
|
||||
break
|
||||
except Exception as e:
|
||||
logger.debug(f"Error with JJ Food Service price selector {selector}: {e}")
|
||||
|
||||
# Try to extract title
|
||||
title_selectors = [
|
||||
'h1',
|
||||
'.product-title',
|
||||
'.product-name',
|
||||
'[data-testid="product-title"]',
|
||||
'.product-card-title',
|
||||
'title'
|
||||
]
|
||||
|
||||
for selector in title_selectors:
|
||||
try:
|
||||
element = soup.select_one(selector)
|
||||
if element:
|
||||
result['title'] = element.get_text(strip=True)
|
||||
break
|
||||
except Exception as e:
|
||||
logger.debug(f"Error with JJ Food Service title selector {selector}: {e}")
|
||||
|
||||
# Check availability
|
||||
availability_indicators = [
|
||||
'out of stock',
|
||||
'unavailable',
|
||||
'not available',
|
||||
'sold out'
|
||||
]
|
||||
|
||||
page_text = soup.get_text().lower()
|
||||
for indicator in availability_indicators:
|
||||
if indicator in page_text:
|
||||
result['availability'] = False
|
||||
break
|
||||
|
||||
return result
|
||||
|
||||
def _extract_atoz_data(self, soup: BeautifulSoup) -> Dict[str, Any]:
|
||||
"""Extract data specifically from A to Z Catering."""
|
||||
result = {
|
||||
'price': None,
|
||||
'title': None,
|
||||
'availability': True,
|
||||
'currency': 'GBP'
|
||||
}
|
||||
|
||||
# A to Z Catering shows prices like "Delivery:£X.XX Collection:£Y.YY"
|
||||
# We'll prioritize the lower price (usually collection)
|
||||
|
||||
price_text = soup.get_text()
|
||||
|
||||
# Look for delivery and collection prices
|
||||
delivery_match = re.search(r'delivery:?\s*£(\d+\.?\d*)', price_text, re.IGNORECASE)
|
||||
collection_match = re.search(r'collection:?\s*£(\d+\.?\d*)', price_text, re.IGNORECASE)
|
||||
|
||||
prices = []
|
||||
if delivery_match:
|
||||
try:
|
||||
prices.append(float(delivery_match.group(1)))
|
||||
except ValueError:
|
||||
pass
|
||||
|
||||
if collection_match:
|
||||
try:
|
||||
prices.append(float(collection_match.group(1)))
|
||||
except ValueError:
|
||||
pass
|
||||
|
||||
# If we found prices, use the lowest one
|
||||
if prices:
|
||||
result['price'] = min(prices)
|
||||
else:
|
||||
# Fallback to general price extraction
|
||||
price_selectors = [
|
||||
'.price',
|
||||
'.product-price',
|
||||
'span:contains("£")',
|
||||
'.price-value'
|
||||
]
|
||||
|
||||
for selector in price_selectors:
|
||||
try:
|
||||
elements = soup.select(selector)
|
||||
for element in elements:
|
||||
price_text = element.get_text(strip=True)
|
||||
price = self._parse_uk_price(price_text)
|
||||
if price is not None:
|
||||
result['price'] = price
|
||||
break
|
||||
if result['price'] is not None:
|
||||
break
|
||||
except Exception as e:
|
||||
logger.debug(f"Error with A to Z price selector {selector}: {e}")
|
||||
|
||||
# Extract title - A to Z often has product names in links
|
||||
title_selectors = [
|
||||
'h1',
|
||||
'.product-title',
|
||||
'.product-name',
|
||||
'a[href*="/products/product/"]',
|
||||
'.product-link',
|
||||
'title'
|
||||
]
|
||||
|
||||
for selector in title_selectors:
|
||||
try:
|
||||
element = soup.select_one(selector)
|
||||
if element:
|
||||
title = element.get_text(strip=True)
|
||||
# Clean up the title
|
||||
if len(title) > 5 and 'A to Z' not in title:
|
||||
result['title'] = title
|
||||
break
|
||||
except Exception as e:
|
||||
logger.debug(f"Error with A to Z title selector {selector}: {e}")
|
||||
|
||||
# Check availability - look for "Add To Basket" button
|
||||
add_to_basket = soup.find(text=re.compile('Add To Basket', re.IGNORECASE))
|
||||
if not add_to_basket:
|
||||
# Also check for out of stock indicators
|
||||
out_of_stock_indicators = [
|
||||
'out of stock',
|
||||
'unavailable',
|
||||
'not available',
|
||||
'sold out'
|
||||
]
|
||||
|
||||
page_text = soup.get_text().lower()
|
||||
for indicator in out_of_stock_indicators:
|
||||
if indicator in page_text:
|
||||
result['availability'] = False
|
||||
break
|
||||
|
||||
return result
|
||||
|
||||
def _extract_amazon_uk_data(self, soup: BeautifulSoup) -> Dict[str, Any]:
|
||||
"""Extract data specifically from Amazon UK."""
|
||||
result = {
|
||||
'price': None,
|
||||
'title': None,
|
||||
'availability': True,
|
||||
'currency': 'GBP'
|
||||
}
|
||||
|
||||
# Amazon UK price selectors
|
||||
price_selectors = [
|
||||
'.a-price-whole',
|
||||
'.a-price .a-offscreen',
|
||||
'.a-price-current .a-offscreen',
|
||||
'#priceblock_dealprice',
|
||||
'#priceblock_ourprice',
|
||||
'.a-price-range',
|
||||
'.a-price.a-text-price.a-size-medium.apexPriceToPay .a-offscreen'
|
||||
]
|
||||
|
||||
for selector in price_selectors:
|
||||
try:
|
||||
elements = soup.select(selector)
|
||||
for element in elements:
|
||||
price_text = element.get_text(strip=True)
|
||||
price = self._parse_uk_price(price_text)
|
||||
if price is not None:
|
||||
result['price'] = price
|
||||
break
|
||||
if result['price'] is not None:
|
||||
break
|
||||
except Exception as e:
|
||||
logger.debug(f"Error with Amazon UK price selector {selector}: {e}")
|
||||
|
||||
# Extract title
|
||||
title_selectors = [
|
||||
'#productTitle',
|
||||
'.product-title',
|
||||
'h1.a-size-large'
|
||||
]
|
||||
|
||||
for selector in title_selectors:
|
||||
try:
|
||||
element = soup.select_one(selector)
|
||||
if element:
|
||||
result['title'] = element.get_text(strip=True)
|
||||
break
|
||||
except Exception as e:
|
||||
logger.debug(f"Error with Amazon UK title selector {selector}: {e}")
|
||||
|
||||
# Check availability
|
||||
availability_text = soup.get_text().lower()
|
||||
if any(phrase in availability_text for phrase in ['out of stock', 'currently unavailable', 'not available']):
|
||||
result['availability'] = False
|
||||
|
||||
return result
|
||||
|
||||
def _extract_tesco_data(self, soup: BeautifulSoup) -> Dict[str, Any]:
|
||||
"""Extract data specifically from Tesco."""
|
||||
result = {
|
||||
'price': None,
|
||||
'title': None,
|
||||
'availability': True,
|
||||
'currency': 'GBP'
|
||||
}
|
||||
|
||||
# Tesco price selectors
|
||||
price_selectors = [
|
||||
'.price-control-wrapper .value',
|
||||
'.price-per-sellable-unit .value',
|
||||
'.price-per-quantity-weight .value',
|
||||
'[data-testid="price-current-value"]',
|
||||
'.price-current',
|
||||
'.product-price .price'
|
||||
]
|
||||
|
||||
for selector in price_selectors:
|
||||
try:
|
||||
elements = soup.select(selector)
|
||||
for element in elements:
|
||||
price_text = element.get_text(strip=True)
|
||||
price = self._parse_uk_price(price_text)
|
||||
if price is not None:
|
||||
result['price'] = price
|
||||
break
|
||||
if result['price'] is not None:
|
||||
break
|
||||
except Exception as e:
|
||||
logger.debug(f"Error with Tesco price selector {selector}: {e}")
|
||||
|
||||
# Extract title
|
||||
title_selectors = [
|
||||
'h1[data-testid="product-title"]',
|
||||
'.product-details-tile h1',
|
||||
'.product-title',
|
||||
'h1.product-name'
|
||||
]
|
||||
|
||||
for selector in title_selectors:
|
||||
try:
|
||||
element = soup.select_one(selector)
|
||||
if element:
|
||||
result['title'] = element.get_text(strip=True)
|
||||
break
|
||||
except Exception as e:
|
||||
logger.debug(f"Error with Tesco title selector {selector}: {e}")
|
||||
|
||||
return result
|
||||
|
||||
def _extract_sainsburys_data(self, soup: BeautifulSoup) -> Dict[str, Any]:
|
||||
"""Extract data specifically from Sainsburys."""
|
||||
result = {
|
||||
'price': None,
|
||||
'title': None,
|
||||
'availability': True,
|
||||
'currency': 'GBP'
|
||||
}
|
||||
|
||||
# Sainsburys price selectors
|
||||
price_selectors = [
|
||||
'.pd__cost__current-price',
|
||||
'.pd__cost .pd__cost__retail-price',
|
||||
'.pricing__now-price',
|
||||
'.product-price__current',
|
||||
'[data-testid="pd-retail-price"]',
|
||||
'.price-per-unit'
|
||||
]
|
||||
|
||||
for selector in price_selectors:
|
||||
try:
|
||||
elements = soup.select(selector)
|
||||
for element in elements:
|
||||
price_text = element.get_text(strip=True)
|
||||
price = self._parse_uk_price(price_text)
|
||||
if price is not None:
|
||||
result['price'] = price
|
||||
break
|
||||
if result['price'] is not None:
|
||||
break
|
||||
except Exception as e:
|
||||
logger.debug(f"Error with Sainsburys price selector {selector}: {e}")
|
||||
|
||||
# Extract title
|
||||
title_selectors = [
|
||||
'.pd__header h1',
|
||||
'h1[data-testid="pd-product-name"]',
|
||||
'.product-name',
|
||||
'.pd__product-name'
|
||||
]
|
||||
|
||||
for selector in title_selectors:
|
||||
try:
|
||||
element = soup.select_one(selector)
|
||||
if element:
|
||||
result['title'] = element.get_text(strip=True)
|
||||
break
|
||||
except Exception as e:
|
||||
logger.debug(f"Error with Sainsburys title selector {selector}: {e}")
|
||||
|
||||
return result
|
||||
|
||||
def _extract_booker_data(self, soup: BeautifulSoup) -> Dict[str, Any]:
|
||||
"""Extract data specifically from Booker."""
|
||||
result = {
|
||||
'price': None,
|
||||
'title': None,
|
||||
'availability': True,
|
||||
'currency': 'GBP'
|
||||
}
|
||||
|
||||
# Booker price selectors
|
||||
price_selectors = [
|
||||
'.price',
|
||||
'.product-price',
|
||||
'.price-current',
|
||||
'.selling-price',
|
||||
'[data-testid="price"]',
|
||||
'.product-tile-price'
|
||||
]
|
||||
|
||||
for selector in price_selectors:
|
||||
try:
|
||||
elements = soup.select(selector)
|
||||
for element in elements:
|
||||
price_text = element.get_text(strip=True)
|
||||
price = self._parse_uk_price(price_text)
|
||||
if price is not None:
|
||||
result['price'] = price
|
||||
break
|
||||
if result['price'] is not None:
|
||||
break
|
||||
except Exception as e:
|
||||
logger.debug(f"Error with Booker price selector {selector}: {e}")
|
||||
|
||||
# Extract title
|
||||
title_selectors = [
|
||||
'h1',
|
||||
'.product-title',
|
||||
'.product-name',
|
||||
'.product-description h1',
|
||||
'[data-testid="product-title"]'
|
||||
]
|
||||
|
||||
for selector in title_selectors:
|
||||
try:
|
||||
element = soup.select_one(selector)
|
||||
if element:
|
||||
result['title'] = element.get_text(strip=True)
|
||||
break
|
||||
except Exception as e:
|
||||
logger.debug(f"Error with Booker title selector {selector}: {e}")
|
||||
|
||||
return result
|
||||
|
||||
async def scrape_product_price(self, url: str, site_name: str = None) -> Dict[str, Any]:
|
||||
"""Enhanced scraping for UK catering sites."""
|
||||
result = {
|
||||
'success': False,
|
||||
'price': None,
|
||||
'currency': 'GBP',
|
||||
'title': None,
|
||||
'availability': None,
|
||||
'url': url,
|
||||
'error': None
|
||||
}
|
||||
|
||||
try:
|
||||
# Auto-detect site if not provided
|
||||
if not site_name:
|
||||
site_name = self._detect_site(url)
|
||||
if not site_name:
|
||||
result['error'] = "Could not detect site from URL"
|
||||
return result
|
||||
|
||||
# Check if site is enabled
|
||||
if not self.config.is_site_enabled(site_name):
|
||||
result['error'] = f"Site {site_name} is disabled"
|
||||
return result
|
||||
|
||||
# Fetch page content
|
||||
html_content = await self._fetch_page(url)
|
||||
if not html_content:
|
||||
result['error'] = "Failed to fetch page content"
|
||||
return result
|
||||
|
||||
# Parse HTML
|
||||
soup = BeautifulSoup(html_content, 'html.parser')
|
||||
|
||||
# Use specialized extraction based on site
|
||||
if site_name == 'jjfoodservice':
|
||||
extracted_data = self._extract_jjfoodservice_data(soup)
|
||||
elif site_name == 'atoz_catering':
|
||||
extracted_data = self._extract_atoz_data(soup)
|
||||
elif site_name == 'amazon_uk':
|
||||
extracted_data = self._extract_amazon_uk_data(soup)
|
||||
elif site_name == 'tesco':
|
||||
extracted_data = self._extract_tesco_data(soup)
|
||||
elif site_name == 'sainsburys':
|
||||
extracted_data = self._extract_sainsburys_data(soup)
|
||||
elif site_name == 'booker':
|
||||
extracted_data = self._extract_booker_data(soup)
|
||||
else:
|
||||
# Fall back to general extraction
|
||||
return await super().scrape_product_price(url, site_name)
|
||||
|
||||
if extracted_data['price'] is None:
|
||||
result['error'] = "Could not extract price from page"
|
||||
return result
|
||||
|
||||
result.update({
|
||||
'success': True,
|
||||
'price': extracted_data['price'],
|
||||
'currency': extracted_data.get('currency', 'GBP'),
|
||||
'title': extracted_data.get('title'),
|
||||
'availability': extracted_data.get('availability', True)
|
||||
})
|
||||
|
||||
logger.info(f"Successfully scraped {site_name}: £{extracted_data['price']}")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error scraping {url}: {e}")
|
||||
result['error'] = str(e)
|
||||
|
||||
return result
|
||||
|
||||
def _detect_site(self, url: str) -> Optional[str]:
|
||||
"""Detect which UK catering site this URL belongs to."""
|
||||
url_lower = url.lower()
|
||||
|
||||
if 'jjfoodservice.com' in url_lower:
|
||||
return 'jjfoodservice'
|
||||
elif 'atoz-catering.co.uk' in url_lower:
|
||||
return 'atoz_catering'
|
||||
elif 'amazon.co.uk' in url_lower:
|
||||
return 'amazon_uk'
|
||||
elif 'tesco.com' in url_lower:
|
||||
return 'tesco'
|
||||
elif 'sainsburys.co.uk' in url_lower:
|
||||
return 'sainsburys'
|
||||
elif 'booker.co.uk' in url_lower:
|
||||
return 'booker'
|
||||
|
||||
# Fall back to parent detection for other sites
|
||||
return super()._detect_site(url)
|
||||
118
src/utils.py
Normal file
118
src/utils.py
Normal file
@@ -0,0 +1,118 @@
|
||||
"""
|
||||
Utility functions for the price tracker
|
||||
"""
|
||||
|
||||
import logging
|
||||
from typing import Dict, Any, List
|
||||
from datetime import datetime, timedelta
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def format_price(price: float, currency: str = 'GBP') -> str:
|
||||
"""Format price with appropriate currency symbol."""
|
||||
if currency == 'GBP':
|
||||
return f"£{price:.2f}"
|
||||
elif currency == 'USD':
|
||||
return f"${price:.2f}"
|
||||
elif currency == 'EUR':
|
||||
return f"€{price:.2f}"
|
||||
else:
|
||||
return f"{price:.2f} {currency}"
|
||||
|
||||
|
||||
def calculate_price_change(old_price: float, new_price: float) -> Dict[str, Any]:
|
||||
"""Calculate price change percentage and direction."""
|
||||
if old_price == 0:
|
||||
return {
|
||||
'change': 0.0,
|
||||
'percentage': 0.0,
|
||||
'direction': 'stable'
|
||||
}
|
||||
|
||||
change = new_price - old_price
|
||||
percentage = (change / old_price) * 100
|
||||
|
||||
if percentage > 0.1:
|
||||
direction = 'up'
|
||||
elif percentage < -0.1:
|
||||
direction = 'down'
|
||||
else:
|
||||
direction = 'stable'
|
||||
|
||||
return {
|
||||
'change': change,
|
||||
'percentage': percentage,
|
||||
'direction': direction
|
||||
}
|
||||
|
||||
|
||||
def is_site_accessible(site_name: str, last_success: datetime = None) -> bool:
|
||||
"""Check if a site is likely accessible based on recent success."""
|
||||
if not last_success:
|
||||
return True # Assume accessible if no data
|
||||
|
||||
# Consider site inaccessible if no success in last 24 hours
|
||||
return (datetime.now() - last_success) < timedelta(hours=24)
|
||||
|
||||
|
||||
def get_retry_delay(attempt: int, base_delay: float = 1.0, max_delay: float = 60.0) -> float:
|
||||
"""Calculate exponential backoff delay with jitter."""
|
||||
import random
|
||||
delay = min(base_delay * (2 ** attempt), max_delay)
|
||||
jitter = random.uniform(0, delay * 0.1) # Add 10% jitter
|
||||
return delay + jitter
|
||||
|
||||
|
||||
def clean_product_name(name: str) -> str:
|
||||
"""Clean and normalize product name."""
|
||||
import re
|
||||
# Remove extra whitespace and normalize
|
||||
name = re.sub(r'\s+', ' ', name.strip())
|
||||
# Remove special characters that might cause issues
|
||||
name = re.sub(r'[^\w\s\-\(\)&]', '', name)
|
||||
return name
|
||||
|
||||
|
||||
def is_valid_price(price: float) -> bool:
|
||||
"""Check if a price is valid (positive and reasonable)."""
|
||||
return price > 0 and price < 10000 # Max £10,000 seems reasonable for catering supplies
|
||||
|
||||
|
||||
def get_price_alert_message(product_name: str, site_name: str, current_price: float,
|
||||
target_price: float, currency: str = 'GBP') -> str:
|
||||
"""Generate price alert message."""
|
||||
current_formatted = format_price(current_price, currency)
|
||||
target_formatted = format_price(target_price, currency)
|
||||
|
||||
return (f"Price Alert: {product_name} is now {current_formatted} on {site_name}, "
|
||||
f"which is at or below your target price of {target_formatted}!")
|
||||
|
||||
|
||||
def group_results_by_status(results: Dict[str, Dict[str, Any]]) -> Dict[str, List]:
|
||||
"""Group scraping results by success/failure status."""
|
||||
grouped = {
|
||||
'successful': [],
|
||||
'failed': [],
|
||||
'blocked': []
|
||||
}
|
||||
|
||||
for site_name, result in results.items():
|
||||
if result.get('success'):
|
||||
grouped['successful'].append({
|
||||
'site': site_name,
|
||||
'price': result.get('price'),
|
||||
'currency': result.get('currency', 'GBP')
|
||||
})
|
||||
elif 'blocked' in str(result.get('error', '')).lower() or '403' in str(result.get('error', '')):
|
||||
grouped['blocked'].append({
|
||||
'site': site_name,
|
||||
'error': result.get('error')
|
||||
})
|
||||
else:
|
||||
grouped['failed'].append({
|
||||
'site': site_name,
|
||||
'error': result.get('error')
|
||||
})
|
||||
|
||||
return grouped
|
||||
271
src/web_ui.py
Normal file
271
src/web_ui.py
Normal file
@@ -0,0 +1,271 @@
|
||||
"""
|
||||
Web UI for the price tracker application
|
||||
"""
|
||||
|
||||
from flask import Flask, render_template, request, jsonify, redirect, url_for, flash, send_from_directory
|
||||
from flask_wtf import FlaskForm
|
||||
from wtforms import StringField, FloatField, TextAreaField, SubmitField, URLField
|
||||
from wtforms.validators import DataRequired, NumberRange, URL, Optional
|
||||
import json
|
||||
import asyncio
|
||||
from datetime import datetime, timedelta
|
||||
import plotly
|
||||
import plotly.graph_objs as go
|
||||
import pandas as pd
|
||||
import os
|
||||
|
||||
from .database import DatabaseManager
|
||||
from .config import Config
|
||||
from .scraper_manager import ScraperManager
|
||||
from .notification import NotificationManager
|
||||
from .utils import format_price, group_results_by_status
|
||||
|
||||
|
||||
def create_app():
|
||||
"""Create Flask application."""
|
||||
# Get the project root directory (parent of src)
|
||||
project_root = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
|
||||
template_dir = os.path.join(project_root, 'templates')
|
||||
|
||||
app = Flask(__name__, template_folder=template_dir)
|
||||
app.config['SECRET_KEY'] = 'your-secret-key-change-this'
|
||||
|
||||
# Initialize components
|
||||
config = Config()
|
||||
db_manager = DatabaseManager(config.database_path)
|
||||
scraper_manager = ScraperManager(config)
|
||||
notification_manager = NotificationManager(config)
|
||||
|
||||
class ProductForm(FlaskForm):
|
||||
name = StringField('Product Name', validators=[DataRequired()])
|
||||
description = TextAreaField('Description')
|
||||
target_price = FloatField('Target Price (£)', validators=[Optional(), NumberRange(min=0)])
|
||||
jjfoodservice_url = URLField('JJ Food Service URL', validators=[Optional(), URL()])
|
||||
atoz_catering_url = URLField('A to Z Catering URL', validators=[Optional(), URL()])
|
||||
amazon_uk_url = URLField('Amazon UK URL', validators=[Optional(), URL()])
|
||||
submit = SubmitField('Add Product')
|
||||
|
||||
@app.route('/')
|
||||
def index():
|
||||
"""Home page showing all products."""
|
||||
products = db_manager.get_all_products()
|
||||
|
||||
# Get latest prices for each product
|
||||
for product in products:
|
||||
latest_prices = db_manager.get_latest_prices(product['id'])
|
||||
product['latest_prices'] = latest_prices
|
||||
|
||||
# Find best current price
|
||||
if latest_prices:
|
||||
best_price = min(latest_prices.values(), key=lambda x: x['price'])
|
||||
product['best_price'] = best_price
|
||||
else:
|
||||
product['best_price'] = None
|
||||
|
||||
return render_template('index.html', products=products)
|
||||
|
||||
@app.route('/add_product', methods=['GET', 'POST'])
|
||||
def add_product():
|
||||
"""Add a new product to track."""
|
||||
form = ProductForm()
|
||||
|
||||
if form.validate_on_submit():
|
||||
urls = {}
|
||||
if form.jjfoodservice_url.data:
|
||||
urls['jjfoodservice'] = form.jjfoodservice_url.data
|
||||
if form.atoz_catering_url.data:
|
||||
urls['atoz_catering'] = form.atoz_catering_url.data
|
||||
if form.amazon_uk_url.data:
|
||||
urls['amazon_uk'] = form.amazon_uk_url.data
|
||||
|
||||
if not urls:
|
||||
flash('Please provide at least one URL to track.', 'error')
|
||||
return render_template('add_product.html', form=form)
|
||||
|
||||
try:
|
||||
product_id = db_manager.add_product(
|
||||
name=form.name.data,
|
||||
description=form.description.data,
|
||||
target_price=form.target_price.data,
|
||||
urls=urls
|
||||
)
|
||||
flash(f'Product "{form.name.data}" added successfully!', 'success')
|
||||
return redirect(url_for('product_detail', product_id=product_id))
|
||||
except Exception as e:
|
||||
flash(f'Error adding product: {str(e)}', 'error')
|
||||
|
||||
return render_template('add_product.html', form=form)
|
||||
|
||||
@app.route('/product/<int:product_id>')
|
||||
def product_detail(product_id):
|
||||
"""Show detailed information for a product."""
|
||||
product = db_manager.get_product(product_id)
|
||||
if not product:
|
||||
flash('Product not found.', 'error')
|
||||
return redirect(url_for('index'))
|
||||
|
||||
# Get price history
|
||||
price_history = db_manager.get_price_history(product_id, days=30)
|
||||
latest_prices = db_manager.get_latest_prices(product_id)
|
||||
price_stats = db_manager.get_price_statistics(product_id, days=30)
|
||||
|
||||
# Create price chart
|
||||
chart_json = create_price_chart(price_history, product['name'])
|
||||
|
||||
return render_template('product_detail.html',
|
||||
product=product,
|
||||
price_history=price_history,
|
||||
latest_prices=latest_prices,
|
||||
price_stats=price_stats,
|
||||
chart_json=chart_json)
|
||||
|
||||
@app.route('/scrape/<int:product_id>', methods=['POST'])
|
||||
def scrape_product(product_id):
|
||||
"""Manually trigger scraping for a specific product."""
|
||||
product = db_manager.get_product(product_id)
|
||||
if not product:
|
||||
return jsonify({'error': 'Product not found'}), 404
|
||||
|
||||
try:
|
||||
# Run scraping in a new event loop (since we're in Flask)
|
||||
loop = asyncio.new_event_loop()
|
||||
asyncio.set_event_loop(loop)
|
||||
|
||||
results = loop.run_until_complete(scraper_manager.scrape_product(product))
|
||||
|
||||
# Save results to database
|
||||
for site_name, result in results.items():
|
||||
if result['success']:
|
||||
db_manager.save_price_history(
|
||||
product_id=product_id,
|
||||
site_name=site_name,
|
||||
price=result['price'],
|
||||
availability=result.get('availability', True),
|
||||
timestamp=datetime.now()
|
||||
)
|
||||
|
||||
loop.close()
|
||||
|
||||
return jsonify({
|
||||
'success': True,
|
||||
'results': results,
|
||||
'message': 'Scraping completed successfully'
|
||||
})
|
||||
|
||||
except Exception as e:
|
||||
return jsonify({'error': str(e)}), 500
|
||||
|
||||
@app.route('/scrape_all', methods=['POST'])
|
||||
def scrape_all_products():
|
||||
"""Trigger scraping for all products."""
|
||||
try:
|
||||
products = db_manager.get_all_products()
|
||||
|
||||
loop = asyncio.new_event_loop()
|
||||
asyncio.set_event_loop(loop)
|
||||
|
||||
results = loop.run_until_complete(scraper_manager.scrape_all_products(products))
|
||||
|
||||
# Save results to database
|
||||
total_updated = 0
|
||||
for product_id, site_results in results.items():
|
||||
for site_name, result in site_results.items():
|
||||
if result['success']:
|
||||
db_manager.save_price_history(
|
||||
product_id=product_id,
|
||||
site_name=site_name,
|
||||
price=result['price'],
|
||||
availability=result.get('availability', True),
|
||||
timestamp=datetime.now()
|
||||
)
|
||||
total_updated += 1
|
||||
|
||||
loop.close()
|
||||
|
||||
return jsonify({
|
||||
'success': True,
|
||||
'total_updated': total_updated,
|
||||
'message': f'Updated prices for {total_updated} product-site combinations'
|
||||
})
|
||||
|
||||
except Exception as e:
|
||||
return jsonify({'error': str(e)}), 500
|
||||
|
||||
@app.route('/api/products')
|
||||
def api_products():
|
||||
"""API endpoint to get all products."""
|
||||
products = db_manager.get_all_products()
|
||||
return jsonify(products)
|
||||
|
||||
@app.route('/api/product/<int:product_id>/prices')
|
||||
def api_product_prices(product_id):
|
||||
"""API endpoint to get price history for a product."""
|
||||
days = request.args.get('days', 30, type=int)
|
||||
price_history = db_manager.get_price_history(product_id, days)
|
||||
return jsonify(price_history)
|
||||
|
||||
@app.route('/settings')
|
||||
def settings():
|
||||
"""Settings page."""
|
||||
return render_template('settings.html', config=config)
|
||||
|
||||
@app.route('/test_notifications', methods=['POST'])
|
||||
def test_notifications():
|
||||
"""Test notification system."""
|
||||
try:
|
||||
loop = asyncio.new_event_loop()
|
||||
asyncio.set_event_loop(loop)
|
||||
|
||||
result = loop.run_until_complete(notification_manager.send_test_notification())
|
||||
loop.close()
|
||||
|
||||
return jsonify(result)
|
||||
except Exception as e:
|
||||
return jsonify({'error': str(e)}), 500
|
||||
|
||||
@app.route('/favicon.ico')
|
||||
def favicon():
|
||||
"""Serve the favicon."""
|
||||
return send_from_directory(os.path.join(app.root_path, 'static'),
|
||||
'favicon.ico', mimetype='image/vnd.microsoft.icon')
|
||||
|
||||
def create_price_chart(price_history, product_name):
|
||||
"""Create a price history chart using Plotly."""
|
||||
if not price_history:
|
||||
return json.dumps({})
|
||||
|
||||
# Convert to DataFrame for easier manipulation
|
||||
df = pd.DataFrame(price_history)
|
||||
df['timestamp'] = pd.to_datetime(df['timestamp'])
|
||||
|
||||
# Create traces for each site
|
||||
traces = []
|
||||
sites = df['site_name'].unique()
|
||||
|
||||
colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd']
|
||||
|
||||
for i, site in enumerate(sites):
|
||||
site_data = df[df['site_name'] == site].sort_values('timestamp')
|
||||
|
||||
trace = go.Scatter(
|
||||
x=site_data['timestamp'],
|
||||
y=site_data['price'],
|
||||
mode='lines+markers',
|
||||
name=site.title(),
|
||||
line=dict(color=colors[i % len(colors)], width=2),
|
||||
marker=dict(size=6)
|
||||
)
|
||||
traces.append(trace)
|
||||
|
||||
layout = go.Layout(
|
||||
title=f'Price History - {product_name}',
|
||||
xaxis=dict(title='Date'),
|
||||
yaxis=dict(title='Price (USD)'),
|
||||
hovermode='closest',
|
||||
margin=dict(l=50, r=50, t=50, b=50)
|
||||
)
|
||||
|
||||
fig = go.Figure(data=traces, layout=layout)
|
||||
return json.dumps(fig, cls=plotly.utils.PlotlyJSONEncoder)
|
||||
|
||||
return app
|
||||
2
static/favicon.ico
Normal file
2
static/favicon.ico
Normal file
@@ -0,0 +1,2 @@
|
||||
# Simple placeholder favicon
|
||||
# This prevents 404 errors in the browser logs
|
||||
184
templates/add_product.html
Normal file
184
templates/add_product.html
Normal file
@@ -0,0 +1,184 @@
|
||||
{% extends "base.html" %}
|
||||
|
||||
{% block title %}Add Product - Price Tracker{% endblock %}
|
||||
|
||||
{% block content %}
|
||||
<div class="row justify-content-center">
|
||||
<div class="col-lg-8">
|
||||
<div class="card">
|
||||
<div class="card-header">
|
||||
<h2 class="mb-0">
|
||||
<i class="fas fa-plus-circle me-2 text-primary"></i>Add New Product
|
||||
</h2>
|
||||
</div>
|
||||
<div class="card-body">
|
||||
<form method="POST">
|
||||
{{ form.hidden_tag() }}
|
||||
|
||||
<div class="row">
|
||||
<div class="col-md-8 mb-3">
|
||||
{{ form.name.label(class="form-label fw-bold") }}
|
||||
{{ form.name(class="form-control form-control-lg") }}
|
||||
{% if form.name.errors %}
|
||||
<div class="text-danger small mt-1">
|
||||
{% for error in form.name.errors %}
|
||||
<div>{{ error }}</div>
|
||||
{% endfor %}
|
||||
</div>
|
||||
{% endif %}
|
||||
</div>
|
||||
<div class="col-md-4 mb-3">
|
||||
{{ form.target_price.label(class="form-label fw-bold") }}
|
||||
<div class="input-group">
|
||||
<span class="input-group-text">£</span>
|
||||
{{ form.target_price(class="form-control form-control-lg") }}
|
||||
</div>
|
||||
{% if form.target_price.errors %}
|
||||
<div class="text-danger small mt-1">
|
||||
{% for error in form.target_price.errors %}
|
||||
<div>{{ error }}</div>
|
||||
{% endfor %}
|
||||
</div>
|
||||
{% endif %}
|
||||
<small class="text-muted">Optional: Alert when price drops below this</small>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="mb-3">
|
||||
{{ form.description.label(class="form-label fw-bold") }}
|
||||
{{ form.description(class="form-control", rows="3") }}
|
||||
{% if form.description.errors %}
|
||||
<div class="text-danger small mt-1">
|
||||
{% for error in form.description.errors %}
|
||||
<div>{{ error }}</div>
|
||||
{% endfor %}
|
||||
</div>
|
||||
{% endif %}
|
||||
<small class="text-muted">Optional: Brief description of the product</small>
|
||||
</div>
|
||||
|
||||
<hr class="my-4">
|
||||
|
||||
<h4 class="mb-3">
|
||||
<i class="fas fa-link me-2 text-info"></i>Product URLs
|
||||
</h4>
|
||||
<p class="text-muted mb-4">Add URLs from the sites you want to track. At least one URL is required.</p>
|
||||
|
||||
<div class="row">
|
||||
<div class="col-md-12 mb-3">
|
||||
{{ form.jjfoodservice_url.label(class="form-label fw-bold") }}
|
||||
<div class="input-group">
|
||||
<span class="input-group-text jjfoodservice">
|
||||
<i class="fas fa-utensils"></i> JJ Food Service
|
||||
</span>
|
||||
{{ form.jjfoodservice_url(class="form-control", placeholder="https://www.jjfoodservice.com/...") }}
|
||||
</div>
|
||||
{% if form.jjfoodservice_url.errors %}
|
||||
<div class="text-danger small mt-1">
|
||||
{% for error in form.jjfoodservice_url.errors %}
|
||||
<div>{{ error }}</div>
|
||||
{% endfor %}
|
||||
</div>
|
||||
{% endif %}
|
||||
</div>
|
||||
|
||||
<div class="col-md-12 mb-3">
|
||||
{{ form.atoz_catering_url.label(class="form-label fw-bold") }}
|
||||
<div class="input-group">
|
||||
<span class="input-group-text atoz_catering">
|
||||
<i class="fas fa-store"></i> A to Z Catering
|
||||
</span>
|
||||
{{ form.atoz_catering_url(class="form-control", placeholder="https://www.atoz-catering.co.uk/...") }}
|
||||
</div>
|
||||
{% if form.atoz_catering_url.errors %}
|
||||
<div class="text-danger small mt-1">
|
||||
{% for error in form.atoz_catering_url.errors %}
|
||||
<div>{{ error }}</div>
|
||||
{% endfor %}
|
||||
</div>
|
||||
{% endif %}
|
||||
</div>
|
||||
|
||||
<div class="col-md-12 mb-3">
|
||||
{{ form.amazon_uk_url.label(class="form-label fw-bold") }}
|
||||
<div class="input-group">
|
||||
<span class="input-group-text amazon_uk">
|
||||
<i class="fab fa-amazon"></i> Amazon UK
|
||||
</span>
|
||||
{{ form.amazon_uk_url(class="form-control", placeholder="https://www.amazon.co.uk/...") }}
|
||||
</div>
|
||||
{% if form.amazon_uk_url.errors %}
|
||||
<div class="text-danger small mt-1">
|
||||
{% for error in form.amazon_uk_url.errors %}
|
||||
<div>{{ error }}</div>
|
||||
{% endfor %}
|
||||
</div>
|
||||
{% endif %}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="alert alert-info">
|
||||
<i class="fas fa-info-circle me-2"></i>
|
||||
<strong>Tips:</strong>
|
||||
<ul class="mb-0 mt-2">
|
||||
<li>Make sure URLs point to the specific product page</li>
|
||||
<li>Test URLs in your browser first to ensure they work</li>
|
||||
<li>Some sites may block automated requests - we'll handle this gracefully</li>
|
||||
<li>For best results, use direct product page URLs</li>
|
||||
</ul>
|
||||
</div>
|
||||
|
||||
<div class="d-grid gap-2 d-md-flex justify-content-md-end">
|
||||
<a href="{{ url_for('index') }}" class="btn btn-outline-secondary me-md-2">
|
||||
<i class="fas fa-arrow-left me-1"></i>Cancel
|
||||
</a>
|
||||
{{ form.submit(class="btn btn-primary btn-lg") }}
|
||||
</div>
|
||||
</form>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="row mt-4">
|
||||
<div class="col-12">
|
||||
<div class="card">
|
||||
<div class="card-header">
|
||||
<h5 class="mb-0">
|
||||
<i class="fas fa-question-circle me-2"></i>How to Find Product URLs
|
||||
</h5>
|
||||
</div>
|
||||
<div class="card-body">
|
||||
<div class="row">
|
||||
<div class="col-md-6">
|
||||
<h6 class="fw-bold">JJ Food Service</h6>
|
||||
<p class="small text-muted">
|
||||
Navigate to the specific product page on JJ Food Service and copy the URL.
|
||||
Make sure you're logged in for accurate pricing.
|
||||
</p>
|
||||
|
||||
<h6 class="fw-bold">A to Z Catering</h6>
|
||||
<p class="small text-muted">
|
||||
Go to the product page on A to Z Catering and copy the URL.
|
||||
URLs typically contain "/products/product/" followed by the product name.
|
||||
</p>
|
||||
</div>
|
||||
<div class="col-md-6">
|
||||
<h6 class="fw-bold">Amazon UK</h6>
|
||||
<p class="small text-muted">
|
||||
Navigate to the product page on Amazon.co.uk and copy the URL.
|
||||
The URL should contain "/dp/" followed by the product identifier.
|
||||
</p>
|
||||
|
||||
<h6 class="fw-bold text-muted">Note</h6>
|
||||
<p class="small text-muted">
|
||||
We focus on UK catering supply websites that work well with automated price tracking.
|
||||
This provides reliable price monitoring for your business needs.
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
{% endblock %}
|
||||
225
templates/base.html
Normal file
225
templates/base.html
Normal file
@@ -0,0 +1,225 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>{% block title %}Price Tracker{% endblock %}</title>
|
||||
<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/css/bootstrap.min.css" rel="stylesheet">
|
||||
<link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.0.0/css/all.min.css" rel="stylesheet">
|
||||
<style>
|
||||
body {
|
||||
background-color: #f8f9fa;
|
||||
font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
|
||||
}
|
||||
.navbar {
|
||||
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
||||
box-shadow: 0 2px 4px rgba(0,0,0,.1);
|
||||
}
|
||||
.navbar-brand {
|
||||
font-weight: bold;
|
||||
color: white !important;
|
||||
}
|
||||
.card {
|
||||
border: none;
|
||||
border-radius: 15px;
|
||||
box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
|
||||
transition: transform 0.2s;
|
||||
}
|
||||
.card:hover {
|
||||
transform: translateY(-2px);
|
||||
}
|
||||
.price-badge {
|
||||
font-size: 1.2em;
|
||||
font-weight: bold;
|
||||
}
|
||||
.price-best {
|
||||
background: linear-gradient(135deg, #4CAF50, #45a049);
|
||||
}
|
||||
.price-high {
|
||||
background: linear-gradient(135deg, #f44336, #d32f2f);
|
||||
}
|
||||
.price-medium {
|
||||
background: linear-gradient(135deg, #ff9800, #f57c00);
|
||||
}
|
||||
.btn-primary {
|
||||
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
||||
border: none;
|
||||
border-radius: 25px;
|
||||
padding: 10px 25px;
|
||||
}
|
||||
.btn-success {
|
||||
background: linear-gradient(135deg, #4CAF50, #45a049);
|
||||
border: none;
|
||||
border-radius: 25px;
|
||||
}
|
||||
.btn-danger {
|
||||
background: linear-gradient(135deg, #f44336, #d32f2f);
|
||||
border: none;
|
||||
border-radius: 25px;
|
||||
}
|
||||
.alert {
|
||||
border-radius: 15px;
|
||||
border: none;
|
||||
}
|
||||
.site-badge {
|
||||
font-size: 0.8em;
|
||||
padding: 0.3em 0.6em;
|
||||
border-radius: 15px;
|
||||
margin-right: 5px;
|
||||
}
|
||||
.jjfoodservice { background-color: #e74c3c; color: white; }
|
||||
.atoz_catering { background-color: #3498db; color: white; }
|
||||
.amazon_uk { background-color: #ff9900; color: white; }
|
||||
.ebay { background-color: #0064d2; color: white; }
|
||||
.walmart { background-color: #0071ce; color: white; }
|
||||
|
||||
@media (max-width: 768px) {
|
||||
.card {
|
||||
margin-bottom: 1rem;
|
||||
}
|
||||
}
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<nav class="navbar navbar-expand-lg navbar-dark">
|
||||
<div class="container">
|
||||
<a class="navbar-brand" href="{{ url_for('index') }}">
|
||||
<i class="fas fa-chart-line me-2"></i>Price Tracker
|
||||
</a>
|
||||
<button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbarNav">
|
||||
<span class="navbar-toggler-icon"></span>
|
||||
</button>
|
||||
<div class="collapse navbar-collapse" id="navbarNav">
|
||||
<ul class="navbar-nav me-auto">
|
||||
<li class="nav-item">
|
||||
<a class="nav-link" href="{{ url_for('index') }}">
|
||||
<i class="fas fa-home me-1"></i>Dashboard
|
||||
</a>
|
||||
</li>
|
||||
<li class="nav-item">
|
||||
<a class="nav-link" href="{{ url_for('add_product') }}">
|
||||
<i class="fas fa-plus me-1"></i>Add Product
|
||||
</a>
|
||||
</li>
|
||||
<li class="nav-item">
|
||||
<a class="nav-link" href="{{ url_for('settings') }}">
|
||||
<i class="fas fa-cog me-1"></i>Settings
|
||||
</a>
|
||||
</li>
|
||||
</ul>
|
||||
<div class="navbar-nav">
|
||||
<button class="btn btn-outline-light btn-sm" onclick="scrapeAll()">
|
||||
<i class="fas fa-sync-alt me-1"></i>Scrape All
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<div class="container mt-4">
|
||||
{% with messages = get_flashed_messages(with_categories=true) %}
|
||||
{% if messages %}
|
||||
{% for category, message in messages %}
|
||||
<div class="alert alert-{{ 'danger' if category == 'error' else category }} alert-dismissible fade show" role="alert">
|
||||
<i class="fas fa-{{ 'exclamation-triangle' if category == 'error' else 'check-circle' if category == 'success' else 'info-circle' }} me-2"></i>
|
||||
{{ message }}
|
||||
<button type="button" class="btn-close" data-bs-dismiss="alert"></button>
|
||||
</div>
|
||||
{% endfor %}
|
||||
{% endif %}
|
||||
{% endwith %}
|
||||
|
||||
{% block content %}{% endblock %}
|
||||
</div>
|
||||
|
||||
<footer class="bg-dark text-light py-4 mt-5">
|
||||
<div class="container text-center">
|
||||
<p>© 2025 Price Tracker. Built with Beautiful Soup & Flask.</p>
|
||||
</div>
|
||||
</footer>
|
||||
|
||||
<script src="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/js/bootstrap.bundle.min.js"></script>
|
||||
<script src="https://cdn.plot.ly/plotly-latest.min.js"></script>
|
||||
<script>
|
||||
function scrapeProduct(productId) {
|
||||
const btn = document.querySelector(`[onclick="scrapeProduct(${productId})"]`);
|
||||
const originalText = btn.innerHTML;
|
||||
btn.innerHTML = '<i class="fas fa-spinner fa-spin me-1"></i>Scraping...';
|
||||
btn.disabled = true;
|
||||
|
||||
fetch(`/scrape/${productId}`, { method: 'POST' })
|
||||
.then(response => response.json())
|
||||
.then(data => {
|
||||
if (data.success) {
|
||||
location.reload();
|
||||
} else {
|
||||
alert('Error: ' + (data.error || 'Unknown error'));
|
||||
}
|
||||
})
|
||||
.catch(error => {
|
||||
alert('Error: ' + error.message);
|
||||
})
|
||||
.finally(() => {
|
||||
btn.innerHTML = originalText;
|
||||
btn.disabled = false;
|
||||
});
|
||||
}
|
||||
|
||||
function scrapeAll() {
|
||||
const btn = document.querySelector('[onclick="scrapeAll()"]');
|
||||
const originalText = btn.innerHTML;
|
||||
btn.innerHTML = '<i class="fas fa-spinner fa-spin me-1"></i>Scraping...';
|
||||
btn.disabled = true;
|
||||
|
||||
fetch('/scrape_all', { method: 'POST' })
|
||||
.then(response => response.json())
|
||||
.then(data => {
|
||||
if (data.success) {
|
||||
alert(`Success! Updated ${data.total_updated} price entries.`);
|
||||
location.reload();
|
||||
} else {
|
||||
alert('Error: ' + (data.error || 'Unknown error'));
|
||||
}
|
||||
})
|
||||
.catch(error => {
|
||||
alert('Error: ' + error.message);
|
||||
})
|
||||
.finally(() => {
|
||||
btn.innerHTML = originalText;
|
||||
btn.disabled = false;
|
||||
});
|
||||
}
|
||||
|
||||
function testNotifications() {
|
||||
const btn = document.querySelector('[onclick="testNotifications()"]');
|
||||
const originalText = btn.innerHTML;
|
||||
btn.innerHTML = '<i class="fas fa-spinner fa-spin me-1"></i>Testing...';
|
||||
btn.disabled = true;
|
||||
|
||||
fetch('/test_notifications', { method: 'POST' })
|
||||
.then(response => response.json())
|
||||
.then(data => {
|
||||
let message = 'Notification Test Results:\n';
|
||||
if (data.email.enabled) {
|
||||
message += `Email: ${data.email.success ? 'Success' : 'Failed - ' + data.email.error}\n`;
|
||||
}
|
||||
if (data.webhook.enabled) {
|
||||
message += `Webhook: ${data.webhook.success ? 'Success' : 'Failed - ' + data.webhook.error}\n`;
|
||||
}
|
||||
if (!data.email.enabled && !data.webhook.enabled) {
|
||||
message += 'No notifications are enabled.';
|
||||
}
|
||||
alert(message);
|
||||
})
|
||||
.catch(error => {
|
||||
alert('Error: ' + error.message);
|
||||
})
|
||||
.finally(() => {
|
||||
btn.innerHTML = originalText;
|
||||
btn.disabled = false;
|
||||
});
|
||||
}
|
||||
</script>
|
||||
{% block scripts %}{% endblock %}
|
||||
</body>
|
||||
</html>
|
||||
184
templates/index.html
Normal file
184
templates/index.html
Normal file
@@ -0,0 +1,184 @@
|
||||
{% extends "base.html" %}
|
||||
|
||||
{% block title %}Dashboard - Price Tracker{% endblock %}
|
||||
|
||||
{% block content %}
|
||||
<div class="d-flex justify-content-between align-items-center mb-4">
|
||||
<h1 class="display-4">
|
||||
<i class="fas fa-tachometer-alt me-3 text-primary"></i>Dashboard
|
||||
</h1>
|
||||
<a href="{{ url_for('add_product') }}" class="btn btn-primary btn-lg">
|
||||
<i class="fas fa-plus me-2"></i>Add Product
|
||||
</a>
|
||||
</div>
|
||||
|
||||
{% if not products %}
|
||||
<div class="text-center py-5">
|
||||
<div class="card mx-auto" style="max-width: 500px;">
|
||||
<div class="card-body">
|
||||
<i class="fas fa-shopping-cart fa-4x text-muted mb-3"></i>
|
||||
<h3 class="text-muted">No Products Yet</h3>
|
||||
<p class="text-muted">Start tracking prices by adding your first product!</p>
|
||||
<a href="{{ url_for('add_product') }}" class="btn btn-primary btn-lg">
|
||||
<i class="fas fa-plus me-2"></i>Add Your First Product
|
||||
</a>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
{% else %}
|
||||
<div class="row">
|
||||
{% for product in products %}
|
||||
<div class="col-lg-6 col-xl-4 mb-4">
|
||||
<div class="card h-100">
|
||||
<div class="card-body">
|
||||
<div class="d-flex justify-content-between align-items-start mb-3">
|
||||
<h5 class="card-title fw-bold">{{ product.name }}</h5>
|
||||
{% if product.target_price %}
|
||||
<span class="badge bg-info">
|
||||
Target: £{{ "%.2f"|format(product.target_price) }}
|
||||
</span>
|
||||
{% endif %}
|
||||
</div>
|
||||
|
||||
{% if product.description %}
|
||||
<p class="card-text text-muted small">{{ product.description[:100] }}{% if product.description|length > 100 %}...{% endif %}</p>
|
||||
{% endif %}
|
||||
|
||||
<!-- Sites being tracked -->
|
||||
<div class="mb-3">
|
||||
<small class="text-muted">Tracking on:</small><br>
|
||||
{% for site_name in product.urls.keys() %}
|
||||
<span class="site-badge {{ site_name }}">{{ site_name.title() }}</span>
|
||||
{% endfor %}
|
||||
</div>
|
||||
|
||||
<!-- Current Prices -->
|
||||
{% if product.latest_prices %}
|
||||
<div class="row g-2 mb-3">
|
||||
{% for site_name, price_data in product.latest_prices.items() %}
|
||||
<div class="col-12">
|
||||
<div class="d-flex justify-content-between align-items-center p-2 bg-light rounded">
|
||||
<span class="site-badge {{ site_name }} small">{{ site_name.title() }}</span>
|
||||
<div class="text-end">
|
||||
<span class="fw-bold">£{{ "%.2f"|format(price_data.price) }}</span>
|
||||
<br><small class="text-muted">{{ price_data.timestamp[:10] }}</small>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
{% endfor %}
|
||||
</div>
|
||||
|
||||
<!-- Best Price Highlight -->
|
||||
{% if product.best_price %}
|
||||
<div class="alert alert-success py-2 mb-3">
|
||||
<i class="fas fa-trophy me-2"></i>
|
||||
<strong>Best Price: £{{ "%.2f"|format(product.best_price.price) }}</strong>
|
||||
{% if product.target_price and product.best_price.price <= product.target_price %}
|
||||
<span class="badge bg-danger ms-2">
|
||||
<i class="fas fa-bell me-1"></i>Target Reached!
|
||||
</span>
|
||||
{% endif %}
|
||||
</div>
|
||||
{% endif %}
|
||||
{% else %}
|
||||
<div class="alert alert-warning py-2 mb-3">
|
||||
<i class="fas fa-exclamation-triangle me-2"></i>
|
||||
No price data yet. Click "Scrape Now" to get prices.
|
||||
</div>
|
||||
{% endif %}
|
||||
|
||||
<!-- Action Buttons -->
|
||||
<div class="d-grid gap-2">
|
||||
<div class="btn-group" role="group">
|
||||
<a href="{{ url_for('product_detail', product_id=product.id) }}" class="btn btn-outline-primary">
|
||||
<i class="fas fa-chart-line me-1"></i>Details
|
||||
</a>
|
||||
<button class="btn btn-success" onclick="scrapeProduct({{ product.id }})">
|
||||
<i class="fas fa-sync-alt me-1"></i>Scrape Now
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Card Footer with last update -->
|
||||
<div class="card-footer bg-transparent">
|
||||
<small class="text-muted">
|
||||
<i class="fas fa-clock me-1"></i>
|
||||
Added: {{ product.created_at[:10] }}
|
||||
</small>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
{% endfor %}
|
||||
</div>
|
||||
|
||||
<!-- Summary Stats -->
|
||||
<div class="row mt-4">
|
||||
<div class="col-md-3">
|
||||
<div class="card text-center">
|
||||
<div class="card-body">
|
||||
<i class="fas fa-shopping-bag fa-2x text-primary mb-2"></i>
|
||||
<h4 class="fw-bold">{{ products|length }}</h4>
|
||||
<p class="text-muted mb-0">Products Tracked</p>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="col-md-3">
|
||||
<div class="card text-center">
|
||||
<div class="card-body">
|
||||
<i class="fas fa-store fa-2x text-success mb-2"></i>
|
||||
<h4 class="fw-bold">
|
||||
{% if products %}
|
||||
{% set total_urls = 0 %}
|
||||
{% for product in products %}
|
||||
{% set total_urls = total_urls + product.urls|length %}
|
||||
{% endfor %}
|
||||
{{ total_urls }}
|
||||
{% else %}
|
||||
0
|
||||
{% endif %}
|
||||
</h4>
|
||||
<p class="text-muted mb-0">Total URLs</p>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="col-md-3">
|
||||
<div class="card text-center">
|
||||
<div class="card-body">
|
||||
<i class="fas fa-bell fa-2x text-warning mb-2"></i>
|
||||
<h4 class="fw-bold">
|
||||
{% set alerts = [] %}
|
||||
{% for product in products %}
|
||||
{% if product.target_price and product.best_price and product.best_price.price <= product.target_price %}
|
||||
{% set _ = alerts.append(1) %}
|
||||
{% endif %}
|
||||
{% endfor %}
|
||||
{{ alerts|length }}
|
||||
</h4>
|
||||
<p class="text-muted mb-0">Price Alerts</p>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="col-md-3">
|
||||
<div class="card text-center">
|
||||
<div class="card-body">
|
||||
<i class="fas fa-chart-bar fa-2x text-info mb-2"></i>
|
||||
<h4 class="fw-bold">
|
||||
{% set total_savings = 0 %}
|
||||
{% for product in products %}
|
||||
{% if product.target_price and product.best_price %}
|
||||
{% set savings = product.target_price - product.best_price.price %}
|
||||
{% if savings > 0 %}
|
||||
{% set total_savings = total_savings + savings %}
|
||||
{% endif %}
|
||||
{% endif %}
|
||||
{% endfor %}
|
||||
£{{ "%.0f"|format(total_savings) }}
|
||||
</h4>
|
||||
<p class="text-muted mb-0">Potential Savings</p>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
{% endif %}
|
||||
{% endblock %}
|
||||
234
templates/product_detail.html
Normal file
234
templates/product_detail.html
Normal file
@@ -0,0 +1,234 @@
|
||||
{% extends "base.html" %}
|
||||
|
||||
{% block title %}{{ product.name }} - Price Tracker{% endblock %}
|
||||
|
||||
{% block content %}
|
||||
<div class="d-flex justify-content-between align-items-center mb-4">
|
||||
<div>
|
||||
<h1 class="display-5">{{ product.name }}</h1>
|
||||
{% if product.description %}
|
||||
<p class="text-muted">{{ product.description }}</p>
|
||||
{% endif %}
|
||||
</div>
|
||||
<div>
|
||||
<button class="btn btn-success me-2" onclick="scrapeProduct({{ product.id }})">
|
||||
<i class="fas fa-sync-alt me-1"></i>Scrape Now
|
||||
</button>
|
||||
<a href="{{ url_for('index') }}" class="btn btn-outline-secondary">
|
||||
<i class="fas fa-arrow-left me-1"></i>Back to Dashboard
|
||||
</a>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="row">
|
||||
<!-- Price Overview -->
|
||||
<div class="col-lg-4">
|
||||
<div class="card mb-4">
|
||||
<div class="card-header">
|
||||
<h5 class="mb-0">
|
||||
<i class="fas fa-tags me-2"></i>Current Prices
|
||||
</h5>
|
||||
</div>
|
||||
<div class="card-body">
|
||||
{% if latest_prices %}
|
||||
{% set price_list = latest_prices.values() | list %}
|
||||
{% set min_price = price_list | min(attribute='price') %}
|
||||
{% set max_price = price_list | max(attribute='price') %}
|
||||
|
||||
{% for site_name, price_data in latest_prices.items() %}
|
||||
<div class="d-flex justify-content-between align-items-center mb-3 p-3 rounded
|
||||
{% if price_data.price == min_price.price %}bg-success bg-opacity-10{% endif %}">
|
||||
<div>
|
||||
<span class="site-badge {{ site_name }}">{{ site_name.title() }}</span>
|
||||
{% if not price_data.availability %}
|
||||
<span class="badge bg-warning ms-2">Out of Stock</span>
|
||||
{% endif %}
|
||||
{% if price_data.price == min_price.price %}
|
||||
<span class="badge bg-success ms-2">
|
||||
<i class="fas fa-trophy"></i> Best Price
|
||||
</span>
|
||||
{% endif %}
|
||||
</div>
|
||||
<div class="text-end">
|
||||
<div class="h5 mb-0">£{{ "%.2f"|format(price_data.price) }}</div>
|
||||
<small class="text-muted">{{ price_data.timestamp[:10] }}</small>
|
||||
</div>
|
||||
</div>
|
||||
{% endfor %}
|
||||
|
||||
{% if product.target_price %}
|
||||
<hr>
|
||||
<div class="d-flex justify-content-between align-items-center">
|
||||
<span class="fw-bold">Target Price:</span>
|
||||
<span class="h5 mb-0 text-info">£{{ "%.2f"|format(product.target_price) }}</span>
|
||||
</div>
|
||||
|
||||
{% if min_price.price <= product.target_price %}
|
||||
<div class="alert alert-success mt-3 py-2">
|
||||
<i class="fas fa-bell me-2"></i>
|
||||
<strong>Target Reached!</strong> Best price is at or below your target.
|
||||
</div>
|
||||
{% else %}
|
||||
<div class="alert alert-info mt-3 py-2">
|
||||
<i class="fas fa-info-circle me-2"></i>
|
||||
You could save <strong>£{{ "%.2f"|format(min_price.price - product.target_price) }}</strong>
|
||||
when price drops to target.
|
||||
</div>
|
||||
{% endif %}
|
||||
{% endif %}
|
||||
|
||||
{% else %}
|
||||
<p class="text-muted text-center py-4">
|
||||
<i class="fas fa-exclamation-triangle fa-2x mb-3"></i><br>
|
||||
No price data available yet.<br>
|
||||
<button class="btn btn-primary mt-2" onclick="scrapeProduct({{ product.id }})">
|
||||
Get Prices Now
|
||||
</button>
|
||||
</p>
|
||||
{% endif %}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Product URLs -->
|
||||
<div class="card mb-4">
|
||||
<div class="card-header">
|
||||
<h5 class="mb-0">
|
||||
<i class="fas fa-link me-2"></i>Tracked URLs
|
||||
</h5>
|
||||
</div>
|
||||
<div class="card-body">
|
||||
{% for site_name, url in product.urls.items() %}
|
||||
<div class="mb-2">
|
||||
<span class="site-badge {{ site_name }}">{{ site_name.title() }}</span>
|
||||
<a href="{{ url }}" target="_blank" class="btn btn-sm btn-outline-primary ms-2">
|
||||
<i class="fas fa-external-link-alt"></i> View
|
||||
</a>
|
||||
</div>
|
||||
{% endfor %}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Price Chart -->
|
||||
<div class="col-lg-8">
|
||||
<div class="card mb-4">
|
||||
<div class="card-header">
|
||||
<h5 class="mb-0">
|
||||
<i class="fas fa-chart-line me-2"></i>Price History (Last 30 Days)
|
||||
</h5>
|
||||
</div>
|
||||
<div class="card-body">
|
||||
{% if price_history %}
|
||||
<div id="priceChart" style="height: 400px;"></div>
|
||||
{% else %}
|
||||
<p class="text-muted text-center py-5">
|
||||
<i class="fas fa-chart-line fa-3x mb-3"></i><br>
|
||||
No price history available yet. Price data will appear here after scraping.
|
||||
</p>
|
||||
{% endif %}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Price Statistics -->
|
||||
{% if price_stats %}
|
||||
<div class="card mb-4">
|
||||
<div class="card-header">
|
||||
<h5 class="mb-0">
|
||||
<i class="fas fa-calculator me-2"></i>Price Statistics (Last 30 Days)
|
||||
</h5>
|
||||
</div>
|
||||
<div class="card-body">
|
||||
<div class="row">
|
||||
{% for site_name, stats in price_stats.items() %}
|
||||
<div class="col-md-6 mb-3">
|
||||
<div class="card border-0 bg-light">
|
||||
<div class="card-body">
|
||||
<h6 class="card-title">
|
||||
<span class="site-badge {{ site_name }}">{{ site_name.title() }}</span>
|
||||
</h6>
|
||||
<div class="row">
|
||||
<div class="col-6">
|
||||
<small class="text-muted">Min Price</small>
|
||||
<div class="fw-bold text-success">£{{ "%.2f"|format(stats.min_price) }}</div>
|
||||
</div>
|
||||
<div class="col-6">
|
||||
<small class="text-muted">Max Price</small>
|
||||
<div class="fw-bold text-danger">£{{ "%.2f"|format(stats.max_price) }}</div>
|
||||
</div>
|
||||
<div class="col-6 mt-2">
|
||||
<small class="text-muted">Avg Price</small>
|
||||
<div class="fw-bold">£{{ "%.2f"|format(stats.avg_price) }}</div>
|
||||
</div>
|
||||
<div class="col-6 mt-2">
|
||||
<small class="text-muted">Data Points</small>
|
||||
<div class="fw-bold">{{ stats.data_points }}</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
{% endfor %}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
{% endif %}
|
||||
|
||||
<!-- Recent Price History -->
|
||||
{% if price_history %}
|
||||
<div class="card">
|
||||
<div class="card-header">
|
||||
<h5 class="mb-0">
|
||||
<i class="fas fa-history me-2"></i>Recent Price Updates
|
||||
</h5>
|
||||
</div>
|
||||
<div class="card-body">
|
||||
<div class="table-responsive">
|
||||
<table class="table table-hover">
|
||||
<thead>
|
||||
<tr>
|
||||
<th>Site</th>
|
||||
<th>Price</th>
|
||||
<th>Available</th>
|
||||
<th>Date</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
{% for entry in price_history[:20] %}
|
||||
<tr>
|
||||
<td>
|
||||
<span class="site-badge {{ entry.site_name }}">{{ entry.site_name.title() }}</span>
|
||||
</td>
|
||||
<td class="fw-bold">£{{ "%.2f"|format(entry.price) }}</td>
|
||||
<td>
|
||||
{% if entry.availability %}
|
||||
<span class="badge bg-success">Available</span>
|
||||
{% else %}
|
||||
<span class="badge bg-warning">Out of Stock</span>
|
||||
{% endif %}
|
||||
</td>
|
||||
<td class="text-muted">{{ entry.timestamp[:16] }}</td>
|
||||
</tr>
|
||||
{% endfor %}
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
{% if price_history|length > 20 %}
|
||||
<p class="text-muted text-center mt-3">
|
||||
Showing 20 most recent entries of {{ price_history|length }} total.
|
||||
</p>
|
||||
{% endif %}
|
||||
</div>
|
||||
</div>
|
||||
{% endif %}
|
||||
</div>
|
||||
</div>
|
||||
{% endblock %}
|
||||
|
||||
{% block scripts %}
|
||||
{% if chart_json %}
|
||||
<script>
|
||||
var chartData = {{ chart_json|safe }};
|
||||
Plotly.newPlot('priceChart', chartData.data, chartData.layout, {responsive: true});
|
||||
</script>
|
||||
{% endif %}
|
||||
{% endblock %}
|
||||
217
templates/settings.html
Normal file
217
templates/settings.html
Normal file
@@ -0,0 +1,217 @@
|
||||
{% extends "base.html" %}
|
||||
|
||||
{% block title %}Settings - Price Tracker{% endblock %}
|
||||
|
||||
{% block content %}
|
||||
<div class="row">
|
||||
<div class="col-lg-8">
|
||||
<h1 class="display-5 mb-4">
|
||||
<i class="fas fa-cog me-3 text-primary"></i>Settings
|
||||
</h1>
|
||||
|
||||
<!-- Scraping Settings -->
|
||||
<div class="card mb-4">
|
||||
<div class="card-header">
|
||||
<h5 class="mb-0">
|
||||
<i class="fas fa-spider me-2"></i>Scraping Configuration
|
||||
</h5>
|
||||
</div>
|
||||
<div class="card-body">
|
||||
<div class="row">
|
||||
<div class="col-md-6">
|
||||
<h6>Request Settings</h6>
|
||||
<ul class="list-unstyled">
|
||||
<li><strong>Delay between requests:</strong> {{ config.delay_between_requests }}s</li>
|
||||
<li><strong>Max concurrent requests:</strong> {{ config.max_concurrent_requests }}</li>
|
||||
<li><strong>Request timeout:</strong> {{ config.timeout }}s</li>
|
||||
<li><strong>Retry attempts:</strong> {{ config.retry_attempts }}</li>
|
||||
</ul>
|
||||
</div>
|
||||
<div class="col-md-6">
|
||||
<h6>User Agents</h6>
|
||||
<p class="text-muted small">{{ config.user_agents|length }} user agents configured</p>
|
||||
<details>
|
||||
<summary class="text-primary" style="cursor: pointer;">View user agents</summary>
|
||||
<div class="mt-2">
|
||||
{% for ua in config.user_agents %}
|
||||
<div class="small text-muted mb-1">{{ ua[:80] }}...</div>
|
||||
{% endfor %}
|
||||
</div>
|
||||
</details>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Site Configuration -->
|
||||
<div class="card mb-4">
|
||||
<div class="card-header">
|
||||
<h5 class="mb-0">
|
||||
<i class="fas fa-store me-2"></i>Supported Sites
|
||||
</h5>
|
||||
</div>
|
||||
<div class="card-body">
|
||||
<div class="row">
|
||||
{% for site_name, site_config in config.sites_config.items() %}
|
||||
<div class="col-md-4 mb-3">
|
||||
<div class="card border-0 bg-light">
|
||||
<div class="card-body">
|
||||
<h6 class="card-title">
|
||||
<span class="site-badge {{ site_name }}">{{ site_name.title() }}</span>
|
||||
{% if site_config.enabled %}
|
||||
<span class="badge bg-success ms-2">Enabled</span>
|
||||
{% else %}
|
||||
<span class="badge bg-secondary ms-2">Disabled</span>
|
||||
{% endif %}
|
||||
</h6>
|
||||
<p class="card-text small text-muted">
|
||||
<strong>Base URL:</strong> {{ site_config.base_url }}<br>
|
||||
<strong>Price selectors:</strong> {{ site_config.selectors.price|length }}<br>
|
||||
<strong>Title selectors:</strong> {{ site_config.selectors.title|length }}
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
{% endfor %}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Notification Settings -->
|
||||
<div class="card mb-4">
|
||||
<div class="card-header d-flex justify-content-between align-items-center">
|
||||
<h5 class="mb-0">
|
||||
<i class="fas fa-bell me-2"></i>Notification Settings
|
||||
</h5>
|
||||
<button class="btn btn-sm btn-outline-primary" onclick="testNotifications()">
|
||||
<i class="fas fa-test-tube me-1"></i>Test Notifications
|
||||
</button>
|
||||
</div>
|
||||
<div class="card-body">
|
||||
<div class="row">
|
||||
<div class="col-md-6">
|
||||
<h6>
|
||||
<i class="fas fa-envelope me-2"></i>Email Notifications
|
||||
{% if config.notification_config.email.enabled %}
|
||||
<span class="badge bg-success">Enabled</span>
|
||||
{% else %}
|
||||
<span class="badge bg-secondary">Disabled</span>
|
||||
{% endif %}
|
||||
</h6>
|
||||
{% if config.notification_config.email.enabled %}
|
||||
<ul class="list-unstyled small">
|
||||
<li><strong>SMTP Server:</strong> {{ config.notification_config.email.smtp_server }}</li>
|
||||
<li><strong>Port:</strong> {{ config.notification_config.email.smtp_port }}</li>
|
||||
<li><strong>Sender:</strong> {{ config.notification_config.email.sender_email }}</li>
|
||||
<li><strong>Recipient:</strong> {{ config.notification_config.email.recipient_email }}</li>
|
||||
</ul>
|
||||
{% else %}
|
||||
<p class="text-muted small">Email notifications are disabled. Configure in config.json to enable.</p>
|
||||
{% endif %}
|
||||
</div>
|
||||
<div class="col-md-6">
|
||||
<h6>
|
||||
<i class="fas fa-webhook me-2"></i>Webhook Notifications
|
||||
{% if config.notification_config.webhook.enabled %}
|
||||
<span class="badge bg-success">Enabled</span>
|
||||
{% else %}
|
||||
<span class="badge bg-secondary">Disabled</span>
|
||||
{% endif %}
|
||||
</h6>
|
||||
{% if config.notification_config.webhook.enabled %}
|
||||
<p class="small">
|
||||
<strong>Webhook URL:</strong><br>
|
||||
<code>{{ config.notification_config.webhook.url }}</code>
|
||||
</p>
|
||||
{% else %}
|
||||
<p class="text-muted small">Webhook notifications are disabled. Configure in config.json to enable.</p>
|
||||
{% endif %}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Database Information -->
|
||||
<div class="card mb-4">
|
||||
<div class="card-header">
|
||||
<h5 class="mb-0">
|
||||
<i class="fas fa-database me-2"></i>Database Information
|
||||
</h5>
|
||||
</div>
|
||||
<div class="card-body">
|
||||
<p><strong>Database Path:</strong> <code>{{ config.database_path }}</code></p>
|
||||
<p class="text-muted small">
|
||||
The SQLite database stores all product information and price history.
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Quick Actions -->
|
||||
<div class="col-lg-4">
|
||||
<div class="card mb-4">
|
||||
<div class="card-header">
|
||||
<h5 class="mb-0">
|
||||
<i class="fas fa-tools me-2"></i>Quick Actions
|
||||
</h5>
|
||||
</div>
|
||||
<div class="card-body">
|
||||
<div class="d-grid gap-2">
|
||||
<button class="btn btn-primary" onclick="scrapeAll()">
|
||||
<i class="fas fa-sync-alt me-2"></i>Scrape All Products
|
||||
</button>
|
||||
<button class="btn btn-info" onclick="testNotifications()">
|
||||
<i class="fas fa-bell me-2"></i>Test Notifications
|
||||
</button>
|
||||
<button class="btn btn-secondary" onclick="checkSystemHealth()">
|
||||
<i class="fas fa-heartbeat me-2"></i>System Health Check
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Configuration Help -->
|
||||
<div class="card">
|
||||
<div class="card-header">
|
||||
<h5 class="mb-0">
|
||||
<i class="fas fa-question-circle me-2"></i>Configuration Help
|
||||
</h5>
|
||||
</div>
|
||||
<div class="card-body">
|
||||
<h6>Configuration File</h6>
|
||||
<p class="small text-muted">
|
||||
Settings are stored in <code>config.json</code>.
|
||||
Edit this file to customize scraping behavior, add new sites, or configure notifications.
|
||||
</p>
|
||||
|
||||
<h6>Adding New Sites</h6>
|
||||
<p class="small text-muted">
|
||||
To add support for new e-commerce sites, add a new section to the "sites"
|
||||
configuration with CSS selectors for price, title, and availability.
|
||||
</p>
|
||||
|
||||
<h6>Email Setup</h6>
|
||||
<p class="small text-muted">
|
||||
For Gmail, use <code>smtp.gmail.com:587</code> and an app-specific password.
|
||||
Enable "Less secure app access" or use OAuth2.
|
||||
</p>
|
||||
|
||||
<h6>Webhooks</h6>
|
||||
<p class="small text-muted">
|
||||
Webhook notifications send JSON payloads to your specified URL.
|
||||
Useful for integrating with Slack, Discord, or custom applications.
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
{% endblock %}
|
||||
|
||||
{% block scripts %}
|
||||
<script>
|
||||
function checkSystemHealth() {
|
||||
alert('System health check functionality would be implemented here.');
|
||||
// This would make an API call to check system health
|
||||
}
|
||||
</script>
|
||||
{% endblock %}
|
||||
Reference in New Issue
Block a user