Files
price-tracker/SCRAPER_ARCHITECTURE.md
Oli Passey 5726183115 scrape fix
2025-06-27 17:25:56 +01:00

81 lines
2.9 KiB
Markdown

# Price Tracker - Scraper Architecture
## Current Structure
### 1. **`scraper.py` - Base Scraper Class**
- **Purpose**: Foundation class for all price scraping
- **Handles**: Generic e-commerce sites (Amazon.com, eBay, Walmart, etc.)
- **Key Features**:
- Base `PriceScraper` class with HTTP session management
- Anti-bot measures (headers, delays, retries)
- Generic price extraction methods
- Site detection logic
### 2. **`uk_scraper.py` - UK Catering Specialist**
- **Purpose**: Specialized scraper for UK catering supply websites
- **Handles**: JJ Food Service, A to Z Catering, Amazon UK
- **Key Features**:
- Inherits from `PriceScraper` base class
- UK currency handling (£ symbol)
- Delivery vs Collection price prioritization
- Special pricing detection (offers, strikethrough, was/now pricing)
- Site-specific CSS selectors (e.g., `.my-price.price-offer` for A to Z)
### 3. **`scraper_manager.py` - Orchestration Layer**
- **Purpose**: Routes scraping tasks to appropriate scrapers
- **Logic**:
- Detects UK catering sites → uses `UKCateringScraper`
- Detects other sites → uses base `PriceScraper`
- Manages concurrent requests and error handling
## Site Mapping
### UK Catering Sites (UKCateringScraper):
- `jjfoodservice` → JJ Food Service
- `atoz_catering` → A to Z Catering
- `amazon_uk` → Amazon UK
### International Sites (PriceScraper):
- `amazon` → Amazon.com
- `ebay` → eBay
- `walmart` → Walmart
- *(Future sites can be added here)*
## Key Benefits of Current Structure
**Separation of Concerns**: UK-specific logic is isolated
**Extensibility**: Easy to add new UK sites or international sites
**Maintainability**: Changes to UK logic don't affect international scraping
**Specialization**: UK scraper handles currency, delivery pricing, special offers
## Recommendations
### ✅ **KEEP CURRENT STRUCTURE** - It's well-designed!
The separation between `scraper.py` and `uk_scraper.py` is actually **good architecture** because:
1. **UK catering sites have unique requirements** (delivery vs collection, £ pricing, special offers)
2. **International sites have different patterns** (USD pricing, different site structures)
3. **Easy to maintain and extend** each scraper independently
### Minor Improvements Made:
1. **Enhanced site detection** in base scraper
2. **Added helper methods** to determine scraper routing
3. **Improved scraper manager** logic for clarity
4. **Fixed A to Z pricing** with `.my-price.price-offer` selector
## Final File Structure
```
src/
├── scraper.py # Base scraper (international sites)
├── uk_scraper.py # UK catering specialist
├── scraper_manager.py # Orchestration layer
├── config.py # Configuration management
├── database.py # Data persistence
└── web_ui.py # Flask web interface
```
This structure supports both current UK catering needs and future expansion to international e-commerce sites.