Difference between revisions of "DigI:TI1.1"
From its-wiki.no
Josef.Noll (Talk | contribs) (→1.1.- Whitelist) |
Josef.Noll (Talk | contribs) (→1.2 Blacklist of already known Content Delivery Network (CDN) addresses) |
||
Line 80: | Line 80: | ||
=== 1.2 Blacklist of already known Content Delivery Network (CDN) addresses === | === 1.2 Blacklist of already known Content Delivery Network (CDN) addresses === | ||
+ | Blacklist is the open world approach. A potential starting point is to use the topp 500 web pages (national, international,....) and analyse them ''in depth''. Which Web pages are they calling ("all levels below"). This should give us an overview over 90%(?) of the traffic. Strategy: you measure upcoming new web sites, and their traffic, and if the traffic exceeds xxx MB, then you analyse | ||
+ | |||
Pros: | Pros: | ||
* The second easiest one to implement. | * The second easiest one to implement. |
Revision as of 11:13, 21 September 2018
Digital Inclusion (DigI) | |||||||
---|---|---|---|---|---|---|---|
|
T1.1 Low-cost infrastructure
Task Title | Low-cost infrastructure for InfoIntenet |
---|---|
WP | DigI:WP-I1 |
Lead partner | Basic Internet Foundation |
Leader | |
Contributors | BasicInternet |
edit this task |
Contents
- 1 T1.1 Low-cost infrastructure
- 2 Deliverables in T1.1 Low-cost infrastructure
- 3 Equipment supplier
- 4 Content filtering
- 4.1 Methods
- 4.2 PROS & CONS
- 4.2.1 1.1.- Whitelist
- 4.2.2 1.2 Blacklist of already known Content Delivery Network (CDN) addresses
- 4.2.3 1.3.- RouterOS L7 filter
- 4.2.4 2.1.- Web crawler
- 4.2.5 3.1. Commercial proxy/firewall to filter by Content-Type
- 4.2.6 3.2.- Open-Source proxy/firewall to filter by Content-Type
- 4.2.7 4.1.- Traffic pattern based connection filtering
- 5 New ideas
- 6 Cost calculation
Objective
This task will establish the architecture for low cost access, including:
- cost calculation for TZ and Congo
Category:Task |
Deliverables in T1.1 Low-cost infrastructure
Equipment supplier
see DigI:TI1.2 for pilot installations
Content filtering
by Iñaki GaritanoWED SEP 12 Basic understanding of InfoInternet standard:
- Text & pictures: allowed
- Streaming, games, high-bandwidth content
The way to filter is known from the security industry, a.o. Palo Alto Networks. However, their solution is focussing on security, and not on low-cost provision of information.
Required:
- Roadmap to reach the InfoInternet standard
- Today: whilelist, blacklist, content metadata
- tommorrow: automatic analysis (either be real-time or off-line)
- Final InfoInternet standard: Public Database supporting local filtering
Methods
- Decentralized = each Mikrotik has to do something.
- Centralized = all traffic (at least the unauthenticated one) goes through the Basic Internet core.
Methods, ordered by centralized/decentralized filtering plus difficulty/time to implement:
1.- Decentralized filtering
- 1.1.- Whitelist
- 1.2.- Blacklist of already known Content Delivery Network (CDN), addresses (Akamai, Cloudfare, CloudFront, Wowza, IBM Cloud Video, Livestream, DaCast, etc.)
- 1.3.- RouterOS L7 filter
2.- Semi centralized/decentralized - some actions have to be done in the core while others in the Mikrotiks
- 2.1.- Web crawler to analyze requested web pages and populate the blacklists
3.- Centralized filtering
- 3.1.- Commercial proxy/firewall to filter by Content-Type
- 3.2.- Open-Source proxy/firewall to filter by Content-Type
4.- Needs more research because maybe could be done decentralized
- 4.1.- Traffic pattern based connection filtering
PROS & CONS
1.- Decentralized filtering
- Cons:
- Mikrotik devices have to be populated by new configuration updates.
- Some performance overhead may occur. Would be interesting to somehow measure it.
- overhead is very small, we can easily handle 60 GByte/day on an RB960 (RDB952) - only when traffic is tagged and measured
2.- Semi centralized/decentralized - some actions have to be done in the core while others in the Mikrotiks
- Cons:
- Core infrastructure needs to be prepared. % core infrastructure is in place. Whitelist is centrally located (owncloud), and populated to the LNCC
- Mikrotik devices have to be populated by new configuration updates.
- Some performance overhead may occur. Would be interesting to somehow measure it.
3.- Centralized filtering
- Cons:
- All traffic needs to go through a centralized device.
- % applicable to some traffic, not all traffic - not suitable for all traffic, as the backbone traffic is the main cost (and other topics such as virus filtering, internation traffic...)
1.1.- Whitelist
White listing is the closed world approach, which is easier to manage
Pros:
- The easiest one to implement.
- Allows to reduce most of the traffic.
Cons:
- The most restrictive one.
- Requires to analyze the content of each web page.
- Dynamically generated web pages such as Facebook have to be blocked because is not possible to analyze their content beforehand.
- not completely right, as facebook uses video servers, which can be blocked
1.2 Blacklist of already known Content Delivery Network (CDN) addresses
Blacklist is the open world approach. A potential starting point is to use the topp 500 web pages (national, international,....) and analyse them in depth. Which Web pages are they calling ("all levels below"). This should give us an overview over 90%(?) of the traffic. Strategy: you measure upcoming new web sites, and their traffic, and if the traffic exceeds xxx MB, then you analyse
Pros:
- The second easiest one to implement.
- Allows to reduce well known CDNs' traffic.
Cons:
- Video/Audio content delivered through not known CDNs or other addresses is not filtered.
- Requires to analyze and update the addresses of CDNs %all the time
- hard to catch new CDNs
1.3.- RouterOS L7 filter
Pros:
- Many interesting filters are already created.
- https://wiki.mikrotik.com/wiki/Manual:IP/Firewall/L7
- http://l7-filter.sourceforge.net/protocols
- Easy to implement.
- Could be implemented together with 1.1 and 1.2 solutions.
Cons:
- Only unencrypted HTTP can be matched. NOT HTTPS.
- Not 100% reliable.
2.1.- Web crawler
to analyze requested web pages and populate the blacklists
Pros:
- Could be combined with 1.1, 1.2 and 1.3.
Cons:
- Dynamically generated web pages cannot be partially filtered. Such as login based pages.
- Not 100% reliable.
- Requires many resources to analyze web pages.
3.1. Commercial proxy/firewall to filter by Content-Type
Pros:
- Easy to implement.
- Able to filter even HTTPS connections.
Cons:
- All traffic needs to be centralized.
- Price.
- Need to perform a man in the middle for HTTPS connections.
- Even if it is paid most probably will not block 100% of not desired traffic.
3.2.- Open-Source proxy/firewall to filter by Content-Type
Examples:
- L7-filter - http://l7-filter.clearos.com/
- nDPI - https://www.ntop.org/products/deep-packet-inspection/ndpi/
- OpenDPI - https://github.com/thomasbhatia/OpenDPI
Pros:
- Cheap.
Cons:
- All traffic needs to be centralized.
- Need to perform a man in the middle for HTTPS connections.
4.1.- Traffic pattern based connection filtering
Research topic
- Pros: - Works either for HTTP and HTTPS - Cons: - Traffic patterns need to be generated for different content-type, bandwidth, etc. - Final implementation on Mikrotiks needs to be analyzed. - If not possible, all traffic would need to be centralized
=================================
IMPLEMENTATION PLAN: 1.1.- Whitelist - Done. 1.2.- Blacklist of already known Content Delivery Network (CDN) addresses (Akamai, Cloudfare, CloudFront, Wowza, IBM Cloud Video, Livestream, DaCast, etc.) - Done. 1.3.- RouterOS L7 filter - I would need a student to try different filters and check how it performs. - Filter updating scripts would need to be generated. - Mikrotik performance impact would have to be measured.
2.1.- Web crawler to analyze requested web pages and populate the blacklists. - Different crawlers such as Apache Nutch have to be analyzed. - Scripts to get DNS requests for later analysis have to be developed.
3.1.- Commercial proxy/firewall to filter by Content-Type - Topology needs to be changed to centralize all traffic or at least the unauthenticated one. - Device needs to be configured.
3.2.- Open-Source proxy/firewall to filter by Content-Type - Different proxy/firewall solutions have to be analyzed to select those performing well. - Topology needs to be changed to centralize all traffic or at least the unauthenticated one. - Device needs to be configured.
4.1.- Traffic pattern based connection filtering - This will require a bachelor or master thesis to analyze traffic patterns and create a lightweight content based filter. - Analyze if it is possible to implement the content filter on the Mikrotiks.
Josef, I would like to further discuss with you all these ideas. In the mean time at Mondragon we will continue with the multi-language voucher platform development and virtually duplicating the infrastructure.
Best regards,
-- Iñaki Garitano Data Analysis and Cybersecurity Electronics and Computing Department Mondragon University - Faculty of Engineering Goiru, 2; 20500 Arrasate - Mondragón (Gipuzkoa), Spain Tel. : +(34) 647503682 / +(34) 943794700 + Ext. 8119 www.mondragon.edu www.garitano.info / www.garitano.eu
@mention a user or group to share this mail. Content-Type / Media-Type / MIME filtering 6 garitano
Here is your Smart Chat (Ctrl+Space)
New ideas
QR code for voucher access, alternative: SMS
- generated from http://blog.qr4.nl/QR-Code-WiFi.aspx
Cost calculation
Calculations of costs, using TZ as example (owncloud confidential) https://owncloud.unik.no/index.php/apps/files/ajax/download.php?dir=%2F1-Projects%2FBasicInternet%2FTechnology%2FCost-Infrastructure&files=Infra_cost_Template_Tz.xlsx