DigI:TI1.1

From its-wiki.no

Jump to: navigation, search

T1.1 Low-cost infrastructure

Task Title Low-cost infrastructure for InfoIntenet
WP DigI:WP-I1
Lead partner Basic Internet Foundation
Leader
Contributors BasicInternet
edit this task

Objective

This task will establish the architecture for low cost access, including:

  • cost calculation for TZ and Congo
Category:Task


Deliverables in T1.1 Low-cost infrastructure

Add Deliverable


Equipment supplier

see DigI:TI1.2 for pilot installations


Content filtering

by Iñaki Garitano (12Sep2018)
Basic understanding of InfoInternet standard:

  • Text & pictures: allowed
  • Streaming, games, high-bandwidth content

The way to filter is known from the security industry, a.o. Palo Alto Networks. However, their solution is focussing on security, and not on low-cost provision of information.

Required:

  • Roadmap to reach the InfoInternet standard
  • Today: whilelist, blacklist, content metadata
  • tommorrow: automatic analysis (either be real-time or off-line)
  • Final InfoInternet standard: Public Database supporting local filtering

Methods

  • Decentralized = each Mikrotik has to do something.
  • Centralized = all traffic (at least the unauthenticated one) goes through the Basic Internet core.

Methods, ordered by centralized/decentralized filtering plus difficulty/time to implement:

1.- Decentralized filtering

  • 1.1.- Whitelist
  • 1.2.- Blacklist of already known Content Delivery Network (CDN), addresses (Akamai, Cloudfare, CloudFront, Wowza, IBM Cloud Video, Livestream, DaCast, etc.)
  • 1.3.- RouterOS L7 filter

2.- Semi centralized/decentralized - some actions have to be done in the core while others in the Mikrotiks

  • 2.1.- Web crawler to analyze requested web pages and populate the blacklists

3.- Centralized filtering

  • 3.1.- Commercial proxy/firewall to filter by Content-Type
  • 3.2.- Open-Source proxy/firewall to filter by Content-Type

4.- Needs more research because maybe could be done decentralized

  • 4.1.- Traffic pattern based connection filtering

PROS & CONS

1.- Decentralized filtering

  • Cons:
  • Mikrotik devices have to be populated by new configuration updates.
  • Some performance overhead may occur. Would be interesting to somehow measure it.
  • overhead is very small, we can easily handle ~70 Mbit/s (including 25 rules firewall, results in 60 GByte/day) on an RB960 (RDB952 - max 900 Mbit/s) - only when traffic is tagged and measured

2.- Semi centralized/decentralized - some actions have to be done in the core while others in the Mikrotiks

  • Cons:
  • Core infrastructure needs to be prepared. % core infrastructure is in place. Whitelist is centrally located (owncloud), and populated to the LNCC
  • Mikrotik devices have to be populated by new configuration updates.
  • Some performance overhead may occur. Would be interesting to somehow measure it.

3.- Centralized filtering

  • Cons:
  • All traffic needs to go through a centralized device.
  • % applicable to some traffic, not all traffic - not suitable for all traffic, as the backbone traffic is the main cost (and other topics such as virus filtering, internation traffic...)

1.1.- Whitelist

White listing is the closed world approach, which is easier to manage

Pros:

  • The easiest one to implement.
  • Allows to reduce most of the traffic.

Cons:

  • The most restrictive one.
  • Requires to analyze the content of each web page.
  • Dynamically generated web pages such as Facebook have to be blocked because is not possible to analyze their content beforehand.
  • not completely right, as facebook uses video servers, which can be blocked

1.2 Blacklist of already known Content Delivery Network (CDN) addresses

Blacklist is the open world approach. A potential starting point is to use the topp 500 web pages (national, international,....) and analyse them in depth. Which Web pages are they calling ("all levels below"). This should give us an overview over 90%(?) of the traffic. Strategy: you measure upcoming new web sites, and their traffic, and if the traffic exceeds xxx MB, then you analyse. Might need to tagg the "known" web pages, and then measure the traffic of the "not-known" web pages.

Pros:

  • The second easiest one to implement.
  • Allows to reduce well known CDNs' traffic.

Cons:

  • Video/Audio content delivered through not known CDNs or other addresses is not filtered.
  • Requires to analyze and update the addresses of CDNs %all the time
  • hard to catch new CDNs

Conclusion: worthwhile a trial in India (or Kinderdorf)

  • Both black and white-listing will increase the number of addresses, and will at one place hit the capacity of the Mikrotik equipment
  • 20.000 addresses is okay for RBD952

1.3.- RouterOS L7 filter

Pros:

Cons:

  • Only unencrypted HTTP can be matched. NOT HTTPS, example YouTube does not work.
  • Not 100% reliable.

Conclusions:

  • as more and more traffic is moved into https, it is not advisable to use L7-filtering

2.1.- Web crawler

to analyze requested web pages and populate the blacklists

Pros:

  • Could be combined with 1.1, 1.2 and 1.3.

Cons:

  • Dynamically generated web pages cannot be partially filtered. Such as login based pages.
  • Not 100% reliable.
  • Requires many resources to analyze web pages.

Conclusion:

  • establish a cloud infrastructure for the filtering.
  • for the "login/https" pages, use the blacklist approach

3.1. Commercial proxy/firewall to filter by Content-Type

A typical example of such an approach is the security filtering by Palo Alto Networks.

Pros:

  • Easy to implement. % you buy a device which performs the filtering
  • Able to filter even HTTPS connections.

Cons:

  • All traffic needs to be centralized.
  • Price.
  • Need to perform a man in the middle for HTTPS connections.
  • Even if it is paid most probably will not block 100% of not desired traffic.

3.2.- Open-Source proxy/firewall to filter by Content-Type

Examples:

Pros:

  • Cheap.

Cons:

  • All traffic needs to be centralized.
  • Need to perform a man in the middle for HTTPS connections.

Conclusions:

  • need further work from our side

4.1.- Traffic pattern based connection filtering

Research topic

Pros:

  • Works either for HTTP and HTTPS

Cons:

  • Traffic patterns need to be generated for different content-type, bandwidth, etc.
  • Final implementation on Mikrotiks needs to be analyzed.
  • If not possible, all traffic would need to be centralized

IMPLEMENTATION PLAN

1.1.- Whitelist

  • Done.

1.2.- Blacklist of already known Content Delivery Network (CDN)

addresses (Akamai, Cloudfare, CloudFront, Wowza, IBM Cloud Video, Livestream, DaCast, etc.)

  • Done in principle
  • not automated for new upcoming CDN servers

1.3.- RouterOS L7 filter

  • Need student work / industrial development, addressing the evaluation of different filters and check how it performs.
  • Filter updating scripts would need to be generated.
  • Mikrotik performance impact would have to be measured.

2.1.- Web crawler to analyze requested web pages and populate the blacklists.

Topics to do/outsource:

  • Different crawlers such as Apache Nutch have to be analyzed.
  • Scripts to get DNS requests for later analysis have to be developed.
  • need support team for long-term sustainability

3.1.- Commercial proxy/firewall to filter by Content-Type

  • Topology needs to be changed to centralize all traffic or at least the unauthenticated one.
  • Device needs to be configured.
  • Cons: probably not scalable to fit the low-cost market which we address

3.2.- Open-Source proxy/firewall to filter by Content-Type

  • Different proxy/firewall solutions have to be analyzed to select those performing well.
  • Topology needs to be changed to centralize all traffic or at least the unauthenticated one.
  • Device needs to be configured.
  • Should be combined with traffic pattern, e.g. to analyse video buffering, and then reduces speed to those sites. Could go together with Mikrotik rule to limit traffic to e.g. 500 kbit/s

4.1.- Traffic pattern based connection filtering

  • This will require a bachelor or master thesis to analyze traffic patterns and create a lightweight content based filter.
  • Analyze if it is possible to implement the content filter on the Mikrotiks.

Conclusion

Now:

  • continue with the white list filtering,
  • develop the web analysis (2.1 - which web pages are called)
  • analyse the web pages for content (3.2 filtering plus 4.2 traffic pattern, evtl Mikrotik throughput reduction)


Medium term:

  • test the black-list approach, by starting with the top 500++ Web pages (2.1) and those ones being called
  • add L7 filtering approach in Mikrotik (needs to be implemented and tested, did not work out at Kjeller)

Longer term:

  • combined analysis of 1.2 black-list, 1.3 plus 3.2 and 4.1

New ideas

QR code scanning for wifi access code

QR code for voucher access, alternative: SMS

Cost calculation

Calculations of costs, using TZ as example (owncloud confidential) https://owncloud.unik.no/index.php/apps/files/ajax/download.php?dir=%2F1-Projects%2FBasicInternet%2FTechnology%2FCost-Infrastructure&files=Infra_cost_Template_Tz.xlsx