• A big Package of “Architecture” Principles/Manifesto

    Define a set of guiding principles is an important first step of any strategy

    —- Cloud Strategy from Gregor Hohpe.

    1. Agile manifesto

    We are uncovering better ways of developing
    software by doing it and helping others do it.
    Through this work we have come to value:

    1. Individuals and interactions over processes and tools
    2. Working software over comprehensive documentation
    3. Customer collaboration over contract negotiation
    4. Responding to change over following a plan

    That is, while there is value in the items on
    the right, we value the items on the left more.

    https://agilemanifesto.org/iso/en/manifesto.html

    3. 21 principles of enterprise architecture

    Four categories of principles

    • General principles
    • Information principles
    • Application principles
    • Technology principles

    General principles

    1. IT and business alignment
    2. Maximum benefits at the lowest cost and risk
    3. Business continuity
    4. Compliance with standards and policies
    5. Adoption of the best practices for the market

    Information principles

    1. Information treated as an asset
    2. Shared information
    3. Accessible information
    4. Common terminology and data definitions
    5. Information security

    Application principles

    1. Technological independence
    2. Easy-to-use applications
    3. Component reusability and simplicity
    4. Adaptability and flexibility
    5. Convergence with the enterprise architecture
    6. Enterprise architecture also applies to external applications
    7. Low-coupling interfaces
    8. Adherence to functional domains

    Technology principles

    1. Changes based on requirements
    2. Control of technical diversity and suppliers
    3. Interoperability

    https://developer.ibm.com/articles/enterprise-architecture-financial-sector/

    3. 6 pillars of AWS Architectured framework

    1. Operational Excellence
    2. Security
    3. Reliability
    4. Performance Efficiency
    5. Cost Optimization
    6. Sustainability

    https://aws.amazon.com/cn/blogs/apn/the-6-pillars-of-the-aws-well-architected-framework/

    4. The Twelve Factors

    1. Codebase
      One codebase tracked in revision control, many deploys
    2. Dependencies
      Explicitly declare and isolate dependencies
    3. Config
      Store config in the environment
    4. Backing services
      Treat backing services as attached resources
    5. Build, release, run
      Strictly separate build and run stages
    6. Processes
      Execute the app as one or more stateless processes
    7. Port binding
      Export services via port binding
    8. Concurrency
      Scale out via the process model
    9. Disposability
      Maximize robustness with fast startup and graceful shutdown
    10. Dev/prod parity
      Keep development, staging, and production as similar as possible
    11. Logs
      Treat logs as event streams
    12. Admin processes
      Run admin/management tasks as one-off processes

    https://12factor.net/

    5. Manifesto for software craftsmanship

    As aspiring Software Craftsmen we are raising the bar of professional software development by practicing it and helping others learn the craft. Through this work we have come to value:

    1. Not only working software, but also well-crafted software.
    2. Not only responding to change, but also steadily adding value
    3. Not only individuals and interactions, but also a community of professionals
    4. Not only customer collaboration, but also productive partnerships

    That is, in pursuit of the items on the left we have found the items on the right to be indispensable.

    https://manifesto.softwarecraftsmanship.org/#/en

  • Business is constantly changing. How to design a scalable python application in the data scientist and data engineering world?

    It is no one mature enterprise-level framework in python data world compared Java’s Enterprise frameworks like springboot, microprofile, etc…. but I try to use dependency Injection and configuration pattern to clean python code.

    Use Case

    There is a nlp processing in financial project, but with more different business case in need, configuration and dependencies becomes overwriting and missing conf or cyclic dependency in python.

    We build an NLP example service following the dependency injection principle. It consists of several services with a NLP domain logic. The services have dependencies on database & storage by different providers. In the meanwhile, the configuration can also be Inheritance by python @dataclass and supported hydra.cc framework.

    Refactoring

    high cohesion and loose coupling recap.

    1. Coupling is the degree of interdependence between software modules, tightly coupling modules can be solved by dependency Injection pattern.

    2. Complicated Configuration can be extend, validate, Inheritance by Hydra.cc and pydantic framework.

    Configuration

    Assembling Processing

    there are 3 main steps to handle configurations

    1. Load configuration from YAML into python dataclass by Hydra.cc framework which developer by facebook team.
    2. Using Pydantic library’s annotation(@validate) to check the value of your configuration
    3. Set the configuration into container, container injects this configuration into different services easily. DI library called python dependency injection
    from pydantic import validator
    from pydantic.dataclasses import dataclass
    
    @dataclass
    class MySQLConfig:
        driver: str
        user: str
        port: int
        password: str
    
        @validator('port', pre=True)
        def check_port(cls, port):
            if port < 1024:
                raise Exception(f"Port:{port} < 1024 is forbidden ")
            return port
    

    Personally experience, I prefer to dataclass instead of pydantic BaseModel. In fact pydantic has pydantic.dataclasses which looks like dataclass and support @valiator annotation.

    Extra read, I only use simple config in demo case. If you like to extend the complicate and separate YAML, please check in detail https://hydra.cc/docs/tutorials/structured_config/hierarchical_static_config/ like this

    from dataclasses import dataclass
    import hydra
    from hydra.core.config_store import ConfigStore
    @dataclass
    class MySQLConfig:
        host: str = "localhost"
        port: int = 3306
    
    @dataclass
    class UserInterface:
        title: str = "My app"
        width: int = 1024
        height: int = 768
    
    @dataclass
    class MyConfig:
        db: MySQLConfig = MySQLConfig()
        ui: UserInterface = UserInterface()
    
    cs = ConfigStore.instance()
    cs.store(name="config", node=MyConfig)
    
    @hydra.main(config_path=None, config_name="config")
    def my_app(cfg: MyConfig) -> None:
        print(f"Title={cfg.ui.title}, size={cfg.ui.width}x{cfg.ui.height} pixels")
    
    if __name__ == "__main__":
        my_app()
    

    Hydra is quite impressive by its feature, you can check this youtube video “Configuration Management For Data Science Made Easy With Hydra

    Application Structure

    ./
    ├── src/
    │   ├── __init__.py
    │   ├── containers.py
    │   ├── gateway.py
    │   └── services.py
    ├── config.yaml
    ├── __main__.py
    └── requirements.txt
    
    https://github.com/wuqunfei/python-di-config

    Gateways

    from abc import ABC, abstractmethod
    from loguru import logger
    
    class DatabaseGateway(ABC):
        def __init__(self):
            ...
        @abstractmethod
        def save(self):
            ...
    class MysqlGateway(DatabaseGateway):
        def __init__(self):
            ...
        def save(self):
            logger.info("Saved in Mysql")
    
    class PostgresqlGateway(DatabaseGateway):
        def __init__(self):
            ...
        def save(self):
            logger.info("Saved in Postgresql")
    
    class ObjectStorageGateway(ABC):
        def __init__(self):
            ...
        @abstractmethod
        def download(self):
            ...
    
    class S3GateWay(ObjectStorageGateway):
        def download(self):
            logger.info("download from AWS S3 blob Storage")
    
    class AzureStoreGateWay(ObjectStorageGateway):
        def download(self):
            logger.info("download from Azure Object Storage")
    
    

    Services

    from abc import ABC, abstractmethod
    from loguru import logger
    
    from src.gateway import DatabaseGateway, ObjectStorageGateway
    
    class AbstractNLPService(ABC):
        def __init__(self, config: dict):
            self.config = config
    
        @abstractmethod
        def ocr_preprocess(self):
            ...
        @abstractmethod
        def tokenizer(self):
            ...
        @abstractmethod
        def chunker(self):
            ...
        @abstractmethod
        def post_process(self):
            ...
        def run_nlp(self):
            self.ocr_preprocess()
            self.tokenizer()
            self.chunker()
            self.post_process()
    
    class BankNLPService(AbstractNLPService):
        def __init__(self,
                     config: dict,
                     db_gateway: DatabaseGateway,
                     storage_gateway: ObjectStorageGateway):
            super().__init__(config)
            self.db_gateway = db_gateway
            self.storage_gateway = storage_gateway
    
        def ocr_preprocess(self):
            self.storage_gateway.download()
            logger.info(f"{self.__class__.__name__} OCR preprocess done")
    
        def tokenizer(self):
            logger.info(f"{self.__class__.__name__} Tokenizer done")
    
        def chunker(self):
            logger.info(f"{self.__class__.__name__} Chunker done")
    
        def post_process(self):
            logger.info(f"{self.__class__.__name__} post process done")
            logger.info(self.config)
            self.db_gateway.save()
    
    
    class InsuranceNLPService(AbstractNLPService):
        def __init__(self, config: dict):
            super().__init__(config)
    
        def ocr_preprocess(self):
            logger.info(f"{self.__class__.__name__} OCR preprocess done")
    
        def tokenizer(self):
            logger.info(f"{self.__class__.__name__} Tokenizer done")
    
        def chunker(self):
            logger.info(f"{self.__class__.__name__} Chunker done")
    
        def post_process(self):
            logger.info(f"{self.__class__.__name__} post process done")
    
        @abstractmethod
        def get_risk(self):
            ...
    
    
    class LifeNLPService(InsuranceNLPService):
        def get_risk(self):
            logger.info(f"{self.__class__.__name__} risk score 1.0 done")
    
    class CarNLPService(InsuranceNLPService):
        def get_risk(self):
            logger.info(f"{self.__class__.__name__} risk score 2.0 done")
    

    Container

    class MyContainer(containers.DeclarativeContainer):
        config = providers.Configuration()
        '''Gateways as singleton'''
        mysql_gateway: DatabaseGateway = providers.Singleton(
            MysqlGateway
        )
        s3_gateway: ObjectStorageGateway = providers.Singleton(
            S3GateWay
        )
        '''Services factory '''
        nlp_service_factory: AbstractNLPService = providers.Factory(
            BankNLPService,
            config=config,
            db_gateway=mysql_gateway,
            storage_gateway=s3_gateway
    
        )
        life_nlp_factory: AbstractNLPService = providers.Factory(
            LifeNLPService,
            config=config
        )
        car_nlp_factory: AbstractNLPService = providers.Factory(
            CarNLPService,
            config=config
        )
    

    Main Function

    Let’s put all together and run it

    @hydra.main(config_path="", config_name="config")
    def my_app(cfg: MySQLConfig) -> None:
        """1. to get config yaml by hydra"""
        cfg_dict = dict(cfg)
        """2. to validate the configuration by pydantic"""
        MySQLConfig(**cfg_dict)
        container = MyContainer()
        """3. to load configuration into container"""
        container.config.from_dict(cfg_dict)
        nlp = container.nlp_service_factory()
        nlp.run_nlp()
    
    if __name__ == "__main__":
        my_app()
    

    Finally Results

    2022-02-10 20:11:59.873 | INFO     | src.gateway:download:43 - download from AWS S3 blob Storage
    2022-02-10 20:11:59.873 | INFO     | src.services:ocr_preprocess:48 - BankNLPService OCR preprocess done
    2022-02-10 20:11:59.873 | INFO     | src.services:tokenizer:51 - BankNLPService Tokenizer done
    2022-02-10 20:11:59.873 | INFO     | src.services:chunker:54 - BankNLPService Chunker done
    2022-02-10 20:11:59.873 | INFO     | src.services:post_process:57 - BankNLPService post process done
    2022-02-10 20:11:59.873 | INFO     | src.services:post_process:58 - {'driver': 'mydriver', 'user': 'root', 'port': 3306, 'password': 'foobar'}
    2022-02-10 20:11:59.873 | INFO     | src.gateway:save:20 - Saved in Mysql
    

    It is a simple practise to clean python code with 3 libraries

    1. https://python-dependency-injector.ets-labs.org/
    2. https://hydra.cc/docs
    3. https://pydantic-docs.helpmanual.io/usage/dataclasses/

    Source Code address, hope it inspires us to write clean python code.

    git clone git@github.com:wuqunfei/python-di-config.git
    
  • Personal Evolution & Meaningful relationships are rewards

    The things we strive for are just the bait… the struggle to get them with people that we care about gives us the personal evolution and the meaningful relationships that are the real rewards. —— Principles: Life and Work by Ray Dalio

    In the end, I no logger wanted to get to the other side of the jungle to reach the rewards. I instead wanted to stay in the jungle, struggling to be successful with people I cared about.

  • Bellman equation, named after Richard E. Bellman, is a necessary condition for optimality associated with the mathematical optimization method known as dynamic programming.[1] It writes the “value” of a decision problem at a certain point in time in terms of the payoff from some initial choices and the “value” of the remaining decision problem that results from those initial choices.[citation needed] This breaks a dynamic optimization problem into a sequence of simpler subproblems, as Bellman’s “principle of optimality” prescribes.[2]

    https://en.wikipedia.org/wiki/Bellman_equation

    In English understanding,

    1. Identify the Subproblems/Subphase in Time or Space
    2. clearly express the recurrence relation with sub problems
    3. Identify variables of states
    4. State transition equation (Bellman_equation)
    5. Define optimisation value Function/Lost function with boundary

    In Chinese understanding,

    1. 识别问题的多阶段性特征
    2. 将问题分解成递推关系式联系起来的若干个子阶段
    3. 正确的状态变量
    4. 正确的定义状态转移方程
    5. 找到最优的指标函数的递推关系及边界条件

    In Programming understanding,

    1. Define the state(s).
    2. Define the recurrence relation(s).
    3. List all the state(s) transitions with their respective conditions.
    4. Define the base case(s).
    5. Implement a naive recursive solution.
    6. Optimize the recursive solution to caching (memoization).
    7. Remove the overhead of recursion with a bottom-up approach (tabulation).

    But the difficulties is in applying in real problem.

  • To build an enterprise Kafka stream application in Python

    With a high-frequency trading requirement, real-time and low-latency becomes the critical need in the trading system.

    I was a software architect who involved to upgrade a legacy system into a real-time gambling trading system for a billion-sized business. Based on the maturity of using streams, new machine learning ideas were down to earth significantly. Today I want to share this five-year painful but worthy journey from Zero to Master on Kafka stream with python in this series. Besides the high-performance, the developing tips consolidate with stream system design, DevOps operation and quality assurance to build an enterprise python application.

    There are eight sections of this series. It covers the concepts, patterns and architectures with five hands-on examples,

    • Get Started Stream Processing
    • Kafka Python Library Architecture
    • Stream Process Design Pattern
    • Stream Processing Operation
    • Building an Enterprise Application
    • Stream Application Test
    • Monitoring Stream Application
    • Stream System Capacity Design

    Furthermore, to avoid the gap from concepts into practice. All five examples with machine learning libraries inspire all data scientists and engineers to transform their model and service into event-driven architecture faster and easier in action. In the meanwhile for business, to provide a new landscape to extend the uncovered business model with stream technology.  In this case, does business drive technology or technology drive business?

    Like the Concorde airline, the business needed faster airline even in 1969, but there was no mature-ready technology to run at that moment.

    So I applied real-time stream technology into exciting topics like an earthquake, stock market, chatbot, smart-price with example as following,

    ExampleStream Design PatternDependencyKeywordsDescription
    HelloWorldSingle-Event ProcessingStreamProducer, ConsumerSimple event processing
    WordCountProcessing with local StateStream, Table, Web serviceState, expose state, web serviceSimple event processing with state
    EarthquakeProcessing with external lookupMysql, HttpClient, Cron jobs, Timers, de/serialization/, schema registerExternal db, api, partitions, gisui, message serialisation/deserilaztionSimple earthquake data in realtime GIS-UI
    StockMarketMultiphase processing/repartitioningStream, Table, Batch-Training, Real-time recommendationExternal api, batch-training, real-time recommendation, logs, monitoringSimple stock market recommendation system
    FraudDetectionMultiphase processingStream, Table, Juypter notebook, autoMLWindows, Real-time prediction, metrics, autoMLSimple fraud detection in insurance claim
    ChatbotMultiphase processingReal-time Web & Websocket, NLP rasa, spacyNLP, Real-time responseSimple NLP process and response in real-time
    SmartPriceMultiphase processingIncremental learning model,Feature Store, Incremental learning modelSimple incremental learning model

    In the end, this series is helping you to build confidence in adopting the steam-system, to choose a different stream design pattern, to avoid some apparent traps, to design your first stream production-ready and enterprise-ready application in python. Let’s shift stream technology into new business models.

  • Wrong problem

    I am reading Programming Peals again. Wrote down some sentences that I think it is quite impressive, even it was 1999 book.

    the programmer’s main problem was not so much technical as psychological: he could not make progress because he was trying to solve the wrong problem. We finally solved his problem by breaking through his conceptual block and solving an easier problem.

    Conceptual blocks as ‘mental walls that block the problem-solver from correctly perceiving a problem or conceiving its solution.

    this 2 sentences also inspired me that there are so many issues in our life that we can’t solve it, 90% is the concept which I did not know.

    for example, I can’t solve the relationship and communication problem when I graduated, because I believe that all the IT project problem is only the technology problem, other does not matter. But after PMP training, I really understood I was totally wrong thinking when I was young, solve the project process is 9 domains problem included communication, human resources, scope, risk, stakeholder, cost, time domains.

    Keep reading, keep Self-reflection

  • Data Lake and Practise on AWS

    In the software industry,  automation and innovation are 2 biggest core company competitions.  Nowadays, we are fascinated by AI or Machine Learning’s ambitions, but on the other side,  the reality is a lot of jobs or tasks which is still moving data from here to there, digital transformation, refactoring legacy system in the big enterprise organisation.  So building any kinds of pipelines are the key to accelerate the enterprise to implement their business strategy.  Today topic is only data pipeline, remember the time I was using Oracle X-data in Shanghai Telecom, it makes sense to summary my thinking of data pipeline.

    1. Challenges

    Variety, Velocity, Volume are the 3V of the increasing data challenge, remember 5 years in Hadoop system of my company, the Volume for  3TB/Day for 5 million Shanghai mobile user is the biggest difficult task, but now Evolution of big data is coming into batch processing and stream processing, even how to integrate with Artifical Intelligence components in 1 pipeline. With rapidly cloud technology developing,  VM -> Mircoservcie->Serverless, building a big data platform is much easier 5 years ago, buy hardware and install all Hadoop components on-premises system was terrible experiences, with AWS RMR, anyone can get Hadoop or Spark cluster in 5 minutes. However, there are too many tools in the market, whatever from cloud solutions or open source solutions.

    2. Architectural Principles

    Build decoupled systems

    DataSet – > Store -> Process -> Store – >Analyze ->Answers.

    Choose the right tools or library for job

    • the data structure of the storage
    • Latency acceptance
    • throughput requirements

    Centric and secure Log patterns

    • immutable logs or data lake
    • data protection of user log with GDPR

    Be cost-conscious

    • By as you go, no hardware
    • Big data != big cost

    Integration with AI/ML

    • using AI to answer questions
    • AI/ML-based data platform

    3. Simplify Big Data Processing

    Screen Shot 2018-06-06 at 23.22.36.png

    3. Data Temperature Characteristics

    What data store should we use?

    • DataStructure: Fix-Schema, Schema-Free,  JSON, KEY/Value
    • Access Pattern: Store Data in the format you will access it
    • Data Characteristics:  hot -> warm -> cold
    • Cost: right cost

    Amazon Components Compare in data store

    Build realtime analysis system on AWS

    Interactive and batch analysis

    All together in Amazon Data Lake

    Data Lake Reference Architecture

    Summary

    • Building decoupled system:  data -> store-> Process -> Store_> Analyse-> Answers
    • Use the right tool for the job
      • Data Structure
      • Latency
      • Throughput
      • Access pattern
    • Use Log-centric design patterns
      • Immutable logs, data lake, materialised views
    • Be cost-conscious
      • Big data != Big cost
    • AL/ML enable your applications

    References: AWS 2017 Big data data Architectural patterns and best Practices on AWS

    https://www.youtube.com/watch?v=a3713oGB6Zk

  • Practise DDD with Swagger API Contract

    Background

    Owing to the microservice of the software development direction, Domain Driven Design (DDD) is the good pattern for mapping business domain concepts into microservice. Different developers have different opinions to implement, this blog is a kind of our practice with DDD, Spring boot, Swagger API in a simple and effective way.

    Architecture

    A typical enterprise application architecture consists of the following four conceptual layers: https://www.infoq.com/articles/ddd-in-practice

    • User Interface (Presentation Layer): Responsible for presenting information to the user and interpreting user commands.
    • Application Layer: This layer coordinates the application activity. It doesn’t contain any business logic. It does not hold the state of business objects, but it can hold the state of an application task’s progress.
    • Domain Layer: This layer contains information about the business domain. The state of business objects is held here. Persistence of the business objects and possibly their state is delegated to the infrastructure layer.
    • Infrastructure Layer: This layer acts as a supporting library for all the other layers. It provides communication between layers, implements persistence for business objects, contains supporting libraries for the user interface layer, etc.

    Screen Shot 2018-08-02 at 15.00.48.png

    we prefer to use package structure instead of maven model structure, there are some experiences,

    • one micro service is an isolated business logic module, share java API module with mircoservice is coupling tightly
    • we release 1 mircroservice almost includes all the maven module,  there is no frequency that releases update API, Implementation, common or infra maven module separately.
    • keep simple, more modules inside, more version maintaining job in nexus and developer understanding

    we prefer to API contract(Swagger.yml)  instead of Hard Code of java Interface, there are some practices.

    • Industry standard contract definition in REST or gRpc
    • Swagger.yml can auto-generate interface, DTO, Documentation to improve the efficiency of developing, https://github.com/swagger-api/swagger-codegen
    • Spring boot Rest interface is perfectly matched with swagger.yml code generate.
    • With the swagger contract, communication with another language system easily, like javascript or etc
    • Big cloud platforms like AWS, Google, kubernetes can deploy API gateway pattern with swagger.yml directly.

    Spring Swagger Code Generation

    Swagger provides some easy tools in https://github.com/swagger-api/swagger-codegen. there are 2 steps to use it

    • swagger dependency in your maven pom.xml
    <plugin>
        <groupId>io.swagger</groupId>
        <artifactId>swagger-codegen-maven-plugin</artifactId>
        <version>${dep.plugin.swagger-codegen.version}</version>
        <configuration>
            <inputSpec>src/main/resources/push.swagger.yml</inputSpec>
            <modelPackage>com.xxx.services.push.notification.view</modelPackage>
            <apiPackage>com.xxx.services.push.notification.view.api</apiPackage>
            <language>spring</language>
            <output>.</output>
            <generateApis>true</generateApis>
            <generateModelDocumentation>false</generateModelDocumentation>
            <generateSupportingFiles>false</generateSupportingFiles>
            <configOptions>
                <delegatePattern>false</delegatePattern>
                <sourceFolder>src/main/java</sourceFolder>
                <hideGenerationTimestamp>true</hideGenerationTimestamp>
                <useBeanValidation>false</useBeanValidation>
                <java8>false</java8>
            </configOptions>
        </configuration>
    </plugin>

    Plugin Configuration,  inputspec is your swagger.yml location

    • inputSpec is swagger file
    • apipackage is api package
    • using spring the language is important
    • Prefer to use pure java interface instead of java 8 default interface implementation, disable java8
    <plugin>
        <groupId>io.swagger</groupId>
        <artifactId>swagger-codegen-maven-plugin</artifactId>
        <version>${dep.plugin.swagger-codegen.version}</version>
        <configuration>
            <inputSpec>src/main/resources/push.swagger.yml</inputSpec>
            <modelPackage>com.xxx.services.push.notification.view</modelPackage>
            <apiPackage>com.xxx.services.push.notification.view.api</apiPackage>
            <language>spring</language>
            <output>.</output>
            <generateApis>true</generateApis>
            <generateModelDocumentation>false</generateModelDocumentation>
            <generateSupportingFiles>false</generateSupportingFiles>
            <configOptions>
                <delegatePattern>false</delegatePattern>
                <sourceFolder>src/main/java</sourceFolder>
                <hideGenerationTimestamp>true</hideGenerationTimestamp>
                <useBeanValidation>false</useBeanValidation>
                <java8>false</java8>
            </configOptions>
        </configuration>
    </plugin>

    Inheritance of  API Contact

    Screen Shot 2018-08-02 at 15.13.48.png

    Summary

    The main target is not discussing the DDD in deeply, it is a more effective practice to implement a full backend and frontend project. it was used in push notification(java backend) and swipe game(typescript frontend) with this direction.

    Reduce over 80% of the duplicated API, Class definition, name conversation work, only DTO to Model Mapping is the heavy tasks in programming. Let developers focus on the Business Service logic with this scaffold Cross-language with contract easily.

     

  • AWS Best Practise Map

    After using AWS over 5 years, last weekend read the article of Architecting for the cloud(AWS best practise), download PDF, I made some mind-map to give a impression of AWS best practise and sharing my personal thought of cloud future.

    To be honest, the AWS PDF is not real best practise guide, it seems sales flyers. But anyway, some part of pattern is useful practise to understand the current popular solution for which kind of architecture problems they want to solve.

    AWS Best Practise.png

    Summary this tree in short, make a right data persistence for application, loose coupling your system, scale horizontally with high availability.  

    Some patterns can be understood in one Chain

    1. [X] is code 
    2. Automation is the king
    3. Monitoring is self-reflection
    4. Availability is the baseline

    First pattern is quite popular in current developing period like

    • Infrastructure as Code(IAC)
    • Documentation as code(DAC)
    • Security as code(SAC)

    [X] is code 

    X As Code is changing the cognition of  us. The implementation of IAC is a huge revolution in devops world, with the magic of ansible and Chef. Maintaining thousands of node is not big deal. 3 years ago, our ugly Linux shell and Python or Ruby scripts are already into good structured and organised git repositories in our system.

    Secondly, DAC is new direction to corporate between Business and Development, for example in decoupling pattern, good interfaces are core. With repaid requirements changes, api is always changed by PM, PO, Developers, how can we sync and understand in same. OpenAPI is a industry standard. It can be in a .yml file which it is used to between human and application. Human can modify it which constrains the implementation of the business requirements.

    Thirdly, AWS environment give you have the opportunity to capture them all in a script that defines a “Golden Environment.” This means you can create an AWS CloudFormation script that captures your security policy and reliably deploys it. Security best practices can now be reused among multiple projects and become part of your continuous integration pipeline.

    Screen Shot 2017-11-01 at 13.09.36.pngAutomation is the king

    Furthermore, code has version. If something can be as code, all the changes can be managed. If something can be managed, then we can make it automation. Peter Drucker’s prediction in 1950, the biggest challenge of economy after the second world war, 1 was automation of system. 2  was free will of labour.

    Machine does not have the introspection psychology, but human have. So building a system, we have to monitor the log, cost, performance in a feedback system. AWS cloudwatch, events, alarm, etc… Detect fail is the first step, like people can self-refection. Then we can make nice strategy or pattern to achieve graceful failure or change our business idea.

    Screen Shot 2017-11-01 at 13.20.58.png

    Monitoring is self-reflection

    there are 3 kinds of monitoring,

    1. Log system like Kibana, Graylog2
    2. Application performance management (APM) system like AWS cloudwatch, appdynamics
    3. User behaviour system like google analytics, firebase, woopra

    In general,  we are never short of tools. But we are lack of the knowledge which kind key index or metrics we need to measure.  Peter Drucker famously observed, “If you can’t measure it, you can’t manage it”.

    Screen Shot 2017-11-01 at 13.38.35.png

    Lean canvas Step8, we have to list keys that tell you and your business is doing. Maybe this is good start point to redefine our KPI of Application and System again.

    Availability is the baseline

    In the end, High availability means no down time. Whatever strategy or methodology did you use, don’t make system down is the baseline. if we learn some Positive and negative feedback concept in Cybernetics,  availability in software is almost dealing with negative feedback. Screen Shot 2017-11-01 at 13.53.37.png

    The AWS auto scalding or beantalks like the Effector & Controller to reduce system error when performance is not enough. But enterprise software industry is still in low level of feedback system in current software.

    With the Cloud capabilities, I believe that software architecture can be much easier than earlier age or in-house made solution.  In the future, I hope more computer feedback actions can be wrote in algorithms, it means another [x] is code. Regarding the Peter Drucker prediction,  high-automation system by machine, free will labour can be machine.

  • fingerprint-internet_SS_105959135_101113

    Mobile FingerPrint is very popular on both IOS and Android devices.  But for a normal device, like the desktop browser, mobile browser. How can we distinguish user without cookie technology?

    On the Internet, Nobody Knows You’re a Dog

    To be honest, we can know you are cat or dog.  This practice is trying to build a reinforcement fingerprint solution with latest HTML5, Android DeviceID SDK ,  Apple Device ID SDK, Google Firebase InstanceID SDK, GeoService without cookie. This article tries to figure out these questions,

    1. What’s the problem? What’s device fingerprint?
    2. Which kinds of attributes can identify a unique device?
    3. Algorithm of  web fingerprinting
    4. finger print is not unique, a double HASH mechanism
    5. Make user agent readable in the backend.
    6. GeoService to recognize IP location.
    7. Send SMS or Email to customer
    8. Database schema design 
    9. Simple System Architecture

    1. What is finger printing? Which problem can it solve for us?

    browser fingerprinting is the capability of a site to identify or re-identify a visiting user, user agent or device via configuration settings or other observable characteristics. — w3c

    Fingerprinting Guidance for Web Specification is the draft, it can be used as a security measure (e.g. as means of authenticating the user),  google and apple are using for user authentication and trust devices.Uniquely identifies and tracks every device that accesses your site. Fraud detection is the most usage, there are 3 types of fingerprinting of web,

    1. Passive fingerprinting is browser fingerprinting based on characteristics observable in the contents of Web requests, without the use of any code executing on the client side.
    2. Active fingerprinting, we also consider techniques where a site runs JavaScript or other code on the local client to observe additional characteristics about the browser.
    3. user agents and devices may also be re-identified by a site that first sets and later retrieves state stored by a user agent or device. This cookie-like fingerprinting allows re-identification of a user or inferences about a user in the same way that HTTP cookies allow state management for the stateless HTTP protocol.

    2.Which kinds of characteristics can identify a unique device?

    More characteristics and attributes, more accuracy,  but respect  Data Privacy Laws such as The Data Protection Directive in EU. This practice doesn’t use cookie technology, respect privacy for the customer, we use HASH algorithm to avoid collect user web information directly. In the end,  user characteristics are only HASH in our database,  it is deliberately difficult to reconstruct it. Screen Shot 2017-07-15 at 14.27.45

    this is an example, in active mode, we can use a javascript library like fingerprintjs2 on the client side,   to combine and encrypt users characteristics into a hash value. More detail of this library, please watch this video. https://player.vimeo.com/video/151208427, very simple and clear solution used in a lot of web site.

    var fingerPrintingHash = null;
    new Fingerprint2().get(function(result, components){
      fingerPrintingHash = result;
      console.log(result);
    });
    

     

    3. The algorithm of web fingerprinting

    there is a famous paper,  https://panopticlick.eff.org/static/browser-uniqueness.pdf

    How unique is your browser?

    this is the test unique of this paper, https://panopticlick.eff.org/. there is a simple algorithm can 94.2% of browsers with Flash or Java were unique in this sample.

    Screen Shot 2017-07-15 at 19.53.28

    If you need to know the detail, please read this paper, it depends on your collect characteristics, we don’t use supercookie and some plugins. If you in the simple solution, with some stable characteristics of your browser, you don’t need this algorithm, only 1 hash is the unique key.

    4. finger printing is not unique, double HASH mechanism

    Above this paragraph, there is only web system situation, but if the device is mobile. There are more opinions that can unique a device on the native mobile system. And this MD5 HASH can implement in server side.

    Screen Shot 2017-07-15 at 20.03.00

    Both IOS and Android support UUID of their devices. In the old system, we can read IMEI (International Mobile Equipment Identity) directly, owing to the security consideration, IOS and Android stopped this access in the code. Otherwise, it is quite a useful information, you can use *#06# to call your phone to find it.  For example, https://support.apple.com/en-us/HT204073.

    • Android device unique id
    String androidId = Secure.getString(getContentResolver(), Secure.ANDROID_ID);
    

    https://developer.android.com/reference/android/provider/Settings.Secure.html#ANDROID_ID

    • IOS device unique id
    let device_id = UIDevice.currentDevice().identifierForVendor?.UUIDString
    

    https://developer.apple.com/documentation/uikit/uidevice

    • Firebase unique id

    google, Firebase Instance ID provides a unique identifier for each app instance and a mechanism to authenticate and authorize actions. For example FCM messages in push notification usage. In fact, FCM also provided device token id to unique each device to push notification.  It supports Web, Android, IOS platform.

    FirebaseInstanceId.getInstance().getToken();
    FirebaseInstanceId.getInstance().getId();
    

    5. Make UserAgent readable

    user agent information is in HTTP request like

    Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Mobile Safari/537.36

    We have to split and sperate into device, os, browser etc.. with version. there are some easy libraries to make it readable for developers.

    http://uadetector.sourceforge.net/, you can use get os, get device, get web browser easily.

    <dependency>
        <groupId>net.sf.uadetector</groupId>
        <artifactId>distribution</artifactId>
        <version>2014.10</version>
    </dependency>
    UserAgentStringParser parser = UADetectorServiceFactory.getResourceModuleParser();
    ReadableUserAgent agent = parser.parse(request.getHeader("User-Agent"));
    

    in python, you can

    pip install user_agents

    readAgent = user_agents.parse("Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T)")
    os = readAgent.os.family + ", " + readAgent.os.version_string
    device = readAgent.device.family + ", " + readAgent.device.brand
    

    6. GeoService to convert IP into a location and recognize IP proxy.

    there are many services provided easy and powerful IP geo service, like maxmind, there is online service we can call directly with the license. java python, ect. SDK and API link here

    pip install geoip2

    import geoip2.webservice
    def decodegeo(self):
        country, state, city = None, None, None
        try:
           self.geoservice = geoip2.webservice.Client("userId", 'license token')
            response = self.geoservice.insights("5.146.199.100")
            country = response.country.name
            state = response.subdivisions.most_specific.name
            city = response.city.name
        except Exception, e:
            logger.error(e)
        return country, state, city
    

    in fact, there is fraud service from maxmind https://www.maxmind.com/en/minfraud-services, you also can try.

    6. Send SMS or Email or push notification to customer when new login into the system. 

    we can use a simple SMTP library to send an email to the customers directly,  but in current could world, you can use AMAZON SES or SNS.

    http://docs.aws.amazon.com/zh_cn/ses/latest/DeveloperGuide/send-using-sdk-python.html

    pip install boto3

    import boto3
    
    # Replace sender@example.com with your "From" address.
    # This address must be verified with Amazon SES.
    sender = "sender@example.com"
    
    # Replace recipient@example.com with a "To" address. If your account 
    # is still in the sandbox, this address must be verified.
    recipient = "recipient@example.com"
    
    # If necessary, replace us-west-2 with the AWS Region you're using for Amazon SES.
    awsregion = "us-west-2"
    
    # The subject line for the email.
    subject = "Amazon SES Test (SDK for Python)"
    
    # The HTML body of the email.
    htmlbody = """<h1>Amazon SES Test (SDK for Python)</h1><p>This email was sent with 
                <a href='https://aws.amazon.com/ses/'>Amazon SES</a> using the 
                <a href='https://aws.amazon.com/sdk-for-python/'>AWS SDK for Python (Boto)</a>.</p>"""
    
    # The email body for recipients with non-HTML email clients.  
    textbody = "This email was sent with Amazon SES using the AWS SDK for Python (Boto)"
    
    # The character encoding for the email.
    charset = "UTF-8"
    
    # Create a new SES resource and specify a region.
    client = boto3.client('ses',region_name=awsregion)
    
    # Try to send the email.
    try:
        #Provide the contents of the email.
        response = client.send_email(
            Destination={
                'ToAddresses': [
                    recipient,
                ],
            },
            Message={
                'Body': {
                    'Html': {
                        'Charset': charset,
                        'Data': htmlbody,
                    },
                    'Text': {
                        'Charset': charset,
                        'Data': textbody,
                    },
                },
                'Subject': {
                    'Charset': charset,
                    'Data': subject,
                },
            },
            Source=sender,
        )
    # Display an error if something goes wrong.	
    except Exception as e:
        print "Error: ", e	
    else:
        print "Email sent!"
    

    Send SMS to user phone

    import boto3
    sns = boto3.client('sns')
    number = '+17702233322'
    sns.publish(PhoneNumber = number, Message='example text message' )

    If you need advanced notification and communication with customer,  using twillio is another good choice, https://www.twilio.com/

     8. Database schema design

    this schema is MVP of the device identity, there are 2 HASH, web_hash_signature and firebase_device_token token is the unique key. to simplify, MD5(web_hash_signature + firebase_device_token) is also solution of Reinforcement.

    CREATE TABLE `device` (
    `id` int(11) NOT NULL AUTO_INCREMENT,
    `user_id` varchar(32) NOT NULL,
    `user_login_name` varchar(32) NOT NULL,
    `email` varchar(50) NOT NULL,
    `os` varchar(32) NOT NULL,
    `date` datetime NOT NULL,
    `device` varchar(32) NOT NULL,
    `country` varchar(32) NOT NULL,
    `state` varchar(32) NOT NULL,
    `web_hash_signature` varchar(64) NOT NULL,,
    `firebase_device_token` varchar(200) NOT NULL,
    `phonenumber` varchar(45) NOT NULL,
    `ip` varchar(15) NOT NULL,
    PRIMARY KEY (`id`),
    UNIQUE KEY `signature_UNIQUE` (`web_hash_signature`),
    UNIQUE KEY `devicetoken_UNIQUE` (`firebase_device_token`),
    UNIQUE KEY `phonenumber_UNIQUE` (`phonenumber`)
    ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
    

    Then you can have almost basic fields to send google or apple or Airbnb login notification email. This is an example of airbnb login notification email,  they need, country, state, city, device, os, time data.

    airbnblogin

    9. Simple System Architecture

    Screen Shot 2017-07-15 at 22.20.21.png

    Summary, it is a practice and solution in my web and mobile system to identify the customer to avoid fraud login in the first step. Some problems should consider, like performance design like cache mechanism,  web HASH based the stable characteristics (exclude browser version which customers always changing) in the future, invalidate session or token function in web system.