• Bellman equation, named after Richard E. Bellman, is a necessary condition for optimality associated with the mathematical optimization method known as dynamic programming.[1] It writes the “value” of a decision problem at a certain point in time in terms of the payoff from some initial choices and the “value” of the remaining decision problem that results from those initial choices.[citation needed] This breaks a dynamic optimization problem into a sequence of simpler subproblems, as Bellman’s “principle of optimality” prescribes.[2]

    https://en.wikipedia.org/wiki/Bellman_equation

    In English understanding,

    1. Identify the Subproblems/Subphase in Time or Space
    2. clearly express the recurrence relation with sub problems
    3. Identify variables of states
    4. State transition equation (Bellman_equation)
    5. Define optimisation value Function/Lost function with boundary

    In Chinese understanding,

    1. 识别问题的多阶段性特征
    2. 将问题分解成递推关系式联系起来的若干个子阶段
    3. 正确的状态变量
    4. 正确的定义状态转移方程
    5. 找到最优的指标函数的递推关系及边界条件

    In Programming understanding,

    1. Define the state(s).
    2. Define the recurrence relation(s).
    3. List all the state(s) transitions with their respective conditions.
    4. Define the base case(s).
    5. Implement a naive recursive solution.
    6. Optimize the recursive solution to caching (memoization).
    7. Remove the overhead of recursion with a bottom-up approach (tabulation).

    But the difficulties is in applying in real problem.

  • To build an enterprise Kafka stream application in Python

    With a high-frequency trading requirement, real-time and low-latency becomes the critical need in the trading system.

    I was a software architect who involved to upgrade a legacy system into a real-time gambling trading system for a billion-sized business. Based on the maturity of using streams, new machine learning ideas were down to earth significantly. Today I want to share this five-year painful but worthy journey from Zero to Master on Kafka stream with python in this series. Besides the high-performance, the developing tips consolidate with stream system design, DevOps operation and quality assurance to build an enterprise python application.

    There are eight sections of this series. It covers the concepts, patterns and architectures with five hands-on examples,

    • Get Started Stream Processing
    • Kafka Python Library Architecture
    • Stream Process Design Pattern
    • Stream Processing Operation
    • Building an Enterprise Application
    • Stream Application Test
    • Monitoring Stream Application
    • Stream System Capacity Design

    Furthermore, to avoid the gap from concepts into practice. All five examples with machine learning libraries inspire all data scientists and engineers to transform their model and service into event-driven architecture faster and easier in action. In the meanwhile for business, to provide a new landscape to extend the uncovered business model with stream technology.  In this case, does business drive technology or technology drive business?

    Like the Concorde airline, the business needed faster airline even in 1969, but there was no mature-ready technology to run at that moment.

    So I applied real-time stream technology into exciting topics like an earthquake, stock market, chatbot, smart-price with example as following,

    ExampleStream Design PatternDependencyKeywordsDescription
    HelloWorldSingle-Event ProcessingStreamProducer, ConsumerSimple event processing
    WordCountProcessing with local StateStream, Table, Web serviceState, expose state, web serviceSimple event processing with state
    EarthquakeProcessing with external lookupMysql, HttpClient, Cron jobs, Timers, de/serialization/, schema registerExternal db, api, partitions, gisui, message serialisation/deserilaztionSimple earthquake data in realtime GIS-UI
    StockMarketMultiphase processing/repartitioningStream, Table, Batch-Training, Real-time recommendationExternal api, batch-training, real-time recommendation, logs, monitoringSimple stock market recommendation system
    FraudDetectionMultiphase processingStream, Table, Juypter notebook, autoMLWindows, Real-time prediction, metrics, autoMLSimple fraud detection in insurance claim
    ChatbotMultiphase processingReal-time Web & Websocket, NLP rasa, spacyNLP, Real-time responseSimple NLP process and response in real-time
    SmartPriceMultiphase processingIncremental learning model,Feature Store, Incremental learning modelSimple incremental learning model

    In the end, this series is helping you to build confidence in adopting the steam-system, to choose a different stream design pattern, to avoid some apparent traps, to design your first stream production-ready and enterprise-ready application in python. Let’s shift stream technology into new business models.

  • Wrong problem

    I am reading Programming Peals again. Wrote down some sentences that I think it is quite impressive, even it was 1999 book.

    the programmer’s main problem was not so much technical as psychological: he could not make progress because he was trying to solve the wrong problem. We finally solved his problem by breaking through his conceptual block and solving an easier problem.

    Conceptual blocks as ‘mental walls that block the problem-solver from correctly perceiving a problem or conceiving its solution.

    this 2 sentences also inspired me that there are so many issues in our life that we can’t solve it, 90% is the concept which I did not know.

    for example, I can’t solve the relationship and communication problem when I graduated, because I believe that all the IT project problem is only the technology problem, other does not matter. But after PMP training, I really understood I was totally wrong thinking when I was young, solve the project process is 9 domains problem included communication, human resources, scope, risk, stakeholder, cost, time domains.

    Keep reading, keep Self-reflection

  • Data Lake and Practise on AWS

    In the software industry,  automation and innovation are 2 biggest core company competitions.  Nowadays, we are fascinated by AI or Machine Learning’s ambitions, but on the other side,  the reality is a lot of jobs or tasks which is still moving data from here to there, digital transformation, refactoring legacy system in the big enterprise organisation.  So building any kinds of pipelines are the key to accelerate the enterprise to implement their business strategy.  Today topic is only data pipeline, remember the time I was using Oracle X-data in Shanghai Telecom, it makes sense to summary my thinking of data pipeline.

    1. Challenges

    Variety, Velocity, Volume are the 3V of the increasing data challenge, remember 5 years in Hadoop system of my company, the Volume for  3TB/Day for 5 million Shanghai mobile user is the biggest difficult task, but now Evolution of big data is coming into batch processing and stream processing, even how to integrate with Artifical Intelligence components in 1 pipeline. With rapidly cloud technology developing,  VM -> Mircoservcie->Serverless, building a big data platform is much easier 5 years ago, buy hardware and install all Hadoop components on-premises system was terrible experiences, with AWS RMR, anyone can get Hadoop or Spark cluster in 5 minutes. However, there are too many tools in the market, whatever from cloud solutions or open source solutions.

    2. Architectural Principles

    Build decoupled systems

    DataSet – > Store -> Process -> Store – >Analyze ->Answers.

    Choose the right tools or library for job

    • the data structure of the storage
    • Latency acceptance
    • throughput requirements

    Centric and secure Log patterns

    • immutable logs or data lake
    • data protection of user log with GDPR

    Be cost-conscious

    • By as you go, no hardware
    • Big data != big cost

    Integration with AI/ML

    • using AI to answer questions
    • AI/ML-based data platform

    3. Simplify Big Data Processing

    Screen Shot 2018-06-06 at 23.22.36.png

    3. Data Temperature Characteristics

    What data store should we use?

    • DataStructure: Fix-Schema, Schema-Free,  JSON, KEY/Value
    • Access Pattern: Store Data in the format you will access it
    • Data Characteristics:  hot -> warm -> cold
    • Cost: right cost

    Amazon Components Compare in data store

    Build realtime analysis system on AWS

    Interactive and batch analysis

    All together in Amazon Data Lake

    Data Lake Reference Architecture

    Summary

    • Building decoupled system:  data -> store-> Process -> Store_> Analyse-> Answers
    • Use the right tool for the job
      • Data Structure
      • Latency
      • Throughput
      • Access pattern
    • Use Log-centric design patterns
      • Immutable logs, data lake, materialised views
    • Be cost-conscious
      • Big data != Big cost
    • AL/ML enable your applications

    References: AWS 2017 Big data data Architectural patterns and best Practices on AWS

    https://www.youtube.com/watch?v=a3713oGB6Zk

  • Practise DDD with Swagger API Contract

    Background

    Owing to the microservice of the software development direction, Domain Driven Design (DDD) is the good pattern for mapping business domain concepts into microservice. Different developers have different opinions to implement, this blog is a kind of our practice with DDD, Spring boot, Swagger API in a simple and effective way.

    Architecture

    A typical enterprise application architecture consists of the following four conceptual layers: https://www.infoq.com/articles/ddd-in-practice

    • User Interface (Presentation Layer): Responsible for presenting information to the user and interpreting user commands.
    • Application Layer: This layer coordinates the application activity. It doesn’t contain any business logic. It does not hold the state of business objects, but it can hold the state of an application task’s progress.
    • Domain Layer: This layer contains information about the business domain. The state of business objects is held here. Persistence of the business objects and possibly their state is delegated to the infrastructure layer.
    • Infrastructure Layer: This layer acts as a supporting library for all the other layers. It provides communication between layers, implements persistence for business objects, contains supporting libraries for the user interface layer, etc.

    Screen Shot 2018-08-02 at 15.00.48.png

    we prefer to use package structure instead of maven model structure, there are some experiences,

    • one micro service is an isolated business logic module, share java API module with mircoservice is coupling tightly
    • we release 1 mircroservice almost includes all the maven module,  there is no frequency that releases update API, Implementation, common or infra maven module separately.
    • keep simple, more modules inside, more version maintaining job in nexus and developer understanding

    we prefer to API contract(Swagger.yml)  instead of Hard Code of java Interface, there are some practices.

    • Industry standard contract definition in REST or gRpc
    • Swagger.yml can auto-generate interface, DTO, Documentation to improve the efficiency of developing, https://github.com/swagger-api/swagger-codegen
    • Spring boot Rest interface is perfectly matched with swagger.yml code generate.
    • With the swagger contract, communication with another language system easily, like javascript or etc
    • Big cloud platforms like AWS, Google, kubernetes can deploy API gateway pattern with swagger.yml directly.

    Spring Swagger Code Generation

    Swagger provides some easy tools in https://github.com/swagger-api/swagger-codegen. there are 2 steps to use it

    • swagger dependency in your maven pom.xml
    <plugin>
        <groupId>io.swagger</groupId>
        <artifactId>swagger-codegen-maven-plugin</artifactId>
        <version>${dep.plugin.swagger-codegen.version}</version>
        <configuration>
            <inputSpec>src/main/resources/push.swagger.yml</inputSpec>
            <modelPackage>com.xxx.services.push.notification.view</modelPackage>
            <apiPackage>com.xxx.services.push.notification.view.api</apiPackage>
            <language>spring</language>
            <output>.</output>
            <generateApis>true</generateApis>
            <generateModelDocumentation>false</generateModelDocumentation>
            <generateSupportingFiles>false</generateSupportingFiles>
            <configOptions>
                <delegatePattern>false</delegatePattern>
                <sourceFolder>src/main/java</sourceFolder>
                <hideGenerationTimestamp>true</hideGenerationTimestamp>
                <useBeanValidation>false</useBeanValidation>
                <java8>false</java8>
            </configOptions>
        </configuration>
    </plugin>

    Plugin Configuration,  inputspec is your swagger.yml location

    • inputSpec is swagger file
    • apipackage is api package
    • using spring the language is important
    • Prefer to use pure java interface instead of java 8 default interface implementation, disable java8
    <plugin>
        <groupId>io.swagger</groupId>
        <artifactId>swagger-codegen-maven-plugin</artifactId>
        <version>${dep.plugin.swagger-codegen.version}</version>
        <configuration>
            <inputSpec>src/main/resources/push.swagger.yml</inputSpec>
            <modelPackage>com.xxx.services.push.notification.view</modelPackage>
            <apiPackage>com.xxx.services.push.notification.view.api</apiPackage>
            <language>spring</language>
            <output>.</output>
            <generateApis>true</generateApis>
            <generateModelDocumentation>false</generateModelDocumentation>
            <generateSupportingFiles>false</generateSupportingFiles>
            <configOptions>
                <delegatePattern>false</delegatePattern>
                <sourceFolder>src/main/java</sourceFolder>
                <hideGenerationTimestamp>true</hideGenerationTimestamp>
                <useBeanValidation>false</useBeanValidation>
                <java8>false</java8>
            </configOptions>
        </configuration>
    </plugin>

    Inheritance of  API Contact

    Screen Shot 2018-08-02 at 15.13.48.png

    Summary

    The main target is not discussing the DDD in deeply, it is a more effective practice to implement a full backend and frontend project. it was used in push notification(java backend) and swipe game(typescript frontend) with this direction.

    Reduce over 80% of the duplicated API, Class definition, name conversation work, only DTO to Model Mapping is the heavy tasks in programming. Let developers focus on the Business Service logic with this scaffold Cross-language with contract easily.

     

  • AWS Best Practise Map

    After using AWS over 5 years, last weekend read the article of Architecting for the cloud(AWS best practise), download PDF, I made some mind-map to give a impression of AWS best practise and sharing my personal thought of cloud future.

    To be honest, the AWS PDF is not real best practise guide, it seems sales flyers. But anyway, some part of pattern is useful practise to understand the current popular solution for which kind of architecture problems they want to solve.

    AWS Best Practise.png

    Summary this tree in short, make a right data persistence for application, loose coupling your system, scale horizontally with high availability.  

    Some patterns can be understood in one Chain

    1. [X] is code 
    2. Automation is the king
    3. Monitoring is self-reflection
    4. Availability is the baseline

    First pattern is quite popular in current developing period like

    • Infrastructure as Code(IAC)
    • Documentation as code(DAC)
    • Security as code(SAC)

    [X] is code 

    X As Code is changing the cognition of  us. The implementation of IAC is a huge revolution in devops world, with the magic of ansible and Chef. Maintaining thousands of node is not big deal. 3 years ago, our ugly Linux shell and Python or Ruby scripts are already into good structured and organised git repositories in our system.

    Secondly, DAC is new direction to corporate between Business and Development, for example in decoupling pattern, good interfaces are core. With repaid requirements changes, api is always changed by PM, PO, Developers, how can we sync and understand in same. OpenAPI is a industry standard. It can be in a .yml file which it is used to between human and application. Human can modify it which constrains the implementation of the business requirements.

    Thirdly, AWS environment give you have the opportunity to capture them all in a script that defines a “Golden Environment.” This means you can create an AWS CloudFormation script that captures your security policy and reliably deploys it. Security best practices can now be reused among multiple projects and become part of your continuous integration pipeline.

    Screen Shot 2017-11-01 at 13.09.36.pngAutomation is the king

    Furthermore, code has version. If something can be as code, all the changes can be managed. If something can be managed, then we can make it automation. Peter Drucker’s prediction in 1950, the biggest challenge of economy after the second world war, 1 was automation of system. 2  was free will of labour.

    Machine does not have the introspection psychology, but human have. So building a system, we have to monitor the log, cost, performance in a feedback system. AWS cloudwatch, events, alarm, etc… Detect fail is the first step, like people can self-refection. Then we can make nice strategy or pattern to achieve graceful failure or change our business idea.

    Screen Shot 2017-11-01 at 13.20.58.png

    Monitoring is self-reflection

    there are 3 kinds of monitoring,

    1. Log system like Kibana, Graylog2
    2. Application performance management (APM) system like AWS cloudwatch, appdynamics
    3. User behaviour system like google analytics, firebase, woopra

    In general,  we are never short of tools. But we are lack of the knowledge which kind key index or metrics we need to measure.  Peter Drucker famously observed, “If you can’t measure it, you can’t manage it”.

    Screen Shot 2017-11-01 at 13.38.35.png

    Lean canvas Step8, we have to list keys that tell you and your business is doing. Maybe this is good start point to redefine our KPI of Application and System again.

    Availability is the baseline

    In the end, High availability means no down time. Whatever strategy or methodology did you use, don’t make system down is the baseline. if we learn some Positive and negative feedback concept in Cybernetics,  availability in software is almost dealing with negative feedback. Screen Shot 2017-11-01 at 13.53.37.png

    The AWS auto scalding or beantalks like the Effector & Controller to reduce system error when performance is not enough. But enterprise software industry is still in low level of feedback system in current software.

    With the Cloud capabilities, I believe that software architecture can be much easier than earlier age or in-house made solution.  In the future, I hope more computer feedback actions can be wrote in algorithms, it means another [x] is code. Regarding the Peter Drucker prediction,  high-automation system by machine, free will labour can be machine.

  • fingerprint-internet_SS_105959135_101113

    Mobile FingerPrint is very popular on both IOS and Android devices.  But for a normal device, like the desktop browser, mobile browser. How can we distinguish user without cookie technology?

    On the Internet, Nobody Knows You’re a Dog

    To be honest, we can know you are cat or dog.  This practice is trying to build a reinforcement fingerprint solution with latest HTML5, Android DeviceID SDK ,  Apple Device ID SDK, Google Firebase InstanceID SDK, GeoService without cookie. This article tries to figure out these questions,

    1. What’s the problem? What’s device fingerprint?
    2. Which kinds of attributes can identify a unique device?
    3. Algorithm of  web fingerprinting
    4. finger print is not unique, a double HASH mechanism
    5. Make user agent readable in the backend.
    6. GeoService to recognize IP location.
    7. Send SMS or Email to customer
    8. Database schema design 
    9. Simple System Architecture

    1. What is finger printing? Which problem can it solve for us?

    browser fingerprinting is the capability of a site to identify or re-identify a visiting user, user agent or device via configuration settings or other observable characteristics. — w3c

    Fingerprinting Guidance for Web Specification is the draft, it can be used as a security measure (e.g. as means of authenticating the user),  google and apple are using for user authentication and trust devices.Uniquely identifies and tracks every device that accesses your site. Fraud detection is the most usage, there are 3 types of fingerprinting of web,

    1. Passive fingerprinting is browser fingerprinting based on characteristics observable in the contents of Web requests, without the use of any code executing on the client side.
    2. Active fingerprinting, we also consider techniques where a site runs JavaScript or other code on the local client to observe additional characteristics about the browser.
    3. user agents and devices may also be re-identified by a site that first sets and later retrieves state stored by a user agent or device. This cookie-like fingerprinting allows re-identification of a user or inferences about a user in the same way that HTTP cookies allow state management for the stateless HTTP protocol.

    2.Which kinds of characteristics can identify a unique device?

    More characteristics and attributes, more accuracy,  but respect  Data Privacy Laws such as The Data Protection Directive in EU. This practice doesn’t use cookie technology, respect privacy for the customer, we use HASH algorithm to avoid collect user web information directly. In the end,  user characteristics are only HASH in our database,  it is deliberately difficult to reconstruct it. Screen Shot 2017-07-15 at 14.27.45

    this is an example, in active mode, we can use a javascript library like fingerprintjs2 on the client side,   to combine and encrypt users characteristics into a hash value. More detail of this library, please watch this video. https://player.vimeo.com/video/151208427, very simple and clear solution used in a lot of web site.

    var fingerPrintingHash = null;
    new Fingerprint2().get(function(result, components){
      fingerPrintingHash = result;
      console.log(result);
    });
    

     

    3. The algorithm of web fingerprinting

    there is a famous paper,  https://panopticlick.eff.org/static/browser-uniqueness.pdf

    How unique is your browser?

    this is the test unique of this paper, https://panopticlick.eff.org/. there is a simple algorithm can 94.2% of browsers with Flash or Java were unique in this sample.

    Screen Shot 2017-07-15 at 19.53.28

    If you need to know the detail, please read this paper, it depends on your collect characteristics, we don’t use supercookie and some plugins. If you in the simple solution, with some stable characteristics of your browser, you don’t need this algorithm, only 1 hash is the unique key.

    4. finger printing is not unique, double HASH mechanism

    Above this paragraph, there is only web system situation, but if the device is mobile. There are more opinions that can unique a device on the native mobile system. And this MD5 HASH can implement in server side.

    Screen Shot 2017-07-15 at 20.03.00

    Both IOS and Android support UUID of their devices. In the old system, we can read IMEI (International Mobile Equipment Identity) directly, owing to the security consideration, IOS and Android stopped this access in the code. Otherwise, it is quite a useful information, you can use *#06# to call your phone to find it.  For example, https://support.apple.com/en-us/HT204073.

    • Android device unique id
    String androidId = Secure.getString(getContentResolver(), Secure.ANDROID_ID);
    

    https://developer.android.com/reference/android/provider/Settings.Secure.html#ANDROID_ID

    • IOS device unique id
    let device_id = UIDevice.currentDevice().identifierForVendor?.UUIDString
    

    https://developer.apple.com/documentation/uikit/uidevice

    • Firebase unique id

    google, Firebase Instance ID provides a unique identifier for each app instance and a mechanism to authenticate and authorize actions. For example FCM messages in push notification usage. In fact, FCM also provided device token id to unique each device to push notification.  It supports Web, Android, IOS platform.

    FirebaseInstanceId.getInstance().getToken();
    FirebaseInstanceId.getInstance().getId();
    

    5. Make UserAgent readable

    user agent information is in HTTP request like

    Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Mobile Safari/537.36

    We have to split and sperate into device, os, browser etc.. with version. there are some easy libraries to make it readable for developers.

    http://uadetector.sourceforge.net/, you can use get os, get device, get web browser easily.

    <dependency>
        <groupId>net.sf.uadetector</groupId>
        <artifactId>distribution</artifactId>
        <version>2014.10</version>
    </dependency>
    UserAgentStringParser parser = UADetectorServiceFactory.getResourceModuleParser();
    ReadableUserAgent agent = parser.parse(request.getHeader("User-Agent"));
    

    in python, you can

    pip install user_agents

    readAgent = user_agents.parse("Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T)")
    os = readAgent.os.family + ", " + readAgent.os.version_string
    device = readAgent.device.family + ", " + readAgent.device.brand
    

    6. GeoService to convert IP into a location and recognize IP proxy.

    there are many services provided easy and powerful IP geo service, like maxmind, there is online service we can call directly with the license. java python, ect. SDK and API link here

    pip install geoip2

    import geoip2.webservice
    def decodegeo(self):
        country, state, city = None, None, None
        try:
           self.geoservice = geoip2.webservice.Client("userId", 'license token')
            response = self.geoservice.insights("5.146.199.100")
            country = response.country.name
            state = response.subdivisions.most_specific.name
            city = response.city.name
        except Exception, e:
            logger.error(e)
        return country, state, city
    

    in fact, there is fraud service from maxmind https://www.maxmind.com/en/minfraud-services, you also can try.

    6. Send SMS or Email or push notification to customer when new login into the system. 

    we can use a simple SMTP library to send an email to the customers directly,  but in current could world, you can use AMAZON SES or SNS.

    http://docs.aws.amazon.com/zh_cn/ses/latest/DeveloperGuide/send-using-sdk-python.html

    pip install boto3

    import boto3
    
    # Replace sender@example.com with your "From" address.
    # This address must be verified with Amazon SES.
    sender = "sender@example.com"
    
    # Replace recipient@example.com with a "To" address. If your account 
    # is still in the sandbox, this address must be verified.
    recipient = "recipient@example.com"
    
    # If necessary, replace us-west-2 with the AWS Region you're using for Amazon SES.
    awsregion = "us-west-2"
    
    # The subject line for the email.
    subject = "Amazon SES Test (SDK for Python)"
    
    # The HTML body of the email.
    htmlbody = """<h1>Amazon SES Test (SDK for Python)</h1><p>This email was sent with 
                <a href='https://aws.amazon.com/ses/'>Amazon SES</a> using the 
                <a href='https://aws.amazon.com/sdk-for-python/'>AWS SDK for Python (Boto)</a>.</p>"""
    
    # The email body for recipients with non-HTML email clients.  
    textbody = "This email was sent with Amazon SES using the AWS SDK for Python (Boto)"
    
    # The character encoding for the email.
    charset = "UTF-8"
    
    # Create a new SES resource and specify a region.
    client = boto3.client('ses',region_name=awsregion)
    
    # Try to send the email.
    try:
        #Provide the contents of the email.
        response = client.send_email(
            Destination={
                'ToAddresses': [
                    recipient,
                ],
            },
            Message={
                'Body': {
                    'Html': {
                        'Charset': charset,
                        'Data': htmlbody,
                    },
                    'Text': {
                        'Charset': charset,
                        'Data': textbody,
                    },
                },
                'Subject': {
                    'Charset': charset,
                    'Data': subject,
                },
            },
            Source=sender,
        )
    # Display an error if something goes wrong.	
    except Exception as e:
        print "Error: ", e	
    else:
        print "Email sent!"
    

    Send SMS to user phone

    import boto3
    sns = boto3.client('sns')
    number = '+17702233322'
    sns.publish(PhoneNumber = number, Message='example text message' )

    If you need advanced notification and communication with customer,  using twillio is another good choice, https://www.twilio.com/

     8. Database schema design

    this schema is MVP of the device identity, there are 2 HASH, web_hash_signature and firebase_device_token token is the unique key. to simplify, MD5(web_hash_signature + firebase_device_token) is also solution of Reinforcement.

    CREATE TABLE `device` (
    `id` int(11) NOT NULL AUTO_INCREMENT,
    `user_id` varchar(32) NOT NULL,
    `user_login_name` varchar(32) NOT NULL,
    `email` varchar(50) NOT NULL,
    `os` varchar(32) NOT NULL,
    `date` datetime NOT NULL,
    `device` varchar(32) NOT NULL,
    `country` varchar(32) NOT NULL,
    `state` varchar(32) NOT NULL,
    `web_hash_signature` varchar(64) NOT NULL,,
    `firebase_device_token` varchar(200) NOT NULL,
    `phonenumber` varchar(45) NOT NULL,
    `ip` varchar(15) NOT NULL,
    PRIMARY KEY (`id`),
    UNIQUE KEY `signature_UNIQUE` (`web_hash_signature`),
    UNIQUE KEY `devicetoken_UNIQUE` (`firebase_device_token`),
    UNIQUE KEY `phonenumber_UNIQUE` (`phonenumber`)
    ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
    

    Then you can have almost basic fields to send google or apple or Airbnb login notification email. This is an example of airbnb login notification email,  they need, country, state, city, device, os, time data.

    airbnblogin

    9. Simple System Architecture

    Screen Shot 2017-07-15 at 22.20.21.png

    Summary, it is a practice and solution in my web and mobile system to identify the customer to avoid fraud login in the first step. Some problems should consider, like performance design like cache mechanism,  web HASH based the stable characteristics (exclude browser version which customers always changing) in the future, invalidate session or token function in web system.

  • Programming is all about problem solving. 

    • Tip 1: Always consider the context.
      • 世界上每个事物都是互联的,总是考虑上下文语境。比如你看到独立的树在地面上,但事实上,它有着两个独立的系统,一个是树叶和空气的系统,另外一个是树根和土地的系统。我们总是其中的一部分,不管你知道或者不知道。
    • Tip 2: Use rules for novices, intuition for experts.
      • 对入门者采用规则,对专家只有直觉
    • Tip 3: Know what you don’t know.
      • 知道你不知道,才能扩展你不知道
    • Tip 4: Learn by watching and imitating.
      • 学习只有通过观察和模仿
    • Tip 5: Keep practicing in order to remain expert.
      • 只有不断的练习才能保持专业
    • Tip 6: Avoid formal methods if you need creativity, intuition, or inventiveness.
      • 如果你需要创造力,直觉, 发明力, 避免传统正规的方法
    • Tip 7: Learn the skill of learning.
      • 学习学习的技能,学些的技能也是需要学习的。
    • Tip 8: Capture all ideas to get more of them.
      • 捕捉所有的想法是为了从他们上获得更多。
    • Tip 9: Learn by synthesis as well as by analysis.
      • 学习是通过合成和分析获得的。
    • Tip 10: Strive for good design; it really works better.
    • Tip 11: Rewire your brain with belief and constant practice.
    • Tip 12: Add sensory experience to engage more of your brain.
    • Tip 13: Lead with R-mode; follow with L-mode.
    • Tip 14: Use metaphor as the meeting place between L-mode and R-mode.
    • Tip 15: Cultivate humor to build stronger metaphors.
    • Tip 16: Step away from the keyboard to solve hard problems.
    • Tip 17: Change your viewpoint to solve the problem.
    • Tip 18: Watch the outliers: “rarely” doesn’t mean “never.”
    • Tip 19: Be comfortable with uncertainty.
    • Tip 20: Trust ink over memory; every mental read is a write.
    • Tip 21: Hedge your bets with diversity.
    • Tip 22: Allow for different bugs in different people.
    • Tip 23: Act like you’ve evolved: breath, don’t hiss.
    • Tip 24: Trust intuition, but verify.
    • Tip 25: Create SMART objectives to reach your goals.
    • Tip 26: Plan your investment in learning deliberately.
    • Tip 27: Discover how you learn best.
    • Tip 28: Form study groups to learn and teach.
    • Tip 29: Read deliberately.
    • Tip 30: Take notes with both R-mode and L-mode.
    • Tip 31: Write on: documenting is more important than documentation.
    • Tip 32: See it. Do it. Teach it.
    • Tip 33: Play more in order to learn more.
    • Tip 34: Learn from similarities; unlearn from differences.
    • Tip 35: Explore, Invent, and apply in your environment—safely.
    • Tip 36: See without judging and then act.
    • Tip 37: Give yourself permission to fail; it’s the path to success.
    • Tip 38: Groove your mind for success.
    • Tip 39: Learn to pay attention.
    • Tip 40: Make thinking time.
    • Tip 41: Use a wiki to manage information and knowledge.
    • Tip 42: Establish rules of engagement to manage interruptions.
    • Tip 43: Send less email, and you’ll receive less email.
    • Tip 44: Choose your own tempo for an email conversation.
    • Tip 45: Mask interrupts to maintain focus.
    • Tip 46: Use multiple monitors to avoid context switching.
    • Tip 47: Optimize your personal workflow to maximize context.
    • Tip 48: Grab the wheel. You can’t steer on autopilot.
  • 41wduou7-8l-_sx310_bo1204203200_

    there is a popular book to read 5 years ago,  but I like this book name more than the context in fact.  6 years ago, Seth Godin has an article Good at talking vs. good at doing on his blog.

    This is the chasm of the new marketing.

    The marketing department used to be in charge of talking. Ads are talking. Flyers are talking. Billboards are talking. Trade shows are talking.

    Now, of course, marketing can’t talk so much, because people can’t be easily forced to listen.So the only option is to be in charge of doing. Which means the product, the service, the interaction, the effluent and other detritus left behind when you’re done.

    In organizations,  there are always many meetings or workshop, why don’t we stop it and do something interesting to change the world, make it better.

    Die Philosophen haben die Welt nur verschieden interpretiert; es kömmt drauf an, sie zu verändern.” – 11. These über Feuerbach. Originalfassung. MEW 3, S. 535, 1845

    Even I don’t like Karl Marx too much, but this saying is direct,  zu verändern(To change) is more important.

    Talk is cheap. Show me the code.  Linus 

  • From C10K to C100K problem, push 1,000,000 messages to web clients with 1 node
    • Motivation

      Current web browser communications protocol is limited to the HTTP request and response paradigm – i.e. the browser requests and the server response.What if we want to use a different communications paradigm? For example, what if we want to perform 2 ways communications where the server sends a request and the browser response? A common use case would be the server notifies the client that an event has occurred.

      WebSocket is a technology providing for bi-directional, full-duplex communications channels, over a single Transmission Control Protocol (TCP) socket. In addition, because WebSockets can co-exist with other HTTP traffic over port 80 and 443, firewalls will not have to be re-configured.

    • Challenge of 1 million connection

    In 2011, what’s app announced the blog about the connection number on 1 machine, we know they build by Erlang, but it is still very interesting to make a challenge to build single server can handle over 1 million established connection personally.

    2m

    • Design Test Scenario

    network.png

    I am using AWS m4. 2* large as web socket server  and 20 * t2.micro server as clients, because the port limited(65535) by Linux without virtual network interface.

    Screen Shot 2016-09-20 at 00.37.25.png

    Because some limited by EC2 network, the virtual network interface doesn’t work in fact.If it is the local network, you can do this way to create your virtual network cards easily.

    ifconfig eth0:1 192.168.0.1 netmask 255.255.255.0 up
    ifconfig eth0:2 192.168.0.1 netmask 255.255.255.0 up

    Message content is current system timestamp, server push frequency is 1/second.

    • Java Options

    export JAVA_OPTS=”-Xms16G -Xmx16G -Xss1M -XX:+UseParallelGC”

    • TCP Optimization

    Server side optimisation,  the configuration would handle almost 2,000,000 connection.

    vi /etc/sysctl.conf

    net.ipv4.tcp_wmem = 4096 87380 4161536
    net.ipv4.tcp_rmem = 4096 87380 4161536
    net.ipv4.tcp_mem = 786432 2097152 3145728
    fs.file-max = 2000000
    fs.nr_open = 2000000

    net.ipv4.tcp_syncookies = 1
    net.ipv4.tcp_tw_reuse = 1
    net.ipv4.tcp_tw_recycle = 1
    net.ipv4.tcp_fin_timeout = 30

    You can easy to google the meaning of these configurations, but when turn on  tcp_tw_recycle,  please be careful.

    It causes network reset error, because when using tcp_tw_recycle,  it will check the timestamp of network package. But we can use this configuration to ignore it.

    net.ipv4.tcp_timestamps=0

    • TCP/IP Range Optimization

    net.ipv4.tcp_max_syn_backlog = 8192       // default is 1024
    net.ipv4.ip_local_port_range = 1024 65535
    net.ipv4.tcp_keepalive_time = 1200           // default is 2 hours
    net.ipv4.tcp_max_tw_buckets = 5000       // default is 1800

    Using this command to active the network changes.

    /sbin/sysctl -p

    • Linux Optimization

    vi /etc/security/limits.conf

    * hard nofile 2000000
    * soft nofile 2000000
    root hard nofile 2000000
    root soft nofile 2000000

    there is a hacker way to adjust the stack size of each thread, using this command to check

    ulimit -s

    reduce the size to make each thread light, then it can help us to improve the throughput.

    • Client Side Implementation

    We can use firefox or Chrome to simulate a simple logic like this. It will print the current date time as string like this “Tue Sep 20 2016 01:40:17 GMT+0200 (CEST)”

    var handler = new WebSocket(“ws://ipaddress:port/endpoint”);
    handler.onmessage = function (event) {
    console.log(new Date(event.data));
    }

    But in the real test environment, I am using https://github.com/smallnest/C1000K-Servers as references to test.

    • Server Side Implementation

    Server side, I choose Netty to build server.

    netty_logo.png

    Please read https://github.com/smallnest/C1000K-Servers in detail. we can use this command to make a statistics of your network status.

    netstat -n | awk ‘/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}’

    LAST_ACK 13
    SYN_RECV 468
    ESTABLISHED 90
    FIN_WAIT1 259
    FIN_WAIT2 40
    CLOSING 34
    TIME_WAIT 28322

    • Metrics

    With the beautiful metrics from datadoghq.com, in  2 hours test, the system load was very low. The network traffic was also in normal level.

    Screen Shot 2016-09-20 at 00.48.51.png

    Screen Shot 2016-09-20 at 00.49.29.png

    • Conclusion

    Firstly, I should thank @smallnest https://github.com/smallnest answer some questions to me. In his implementation is also clean and simple in scala. Secondly, the results encourage us to build C100k server by yourself, the performance problem of connection could be conquered in the end.