import React, { useState, useEffect } from "react";

export default function NewYear() {

  return (

<div dangerouslySetInnerHTML={{__html: txt}}>
</div>
  );
}



const txt = `
<h5> No-sql definition: </h5><p>
1- Do not usually require to design a data-schema in advance
2- Do not rely on strict relational data model 
3- Distributed in a cluster of machines 
4- Adapt data partition and data replication for improving performance
5- Does not guarantee strong consistency but rely on weaker notions of consistency

</p><h5>Aggregated data model:</h5><p>
1- Different pieces of information that can be accessed together are put together to reduce the 
number of data access
2- You must decide the pattern you want to use to access data
3- Data models: key-value, document-based, column-family
4- Adv: Easier and more efficient to retrieval of data in distributed contexts
5- Adv: There is no need to access multiple tables distributed in multiple nodes and perform 
complicated joins among the nodes to identify the request data
6- Disadv: data duplication. Lack of flexibility changing the data access pattern. It is bad if you want 
to slice and dice data in different ways

</p><h5>Detailed data model:</h5><p>
1- Information are broken into smaller units, allowing developers to play with those tiny units 
more carefully
2- Data-models: graph based data model
3- Adv: Adequate when:
a. Data are subject to frequently changes
b. Relationships among nodes should be navigated
c. Access patterns are not well defined in advance 
4- Disadv: partitioning of data is not easy to achieve in a distributed context

</p><h5>Scaling: </h5><p>
1- Scale-up vertically: Enlarging the machines, it is expensive and requires higher agreement costs
2- Scale-out horizontally: By creating clusters of machines that can be working together using 
approaches and technologies typical to p2p system. Each node of the cluster maintains a portion 
of information to be handled. New nodes can be added when needed and nodes can be 
removed.

</p><h5>Partitioning: </h5><p>
1- The process of splitting data in a database into smaller disjoint chunks and spread them over 
different nodes of the cluster

</p><h5>Two kinds of partitioning: </h5><p>
1- Horizontally (sharding): typically used in no-sql system and divide the data at the row level into 
disjoint partitions
2- Vertically: divides data based on predefined groups of columns that are accessed together into 
disjoint partitions

</p><h5>Key-oriented, traversal oriented sharding:</h5><p>
1- Key oriented: in the systems that use aggregation strategy, each partition is associated with an 
identifier (shard-key) that is exploited for the storage and data lookup
2- Traversal-oriented: in systems that use graph based strategy, data lookup is based on analyzing 
the relationship between the items contained in the partition

</p><h5>Key oriented sharding has two types:</h5><p>
1- Key oriented static sharding: only considers static information
2- Key oriented workload-aware sharding: considers dynamic data and query workload

</p><h5>Range based partitioning:</h5><p>
1- Data items are clustered depending on the contiguous interval of shard-key
2- Adv: Range queries on short intervals are handled efficiently as they require to communicate 
with only one node or a few nodes
3- Disadv: nodes with popular keys like ‘E’ presents higher workload
4- Disadv: nonuniform distribution of keys causes unbalanced data distribution among nodes
5- Data warehouses, online games, webservers love this approach because they need access piece 
of information depending on specific key or a list of ordered keys

</p><h5>Simple hashing partitioning: </h5><p>
1- The data item keys are randomally hashed into their hosting nodes via simple hashing schemes 
like modulo. Hosting_node_num = key % num_of_nodes
2- Adv: We don’t need a lookup table because the hash function results in efficient data lookup
3- Disadv: Insertion or removal of nodes requires the redistribution of data items because the hash 
of keys need to be severely reshuffled
4- Disadv: poor data locality and number of nodes to be contacted to answer the query can deeply 
increases when data items are randomly distributed

</p><h5>Consistent hashing: </h5><p>
1- Considers the scope of hash function is a ring where node ids and keys are randomly hashed into 
their positions
2- The hosting node of an item is the first node encountered when walking clockwise from the 
position of data item on the ring
3- The insertion or removal of node causes only a redistribution of O(1/N) fraction of data items 
that are hosted on the immediate successor node
4- Still have poor data locality

</p><h5>Data replication: </h5><p>
1- Mirroring the same partition of data among several nodes of the cluster, so that each piece of 
data can be found in multiple places and so guaranteeing increasing throughput, fault tolerance 
and high availability
2- Replication factor is the number of times the partition has been replicated and also represents 
the level of fault tolerance, which is the number of replica that can be failed before considering 
the partition has been lost

</p><h5>Master/Slave replication: </h5><p>
1- Allowing the data on one cluster (master) to be replicated to one or more other nodes of the 
cluster (slaves)
2- The master manages only write/update operations whereas the slaves manages the read 
operations
3- The process can wait for updates from the master (pull modality) or ping the master for 
checking updates (push modality)
4- Replication can be either synchronous (The changes are made on both master and slaves at the 
same time) or asynchronous (The changes are queued up and written later)
5- The master represents a bottleneck since it is in charge of all insert/update operations. 
Therefore, it is not suitable for applications with heavy write traffic

</p><h5>P2P replication:</h5><p>
1- Each node can do write/update requests and need to propagate the changes to the other nodes 
containing a copy of data
2- This approach has solved the bottleneck of the master issue
3- The performance of the system is increased, but inconsistency may occur more frequently when 
updates to copies of different nodes are slowly propagated
4- Two replica of the same record stored in different nodes can be updated in the same time 
generating a write-write conflict
5- The main approaches to detect and solve conflicts are:
a. Timestamp approach: Which relies on the association of the timestamp to each 
modified replica and it is resolved by selecting the one with the best timestamp (last 
write wins)
b. Vector-clock approach: which is a data structure used for determining the partial 
ordering of the write operations and detecting casuality relationship between multiple 
events that occur, helping resolve the conflicts

</p><h5>Eventual consistency:</h5><p>
1- Since no-sql system works on a cluster of machines, the problem of network partition can occur 
because of network failures or because of delays in communicating between nodes
2- This leads to systems that cannot offer at the same time both availability and consistency. The 
impossibility has been expressed through the CAP theorem
3- The eventual consistency is a form of weak consistency that guarantees it after a certain amount 
of time
4- Eventual consistency is a good compromise between the realization of strong consistency 
systems presents relevant latency answering user requests or weak consistency that can provide 
fast answers that might not be accurate

</p><h5>CAP:</h5><p>
"Consistency: a lookup query always returns the current value associated with the search key 
independently from the node which the request has been issued from",
 "Availability: a lookup query always succeeds and returns a value associated with the search 
key, independently from the freshness of a value",
 "Partition tolerance: the previous two properties hold even when network failures prevent 
some machines from communicating with each other"

</p><h5>Characteristics of no-sql:</h5><p>
1- "Non-relational: do not rely on strict relational data model. Do not require to design the schema 
in advance. Do not exploit SQL language",
2- "Distributed: use clusters of machines that exchange messages and collaborate for for data 
storage and query workload"
3- "Open source",
4- "Horizontally scalable: data partitioning and data replication",
5- "Weaker notion of consistent: do not fully support ACID properties",
6- "Schema-less system making the system more flexible"

</p><h5>Map-reduce:</h5><p>
1- I have a very big file to be managed. I can split it into chunks, usually each chunk is 64mb, each 
one can be associated to a map function.
2- The map function is just a simple function that identifies the key which is a word in our example 
with a value which is in our example the count <How, 1 1>
3- The final result of the mappers machines is sent to another machines called the reducers where 
each reducer is responsible for a key 
4- The output will be pairs of key each key is a word and the value is the count of this word

</p><h5>Characteristics of No-Sql:</h5><p>
1- Schema-less: they usually do not provide a fixed schema, or it does not force you to make a 
schema in advance. This makes them flexible. 
2- It doesn’t use sql language, it does not support join operation. Especially that the join operation 
is very costly operation. 
3- In order to provide horizontal sharding we can use:
a. Key-value-oriented sharding approach: partitioning the data into chunks and accessing 
the data through the chunk. Range-based approach, simple hashing, consistent hashing
b. Traversal oriented sharding approach: this is used for graphs, whenever I have to shard 
a graph, the sharding approach will try to take into account the traversal among nodes 
that belong to graph and this is NP problem. Very few no-sql systems allow you to 
consider traversal oriented sharding
4- Data sharding can be horizontally (typical in mongodb ) or vertically (typical in Cassandra 
(column-family))
5- Data replication: replicate same chunk of data in different machines in order to support a 
parallel access to data and create a system that is more efficient.
a. Master/slave approach: all the updates occurs in the master and the master is in charge 
to reflush the modification to the slaves, and the slaves for read operations. It is good 
when you have an intensive number of read operations 
b. P2P approach: all nodes in cluster can execute read/write operations but you may have
conflicts and you have to provide some approaches to detect and solve the conflicts:
i. Time-stamp approach 
ii. Vector clock approach
6- Supports eventual consistency: the system is not able to guarantee that access from any node 
will give you the right answer. Cap theorem. 
7- Map-reduce
`