Parallelizing Assignment Problem with DNA Strands

Document Type: Research Paper

Authors

1 Computer Engineering Department, Faculty of Engineering, Ferdowsi University of Mashhad, Mashhad, Iran.

2 Computer Engineering Department, Faculty of Engineering, Ferdowsi University of Mashhad, Mashhad, Iran

Abstract

Background:Many problems of combinatorial optimization, which are solvable only in exponential time, are known to be Non-Deterministic Polynomial hard (NP-hard). With the advent of parallel machines, new opportunities have been emerged to develop the effective solutions for NP-hard problems. However, solving these problems in polynomial time needs massive parallel machines and is not applicable up to now.
Objectives: DNA (Deoxyribonucleic acid) computing provides a fantastic method to solve NP-hard problems in polynomial time. Accordingly, one of the famous NP-hard problems is assignment problem, which is designed to find the best assignment of n jobs to n persons in a way that it could maximize the profit or minimize the cost.
Material and Methods: Applying bio molecular operations of Adelman Lipton model, a novel parallel DNA algorithm have been proposed for solving the assignment problem.
Results: The proposed algorithm can solve the problem in  time complexity, and just O(n) initial DNA strand in comparison with  initial sequence, which is used by the other methods.
Conclusions: In this article, using DNA computing, we proposed a parallel DNA algorithm to solve the assignment problem in linear time.

Keywords

Main Subjects


1. Background

Deoxyribonucleic acid (DNA) is a molecule that carries most of the genetic instructions used in the development, functioning, and reproduction of all known living organisms and many viruses. Just one gram of DNA can hold about 680 peta bytes (6.8* 1017) equal to one billion ( 1 ) Compact Disc (CD), which take 163,000 centuries to be listened.

Considering parallelism aspect, it also is capable of parallelizing 3*1014 molecules at a time ( 2 ), and this enormous parallelism persuades the researchers to build a model by the use of DNA strands to solve NP-hard problems ( 3 ). Considering the similarities in biological and mathematical operations, and the facts that DNA is stable and also predictable in its reactions, this macromolecule can be used to encode information for mathematical systems.

In 1994, Adelman ( 4 ) designed the first model of molecular computation by solving a seven-node instances of Hamiltonian path problem, and then showed the enormous parallel power of DNA computation. In 1995, Lipton ( 5 ) solved satisfiability problem by Adelman’s method and designed a new method called Adelman-Lipton model, which was consisted of three levels as follows:

a) Producing specific sequences and putting them in a test tube.

b) Performing a number of operations on the DNA sequences inside the tube and removing unwanted sequences, which certainly cannot be considered as the final answer of the problem.

c) Reading and reporting the DNA sequences, if any exists, in the final tube as final answer.

Ouyang ( 6 ) solved Maximal clique problem in 1997 by designing Restriction enzyme model. Roweis designed sticker model ( 7 ) in 1998. The Self-assembly model was designed by Winfree in 1998 ( 8 ). The Surface based model ( 9 ) was designed by Smith in 1998. And finally, Sakomoto proposed the Hairpin model in 2000 ( 10 ).

As the other applications of DNA computing we can point to Zhang's image encryption algorithm ( 11 ) or Patel multiple image encryption ( 12 ) which is based on DNA computing or novel data encryption scheme which is proposed by Patro ( 13 ) by using DNA computing.

Many efforts have been done by researchers to solve NP-hard problems via these models such as knapsack ( 14 ), n-queen ( 15 ), binpacking ( 16 ), maximal clique problem ( 17 ), graph vertex coloring problem ( 18 ), maximal matching problem ( 19 ), maximal connected subgraph problem ( 20 ), maximum complete bipartite subgraph ( 21 ), maximum independent set problem ( 22 ), maximum matching problem ( 23 ), maximum k-colourable subgraph ( 24 ), hamiltonian path ( 25 , 26 ), minimum k-suppliers ( 27 ).

2. Objectives

Assignment is another famous NP-hard problem ( 28 ), which has been solved in different versions by researchers as Wang ( 29 ) who solved the unbalanced version of assignment, or Yang ( 30 ) who solved the quadratic version of assignment.

Based on Adleman-Lipton model, we proposed a parallel DNA algorithm with O(n2) time complexity for assignment problem, and O(n) initial DNA strand in comparison with Onn initial DNA strands of similar solutions such as that was proposed by Wang et al ( 31 ).

3. Materials and Methods

In Assignment problem, we have n jobs, which can be performed by n people. Each of the people will do each one of the jobs with a specific wage. The objective of this study was to assign each job to one person in such a way that the total wages become minimized.

As an example, consider four teachers (T1-T4) from four different cities (CT1-CT4) and also four universities (U1-U4) in four different other cities (CU1-CU4).

Table 1 shows the ticket price for each flight from cities CT1-CT4 to cities CU1-CU4. The goal is to send each teacher to one of the universities in a way that the total price of purchased ticket becomes minimized.

City CT1 CT1 CT1 CT1
CU1 40 140 90 180
CU2 120 20 70 110
CU3 80 25 25 115
CU4 80 180 130 140
Table 1. Ticket prices for the example described in the text.

We can represent Table 1 as a cost matrix:

[ 40 140 90 180 120 20 70 110 80 25 25 115 80 180 130 40 ]

One possible assignment is as follow:

[ 40 140 90 180 120 20 70 110 80 25 25 115 80 180 130 40 ]

Where selected flights highlighted as yellow. The total cost of this assignment is: 90$+110$+25$+80$=305$. But the best possible assignment is:

[ 40 140 90 180 120 20 70 110 80 25 25 115 80 180 130 40 ]

Which will cost 40$+20$+25$+140$=225$ and means to send T1 to U1, T2 to U2, T3 to U3, and T4 to U4.

3.1. Adelman-Lipton Model:

Different operations of Adelman-Lipton model:

  • Merge (T1, T2): The content of tube T1 will be merged with T2 and will also be moved into T1.
  • Copy (T1, T2): The content of T1 will be copied into the empty tube T2.
  • Detect (T): Determining whether any sequence exists in tube T or not.
  • Extract (T1, X, T2): The subset of T1, which contains the X sequence will be transferred into T2.
  • Select (T1, X, T2): The subset of T1 containing sequences longer than X will be transferred into T2.
  • Selecting (T1, L, T2): It will select sequences of size L from tube T1 and put them in tube T2.
  • SelectMax (T1, T2): The longest sequence in tube T1 will be chosen and moved into tube T2.
  • SelectMin (T1, T2): The shortest sequence in tube T1 will be chosen and moved into tube T2.
  • Annealing (T): ssDNA will be converted to dsDNA by freezing the tube.
  • Denaturing (T): dsDNA will be converted to ssDNA by freezing the tube.
  • Discard (T): The content of tube T will be eliminated.
  • PCR (T, T1): A lot of copies of DNA strands will be made and placed in tube T1.
  • Append Tail (T, Z): Sequence Z will be appended to the end of all sequences in tube T.
  • Read (T): A sequence of tube T will be read.

3.2. DNA Algorithm for Assignment Problem

First of all, all nn combinations of people and jobs will be generated as different DNA strands. Then, the strands having all people will be separated and the others will be discarded. After that, among remained stands, those that have all jobs will be separated and the others will be discarded. The shortest sequences will be regarded as the final solutions.

The proposed algorithm consists of six steps:

  • Building 2n unique10-mer sequences (sequence with length 10) for each of n people and n jobs.
  • Building all n2 possible combinations of jobs with people and then adding Ci-mer sequences of G nucleotides (Ci is the cost or profit of performing Ji by Pi).
  • Construct all nn possible combinations of choosing n jobs by n people.
  • Eliminate all sequences which do not consist of all people.
  • Among remaining sequences of previous step, keep only that were consisted of all jobs.
  • Consider the shortest sequence as the optimal solution.

3.3. Strand Design:

First of all, for each person (Pi) and each job (Ji) of assignment problem produce a distinct 10mer sequence. We can produce 59049 different 10mer sequences using 3 nucleotides a, t, c. Among them, we should select the sequences, which have the most difference with each other and then put these 2n sequences in 2n tube and perform the Polymerase Chain Reaction (PCR) on them. PCR, developed in 1983 by Kary Mullis ( 32 ), is a molecular biology technology, which generates a lot of copies from a single piece of DNA sequence. We have to generate n2 sequences by dot product of n jobs to n person’s sequences.

Finally, equal to the cost of assigning a person to a job, we should add Ci-mer sequences of G nucleotides to the related sequences. For making the length of sequences shorter according to the costs, it is easily possible to make a rule. Such as subtracting the minimum cost can form the cost belonged to all people.

As an example, in our problem, the minimum ticket price was 20$ and the costs were differed at least 5$ with each other. So, we subtracted 20 from all ticket prices and then added a G for each5$. As you can see in Table 2 for the four teachers, we used four different 10mer, and for the four universities, we used four others different 10mer.

DNA Sequence DNA Sequence
T1 5' TCTATAACTA 3' U1 5' AACTACATTT 3'
T2 5' TAACACTATT 3' U2 5' CATCATTTAC 3'
T3 5' ACTAATCTCT 3' U3 U35' ATTACTCCTA 3'
T4 5' CACACATCTA 3' U4 5' TACCTCAACT 3'
Table 2. Sequences chosen to represent the elements of the assignment problem, which has been described in the Table 1.

Then, as it is shown in Table 3, we should combine all possible ways and then add one G for each 5$ to all of the sequences and also remove 4G from all of them because of subtracting minimum.

Length DNA Sequence
T1U1 40 5' TCTATAACTAAACTACATTTGGGGGGGGGGGGGGGGGGGG 3'
T1U2 32 5' TCTATAACTACATCATTTACGGGGGGGGGGGG 3'
T1U3 24 5' TCTATAACTAATTACTCCTAGGGG 3'
T1U4 32 5' TCTATAACTATACCTCAACTGGGGGGGGGGGG 3'
T2U1 20 5' TAACACTATTAACTACATTT 3'
T2U2 52 5' TAACACTATTCATCATTTACGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG 3'
T2U3 44 5' TAACACTATTATTACTCCTAGGGGGGGGGGGGGGGGGGGGGGGG 3'
T2U4 21 5' TAACACTATTTACCTCAACTG 3'
T3U1 30 5' ACTAATCTCTAACTACATTTGGGGGGGGGG 3'
T3U2 42 5' ACTAATCTCTCATCATTTACGGGGGGGGGGGGGGGGGGGGGG 3'
T3U3 34 5' ACTAATCTCTATTACTCCTAGGGGGGGGGGGGGG 3'
T3U3 21 5' ACTAATCTCTTACCTCAACTG 3'
T4U1 38 5' CACACATCTAAACTACATTTGGGGGGGGGGGGGGGGGG 3'
T4U2 44 5' CACACATCTACATCATTTACGGGGGGGGGGGGGGGGGGGGGGGG 3'
T4U3 52 5' CACACATCTAATTACTCCTAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG 3'
T4U4 39 5' CACACATCTATACCTCAACTGGGGGGGGGGGGGGGGGGG 3'
Table 3. Sequences chosen to represent the elements of assignment problem.

3.4. Sample Space:

Put each of the related strands to the first person in a separated tube.

  • For (i in 1 to n)
    1. ° Merge (T, Ti)
    2. ° Discard (Ti)
  • For (i in 2 to n)
    1. ° For (j in 1 to n)
      1. ▪ PCR (T, Tj)
      2. ▪ AppendTail (Tj, Pi Jj)
      3. ▪ Merge (Temp, Tj)
      4. ▪ Discard (Tj)
    2. ° Discard (T)
    3. ° Copy (Temp, T)
    4. ° Discard (Temp)

The trace of the top algorithm for three people and three job is shown as an example in Table 4.

Eliminate all sequences, which do not consist of all people.

Round Tube T1 Tube T2 Tube T3 Tube T
i=1 P1J1 P1J2 P1J3 P1J1,P1J2,P1J3
i=2, j=1 P1J1P2J1,P1J2 P2J1,P1J3 P2J1 P1J1,P1J2,P1J3
i=2, j=2 P1J1P2J2,P1J2 P2J2, P1J3 P2J2 P1J1,P1J2,P1J3
i=2, j=3 P1J1P2J3,P1J2P2J3,P1J3P2J3 P1J1,P1J2,P1J3
i=3, j=1 P1J1P2J1P3J1,P1 J2P2J1P3J1, P1J3P2J1P3J1,P1J1P2J2P3J1, P1J2P2J2P3J1,P1J3P2J2P3J1, P1J1P2J3P3J1,P1J2P2J3P3J1, P1J3P2J3P3J1 P1J1P2J1,P1 J2P2J1, P1J3P2J1,P1J1P2J2, P1J2P2J2,P1J3P2J2, P1J1P2J3,P1J2P2J3, P1J3P2J3
i=3, j=3 P1J1P2J1P3J3,P1 J2P2J1P3J3, P1J3P2J1P3J3,P1J1P2J2P3J3, P1J2P2J2P3J3,P1J3P2J2P3J3, P1J1P2J3P3J3,P1J2P2J3P3J3, P1J3P2J3P3J3 P1J1P2J1,P1 J2P2J1, P1J3P2J1,P1J1P2J2, P1J2P2J2,P1J3P2J2, P1J1P2J3,P1J2P2J3, P1J3P2J3
Final P1J1P2J1P3J1,P1J2P2J1P3J1,P1J3P2J1P3J1,P1J1P2J2P3J1,P1J2P2J2P3J1,P1J3P2J2P3J1,P1J1P2J3P3J1, P1J2P2J3P3J1,P1J3P2J3P3J1P1J1P2J1P3J2,P1J2P2J1P3J2,P1J3P2J1P3J2,P1J1P2J2P3J2,P1J2P2J2P3J2, P1J3P2J2P3J2,P1J1P2J3P3J2,P1J2P2J3P3J2,P1J3P2J3P3J2,P1J1P2J1P3J3,P1J2P2J1P3J3,P1J3P2J1P3J3, P1J1P2J2P3J3,P1J2P2J2P3J3,P1J3P2J2P3J3,P1J1P2J3P3J3,P1J2P2J3P3J3,P1J3P2J3P3J3
Table 4. Trace for three people and three jobs
  • For (i in 1 to n)
    1. °Extract (T, Pi, Temp)
    2. °Discard(T)
    3. °Copy (Temp, T)
    4. °Discard (Temp)

Eliminate all sequences, which do not consist of all jobs.

  • For (i in 1 to n)
    1. ° Extract (T, Jobi, Temp)
    2. ° Discard(T)
    3. ° Copy (Temp, T)
    4. ° Discard (Temp)

Final Answer

  • SelectMin (T, Temp)
  • Read (Temp)

4. Results

4.1. The solutions of assignment problem with n jobs and individuals can be obtained by above mentioned DNA molecules operations.

Proof. At first, all combinations of the n job assignments were generated in the sample space. Then, the sequences containing all people were kept and this guaranteed that none of the people are jobless. And finally, among the remaining sequences, the sequences containing all the jobs were kept and this guaranteed that all of the jobs will be done at the end. Therefore, for sure, all possible assignments are in final tube, and as the cost of each assignment was attached to each of them in the strain design phase, so the strand with minimum length among the strands of final tube would be the minimum cost assigning solution.

4.2. The time complexity of the proposed algorithm for solving assignment problem with n jobs and individuals is n2.

Proof. As the complexity of every biological operation is O(1) ( 33 ), the time complexity of algorithm can be calculated by adding the time complexity of all steps as follows:

T (Sample space): O(n) for the first loop and On2 for the nested loop.

T (Persons): Four biological operations in each step, which become O(1), and in n step it would be O(n).

T (Jobs): Four biological operations in each step, which become O(1), and in n step it would be O(n).

T (Last step): Two biological operations, which become O(1).

So, the time complexity of the algorithm will be arrived by equation 1 as follows:

T(n) = O(n)+O(n2)+O(n)+O(n)+O(1)= O(n2)

5. Discussion

In the proposed method, by using adenine, thymine, and cytosine; we produced distinct 10mer sequences equal to the number of people and jobs of assignment problem. Then, by using polymerase chain reaction on them, we produced n2 sequences equal to the dot product of n persons to n jobs. Finally, by adding guanine equal to the cost of assigning a job to a person to the related sequence, we got the desired sequence.

Wang et al.( 31 ), Kang et al.( 34 ), Shu et al.( 35 ), and Ebrahim et al.( 36 ), Tsaftaris et al.( 37 ), Rashid et al. ( 38 ) all used the same model to solve the assignment problem using nn distinct sequence with the time complexity of O(n2). The proposed algorithm reduced the initial cost to a great extent by using just 2n initial sequences in comparison with nn initial sequences, which is used by the other methods and solve the problem with the same time complexity.

6. Conclusion

Ultra-efficient parallelism of the methods of DNA computing in contrast with the obvious limitations in storage, speed, intelligence, and miniaturization of electronic computers is considerable. Although there are still some problems that need a further study in biologic technology, it is still possible to solve a lot of NP-hard problems in linear time via DNA computing. In this article, we highlighted a DNA computing model with biological operations based on Adelman-Lipton model to solve Assignment problem with the time complexity of O(n2) with just buying 2n 10mer sequences and making the other needed sequences in our lab in comparison with the previous methods which need a lot of different sequences.

We hope that, in near future, molecular computer become usable instead of electronic computers, which can cause DNA computing solutions become applicable.

References

  1. Erlich Y, Zielinski D. DNA Fountain enables a robust and efficient storage architecture. Science. 2017; 355(6328):950-954. DOI
  2. Amos M, Păun G, Rozenberg G, Salomaa A. Topics in the theory of DNA computing. TCS. 2002; 287(1):3-38. DOI
  3. Knuth DE. Postscript about NP-hard problems. ACM SIGACT News. 1974; 6(2):15-16. DOI
  4. Adleman LM. Molecular computation of solutions to combinatorial problems. Science. 1994;1021-1024.
  5. Lipton RJ. DNA solution of hard computational problems. Science. 1995; 268(5210):542-545. DOI
  6. Ouyang Q, Kaplan PD, Liu S, Libchaber A. DNA solution of the maximal clique problem. Science. 1997; 278(5337):446-449. DOI
  7. Roweis S, Winfree E, Burgoyne R, Chelyapov NV, Goodman MF, Rothemund PW, et al. A sticker-based model for DNA computation. JCB. 1998; 5(4):615-629. DOI
  8. Winfree E, Liu F, Wenzler LA, Seeman NC. Design and self-assembly of two-dimensional DNA crystals. Nature. 1998; 394(6693):539. DOI
  9. Smith LM, Corn RM, Condon AE, Lagally MG, Frutos AG, Liu Q, et al. A surface-based approach to DNA computation. JCB. 1998; 5(2):255-267.
  10. Sakamoto K, Gouzu H, Komiya K, Kiga D, Yokoyama S, Yokomori T, et al. Molecular computation by DNA hairpin formation. Science. 2000; 288(5469):1223-1226. DOI
  11. Zhang Y. The image encryption algorithm based on chaos and DNA computing. MTA. 2018; 77(16):21589-21615. DOI
  12. Patel A, Parikh M. Multiple Image Encryption Using Chaotic Map And DNA Computing. 2018.
  13. Patro KAK, Acharya B. Novel data encryption scheme using DNA computing. Advances of DNA Computing in Cryptography. CRC. p.69-110.DOI
  14. Darehmiraki M, Nehi HM. Molecular solution to the 0–1 knapsack problem based on DNA computing. AMC. 2007; 187(2):1033-1037. DOI
  15. Wang Z, Huang D, Tan J, Liu T, Zhao K, Li L. A parallel algorithm for solving the n-queens problem based on inspired computational model. BS. 2015; 131:22-29. DOI
  16. Sanches CAA, Soma NY. A polynomial-time DNA computing solution for the Bin-Packing Problem. AMC. 2009; 215(6):2055-2062. DOI
  17. Lingjing S, Zhichao S, Liuqing W, Yafei D. A Molecular Computing Model for Maximum Clique Problem Based on DNA Nanoparticles. JCTN. 2014; 11(10):2120-2124. DOI
  18. Xu J, Qiang X, Zhang K, Zhang C, Yang J. A DNA computing model for the graph vertex coloring problem based on a probe graph. Engineering. 2018; 4(1):61-77. DOI
  19. Wang Z, Ji Z, Su Z, Wang X, Zhao K. Solving the maximal matching problem with DNA molecules in Adleman–Lipton model. IJB. 2016; 9(02):1650019. DOI
  20. Guo N, Pu J, Wang Z, Huang D, Li L, Zhao K. A New Algorithm to Solve the Maximal Connected Subgraph Problem Based on Parallel Molecular Computing. JCTN. 2016; 13(10):7692-7695. DOI
  21. Yin C, Pu J, Wang Z, Wang X. Solving the Maximum Complete Bipartite Subgraph Problem Based on Molecule Parallel Supercomputing. JCTN. 2016; 13(1):314-318.
  22. CHEN J, NING A, ZHI Z, HU L, ZHANG H. Exact algorithm for maximum independent set problem in graph theory. CEA. 2016; 1:5.
  23. Yang J, Yin Z, Huang K, Cui J. The Maximum Matching Problem Based on Self-Assembly Model of Molecular Beacon. NNL. 2018; 10(2):213-218. DOI
  24. Moazzam A, Dalvand B. Molecular solutions for the Maximum K-colourable Sub graph Problem in Adleman-Lipton model. arXiv preprint arXiv:161007294. 2016.
  25. A new bionic method inspired by DNA computation to solve the hamiltonian path problem. ICIA. 2017: IEEE.DOI
  26. Mahasinghe A, Hua R, Dinneen MJ, Goyal R. Solving the Hamiltonian cycle problem using a quantum computer. Computing. 2019; 29:40. DOI
  27. Solving Minimum k-supplier in Adleman-Lipton model. CSCI. 2017: IEEE.DOI
  28. König D. Über graphen und ihre anwendung auf determinantentheorie und mengenlehre. MA. 1916; 77(4):453-465. DOI
  29. Wang Z, Pu J, Cao L, Tan J. A parallel biological optimization algorithm to solve the unbalanced assignment problem based on DNA molecular computing. IJMS. 2015; 16(10):25338-25352.
  30. Yang X, Lu Q, Li C, Liao X. Biological computation of the solution to the quadratic assignment problem. AMC. 2008; 200(1):369-377. DOI
  31. Wang Z, Tan J, Huang D, Ren Y, Ji Z. A biological algorithm to solve the assignment problem based on DNA molecules computation. AMC. 2014; 244:183-190. DOI
  32. Mullis K, Erlich H, Arnheim N, Horn G, Saiki R, Scharf S. One of the first Polymerase Chain Reaction (PCR) patents. Google Patents. 1987.
  33. Chang W-L, Ren T-T, Luo J, Feng M, Guo M, Lin KW. Quantum algorithms for biomolecular solutions of the satisfiability problem on a quantum machine. TNB. 2008; 7(3):215-222. DOI
  34. Kang Z, Tong X, Jin X. Algorithm of DNA computing on optimal assignment problems. JSEE. 2007; 29:1183-1187.
  35. Shu J-J, Wang Q-W, Yong K-Y. DNA-based computing of strategic assignment problems. PRL. 2011; 106(18):188702.
  36. Solving unconstraint assignment problem by a molecular-based computing algorithm. ISIE. 2004: IEEE.DOI
  37. A DNA computing approach to solve Task Assignment problem in Real Time Distributed computing System. MMC. 2007: Citeseer.
  38. Ibrahim GJ, Rashid TA, Sadiq AT. Evolutionary DNA Computing Algorithm for Job Scheduling Problem. IETEJS. 2018; 64(4):514-527. DOI